Skip to content

Latest commit

 

History

History
36 lines (25 loc) · 1.86 KB

README.md

File metadata and controls

36 lines (25 loc) · 1.86 KB

Evaluating Output

To evaluate Mukalma's output, we have used a free tool named nlg-eval made available by Maluuba Inc., an Artificial Intelligence Company that has been acquired by Microsoft. To acquire and setup nlg-eval, please follow the guide on its repository

Nlg-eval runs comparisons between actual and expected output of NLG on several unsupervised, automated metrics. The Gold standard we have used to compare our output to is the Wizards of Wikipedia Dialogue Dataset, which can be found here

These are some of the metrics that nlg-eval produces results for :

  • BLEU-1
  • METEOR
  • ROUGE-L
  • Skip Thoughts Cosine Similarity
  • Embedding Average Cosine Similarity
  • Vector Extrema Cosine Similarity
  • Greedy Matching Score

Steps to reproduce our results

  • Using mukalma_out.txt and wizards_out.txt , run nlg-eval to produce the metrics for Mukalma
  • Using dialo_out.txt and wizards_out.txt , run nlg-eval to produce the metrics for DialoGPT

Steps to evaluate Mukalma using a larger amount of Data

The following instructions are to evaluate MUKALMA on the data given in the 'test_topic_split.json' file. To evaluate MUKALMA on a different data file from the Wizards dataset , paste the chosen file in the 'eval' directory and change the filename from 'test_topic_split.json' to the chosen file on line 6 of the file 'DataGeneration.py'

Running the Evaluation

  • Setup and Run MUKALMA.
  • Run the 'data-generation.py' file
  • Upon execution , provide the endpoint url of the model that is running
  • Once executed , the following files will be generated :
    • wizard_statements.txt
    • apprentice_statements.txt
    • mukalma_output.txt
  • Use the mukalma_output.txt and wizard_statements.txt files with nlg-eval to produce the required metrics