Content deleted Content added
Neonrights (talk | contribs) →Evaluation: PEM |
Neonrights (talk | contribs) |
||
Line 48:
While originally used to evaluate machine translations, [[BLEU]] has been used successfully to evaluate paraphrase generation models as well. However, paraphrases often have several lexically different but equally valid solutions which hurts BLEU and other similar evaluation metrics.<ref name=Chen>{{cite conference|last1=Chen|first1=David|last2=Dolan|first2=William|title=Collecting Highly Parallel Data for Paraphrase Evaluation|booktitle=Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies|place=Portland, Oregon|year=2008|pages=190-200|url=https://dl.acm.org/citation.cfm?id=2002497}}</ref>
Metrics specifically designed to evaluate paraphrase generation inclue PEM<ref name=Liu>{{cite conference|last1=Liu|first1=Chang|last2=Dahlmeier|first2=Daniel|last3=Ng|first3=Hwee Tou|title=PEM: A Paraphrase Evaluation Metric Exploiting
ParaMetric<ref name=Burch2>{{cite conference|last1=Callison-Burch|first1=Chris|last2=Cohn|first2=Trevor|last3=Lapata|first3=Mirella|title=ParaMetric: An Automatic Evaluation Metric for Paraphrasing|booktitle=Proceedings of the 22nd International Conference on Computational Linguistics|place=Manchester|year=2008|pages=97-104|url=https://pdfs.semanticscholar.org/be0d/0df960833c1bea2a39ba9a17e5ca958018cd.pdf}}</ref>,
and PINC<ref name=Chen></ref>. PEM (paraphrase evaluation metric) attempts to evaluate the "adequacy, fluency, and lexical dissimilarity" of paraphrases
== References ==
|