Content deleted Content added
Neonrights (talk | contribs) |
Neonrights (talk | contribs) →Evaluation: PEM |
||
Line 48:
While originally used to evaluate machine translations, [[BLEU]] has been used successfully to evaluate paraphrase generation models as well. However, paraphrases often have several lexically different but equally valid solutions which hurts BLEU and other similar evaluation metrics.<ref name=Chen>{{cite conference|last1=Chen|first1=David|last2=Dolan|first2=William|title=Collecting Highly Parallel Data for Paraphrase Evaluation|booktitle=Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies|place=Portland, Oregon|year=2008|pages=190-200|url=https://dl.acm.org/citation.cfm?id=2002497}}</ref>
ParaMetric<ref name=Burch2>{{cite conference|last1=Callison-Burch|first1=Chris|last2=Cohn|first2=Trevor|last3=Lapata|first3=Mirella|title=ParaMetric: An Automatic Evaluation Metric for Paraphrasing|booktitle=Proceedings of the 22nd International Conference on Computational Linguistics|place=Manchester|year=2008|pages=97-104|url=https://pdfs.semanticscholar.org/be0d/0df960833c1bea2a39ba9a17e5ca958018cd.pdf}}</ref>,
and PINC<ref name=Chen></ref>. PEM (paraphrase evaluation metric) attempts to evaluate the "adequacy, fluency, and lexical dissimilarity" of paraphrases using pivot language [[n-gram|N-grams]]. However, a large drawback to PEM is that must be trained using a large, in-___domain parallel corpora as well as human judges.<ref name=Chen></ref> In other words, it is tantamount to training a paraphrase recognition system in order to evaluate a paraphrase generation system.
== References ==
|