Content deleted Content added
No edit summary Tag: Reverted |
Anachronist (talk | contribs) reverted unsourced paragraph that consisted solely of commentary about inline external links |
||
Line 45:
Metrics specifically designed to evaluate paraphrase generation include paraphrase in n-gram change (PINC)<ref name=Chen /> and paraphrase evaluation metric (PEM)<ref name=Liu>{{cite conference|last1=Liu|first1=Chang|last2=Dahlmeier|first2=Daniel|last3=Ng|first3=Hwee Tou|title=PEM: A Paraphrase Evaluation Metric Exploiting Parallel Texts|book-title=Proceedings of the 2010 Conference on Empricial Methods in Natural Language Processing|place=MIT, Massachusetts|year=2010|pages=923–932|url=http://www.aclweb.org/anthology/D10-1090}}</ref> along with the aforementioned ParaMetric. PINC is designed to be used in conjunction with BLEU and help cover its inadequacies. Since BLEU has difficulty measuring lexical dissimilarity, PINC is a measurement of the lack of n-gram overlap between a source sentence and a candidate paraphrase. It is essentially the [[Jaccard index|Jaccard distance]] between the sentence excluding n-grams that appear in the source sentence to maintain some semantic equivalence. PEM, on the other hand, attempts to evaluate the "adequacy, fluency, and lexical dissimilarity" of paraphrases by returning a single value heuristic calculated using [[N-gram]]s overlap in a pivot language. However, a large drawback to PEM is that must be trained using a large, in-___domain parallel corpora as well as human judges.<ref name=Chen /> In other words, it is tantamount to training a paraphrase recognition system in order to evaluate a paraphrase generation system.
== See also ==
|