Revision as of 06:13, 6 February 2018 edit Neonrights (talk \| contribs) 65 edits →Evaluation: PEM ← Previous edit		Revision as of 06:21, 6 February 2018 edit undo Neonrights (talk \| contribs) 65 edits m →Evaluation Next edit →
Line 48: While originally used to evaluate machine translations, [[BLEU]] has been used successfully to evaluate paraphrase generation models as well. However, paraphrases often have several lexically different but equally valid solutions which hurts BLEU and other similar evaluation metrics.<ref name=Chen>{{cite conference\|last1=Chen\|first1=David\|last2=Dolan\|first2=William\|title=Collecting Highly Parallel Data for Paraphrase Evaluation\|booktitle=Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies\|place=Portland, Oregon\|year=2008\|pages=190-200\|url=https://dl.acm.org/citation.cfm?id=2002497}}</ref> Metrics specifically designed to evaluate paraphrase generation inclue PEM<ref name=Liu>{{cite conference\|last1=Liu\|first1=Chang\|last2=Dahlmeier\|first2=Daniel\|last3=Ng\|first3=Hwee Tou\|title=PEM: A Paraphrase Evaluation Metric Exploiting ~~Parllel~~Parallel Texts\|booktitle=Proceedings of the 2010 Conference on Empricial Methods in Natural Language Processing\|place=MIT, Massachusetts\|year=2010\|pages=923-932\|url=http://www.aclweb.org/anthology/D10-1090}}</ref>, ParaMetric<ref name=Burch2>{{cite conference\|last1=Callison-Burch\|first1=Chris\|last2=Cohn\|first2=Trevor\|last3=Lapata\|first3=Mirella\|title=ParaMetric: An Automatic Evaluation Metric for Paraphrasing\|booktitle=Proceedings of the 22nd International Conference on Computational Linguistics\|place=Manchester\|year=2008\|pages=97-104\|url=https://pdfs.semanticscholar.org/be0d/0df960833c1bea2a39ba9a17e5ca958018cd.pdf}}</ref>, and PINC<ref name=Chen></ref>. PEM (paraphrase evaluation metric) attempts to evaluate the "adequacy, fluency, and lexical dissimilarity" of paraphrases ~~using~~by ~~pivot~~returning ~~language~~a single value heuristic calculated using [[n-gram\|N-grams]] overlap in a pivot language. However, a large drawback to PEM is that must be trained using a large, in-___domain parallel corpora as well as human judges.<ref name=Chen></ref> In other words, it is tantamount to training a paraphrase recognition system in order to evaluate a paraphrase generation system. == References ==

Paraphrasing (computational linguistics): Difference between revisions