Content deleted Content added
Neonrights (talk | contribs) |
Neonrights (talk | contribs) |
||
Line 46:
The simplest method used to evaluate paraphrase generation would be through the use of human judges. Unfortunately, evaluation through human judges tends to be time consuming. Automated approaches to evaluation prove to be challenging as it is essentially a problem as difficult as paraphrase recognition.<ref name=needed>{{Citation needed}}</ref>
While originally used to evaluate machine translations, [[BLEU]] has been used successfully to evaluate paraphrase generation models as well. However, paraphrases often have several lexically different but equally valid solutions which hurts BLEU and other similar evaluation metrics.<ref name=Chen>{{cite conference|last1=Chen|first1=David|last2=Dolan|first2=William|title=Collecting Highly Parallel Data for Paraphrase Evaluation|booktitle=Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies|place=Portland, Oregon|year=2008|pages=190-200|url=https://dl.acm.org/citation.cfm?id=2002497}}</ref>
Other methods include PEM<ref name=Liu>{{cite conference|last1=Liu|first1=Chang|last2=Dahlmeier|first2=Daniel|last3=Ng|first3=Hwee Tou|title=PEM: A Paraphrase Evaluation Metric Exploiting Parllel Texts|booktitle=Proceedings of the 2010 Conference on Empricial Methods in Natural Language Processing|place=MIT, Massachusetts|year=2010|pages=923-932|url=http://www.aclweb.org/anthology/D10-1090}}</ref>,
ParaMetric<ref name=Burch2>{{cite conference|last1=Callison-Burch|first1=Chris|last2=Cohn|first2=Trevor|last3=Lapata|first3=Mirella|title=ParaMetric: An Automatic Evaluation Metric for Paraphrasing|booktitle=Proceedings of the 22nd International Conference on Computational Linguistics|place=Manchester|year=2008|pages=97-104|url=https://pdfs.semanticscholar.org/be0d/0df960833c1bea2a39ba9a17e5ca958018cd.pdf}}</ref>,
and PINC<ref name=Chen></ref>. Their drawbacks are...
== References ==
|