Revision as of 06:21, 6 February 2018 edit Neonrights (talk \| contribs) 65 edits m →Evaluation ← Previous edit		Revision as of 06:30, 6 February 2018 edit undo Neonrights (talk \| contribs) 65 edits m →Evaluation Next edit →
Line 44: The evaluation of paraphrase generation has similar difficulties as the evaluation of [[machine translation]]. Often the quality of a paraphrase is dependent upon its context, whether it is being used as a summary, and how it is generated among other factors. Additionally, a good paraphrase usually is lexically dissimilar from its source phrase. The simplest method used to evaluate paraphrase generation would be through the use of human judges. Unfortunately, evaluation through human judges tends to be time consuming. Automated approaches to evaluation prove to be challenging as it is essentially a problem as difficult as paraphrase recognition.~~<ref name=needed>{{Citation needed}}</ref>~~ While originally used to evaluate machine translations, [[BLEU]] has been used successfully to evaluate paraphrase generation models as well. However, paraphrases often have several lexically different but equally valid solutions which hurts BLEU and other similar evaluation metrics.<ref name=Chen>{{cite conference\|last1=Chen\|first1=David\|last2=Dolan\|first2=William\|title=Collecting Highly Parallel Data for Paraphrase Evaluation\|booktitle=Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies\|place=Portland, Oregon\|year=2008\|pages=190-200\|url=https://dl.acm.org/citation.cfm?id=2002497}}</ref>

Paraphrasing (computational linguistics): Difference between revisions