Paraphrasing (computational linguistics): Difference between revisions

Content deleted Content added
No edit summary
Line 40:
 
== Evaluation ==
There are multiple methods that can be used to evaluate paraphrases. Since paraphrase recognition is simply a classification problem, most standard evaluations metrics such as [[accuracy]], [[f1 score]], or an [[receiver operating characteristic|ROC curve]] will do. Additionally, comparison to human metric can be used
 
Paraphrase generation, similarly to machine translation, has multiple factors that can affect its evaluation. Often the quality of a paraphrase is dependent upon its context, whether it is being used as a summary, and how it is generated among other factors. Additionally, a good paraphrase usually is lexically dissimilar from its source phrase.
 
The simplest method used to evaluate paraphrase generation would be through the use of human judges. Unfortunately, evaluation through human judges tends to be time consuming. For automatedAutomated approaches, whileto originallyevaluation usedprove to evaluatebe machinechallenging translations,as [[BLEU]]it hasis beenessentially useda successfullyproblem toas evaluatedifficult as paraphrase generationrecognition.<ref modelsname=needed>{{Citation as well.needed}}</ref>
 
While originally used to evaluate machine translations, [[BLEU]] has been used successfully to evaluate paraphrase generation models as well. However, paraphrases often have several lexically different but equally valid solutions which hurts BLEU and other similar evaluation metrics.<ref name=Chen>{{cite conference|last1=Chen|first1=David|last2=Dolan|first2=William|title=Collecting Highly Parallel Data for Paraphrase Evaluation|booktitle=Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies|place=Portland, Oregon|pages=190-200|url=https://dl.acm.org/citation.cfm?id=2002497}}</ref>
 
Another method involve PEM and PINC. Their drawbacks are...
 
== Issues ==