Paraphrasing (computational linguistics): Difference between revisions

Content deleted Content added
No edit summary
Line 39:
Since paraphrases carry the same semantic meaning between one another, they should have similar skip-thought vectors. Thus a simple [[logistic regression]] can be trained to a good performance with the absolute difference and component-wise product of two skip-thought vectors as input.
 
== Evaluation and challenges ==
There are multiple methods that can be used to evaluate paraphrases. Since paraphrase recognition is simply a classification problem, most standard evaluations metrics such as [[accuracy]], [[f1 score]], or an [[receiver operating characteristic|ROC curve]] will do. Additionally, comparison to human metric can be used
Field currently has slowed development due to no standard or costly evaluation methods. In the instance of paraphrase generation, results currently are evaluated by hand through the use of two native speakers. While originally used to evaluate machine translations, [[BLEU]] has been used successfully to evaluate paraphrase generation models as well.
 
Paraphrase generation, similarly to machine translation, has multiple factors that can affect its evaluation. Often the quality of a paraphrase is dependent upon its context, whether it is being used as a summary, and how it is generated among other factors.
 
FieldThe currentlysimplest hasmethod slowed development dueused to noevaluate standardparaphrase orgeneration costlywould evaluationbe methods. Inthrough the instanceuse of paraphrasehuman generationjudges. Unfortunately, resultsevaluation currentlythrough arehuman evaluatedjudges bytends handto throughbe thetime useconsuming. ofFor twoautomated native speakers.approaches, Whilewhile originally used to evaluate machine translations, [[BLEU]] has been used successfully to evaluate paraphrase generation models as well.
 
== Issues ==
Another issue facing the field of machine paraphrasing is the lacks of many comprehensive data sets. <ref name=Chen>{{cite conference|last1=Chen|first1=David|last2=Dolan|first2=William|title=Collecting Highly Parallel Data for Paraphrase Evaluation|booktitle=Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies|place=Portland, Oregon|pages=190-200|url=https://dl.acm.org/citation.cfm?id=2002497}}</ref>