Paraphrasing (computational linguistics): Difference between revisions

Content deleted Content added
mNo edit summary
Line 15:
Accordingly the training algorithm consists of four steps. First, clustering sentences describing similar events with similar structure together. This is achieved by judging similarity through [[n-gram]] overlap. Second, patterns are induced by computing multiple-sequence alignment between sentences clustered together producing a ''lattice''. During this step areas of high variability are determined to be instances of arguments and should be replaced with ''slots''. Areas of high variability are determined to be the areas between words shared by more than 50% of the cluster's sentences. Third, lattices are matched between corpora based on matching or similar arguments within their slots. Finally, new paraphrases can be generated by taking in a new sentence, determining which sentence cluster it most closely belongs to, and selecting an appropriately matching lattice. If a matching lattice is found, then slot arguments are determined then used to generate as many new paraphrases are there are lattices in the matching cluster.
 
=== Phrase-based Machine Translation ===
Paraphrase can also be generated through the use of [[statistical machine translation#Phrase-based translation|phrase-based translation]] as proposed by Bannard and Callison-Burch<ref name=Bannard>{{cite conference|last1=Bannard|first1=Colin|last2=Callison-Burch|first2=Chris|title=Paraphrasing Bilingual Parallel Corpora|booktitle=Proceedings of the 43rd Annual Meeting of the ACL|place=Ann Arbor, Michigan|pages=597-604|year=2005|url=https://dl.acm.org/citation.cfm?id=1219914}}</ref>. The chief concept consists of aligning phrases in a pivot language to produce potential paraphrases in the original language. For example, the phrase "under control" in an English sentence is aligned with the phrase "unter kontrolle" in its German counterpart. The phrase "unter kontrolle" is then found in another German sentence with the aligned English phrase being "in check", a paraphrase of "under control".
 
Line 24:
<math>\Pr(e_2|f)</math> and <math>\Pr(f|e_1)</math> can be approximated by simply taking their frequencies. Adding <math>S</math> as a prior is modeled by calculating the probability of forming the <math>S</math> when <math>e_1</math> is substituted with <math>e_2</math>.
 
=== Recursive Autoencoders ===
<ref name=Socher>{{Citation|last1=Socher|first1=Richard|last2=Huang|first2=Eric|last3=Pennington|first3=Jeffrey|last4=Ng|first4=Andrew|last5=Manning|first5=Christopher|title=Dynamic Pooling and Unfolding Recursive Autoencoders for Paraphrase Detection|booktitle=Advances in Neural Information Processing Systems 24|year=2011|url=http://www.socher.org/index.php/Main/DynamicPoolingAndUnfoldingRecursiveAutoencodersForParaphraseDetection}}</ref>