Content deleted Content added
→See also: annlink |
Shinkolobwe (talk | contribs) m →Phrase-based Machine Translation: Format of the section title: Title Case → Sentence case, per WP:Manual of Style#Article titles, sections, and headings |
||
Line 12:
This is achieved by first clustering similar sentences together using [[n-gram]] overlap. Recurring patterns are found within clusters by using multi-sequence alignment. Then the position of argument words is determined by finding areas of high variability within each cluster, aka between words shared by more than 50% of a cluster's sentences. Pairings between patterns are then found by comparing similar variable words between different corpora. Finally, new paraphrases can be generated by choosing a matching cluster for a source sentence, then substituting the source sentence's argument into any number of patterns in the cluster.
=== Phrase-based
Paraphrase can also be generated through the use of [[statistical machine translation#Phrase-based translation|phrase-based translation]] as proposed by Bannard and Callison-Burch.<ref name=Bannard>{{cite conference |last1=Bannard|first1=Colin|last2=Callison-Burch|first2=Chris|title=Paraphrasing Bilingual Parallel Corpora |conference=Proceedings of the 43rd Annual Meeting of the ACL |place=Ann Arbor, Michigan|pages=597–604|year=2005|url=https://dl.acm.org/citation.cfm?id=1219914}}</ref> The chief concept consists of aligning phrases in a [[pivot language]] to produce potential paraphrases in the original language. For example, the phrase "under control" in an English sentence is aligned with the phrase "unter kontrolle" in its German counterpart. The phrase "unter kontrolle" is then found in another German sentence with the aligned English phrase being "in check," a paraphrase of "under control."
|