Revision as of 23:35, 31 August 2021 edit Ars rhetorica (talk \| contribs) 6 edits →Evaluation: Added information on common dataset for training paraphrase detectors. ← Previous edit		Revision as of 15:19, 18 February 2022 edit undo Mubassarj (talk \| contribs) 13 edits m →Multiple sequence alignment Tag: Reverted Next edit →
Line 11: * finding pairings between such patterns the represent paraphrases, i.e. "{{mvar\|X}} (injured/wounded) {{mvar\|Y}} people, {{mvar\|Z}} seriously" and "{{mvar\|Y}} were (wounded/hurt) by {{mvar\|X}}, among them {{mvar\|Z}} were in serious condition" This is achieved by first clustering similar sentences together using [[n-gram]] overlap. Recurring patterns are found within clusters by using multi-sequence alignment. Then the position of argument words are determined by finding areas of high variability within each clusters, aka between words shared by more than 50% of a cluster's sentences. Pairings between patterns are then found by comparing similar variable words between different corpora. Finally new ~~paraphrases~~[paraphrasers] can be generated by choosing a matching cluster for a source sentence, then substituting the source sentence's argument into any number of patterns in the cluster. === Phrase-based Machine Translation ===

Paraphrasing (computational linguistics): Difference between revisions