Content deleted Content added
→Evaluation: Added information on common dataset for training paraphrase detectors. |
Tag: Reverted |
||
Line 11:
* finding pairings between such patterns the represent paraphrases, i.e. "{{mvar|X}} (injured/wounded) {{mvar|Y}} people, {{mvar|Z}} seriously" and "{{mvar|Y}} were (wounded/hurt) by {{mvar|X}}, among them {{mvar|Z}} were in serious condition"
This is achieved by first clustering similar sentences together using [[n-gram]] overlap. Recurring patterns are found within clusters by using multi-sequence alignment. Then the position of argument words are determined by finding areas of high variability within each clusters, aka between words shared by more than 50% of a cluster's sentences. Pairings between patterns are then found by comparing similar variable words between different corpora. Finally new
=== Phrase-based Machine Translation ===
|