Content deleted Content added
RedPanda25 (talk | contribs) m Reverted edits by Mubassarj (talk): addition of unnecessary/inappropriate external links (HG) (3.4.10) |
Copy edit as part of article assessment |
||
Line 7:
=== Multiple sequence alignment ===
Barzilay and Lee<ref name=Barzilay>{{cite conference|last1=Barzilay|first1=Regina|last2=Lee|first2=Lillian|title=Learning to Paraphrase: An Unsupervised Approach Using Multiple-Sequence Alignment|conference=Proceedings of HLT-NAACL 2003|date=May–June 2003|url=http://www.cs.cornell.edu/home/llee/papers/statpar.home.html}}</ref> proposed a method to generate paraphrases through the usage of monolingual [[parallel text|parallel corpora]], namely news articles covering the same event on the same day. Training consists of using [[multiple sequence alignment|multi-sequence alignment]] to generate sentence-level paraphrases from an unannotated corpus. This is done by
* finding recurring patterns in each individual corpus, i.e. "{{mvar|X}} (injured/wounded) {{mvar|Y}} people, {{mvar|Z}} seriously" where {{mvar|X, Y, Z}} are variables
* finding pairings between such patterns the represent paraphrases, i.e. "{{mvar|X}} (injured/wounded) {{mvar|Y}} people, {{mvar|Z}} seriously" and "{{mvar|Y}} were (wounded/hurt) by {{mvar|X}}, among them {{mvar|Z}} were in serious condition"
This is achieved by first clustering similar sentences together using [[n-gram]] overlap. Recurring patterns are found within clusters by using multi-sequence alignment. Then the position of argument words
=== Phrase-based Machine Translation ===
Paraphrase can also be generated through the use of [[statistical machine translation#Phrase-based translation|phrase-based translation]] as proposed by Bannard and Callison-Burch.<ref name=Bannard>{{cite conference |last1=Bannard|first1=Colin|last2=Callison-Burch|first2=Chris|title=Paraphrasing Bilingual Parallel Corpora |conference=Proceedings of the 43rd Annual Meeting of the ACL |place=Ann Arbor, Michigan|pages=597–604|year=2005|url=https://dl.acm.org/citation.cfm?id=1219914}}</ref> The chief concept consists of aligning phrases in a [[pivot language]] to produce potential paraphrases in the original language. For example, the phrase "under control" in an English sentence is aligned with the phrase "unter kontrolle" in its German counterpart. The phrase "unter kontrolle" is then found in another German sentence with the aligned English phrase being "in check
The probability distribution can be modeled as <math>\Pr(e_2 | e_1)</math>, the probability phrase <math>e_2</math> is a paraphrase of <math>e_1</math>, which is equivalent to <math>\Pr(e_2|f) \Pr(f|e_1)</math> summed over all <math>f</math>, a potential phrase translation in the pivot language. Additionally, the sentence <math>e_1</math> is added as a prior to add context to the paraphrase. Thus the optimal paraphrase, <math>\hat{e_2}</math> can be modeled as:
Line 23 ⟶ 22:
=== Long short-term memory ===
There has been success in using [[long short-term memory]] (LSTM) models to generate paraphrases.<ref name=Prakash>{{Citation|last1=Prakash|first1=Aaditya|last2=Hasan|first2=Sadid A.|last3=Lee|first3=Kathy|last4=Datla|first4=Vivek|last5=Qadir|first5=Ashequl|last6=Liu|first6=Joey|last7=Farri|first7=Oladimeji|title=Neural Paraphrase Generation with Staked Residual LSTM Networks|year=2016|arxiv=1610.03098|bibcode=2016arXiv161003098P}}</ref> In short, the model consists of an encoder and decoder component, both implemented using variations of a stacked [[Vanishing gradient problem#Residual networks|residual]] LSTM. First, the encoding LSTM takes a [[one-hot]] encoding of all the words in a sentence as input and produces a final hidden vector, which can
== Paraphrase recognition ==
=== Recursive Autoencoders ===
Paraphrase recognition has been attempted by Socher et al<ref name=Socher>{{Citation |last1=Socher|first1=Richard |last2=Huang |first2=Eric |last3=Pennington |first3=Jeffrey |last4=Ng |first4=Andrew |last5=Manning |first5=Christopher |title=Dynamic Pooling and Unfolding Recursive Autoencoders for Paraphrase Detection |chapter=Advances in Neural Information Processing Systems 24 |year=2011 |url=http://www.socher.org/index.php/Main/DynamicPoolingAndUnfoldingRecursiveAutoencodersForParaphraseDetection}}</ref> through the use of recursive [[autoencoder]]s. The main concept is to produce a vector representation of a sentence
Given a sentence <math>W</math> with <math>m</math> words, the autoencoder is designed to take 2 <math>n</math>-dimensional [[word embedding]]s as input and produce an <math>n</math>-dimensional vector as output. The same autoencoder is applied to every pair of words in <math>S</math> to produce <math>\lfloor m/2 \rfloor</math> vectors. The autoencoder is then applied recursively with the new vectors as inputs until a single vector is produced. Given an odd number of inputs, the first vector is forwarded as
Given two sentences <math>W_1</math> and <math>W_2</math> of length 4 and 3 respectively, the autoencoders would produce 7 and 5 vector representations including the initial word embeddings. The [[euclidean distance]] is then taken between every combination of vectors in <math>W_1</math> and <math>W_2</math> to produce a similarity matrix <math>S \in \mathbb{R}^{7 \times 5}</math>. <math>S</math> is then subject to a dynamic min-[[convolutional neural network#Pooling layer|pooling layer]] to produce a fixed size <math>n_p \times n_p</math> matrix. Since <math>S</math> are not uniform in size among all potential sentences, <math>S</math> is split into <math>n_p</math> roughly even sections. The output is then normalized to have mean 0 and standard deviation 1 and is fed into a fully connected layer with a [[softmax function|softmax]] output. The dynamic pooling to softmax model is trained using pairs of known paraphrases.
=== Skip-thought vectors ===
Skip-thought vectors are an attempt to create a vector representation of the semantic meaning of a sentence,
Since paraphrases carry the same semantic meaning between one another, they should have similar skip-thought vectors. Thus a simple [[logistic regression]] can be trained to
== Evaluation ==
The evaluation of paraphrase generation has similar difficulties as the evaluation of [[machine translation]].
Metrics specifically designed to evaluate paraphrase generation include paraphrase in n-gram change (PINC)<ref name=Chen /> and paraphrase evaluation metric (PEM)<ref name=Liu>{{cite conference|last1=Liu|first1=Chang|last2=Dahlmeier|first2=Daniel|last3=Ng|first3=Hwee Tou|title=PEM: A Paraphrase Evaluation Metric Exploiting Parallel Texts |conference=Proceedings of the 2010 Conference on Empricial Methods in Natural Language Processing |place=MIT, Massachusetts |year=2010 |pages=923–932 |url=http://www.aclweb.org/anthology/D10-1090}}</ref> along with the aforementioned ParaMetric. PINC is designed to be used
The Quora Question Pairs Dataset, which contains hundreds of thousands of duplicate questions, has become a common dataset for the evaluation of paraphrase detectors.<ref>{{cite web |title=Paraphrase Identification on Quora Question Pairs |url=https://paperswithcode.com/sota/paraphrase-identification-on-quora-question|website=Papers with Code}}</ref> The best performing models for paraphrase detection for the last three years have all used the Transformer architecture and all have relied on large amounts of pre-training with more general data before fine-tuning with the question pairs.
|