Content deleted Content added
Divyang444 (talk | contribs) Tag: Reverted |
Anachronist (talk | contribs) m Reverted edits by Divyang444 (talk) to last version by Monkbot |
||
Line 23:
=== Long short-term memory ===
There has been success in using [[long short-term memory]] (LSTM) models to generate paraphrases.<ref name=Prakash>{{Citation|last1=Prakash|first1=Aaditya|last2=Hasan|first2=Sadid A.|last3=Lee|first3=Kathy|last4=Datla|first4=Vivek|last5=Qadir|first5=Ashequl|last6=Liu|first6=Joey|last7=Farri|first7=Oladimeji|title=Neural Paraphrase Generation with Staked Residual LSTM Networks|year=2016|arxiv=1610.03098|bibcode=2016arXiv161003098P}}</ref> In short, the model consists of an encoder and decoder component, both implemented using variations of a stacked [[Vanishing gradient problem#Residual networks|residual]] LSTM. First, the encoding LSTM takes a [[one-hot]] encoding of all the words in a sentence as input and produces a final hidden vector, which can be viewed as a representation of the input sentence. The decoding LSTM then takes the hidden vector as input and generates new sentence, terminating in an end-of-sentence token. The encoder and decoder are trained to take a phrase and reproduce the one-hot distribution of a corresponding paraphrase by minimizing [[perplexity]] using simple [[stochastic gradient descent]]. New paraphrases are generated by inputting a new phrase to the encoder and passing the output to the decoder.
== Paraphrase recognition ==
|