Paraphrasing (computational linguistics): Difference between revisions

Content deleted Content added
Line 32:
Given a sentence <math>W</math> with <math>m</math> words, the autoencoder is designed to take 2 <math>n</math>-dimensional [[word embedding|word embeddings]] as input and produce an <math>n</math>-dimensional vector as output. The same autoencoder is applied to every pair of words in <math>S</math> to produce <math>\lfloor m/2 \rfloor</math> vectors. The autoencoder is then applied recursively with the new vectors as inputs until a single vector is produced. Given an odd number of inputs, the first vector is forwarded as is to the next level of recursion. The autoencoder is then trained to reproduce every vector in the full recursion tree including the initial word embeddings.
 
Given two sentences <math>W_1</math> and <math>W_2</math> of length 4 and 3 respectively, the autoencoders would produce 7 and 5 vector representations including the initial word embeddings. The [[euclidean distance]] is then taken between every combination of vectors in <math>W_1</math> and <math>W_2</math> to produce a similarity matrix <math>S \in \mathbb{R}^{7 \times 5}</math>. <math>S</math> is then subject to a dynamic min-[[convolutional neural network#Pooling layer|pooling layer]] to produce a fixed size <math>n_p \times n_p</math> matrix. Since <math>S</math> are not uniform in size among all potential sentences, <math>S</math> is split into <math>n_p</math> roughly even sections. The output is then normalized to have mean 0 and standard deviation 1 and is fed into a fully connected layer with a [[softmax function|softmax]] output. The dynamic pooling to softmax model is trained using pairs of known paraphrases.
 
=== Skip-thought vectors ===