IBM alignment models: Difference between revisions

Content deleted Content added
Model 1: citation to chap 4
Model 1: description of Model 1
Line 10:
* Model 5: fixed deficiency problem.
* Model 6: Model 4 combined with a [[Hidden Markov model|HMM]] alignment model in a log linear way
 
== Mathematical setup ==
The IBM alignment models translation as a conditional probability model. For each source-language ("foreign") sentence <math>f</math>, we generate both a target-language ("English") sentence <math>e</math> and an alignment <math>a</math>. The problem then is to find a good statistical model for <math>p(e, a|f)</math>, the probability that we would generate English language sentence <math>e</math> and an alignment <math>a</math> given a foreign sentence <math>f</math>.
 
The meaning of an alignment grows increasingly complicated as the model version number grew. See Model 1 for the most simple and understandable version.
 
== Model 1 ==
 
=== Word alignment ===
IBM Model 1 is weak in terms of conducting reordering or adding and dropping words. In most cases, words that follow each other in one language would have a different order after translation, but IBM Model 1 treats all kinds of reordering as equally possible.
IBM Model 1 uses very simplistic assumptions on the statistical model, in order to keep the algorithm tractable.
 
The alignment model is this: Given any foreign-English sentence pair <math>(e, f)</math>, an alignment for the sentence pair is a function of type <math>\{1, ., ..., l_e\} \to \{0, 1, ., ..., l_f\}</math>. That is, we assume that the English word at ___location <math>i</math> is "explained" by the foreign word at ___location <math>a(i)</math>. For example, consider the following pair of sentences<blockquote>It will surely rain tomorrow -- 明日 は きっと 雨 だ</blockquote>We can align some English words to corresponding Japanese words, but not everyone:<blockquote>it -> ?
 
will -> ?
 
surely -> きっと
 
rain -> 雨
 
tomorrow -> 明日</blockquote>This in general happens due to the different grammar and conventions of speech in different languages. English sentences require a subject, and when there is no subject available, it uses a [[dummy pronoun]] ''it''. Japanese verbs do not have different forms for future and present tense, and the future tense is implied by the noun 明日 (tomorrow). Conversely, the [[Topic marker|topic-marker]] は and the grammar word だ (roughly "to be") do not correspond to any word in the English sentence.
 
So, we can write the alignment as <blockquote>1-> 0; 2 -> 0; 3 -> 3; 4 -> 4; 5 -> 1</blockquote>where 0 means that there is no corresponding alignment.
 
Thus, we see that the alignment function is in general a function of type <math>\{1, ., ..., l_e\} \to \{0, 1, ., ..., l_f\}</math>.
 
Future models will allow one English world to be aligned with multiple foreign words.
 
=== Statistical model ===
Given the above definition of alignment, we can define the statistical model used by Model 1:
 
* Start with a "dictionary". Its entries are of form <math>t(e_i | f_j)</math>, which can be interpreted as saying "the foreign word <math>f_j</math> is translated to the English word <math>e_i</math> with probability <math>t(e_i | f_j)</math>".
 
* After being given a foreign sentence <math>f</math> with length <math>l_f</math>, we first generate an English sentence length <math>l_e</math> uniformly in a range <math>Uniform[1, 2, ..., N]</math>. In particular, it does not depend on <math>f</math> or <math>l_f</math>.
* Then, we generate an alignment uniformly in the set of all possible alignment functions <math>\{1, ., ..., l_e\} \to \{0, 1, ., ..., l_f\}</math>.
* Finally, for each English word <math>e_1, e_2, ... e_{l_e}</math>, generate each one independently of every other English word. For the word <math>e_i</math>, generate it according to <math>t(e_i|f_{a(i)})</math>.
 
IBM Model 1It is weak in terms of conducting reordering or adding and dropping words. In most cases, words that follow each other in one language would have a different order after translation, but IBM Model 1 treats all kinds of reordering as equally possible.
 
Another problem while aligning is the fertility (the notion that input words would produce a specific number of output words after translation). In most cases one input word will be translated into one single word, but some words will produce multiple words or even get dropped (produce no words at all). The fertility of word models addresses this aspect of translation. While adding additional components increases the complexity of models, the main principles of IBM Model 1 are constant.<ref>{{cite journal | last1 = Wołk | first1 = K. | last2 = Marasek | first2 = K. | title = Real-Time Statistical Speech Translation | journal = Advances in Intelligent Systems and Computing | publisher = Springer | volume = 275 | pages = 107–114 | issn = 2194-5357 | isbn = 978-3-319-05950-1| date = 2014-04-07 | doi = 10.1007/978-3-319-05951-8_11 | arxiv = 1509.09090 | s2cid = 15361632 }}</ref>
Line 20 ⟶ 53:
In model 1, P(a|e,f) defines the alignment probability. Where, e and f are the English and French words respectively, if the translation is initiated between French-->English.
 
The model is trained by [[Expectation–maximization algorithm|expectation–maximization]]. For a detailed derivation of the EM algorithm used for Model 1, see <ref>{{Cite book |last=Koehn |first=Philipp |url=https://books.google.com/books?id=4v_Cx1wIMLkC&newbks=0&hl=en |title=Statistical Machine Translation |date=2010 |publisher=Cambridge University Press |isbn=978-0-521-87415-1 |language=en}}</ref> chapter 4.
 
== Model 3 ==