Content deleted Content added
→Learning from a corpus: Dirac delta |
→Limitations: limitations |
||
Line 54:
* latent variables: the alignments <math>\{a^{(k)}\}_k</math>
In this form, this is exactly the kind of problem solved by [[expectation–maximization algorithm]]. Due to the simplistic assumptions, the algorithm has a closed-form, efficiently computable solution. For a detailed derivation of the algorithm, see <ref name=":0">{{Cite book |last=Koehn |first=Philipp |url=https://books.google.com/books?id=4v_Cx1wIMLkC&newbks=0&hl=en |title=Statistical Machine Translation |date=2010 |publisher=Cambridge University Press |isbn=978-0-521-87415-1 |language=en |chapter=4. Word-Based Models}}</ref> chapter 4 and <ref>{{Cite web |title=CS288, Spring 2020, Lectur 05: Statistical Machine Translation |url=https://cal-cs288.github.io/sp20/slides/cs288_sp20_05_statistical_translation_1up.pdf |url-status=live |archive-url=https://web.archive.org/web/20201024011801/https://cal-cs288.github.io/sp20/slides/cs288_sp20_05_statistical_translation_1up.pdf |archive-date=24 Oct 2020}}</ref>.
In short, the EM algorithm goes as follows:<blockquote>INPUT. a corpus of English-foreign sentence pairs <math>\{(e^{(k)}, f^{(k)})\}_k</math>
Line 74:
=== Limitations ===
There are several limitations to the IBM model 1.<ref name=":0" />
* No fluency: Given any sentence pair <math>(e, f)</math>, any permutation of the English sentence is equally likely: <math>p(e|f) = p(e'|f)</math> for any permutation of the English sentence <math>e</math> into <math>e'</math>.
* No length preference: The probability of each length of translation is equal: <math>\sum_{e\text{ has length }l}p(e|f) = \frac 1N</math> for any <math>l \in \{1, 2, ..., N\}</math>.
*
It is weak in terms of conducting reordering or adding and dropping words. In most cases, words that follow each other in one language would have a different order after translation, but IBM Model 1 treats all kinds of reordering as equally possible.
|