IBM alignment models: Difference between revisions

Content deleted Content added
hatnote
 
Line 3:
The '''IBM alignment models''' are a sequence of increasingly complex models used in [[statistical machine translation]] to train a translation model and an alignment model, starting with lexical translation probabilities and moving to reordering and word duplication.<ref name=":1">{{Cite journal |last1=Brown |first1=Peter F. |author-link1=Peter Fitzhugh Brown |last2=Pietra |first2=Vincent J. Della |last3=Pietra |first3=Stephen A. Della |last4=Mercer |first4=Robert L. |author-link4=Robert Mercer |date=1993-06-01 |title=The mathematics of statistical machine translation: parameter estimation |url=https://dl.acm.org/doi/10.5555/972470.972474 |journal=Comput. Linguist. |volume=19 |issue=2 |pages=263–311 |issn=0891-2017}}</ref><ref>{{cite web | url = http://www.statmt.org/survey/Topic/IBMModels | title = IBM Models | date = 11 September 2015 | publisher = SMT Research Survey Wiki | access-date = 26 October 2015}}</ref> They underpinned the majority of statistical machine translation systems for almost twenty years starting in the early 1990s, until [[neural machine translation]] began to dominate. These models offer principled probabilistic formulation and (mostly) tractable inference.<ref>{{cite web |author=Yarin Gal |author2=Phil Blunsom |date=12 June 2013 |title=A Systematic Bayesian Treatment of the IBM Alignment Models |url=http://mlg.eng.cam.ac.uk/yarin/PDFs/PY-IBM_presentation.pdf |archive-url=https://web.archive.org/web/20160304071924/http://mlg.eng.cam.ac.uk/yarin/PDFs/PY-IBM_presentation.pdf |archive-date=4 Mar 2016 |access-date=26 October 2015 |publisher=University of Cambridge}}</ref>
 
The IBM alignment models were published in parts in 1988<ref>{{Cite journal |last1=Brown |first1=P. |last2=Cocke |first2=J. |last3=Della Pietra |first3=S. |last4=Della Pietra |first4=V. |last5=Jelinek |first5=F. |last6=Mercer |first6=R. |last7=Roossin |first7=P. |date=1988 |title=A Statistical Approach to Language Translation |url=https://aclanthology.org/C88-1016/ |journal=Coling Budapest 1988 Volume 1: International Conference on Computational Linguistics}}</ref> and 1990,<ref>{{Cite journal |last1=Brown |first1=Peter F. |last2=Cocke |first2=John |last3=Della Pietra |first3=Stephen A. |last4=Della Pietra |first4=Vincent J. |last5=Jelinek |first5=Fredrick |last6=Lafferty |first6=John D. |last7=Mercer |first7=Robert L. |last8=Roossin |first8=Paul S. |date=1990 |title=A Statistical Approach to Machine Translation |url=https://aclanthology.org/J90-2002/ |journal=Computational Linguistics |volume=16 |issue=2 |pages=79–85}}</ref> and the entire series is published in 1993.<ref name=":1" /> Every author of the 1993 paper subsequently went to the hedge fund [[Renaissance Technologies]].<ref>{{Cite web |last=walutowyjohn |date=2013-01-28 |title=A Visionary Gift: Della Pietra Family Endows Biomedical Imaging Chair - SBU News |url=https://news.stonybrook.edu/alumni/a-visionary-gift-della-pietra-family-endows-biomedical-imaging-chair-2/ |access-date=2025-01-06 |website=Stony Brook University News |language=en-US}}</ref>
 
The original work on statistical machine translation at [[IBM]] proposed five models, and a model 6 was proposed later. The sequence of the six models can be summarized as: