Hidden Markov model: Difference between revisions

Content deleted Content added
Citation bot (talk | contribs)
Add: bibcode, article-number. Removed URL that duplicated identifier. Removed parameters. Some additions/deletions were parameter name changes. | Use this bot. Report bugs. | Suggested by Headbomb | Linked from Wikipedia:WikiProject_Academic_Journals/Journals_cited_by_Wikipedia/Sandbox | #UCB_webform_linked 152/990
 
(7 intermediate revisions by 6 users not shown)
Line 32:
''a'' — state transition probabilities<br/>
''b'' — output probabilities]]
In its discrete form, a hidden Markov process can be visualized as a generalization of the [[urn problem]] with replacement (where each item from the urn is returned to the original urn before the next step).<ref>{{cite journal |author=Lawrence R. Rabiner |author-link=Lawrence Rabiner |date=February 1989 |title=A tutorial on Hidden Markov Models and selected applications in speech recognition |url=http://www.ece.ucsb.edu/Faculty/Rabiner/ece259/Reprints/tutorial%20on%20hmm%20and%20applications.pdf |journal=Proceedings of the IEEE |volume=77 |issue=2 |pages=257–286 |citeseerx=10.1.1.381.3454 |doi=10.1109/5.18626 |s2cid=13618539}} [httphttps://www.cs.cornell.edu/courses/cs481/2004fa/rabiner.pdf]</ref> Consider this example: in a room that is not visible to an observer there is a genie. The room contains urns X1, X2, X3, ... each of which contains a known mix of balls, with each ball having a unique label y1, y2, y3, ... . The genie chooses an urn in that room and randomly draws a ball from that urn. It then puts the ball onto a conveyor belt, where the observer can observe the sequence of the balls but not the sequence of urns from which they were drawn. The genie has some procedure to choose urns; the choice of the urn for the ''n''-th ball depends only upon a random number and the choice of the urn for the {{nowrap|(''n'' − 1)-th}} ball. The choice of urn does not directly depend on the urns chosen before this single previous urn; therefore, this is called a [[Markov process]]. It can be described by the upper part of Figure 1.
 
The Markov process cannot be observed, only the sequence of labeled balls, thus this arrangement is called a ''hidden Markov process''. This is illustrated by the lower part of the diagram shown in Figure 1, where one can see that balls y1, y2, y3, y4 can be drawn at each state. Even if the observer knows the composition of the urns and has just observed a sequence of three balls, ''e.g.'' y1, y2 and y3 on the conveyor belt, the observer still cannot be ''sure'' which urn (''i.e.'', at which state) the genie has drawn the third ball from. However, the observer can work out other information, such as the likelihood that the third ball came from each of the urns.
Line 118:
 
=== Statistical significance ===
For some of the above problems, it may also be interesting to ask about [[statistical significance]]. What is the probability that a sequence drawn from some [[null distribution]] will have an HMM probability (in the case of the forward algorithm) or a maximum state sequence probability (in the case of the Viterbi algorithm) at least as large as that of a particular output sequence?<ref>{{Cite journal |last1=Newberg |first1=L. |doi=10.1186/1471-2105-10-212 |title=Error statistics of hidden Markov model and hidden Boltzmann model results |journal=BMC Bioinformatics |volume=10 |pagesarticle-number=212 |year=2009 |pmid=19589158 |pmc=2722652 |doi-access=free}} {{open access}}</ref> When an HMM is used to evaluate the relevance of a hypothesis for a particular output sequence, the statistical significance indicates the [[false positive rate]] associated with failing to reject the hypothesis for the output sequence.
 
== Learning ==
Line 132:
* [[Computational finance]]<ref>{{cite journal |doi=10.1007/s10614-016-9579-y |volume=49 |issue=4 |title=Parallel Optimization of Sparse Portfolios with AR-HMMs |year=2016 |journal=Computational Economics |pages=563–578 |last1=Sipos |first1=I. Róbert |last2=Ceffer |first2=Attila |last3=Levendovszky |first3=János|s2cid=61882456}}</ref><ref>{{cite journal |doi=10.1016/j.eswa.2016.01.015 |volume=53 |title=A novel corporate credit rating system based on Student's-t hidden Markov models |year=2016 |journal=Expert Systems with Applications |pages=87–105 |last1=Petropoulos |first1=Anastasios |last2=Chatzis |first2=Sotirios P. |last3=Xanthopoulos |first3=Stylianos}}</ref>
* [[Single-molecule experiment|Single-molecule kinetic analysis]]<ref>{{cite journal |last1=Nicolai |first1=Christopher |date=2013 |doi=10.1142/S1793048013300053 |title=Solving Ion Channel Kinetics with the QuB Software |journal=Biophysical Reviews and Letters |volume=8 |issue=3n04 |pages=191–211}}</ref>
* [[Neuroscience]]<ref>{{cite journal |doi=10.1002/hbm.25835 |title=Spatiotemporally Resolved Multivariate Pattern Analysis for M/EEG |journal=Human Brain Mapping |date=2022 |last1=Higgins |first1=Cameron |last2=Vidaurre |first2=Diego |last3=Kolling |first3=Nils |last4=Liu |first4=Yunzhe |last5=Behrens |first5=Tim |last6=Woolrich |first6=Mark |volume=43 |issue=10 |pages=3062–3085 |pmid=35302683 |pmc=9188977}}</ref><ref>{{Cite journal |last1=Diomedi |first1=S. |last2=Vaccari |first2=F. E. |last3=Galletti |first3=C. |last4=Hadjidimitrakis |first4=K. |last5=Fattori |first5=P. |date=2021-10-01 |title=Motor-like neural dynamics in two parietal areas during arm reaching |url=https://www.sciencedirect.com/science/article/pii/S0301008221001301 |journal=Progress in Neurobiology |language=en |volume=205 |pagesarticle-number=102116 |doi=10.1016/j.pneurobio.2021.102116 |pmid=34217822 |issn=0301-0082|hdl=11585/834094 |s2cid=235703641 |hdl-access=free}}</ref>
* [[Cryptanalysis]]
* [[Speech recognition]], including [[Siri]]<ref>{{cite book|last1=Domingos|first1=Pedro|title=The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World|url=https://archive.org/details/masteralgorithmh0000domi|url-access=registration|date=2015|publisher=Basic Books|isbn=9780465061921|page=[https://archive.org/details/masteralgorithmh0000domi/page/37 37]|language=en}}</ref>
Line 155:
 
== History ==
Hidden Markov models were described in a series of statistical papers by [[Leonard E. Baum]] and other authors in the second half of the 1960s.<ref>{{cite journal |last=Baum |first=L. E. |author2=Petrie, T. |title=Statistical Inference for Probabilistic Functions of Finite State Markov Chains |journal=The Annals of Mathematical Statistics |year=1966 |volume=37 |issue=6 |pages=1554–1563 |doi=10.1214/aoms/1177699147|doi-access=free}}</ref><ref>{{Cite journal |last1=Baum |first1=L. E. |last2=Eagon |first2=J. A. |doi=10.1090/S0002-9904-1967-11751-8 |title=An inequality with applications to statistical estimation for probabilistic functions of Markov processes and to a model for ecology |journal=[[Bulletin of the American Mathematical Society]] |volume=73 |issue=3 |pages=360 |year=1967 |zbl=0157.11101 |url=http://projecteuclid.org/euclid.bams/1183528841 |doi-access=free}}</ref><ref>{{cite journal |last=Baum |first=L. E. |author2=Sell, G. R. |title=Growth transformations for functions on manifolds |journal=Pacific Journal of Mathematics |year=1968 |volume=27 |issue=2 |pages=211–227 |doi=10.2140/pjm.1968.27.211 |doi-access=free}}</ref><ref>{{Cite journal |last1=Baum |first1=L. E. |author-link1=Leonard E. Baum |last2=Petrie |first2=T. |last3=Soules |first3=G. |last4=Weiss |first4=N. |title=A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains |doi=10.1214/aoms/1177697196 |journal=[[The Annals of Mathematical Statistics]] |volume=41 |issue=1 |pages=164–171 |year=1970 |jstor=2239727 |zbl=0188.49603 |mr=287613 |doi-access=free}}</ref><ref>{{cite journal |last=Baum |first=L.E. |title=An Inequality and Associated Maximization Technique in Statistical Estimation of Probabilistic Functions of a Markov Process |journal=Inequalities |year=1972 |volume=3 |pages=1–8}}</ref> One of the first applications of HMMs was [[speech recognition]], starting in the mid-1970s.<ref>{{Cite journal |last1=Baker |first1=J. |author-link1=James K. Baker |doi=10.1109/TASSP.1975.1162650 |title=The DRAGON system—An overview |journal=IEEE Transactions on Acoustics, Speech, and Signal Processing |volume=23 |pages=24–29 |year=1975}}</ref><ref>{{Cite journal |last1=Jelinek |first1=F. |last2=Bahl |first2=L. |last3=Mercer |first3=R. |doi=10.1109/TIT.1975.1055384 |title=Design of a linguistic statistical decoder for the recognition of continuous speech |journal=[[IEEE Transactions on Information Theory]] |volume=21 |issue=3 |pages=250 |year=1975}}</ref><ref>{{cite book |title=Hidden Markov Models for Speech Recognition |url=https://archive.org/details/hiddenmarkovmode0000huan |publisher=Edinburgh University Press |year=1990 |isbn=978-0-7486-0162-2 |author1=Xuedong Huang |author2=M. Jack |author3=Y. Ariki |author-link1=Xuedong Huang}}</ref><ref>{{cite book |title=Spoken Language Processing |publisher=Prentice Hall |year=2001 |isbn=978-0-13-022616-7 |author1=Xuedong Huang |author2=Alex Acero |author3=Hsiao-Wuen Hon |author-link1=Xuedong Huang}}</ref> From the linguistics point of view, hidden Markov models are equivalent to stochastic regular grammar.<ref>{{Cite book |last1=Carrasco |first1=Rafael C. |last2=Oncina |first2=Jose |date=1994 |editor-last=Carrasco |editor-first=Rafael C. |editor2-last=Oncina |editor2-first=Jose |chapter=Learning stochastic regular grammars by means of a state merging method |chapter-url=https://link.springer.com/chapter/10.1007/3-540-58473-0_144 |title=Grammatical Inference and Applications |series=Lecture Notes in Computer Science |volume=862 |language=en |___location=Berlin, Heidelberg |publisher=Springer |pages=139–152 |doi=10.1007/3-540-58473-0_144 |isbn=978-3-540-48985-6}}</ref>
 
In the second half of the 1980s, HMMs began to be applied to the analysis of biological sequences,<ref>{{cite journal |doi=10.1016/0022-2836(86)90289-5 |author=M. Bishop and E. Thompson |title=Maximum Likelihood Alignment of DNA Sequences |journal=[[Journal of Molecular Biology]] |volume=190 |issue=2 |pages=159–165 |year=1986 |pmid=3641921}} {{subscription required}} {{closed access}}</ref> in particular [[DNA]]. Since then, they have become ubiquitous in the field of [[bioinformatics]].<ref name=durbin>{{Durbin 1998|mode=cs1}}</ref>
Line 163:
In the hidden Markov models considered above, the state space of the hidden variables is discrete, while the observations themselves can either be discrete (typically generated from a [[categorical distribution]]) or continuous (typically from a [[Gaussian distribution]]). Hidden Markov models can also be generalized to allow continuous state spaces. Examples of such models are those where the Markov process over hidden variables is a [[linear dynamical system]], with a linear relationship among related variables and where all hidden and observed variables follow a [[Gaussian distribution]]. In simple cases, such as the linear dynamical system just mentioned, exact inference is tractable (in this case, using the [[Kalman filter]]); however, in general, exact inference in HMMs with continuous latent variables is infeasible, and approximate methods must be used, such as the [[extended Kalman filter]] or the [[particle filter]].
 
Nowadays, inference in hidden Markov models is performed in [[Nonparametric statistics|nonparametric]] settings, where the dependency structure enables [[identifiability]] of the model<ref>{{Cite journal |last1=Gassiat |first1=E. |last2=Cleynen |first2=A. |last3=Robin |first3=S. |date=2016-01-01 |title=Inference in finite state space non parametric Hidden Markov Models and applications |url=https://doi.org/10.1007/s11222-014-9523-8 |journal=Statistics and Computing |language=en |volume=26 |issue=1 |pages=61–71 |doi=10.1007/s11222-014-9523-8 |issn=1573-1375|url-access=subscription }}</ref> and the learnability limits are still under exploration.<ref>{{Cite journal |last1=Abraham |first1=Kweku |last2=Gassiat |first2=Elisabeth |last3=Naulet |first3=Zacharie |date=March 2023 |title=Fundamental Limits for Learning Hidden Markov Model Parameters |url=https://ieeexplore.ieee.org/document/9917566 |journal=IEEE Transactions on Information Theory |volume=69 |issue=3 |pages=1777–1794 |doi=10.1109/TIT.2022.3213429 |arxiv=2106.12936 |bibcode=2023ITIT...69.1777A |issn=0018-9448}}</ref>
 
=== Bayesian modeling of the transitions probabilities ===
Line 190:
== Measure theory ==
{{See also|Subshift of finite type}}
[[File:Blackwell_HMM_example.png|thumb|193x193px|The hidden part of a hidden Markov model, whose observable states is non-Markovian.]]
Given a Markov transition matrix and an invariant distribution on the states, a probability measure can be imposed on the set of subshifts. For example, consider the Markov chain given on the left on the states <math>A, B_1, B_2</math>, with invariant distribution <math>\pi = (2/7, 4/7, 1/7)</math>. By ignoring the distinction between <math>B_1, B_2</math>, this space of subshifts is projected on <math>A, B_1, B_2</math> into another space of subshifts on <math>A, B</math>, and this projection also projects the probability measure down to a probability measure on the subshifts on <math>A, B</math>.
 
Line 230:
* [https://web.archive.org/web/20120415032315/http://www.ee.washington.edu/research/guptalab/publications/EMbookChenGupta2010.pdf Fitting HMM's with expectation-maximization – complete derivation]
* [http://www.comp.leeds.ac.uk/roger/HiddenMarkovModels/html_dev/main.html A step-by-step tutorial on HMMs] {{Webarchive|url=https://web.archive.org/web/20170813231824/http://www.comp.leeds.ac.uk/roger/HiddenMarkovModels/html_dev/main.html |date=2017-08-13}} ''(University of Leeds)''
* [httphttps://www.cs.brown.edu/research/ai/dynamics/tutorial/Documents/HiddenMarkovModels.html Hidden Markov Models] ''(an exposition using basic mathematics)''
* [http://jedlik.phy.bme.hu/~gerjanos/HMM/node2.html Hidden Markov Models] ''(by Narada Warakagoda)''
* Hidden Markov Models: Fundamentals and Applications [http://www.eecis.udel.edu/~lliao/cis841s06/hmmtutorialpart1.pdf Part 1], [http://www.eecis.udel.edu/~lliao/cis841s06/hmmtutorialpart2.pdf Part 2] ''(by V. Petrushin)''