Content deleted Content added
No edit summary |
Citation bot (talk | contribs) Add: bibcode, article-number. Removed URL that duplicated identifier. Removed parameters. Some additions/deletions were parameter name changes. | Use this bot. Report bugs. | Suggested by Headbomb | Linked from Wikipedia:WikiProject_Academic_Journals/Journals_cited_by_Wikipedia/Sandbox | #UCB_webform_linked 152/990 |
||
(9 intermediate revisions by 7 users not shown) | |||
Line 32:
''a'' — state transition probabilities<br/>
''b'' — output probabilities]]
In its discrete form, a hidden Markov process can be visualized as a generalization of the [[urn problem]] with replacement (where each item from the urn is returned to the original urn before the next step).<ref>{{cite journal |author=Lawrence R. Rabiner |author-link=Lawrence Rabiner |date=February 1989 |title=A tutorial on Hidden Markov Models and selected applications in speech recognition |url=http://www.ece.ucsb.edu/Faculty/Rabiner/ece259/Reprints/tutorial%20on%20hmm%20and%20applications.pdf |journal=Proceedings of the IEEE |volume=77 |issue=2 |pages=257–286 |citeseerx=10.1.1.381.3454 |doi=10.1109/5.18626 |s2cid=13618539}} [
The Markov process cannot be observed, only the sequence of labeled balls, thus this arrangement is called a ''hidden Markov process''. This is illustrated by the lower part of the diagram shown in Figure 1, where one can see that balls y1, y2, y3, y4 can be drawn at each state. Even if the observer knows the composition of the urns and has just observed a sequence of three balls, ''e.g.'' y1, y2 and y3 on the conveyor belt, the observer still cannot be ''sure'' which urn (''i.e.'', at which state) the genie has drawn the third ball from. However, the observer can work out other information, such as the likelihood that the third ball came from each of the urns.
Line 91:
The probability of observing a sequence
: <math>Y=y(0), y(1),\dots,y(L-1), </math>
of length ''L'' is given by
:<math>P(Y)=\sum_{X}P(Y\mid X)P(X),</math>
where the sum runs over all possible hidden-node sequences
: <math>X=x(0), x(1), \dots, x(L-1). </math>
Applying the principle of [[dynamic programming]], this problem, too, can be handled efficiently using the [[forward algorithm]].
Line 103:
==== Filtering ====
The task is to compute, given the model's parameters and a sequence of observations, the distribution over hidden states of the last latent variable at the end of the sequence, i.e. to compute <math>P(x(t)
This problem can be handled efficiently using the [[forward algorithm]]. An example is when the algorithm is applied to a Hidden Markov Network to determine <math>\mathrm{P}\big( h_t \
==== Smoothing ====
This is similar to filtering but asks about the distribution of a latent variable somewhere in the middle of a sequence, i.e. to compute <math>P(x(k)\
The [[forward-backward algorithm]] is a good method for computing the smoothed values for all hidden state variables.
Line 118:
=== Statistical significance ===
For some of the above problems, it may also be interesting to ask about [[statistical significance]]. What is the probability that a sequence drawn from some [[null distribution]] will have an HMM probability (in the case of the forward algorithm) or a maximum state sequence probability (in the case of the Viterbi algorithm) at least as large as that of a particular output sequence?<ref>{{Cite journal |last1=Newberg |first1=L. |doi=10.1186/1471-2105-10-212 |title=Error statistics of hidden Markov model and hidden Boltzmann model results |journal=BMC Bioinformatics |volume=10 |
== Learning ==
Line 132:
* [[Computational finance]]<ref>{{cite journal |doi=10.1007/s10614-016-9579-y |volume=49 |issue=4 |title=Parallel Optimization of Sparse Portfolios with AR-HMMs |year=2016 |journal=Computational Economics |pages=563–578 |last1=Sipos |first1=I. Róbert |last2=Ceffer |first2=Attila |last3=Levendovszky |first3=János|s2cid=61882456}}</ref><ref>{{cite journal |doi=10.1016/j.eswa.2016.01.015 |volume=53 |title=A novel corporate credit rating system based on Student's-t hidden Markov models |year=2016 |journal=Expert Systems with Applications |pages=87–105 |last1=Petropoulos |first1=Anastasios |last2=Chatzis |first2=Sotirios P. |last3=Xanthopoulos |first3=Stylianos}}</ref>
* [[Single-molecule experiment|Single-molecule kinetic analysis]]<ref>{{cite journal |last1=Nicolai |first1=Christopher |date=2013 |doi=10.1142/S1793048013300053 |title=Solving Ion Channel Kinetics with the QuB Software |journal=Biophysical Reviews and Letters |volume=8 |issue=3n04 |pages=191–211}}</ref>
* [[Neuroscience]]<ref>{{cite journal |doi=10.1002/hbm.25835 |title=Spatiotemporally Resolved Multivariate Pattern Analysis for M/EEG |journal=Human Brain Mapping |date=2022 |last1=Higgins |first1=Cameron |last2=Vidaurre |first2=Diego |last3=Kolling |first3=Nils |last4=Liu |first4=Yunzhe |last5=Behrens |first5=Tim |last6=Woolrich |first6=Mark |volume=43 |issue=10 |pages=3062–3085 |pmid=35302683 |pmc=9188977}}</ref><ref>{{Cite journal |last1=Diomedi |first1=S. |last2=Vaccari |first2=F. E. |last3=Galletti |first3=C. |last4=Hadjidimitrakis |first4=K. |last5=Fattori |first5=P. |date=2021-10-01 |title=Motor-like neural dynamics in two parietal areas during arm reaching |url=https://www.sciencedirect.com/science/article/pii/S0301008221001301 |journal=Progress in Neurobiology |language=en |volume=205 |
* [[Cryptanalysis]]
* [[Speech recognition]], including [[Siri]]<ref>{{cite book|last1=Domingos|first1=Pedro|title=The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World|url=https://archive.org/details/masteralgorithmh0000domi|url-access=registration|date=2015|publisher=Basic Books|isbn=9780465061921|page=[https://archive.org/details/masteralgorithmh0000domi/page/37 37]|language=en}}</ref>
Line 155:
== History ==
Hidden Markov models were described in a series of statistical papers by [[Leonard E. Baum]] and other authors in the second half of the 1960s.<ref>{{cite journal |last=Baum |first=L. E. |author2=Petrie, T. |title=Statistical Inference for Probabilistic Functions of Finite State Markov Chains |journal=The Annals of Mathematical Statistics |year=1966 |volume=37 |issue=6 |pages=1554–1563 |doi=10.1214/aoms/1177699147|doi-access=free}}</ref><ref>{{Cite journal |last1=Baum |first1=L. E. |last2=Eagon |first2=J. A. |doi=10.1090/S0002-9904-1967-11751-8 |title=An inequality with applications to statistical estimation for probabilistic functions of Markov processes and to a model for ecology |journal=[[Bulletin of the American Mathematical Society]] |volume=73 |issue=3 |pages=360 |year=1967 |zbl=0157.11101 |url=http://projecteuclid.org/euclid.bams/1183528841 |doi-access=free}}</ref><ref>{{cite journal |last=Baum |first=L. E. |author2=Sell, G. R. |title=Growth transformations for functions on manifolds |journal=Pacific Journal of Mathematics |year=1968 |volume=27 |issue=2 |pages=211–227 |doi=10.2140/pjm.1968.27.211 |doi-access=free}}</ref><ref>{{Cite journal |last1=Baum |first1=L. E. |author-link1=Leonard E. Baum |last2=Petrie |first2=T. |last3=Soules |first3=G. |last4=Weiss |first4=N. |title=A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains |doi=10.1214/aoms/1177697196 |journal=[[The Annals of Mathematical Statistics]] |volume=41 |issue=1 |pages=164–171 |year=1970 |jstor=2239727 |zbl=0188.49603 |mr=287613 |doi-access=free}}</ref><ref>{{cite journal |last=Baum |first=L.E. |title=An Inequality and Associated Maximization Technique in Statistical Estimation of Probabilistic Functions of a Markov Process |journal=Inequalities |year=1972 |volume=3 |pages=1–8}}</ref> One of the first applications of HMMs was [[speech recognition]], starting in the mid-1970s.<ref>{{Cite journal |last1=Baker |first1=J. |author-link1=James K. Baker |doi=10.1109/TASSP.1975.1162650 |title=The DRAGON system—An overview |journal=IEEE Transactions on Acoustics, Speech, and Signal Processing |volume=23 |pages=24–29 |year=1975}}</ref><ref>{{Cite journal |last1=Jelinek |first1=F. |last2=Bahl |first2=L. |last3=Mercer |first3=R. |doi=10.1109/TIT.1975.1055384 |title=Design of a linguistic statistical decoder for the recognition of continuous speech |journal=[[IEEE Transactions on Information Theory]] |volume=21 |issue=3 |pages=250 |year=1975}}</ref><ref>{{cite book |title=Hidden Markov Models for Speech Recognition |url=https://archive.org/details/hiddenmarkovmode0000huan |publisher=Edinburgh University Press |year=1990 |isbn=978-0-7486-0162-2 |author1=Xuedong Huang |author2=M. Jack |author3=Y. Ariki |author-link1=Xuedong Huang}}</ref><ref>{{cite book |title=Spoken Language Processing |publisher=Prentice Hall |year=2001 |isbn=978-0-13-022616-7 |author1=Xuedong Huang |author2=Alex Acero |author3=Hsiao-Wuen Hon |author-link1=Xuedong Huang}}</ref> From the linguistics point of view, hidden Markov models are equivalent to stochastic regular grammar.<ref>{{Cite book |last1=Carrasco |first1=Rafael C. |last2=Oncina |first2=Jose |date=1994 |editor-last=Carrasco |editor-first=Rafael C. |editor2-last=Oncina |editor2-first=Jose |chapter=Learning stochastic regular grammars by means of a state merging method |chapter-url=https://link.springer.com/chapter/10.1007/3-540-58473-0_144 |title=Grammatical Inference and Applications |series=Lecture Notes in Computer Science |volume=862 |language=en |___location=Berlin, Heidelberg |publisher=Springer |pages=139–152 |doi=10.1007/3-540-58473-0_144 |isbn=978-3-540-48985-6}}</ref>
In the second half of the 1980s, HMMs began to be applied to the analysis of biological sequences,<ref>{{cite journal |doi=10.1016/0022-2836(86)90289-5 |author=M. Bishop and E. Thompson |title=Maximum Likelihood Alignment of DNA Sequences |journal=[[Journal of Molecular Biology]] |volume=190 |issue=2 |pages=159–165 |year=1986 |pmid=3641921}} {{subscription required}} {{closed access}}</ref> in particular [[DNA]]. Since then, they have become ubiquitous in the field of [[bioinformatics]].<ref name=durbin>{{Durbin 1998|mode=cs1}}</ref>
Line 163:
In the hidden Markov models considered above, the state space of the hidden variables is discrete, while the observations themselves can either be discrete (typically generated from a [[categorical distribution]]) or continuous (typically from a [[Gaussian distribution]]). Hidden Markov models can also be generalized to allow continuous state spaces. Examples of such models are those where the Markov process over hidden variables is a [[linear dynamical system]], with a linear relationship among related variables and where all hidden and observed variables follow a [[Gaussian distribution]]. In simple cases, such as the linear dynamical system just mentioned, exact inference is tractable (in this case, using the [[Kalman filter]]); however, in general, exact inference in HMMs with continuous latent variables is infeasible, and approximate methods must be used, such as the [[extended Kalman filter]] or the [[particle filter]].
Nowadays, inference in hidden Markov models is performed in [[Nonparametric statistics|nonparametric]] settings, where the dependency structure enables [[identifiability]] of the model<ref>{{Cite journal |last1=Gassiat |first1=E. |last2=Cleynen |first2=A. |last3=Robin |first3=S. |date=2016-01-01 |title=Inference in finite state space non parametric Hidden Markov Models and applications |url=https://doi.org/10.1007/s11222-014-9523-8 |journal=Statistics and Computing |language=en |volume=26 |issue=1 |pages=61–71 |doi=10.1007/s11222-014-9523-8 |issn=1573-1375|url-access=subscription }}</ref> and the learnability limits are still under exploration.<ref>{{Cite journal |last1=Abraham |first1=Kweku |last2=Gassiat |first2=Elisabeth |last3=Naulet |first3=Zacharie |date=March 2023 |title=Fundamental Limits for Learning Hidden Markov Model Parameters
=== Bayesian modeling of the transitions probabilities ===
Line 190:
== Measure theory ==
{{See also|Subshift of finite type}}
[[File:Blackwell_HMM_example.png|thumb|193x193px|The hidden part of a hidden Markov model, whose observable states is non-Markovian
Given a Markov transition matrix and an invariant distribution on the states, a probability measure can be imposed on the set of subshifts. For example, consider the Markov chain given on the left on the states <math>A, B_1, B_2</math>, with invariant distribution <math>\pi = (2/7, 4/7, 1/7)</math>. By ignoring the distinction between <math>B_1, B_2</math>, this space of subshifts is projected on <math>A, B_1, B_2</math> into another space of subshifts on <math>A, B</math>, and this projection also projects the probability measure down to a probability measure on the subshifts on <math>A, B</math>.
Line 214:
* [[Sequential dynamical system]]
* [[Stochastic context-free grammar]]
*
* [[Variable-order Markov model]]
* [[Viterbi algorithm]]
Line 230:
* [https://web.archive.org/web/20120415032315/http://www.ee.washington.edu/research/guptalab/publications/EMbookChenGupta2010.pdf Fitting HMM's with expectation-maximization – complete derivation]
* [http://www.comp.leeds.ac.uk/roger/HiddenMarkovModels/html_dev/main.html A step-by-step tutorial on HMMs] {{Webarchive|url=https://web.archive.org/web/20170813231824/http://www.comp.leeds.ac.uk/roger/HiddenMarkovModels/html_dev/main.html |date=2017-08-13}} ''(University of Leeds)''
* [
* [http://jedlik.phy.bme.hu/~gerjanos/HMM/node2.html Hidden Markov Models] ''(by Narada Warakagoda)''
* Hidden Markov Models: Fundamentals and Applications [http://www.eecis.udel.edu/~lliao/cis841s06/hmmtutorialpart1.pdf Part 1], [http://www.eecis.udel.edu/~lliao/cis841s06/hmmtutorialpart2.pdf Part 2] ''(by V. Petrushin)''
|