Revision as of 22:58, 2 September 2024 edit 70.114.196.191 (talk) →Example ← Previous edit		Revision as of 23:01, 3 September 2024 edit undo Arachnidly (talk \| contribs) 196 edits m rl link + citation added replacing the robotics and ai links Tag: Visual edit Next edit →
Line 2: '''Markov decision process''' ('''MDP'''), also called a [[Stochastic dynamic programming\|stochastic dynamic program]] or stochastic control problem, is a model for [[sequential decision making]] when [[Outcome (probability)\|outcomes]] are uncertain.<ref>{{Cite book \|last=Puterman \|first=Martin L. \|title=Markov decision processes: discrete stochastic dynamic programming \|date=1994 \|publisher=Wiley \|isbn=978-0-471-61977-2 \|series=Wiley series in probability and mathematical statistics. Applied probability and statistics section \|___location=New York}}</ref> Originating from [[operations research]] in the 1950s,<ref>{{Cite journal \|last=Schneider \|first=S. \|last2=Wagner \|first2=D. H. \|date=1957-02-26 \|title=Error detection in redundant systems \|url=https://dl.acm.org/doi/10.1145/1455567.1455587 \|journal=Papers presented at the February 26-28, 1957, western joint computer conference: Techniques for reliability \|series=IRE-AIEE-ACM '57 (Western) \|___location=New York, NY, USA \|publisher=Association for Computing Machinery \|pages=115–121 \|doi=10.1145/1455567.1455587 \|isbn=978-1-4503-7861-1}}</ref><ref>{{Cite journal \|last=Bellman \|first=Richard \|date=1958-09-01 \|title=Dynamic programming and stochastic control processes \|url=https://linkinghub.elsevier.com/retrieve/pii/S0019995858800030 \|journal=Information and Control \|volume=1 \|issue=3 \|pages=228–239 \|doi=10.1016/S0019-9958(58)80003-0 \|issn=0019-9958}}</ref> MDPs have since gained recognition in a variety of fields, including ~~[[robotics]],~~ [[ecology]], [[economics]], [[Health care\|healthcare]], [[telecommunications]] and [[~~artificial~~reinforcement ~~intelligence~~learning]].<ref>{{Cite book \|last=Sutton \|first=Richard S. \|title=Reinforcement learning: an introduction \|last2=Barto \|first2=Andrew G. \|date=2018 \|publisher=The MIT Press \|isbn=978-0-262-03924-6 \|edition=2nd \|series=Adaptive computation and machine learning series \|___location=Cambridge, Massachusetts}}</ref> ==Background==

Markov decision process: Difference between revisions