Content deleted Content added
Arachnidly (talk | contribs) m rl link + citation added replacing the robotics and ai links |
|||
Line 2:
'''Markov decision process''' ('''MDP'''), also called a [[Stochastic dynamic programming|stochastic dynamic program]] or stochastic control problem, is a model for [[sequential decision making]] when [[Outcome (probability)|outcomes]] are uncertain.<ref>{{Cite book |last=Puterman |first=Martin L. |title=Markov decision processes: discrete stochastic dynamic programming |date=1994 |publisher=Wiley |isbn=978-0-471-61977-2 |series=Wiley series in probability and mathematical statistics. Applied probability and statistics section |___location=New York}}</ref>
Originating from [[operations research]] in the 1950s,<ref>{{Cite journal |last=Schneider |first=S. |last2=Wagner |first2=D. H. |date=1957-02-26 |title=Error detection in redundant systems |url=https://dl.acm.org/doi/10.1145/1455567.1455587 |journal=Papers presented at the February 26-28, 1957, western joint computer conference: Techniques for reliability |series=IRE-AIEE-ACM '57 (Western) |___location=New York, NY, USA |publisher=Association for Computing Machinery |pages=115–121 |doi=10.1145/1455567.1455587 |isbn=978-1-4503-7861-1}}</ref><ref>{{Cite journal |last=Bellman |first=Richard |date=1958-09-01 |title=Dynamic programming and stochastic control processes |url=https://linkinghub.elsevier.com/retrieve/pii/S0019995858800030 |journal=Information and Control |volume=1 |issue=3 |pages=228–239 |doi=10.1016/S0019-9958(58)80003-0 |issn=0019-9958}}</ref> MDPs have since gained recognition in a variety of fields, including
==Background==
|