Revision as of 19:49, 27 November 2023 edit OAbot (talk \| contribs) Bots 644,447 edits m Open access bot: doi updated in citation with #oabot. ← Previous edit		Revision as of 16:41, 11 January 2024 edit undo 131.188.6.240 (talk) URI was not to the PDF of the book but some other PDF Next edit →
Line 1: {{Short description\|Mathematical model}} In mathematics, a '''Markov decision process''' ('''MDP''') is a [[discrete-time]] [[stochastic]] [[Optimal control theory\|control]] process. It provides a mathematical framework for modeling [[decision making]] in situations where outcomes are partly [[Randomness#In mathematics\|random]] and partly under the control of a decision maker. MDPs are useful for studying [[optimization problem]]s solved via [[dynamic programming]]. MDPs were known at least as early as the 1950s;<ref>{{cite journal\|first=R.\|last=Bellman\|author-link=Richard E. Bellman\|url=http://www.iumj.indiana.edu/IUMJ/FULLTEXT/1957/6/56038\|title=A Markovian Decision Process\|journal=Journal of Mathematics and Mechanics\|volume=6\|year=1957\|issue=5\|pages=679–684\|jstor=24900506}}</ref> a core body of research on Markov decision processes resulted from [[Ronald A. Howard\|Ronald Howard]]'s 1960 book, ''Dynamic Programming and Markov Processes''.<ref>{{cite book\|first=Ronald A.\|last=Howard\|title=Dynamic Programming and Markov Processes\|publisher=The M.I.T. Press\|year=1960~~\|url=http://web.mit.edu/dimitrib/www/dpchapter.pdf}~~}</ref> They are used in many disciplines, including [[robotics]], [[automatic control]], [[economics]] and [[manufacturing]]. The name of MDPs comes from the Russian mathematician [[Andrey Markov]] as they are an extension of [[Markov chain]]s. At each time step, the process is in some state <math>s</math>, and the decision maker may choose any action <math>a</math> that is available in state <math>s</math>. The process responds at the next time step by randomly moving into a new state <math>s'</math>, and giving the decision maker a corresponding reward <math>R_a(s,s')</math>.

Markov decision process: Difference between revisions