Markov decision process: Difference between revisions

Content deleted Content added
m compound modifier
Algorithms: Added a reference on exact (rather than approximate) reduction to a finite model in some cases.
Line 59:
==Algorithms==
 
Solutions for MDPs with finite state and action spaces may be found through a variety of methods such as [[dynamic programming]]. The algorithms in this section apply to MDPs with finite state and action spaces and explicitly given transition probabilities and reward functions, but the basic concepts may be extended to handle other problem classes, for example using [[function approximation]]. Also, some processes with countably infinite state and action spaces can be <i>exactly</i> reduced to ones with finite state and action spaces.<ref name="Wrobel 1984">{{cite journal|first=A.|last=Wrobel|title=On Markovian decision models with a finite skeleton|journal=Zeitschrift für Operations Research|date=1984|volume=28|issue=1|pages=17–27|doi=10.1007/bf01919083|s2cid=2545336}}</ref>
 
The standard family of algorithms to calculate optimal policies for finite state and action MDPs requires storage for two arrays indexed by state: ''value'' <math>V</math>, which contains real values, and ''policy'' <math>\pi</math>, which contains actions. At the end of the algorithm, <math>\pi</math> will contain the solution and <math>V(s)</math> will contain the discounted sum of the rewards to be earned (on average) by following that solution from state <math>s</math>.