Revision as of 15:40, 20 December 2024 edit Kvng (talk \| contribs) Extended confirmed users, New page reviewers 116,056 edits m compound modifier ← Previous edit		Revision as of 08:05, 22 January 2025 edit undo 2601:987:300:e5c0:515d:28be:ff88:d8a0 (talk) →Algorithms: Added a reference on exact (rather than approximate) reduction to a finite model in some cases. Next edit →
Line 59: ==Algorithms== Solutions for MDPs with finite state and action spaces may be found through a variety of methods such as [[dynamic programming]]. The algorithms in this section apply to MDPs with finite state and action spaces and explicitly given transition probabilities and reward functions, but the basic concepts may be extended to handle other problem classes, for example using [[function approximation]]. Also, some processes with countably infinite state and action spaces can be <i>exactly</i> reduced to ones with finite state and action spaces.<ref name="Wrobel 1984">{{cite journal\|first=A.\|last=Wrobel\|title=On Markovian decision models with a finite skeleton\|journal=Zeitschrift für Operations Research\|date=1984\|volume=28\|issue=1\|pages=17–27\|doi=10.1007/bf01919083\|s2cid=2545336}}</ref> The standard family of algorithms to calculate optimal policies for finite state and action MDPs requires storage for two arrays indexed by state: ''value'' <math>V</math>, which contains real values, and ''policy'' <math>\pi</math>, which contains actions. At the end of the algorithm, <math>\pi</math> will contain the solution and <math>V(s)</math> will contain the discounted sum of the rewards to be earned (on average) by following that solution from state <math>s</math>.

Markov decision process: Difference between revisions