Markov decision process: Difference between revisions

Content deleted Content added
Added the concept of <math>H-</math>step return, that is common in learning theory
Line 192:
 
While this function is also unknown, experience during learning is based on <math>(s, a)</math> pairs (together with the outcome <math>s'</math>; that is, "I was in state <math>s</math> and I tried doing <math>a</math> and <math>s'</math> happened"). Thus, one has an array <math>Q</math> and uses experience to update it directly. This is known as [[Q-learning]].
 
Reinforcement learning can solve Markov-Decision processes without explicit specification of the transition probabilities; the values of the transition probabilities are needed in value and policy iteration. In reinforcement learning, instead of explicit specification of the transition probabilities, the transition probabilities are accessed through a simulator that is typically restarted many times from a uniformly random initial state. Reinforcement learning can also be combined with function approximation to address problems with a very large number of states.
 
==Other scopes==