Revision as of 05:52, 24 August 2024 edit Condordellanebbia (talk \| contribs) 12 edits Added the concept of <math>H-</math>step return, that is common in learning theory ← Previous edit		Revision as of 05:55, 24 August 2024 edit undo Condordellanebbia (talk \| contribs) 12 edits m →Reinforcement Learning for discrete MDPs Next edit →
Line 192: While this function is also unknown, experience during learning is based on <math>(s, a)</math> pairs (together with the outcome <math>s'</math>; that is, "I was in state <math>s</math> and I tried doing <math>a</math> and <math>s'</math> happened"). Thus, one has an array <math>Q</math> and uses experience to update it directly. This is known as [[Q-learning]]. Reinforcement learning can solve Markov-Decision processes without explicit specification of the transition probabilities; the values of the transition probabilities are needed in value and policy iteration. In reinforcement learning, instead of explicit specification of the transition probabilities, the transition probabilities are accessed through a simulator that is typically restarted many times from a uniformly random initial state. Reinforcement learning can also be combined with function approximation to address problems with a very large number of states. ==Other scopes==

Markov decision process: Difference between revisions