Content deleted Content added
Added the concept of <math>H-</math>step return, that is common in learning theory |
|||
Line 192:
While this function is also unknown, experience during learning is based on <math>(s, a)</math> pairs (together with the outcome <math>s'</math>; that is, "I was in state <math>s</math> and I tried doing <math>a</math> and <math>s'</math> happened"). Thus, one has an array <math>Q</math> and uses experience to update it directly. This is known as [[Q-learning]].
==Other scopes==
|