Content deleted Content added
Tag: Reverted |
Tag: Reverted |
||
Line 10:
=== Reinforcement learning ===
[[File:
[[Reinforcement learning]] is a process in which an agent learns to make decisions through trial and error. This problem is often modeled mathematically as a [[Markov decision process]] (MDP), where an agent at every timestep is in a state <math>s</math>, takes action <math>a</math>, receives a scalar reward and transitions to the next state <math>s'</math> according to environment dynamics <math>p(s'|s, a)</math>. The agent attempts to learn a policy <math>\pi(a|s)</math>, or map from observations to actions, in order to maximize its returns (expected sum of rewards). In reinforcement learning (as opposed to [[optimal control]]) the algorithm only has access to the dynamics <math>p(s'|s, a)</math> through sampling.
|