Deep reinforcement learning: Difference between revisions

Content deleted Content added
Tsesea (talk | contribs)
Tag: Reverted
Tsesea (talk | contribs)
Tag: Reverted
Line 10:
 
=== Reinforcement learning ===
[[File:Markov_diagram_v2Concept of Reinforcement Learning.svgjpg|alt=Diagram explaining the loop recurring in reinforcement learning algorithms|thumb|Diagram of the loop recurring in reinforcement learning algorithms]]
[[Reinforcement learning]] is a process in which an agent learns to make decisions through trial and error. This problem is often modeled mathematically as a [[Markov decision process]] (MDP), where an agent at every timestep is in a state <math>s</math>, takes action <math>a</math>, receives a scalar reward and transitions to the next state <math>s'</math> according to environment dynamics <math>p(s'|s, a)</math>. The agent attempts to learn a policy <math>\pi(a|s)</math>, or map from observations to actions, in order to maximize its returns (expected sum of rewards). In reinforcement learning (as opposed to [[optimal control]]) the algorithm only has access to the dynamics <math>p(s'|s, a)</math> through sampling.