Revision as of 00:59, 6 December 2023 edit Tsesea (talk \| contribs) 104 edits →Algorithms Tag: Reverted ← Previous edit		Revision as of 03:30, 6 December 2023 edit undo Tsesea (talk \| contribs) 104 edits →Overview Tag: Reverted Next edit →
Line 10: === Reinforcement learning === [[File:~~Markov_diagram_v2~~Concept of Reinforcement Learning.~~svg~~jpg\|alt=Diagram explaining the loop recurring in reinforcement learning algorithms\|thumb\|Diagram of the loop recurring in reinforcement learning algorithms]] [[Reinforcement learning]] is a process in which an agent learns to make decisions through trial and error. This problem is often modeled mathematically as a [[Markov decision process]] (MDP), where an agent at every timestep is in a state <math>s</math>, takes action <math>a</math>, receives a scalar reward and transitions to the next state <math>s'</math> according to environment dynamics <math>p(s'\|s, a)</math>. The agent attempts to learn a policy <math>\pi(a\|s)</math>, or map from observations to actions, in order to maximize its returns (expected sum of rewards). In reinforcement learning (as opposed to [[optimal control]]) the algorithm only has access to the dynamics <math>p(s'\|s, a)</math> through sampling.

Deep reinforcement learning: Difference between revisions