Revision as of 10:10, 18 November 2023 edit Tsesea (talk \| contribs) 104 edits →Algorithms Tag: Reverted ← Previous edit		Revision as of 10:11, 18 November 2023 edit undo Tsesea (talk \| contribs) 104 edits →Algorithms Tag: Reverted Next edit →
Line 39: In '''model-free''' deep reinforcement learning algorithms, a policy <math>\pi(a\|s)</math> is learned without explicitly modeling the forward dynamics. A policy can be optimized to maximize returns by directly estimating the policy gradient<ref name="williams1992"/> but suffers from high variance, making it impractical for use with function approximation in deep RL. Subsequent algorithms have been developed for more stable learning and widely applied.<ref name="schulman2015trpo"/><ref name="schulman2017ppo"/> Another class of model-free deep reinforcement learning algorithms rely on [[dynamic programming]], inspired by [[temporal difference learning]] and [[Q-learning]]. In discrete action spaces, these algorithms usually learn a neural network Q-function <math>Q(s, a)</math> that estimates the future returns taking action <math>a</math> from state <math>s</math>.<ref name="DQN1"/> In continuous spaces, these algorithms often learn both a value estimate and a policy.<ref name="lillicrap2015ddpg"/><ref name="mnih2016a3c"/><ref name="haarnoja2018sac"/> {\| class="wikitable sortable" style="font-size: 96%;"

Deep reinforcement learning: Difference between revisions