Deep reinforcement learning: Difference between revisions

Content deleted Content added
Tsesea (talk | contribs)
Tag: Reverted
Tsesea (talk | contribs)
Tag: Reverted
Line 67:
|Distributional Soft Actor-Critic ||Model-free ||Off-policy ||Continuous ||Continuous ||Value distribution
|}
 
Previously, it was believed that deep reinforcement learning (DRL) was a natural product of combining tabular RL and deep neural network, and its design was a trivial task. In practice, deep reinforcement learning is fundamentally complicated because it inherits a few serious challenges from both reinforcement learning and deep learning. Some challenges, including non-iid sequential data, easy divergence, overestimation, and sample inefficiency yield particularly destructive outcomes if they are not well treated. A few empirical but useful tricks have been proposed to address these prominent issues, which build the basis of various advanced DRL algorithms. These tricks include experience replay (ExR), parallel exploration (PEx), separated target network (STN), delayed policy update (DPU), constrained policy update (CPU), clipped actor criterion (CAC), double Q-functions (DQF), bounded double Q-functions (BDQ), distributional return function (DRF), entropy regularization (EnR), and soft value function (SVF).
[[File:Challenges and Tricks of Deep RL.jpg|thumb|Challenges and tricks in deep reinforcement learning algorithms]]
 
== Research ==