Content deleted Content added
Tag: Reverted |
Tag: Reverted |
||
Line 35:
== Algorithms ==
[[File:Challenges and Tricks of Deep RL.jpg|thumb|Challenges and tricks in deep reinforcement learning algorithms <ref name="Li-2023"/>]]
Previously, it was believed that deep reinforcement learning (DRL) was a natural product of combining tabular RL and deep neural network, and its design was a trivial task. In practice, deep reinforcement learning is fundamentally complicated because it inherits a few serious challenges from both reinforcement learning and deep learning. Some challenges, including non-iid sequential data, easy divergence, overestimation, and sample inefficiency yield particularly destructive outcomes if they are not well treated. A few empirical but useful tricks have been proposed to address these prominent issues, which build the basis of various advanced DRL algorithms. These tricks include experience replay (ExR), parallel exploration (PEx), separated target network (STN), delayed policy update (DPU), constrained policy update (CPU), clipped actor criterion (CAC), double Q-functions (DQF), bounded double Q-functions (BDQ), distributional return function (DRF), entropy regularization (EnR), and soft value function (SVF) <ref name="Li-2023">{{cite book |last1=Li |first1=Shengbo |title= Reinforcement Learning for Sequential Decision and Optimal Control |date=2023 |___location=Springer Verlag, Singapore |isbn=978-9-811-97783-1 |pages=1–460 |doi=10.1007/978-981-19-7784-8 |s2cid=257928563 |edition=First | url=https://link.springer.com/book/10.1007/978-981-19-7784-8}}</ref>.
|