Deep reinforcement learning: Difference between revisions

Content deleted Content added
Tsesea (talk | contribs)
No edit summary
Tag: Reverted
Tsesea (talk | contribs)
No edit summary
Tag: Reverted
Line 1:
{{Short description|Machine learning that combines deep learning and reinforcement learning}}
{{Machine learning}}
'''Deep reinforcement learning''' ('''deep RL''') is a subfield of [[machine learning]] that combines [[reinforcement learning]] (RL) and [[deep learning]]. RL considers the problem of a computational agent learning to make decisions by trial and error. Deep RL incorporates deep learning into the solution, allowing agents to make decisions from unstructured input data without manual engineering of the [[state space]] <ref name="link.springer.com">{{cite book |last1=Li |first1=Shenbo Eben |title= Reinforcement Learning for Sequential Decision and Optimal Control |date=2023 |___location=Springer Verlag, Singapore |isbn=978-9-811-97783-1 |pages=1–460 |doi=10.1007/978-981-19-7784-8 |s2cid=257928563 |edition=First | url=https://link.springer.com/book/10.1007/978-981-19-7784-8}}</ref>. Deep RL algorithms are able to take in very large inputs (e.g. every pixel rendered to the screen in a video game) and decide what actions to perform to optimize an objective (e.g. maximizing the game score). Deep reinforcement learning has been used for a diverse set of applications including but not limited to [[robotics]], [[video game]]s, [[natural language processing]], [[computer vision]], education, transportation, finance and [[Health care|healthcare]].<ref name="francoislavet2018"/>
 
== Overview ==
Line 11:
=== Reinforcement learning ===
[[File:Markov_diagram_v2.svg|alt=Diagram explaining the loop recurring in reinforcement learning algorithms|thumb|Diagram of the loop recurring in reinforcement learning algorithms]]
[[Reinforcement learning]] is a process in which an agent learns to make decisions through trial and error. This problem is often modeled mathematically as a [[Markov decision process]] (MDP), where an agent at every timestep is in a state <math>s</math>, takes action <math>a</math>, receives a scalar reward and transitions to the next state <math>s'</math> according to environment dynamics <math>p(s'|s, a)</math>. The agent attempts to learn a policy <math>\pi(a|s)</math>, or map from observations to actions, in order to maximize its returns (expected sum of rewards). In reinforcement learning (as opposed to [[optimal control]]) the algorithm only has access to the dynamics <math>p(s'|s, a)</math> through sampling <ref name="link.springer.com">{{cite book |last1=Li |first1=Shenbo Eben |title= Reinforcement Learning for Sequential Decision and Optimal Control |date=2023 |___location=Springer Verlag, Singapore |isbn=978-9-811-97783-1 |pages=1–460 |doi=10.1007/978-981-19-7784-8 |edition=First | url=https://link.springer.com/book/10.1007/978-981-19-7784-8}}</ref>.
 
=== Deep reinforcement learning ===