User:ZachsGenericUsername/sandbox/Deep reinforcement learning: Difference between revisions
Content deleted Content added
stated q learning |
added math |
||
Line 22:
Deep reinforcement learning has been used for a variety of applications in the past, some of which include:
* The [[AlphaZero]] algorithm, developed by [[DeepMind]], that has achieved super-human like performance in many games.▼
* Image enhancement models such as GAN and Unet which have attained much higher performance compared to the previous methods such as [[Super-resolution imaging|super-resolution]] and segmentation<ref>{{Cite book|url=https://www.worldcat.org/oclc/1163522253|title=Deep reinforcement learning fundamentals, research and applications|date=2020|publisher=Springer|others=Dong, Hao., Ding, Zihan., Zhang, Shanghang.|isbn=978-981-15-4095-0|___location=Singapore|oclc=1163522253}}</ref>▼
▲The [[AlphaZero]] algorithm, developed by [[DeepMind]], that has achieved super-human like performance in many games.
▲Image enhancement models such as GAN and Unet which have attained much higher performance compared to the previous methods such as [[Super-resolution imaging|super-resolution]] and segmentation<ref>{{Cite book|url=https://www.worldcat.org/oclc/1163522253|title=Deep reinforcement learning fundamentals, research and applications|date=2020|publisher=Springer|others=Dong, Hao., Ding, Zihan., Zhang, Shanghang.|isbn=978-981-15-4095-0|___location=Singapore|oclc=1163522253}}</ref>
[[Deep Q-learning|Deep Q]] networks, or learning algorithms without a specified model that analyze a situation and produce an action the agent should take.▼
== Training ==
Line 37 ⟶ 31:
==== Q-Learning ====
▲
[[Q-learning]] attempts to determine the optimal action given a specific state. The way this method determines the Q value, or quality of an action, can be loosely defined by a function taking in a state "s" and and action "a" and outputting the perceived quality of that action:
Line 45 ⟶ 41:
==== https://neuro.cs.ut.ee/demystifying-deep-reinforcement-learning/ ====
==== Deep Q-Learning ====
[[Deep Q-learning|Deep Q-Learning]] takes the principles of standard Q-learning but
(approximating the q values with an artificial neural network)
Line 53 ⟶ 49:
==== '''Exploration exploitation dilemma''' ====
The exploration exploitation dilemma is the problem of deciding whether to pursue actions that are already known to yield success or explore other pathways in order to discover greater success.
In the greedy learning policy the agent chooses actions that maximize the q value.
{{Math|1=a = argmax_(a) Q(s,a)}}
<nowiki>https://search-proquest-com.proxy.library.ucsb.edu:9443/docview/1136383063?accountid=14522</nowiki>
|