User:ZachsGenericUsername/sandbox/Deep reinforcement learning: Difference between revisions

Content deleted Content added
stated q learning
added math
Line 22:
Deep reinforcement learning has been used for a variety of applications in the past, some of which include:
 
* The [[AlphaZero]] algorithm, developed by [[DeepMind]], that has achieved super-human like performance in many games.
=== Video Games ===
* Image enhancement models such as GAN and Unet which have attained much higher performance compared to the previous methods such as [[Super-resolution imaging|super-resolution]] and segmentation<ref>{{Cite book|url=https://www.worldcat.org/oclc/1163522253|title=Deep reinforcement learning fundamentals, research and applications|date=2020|publisher=Springer|others=Dong, Hao., Ding, Zihan., Zhang, Shanghang.|isbn=978-981-15-4095-0|___location=Singapore|oclc=1163522253}}</ref>
The [[AlphaZero]] algorithm, developed by [[DeepMind]], that has achieved super-human like performance in many games.
 
=== Image Enhancement ===
Image enhancement models such as GAN and Unet which have attained much higher performance compared to the previous methods such as [[Super-resolution imaging|super-resolution]] and segmentation<ref>{{Cite book|url=https://www.worldcat.org/oclc/1163522253|title=Deep reinforcement learning fundamentals, research and applications|date=2020|publisher=Springer|others=Dong, Hao., Ding, Zihan., Zhang, Shanghang.|isbn=978-981-15-4095-0|___location=Singapore|oclc=1163522253}}</ref>
 
 
[[Deep Q-learning|Deep Q]] networks, or learning algorithms without a specified model that analyze a situation and produce an action the agent should take.
 
== Training ==
Line 37 ⟶ 31:
 
==== Q-Learning ====
[[Deep Q-learning|Deep Q]] networks, orare learning algorithms without a specified model that analyze a situation and produce an action the agent should take.
 
[[Q-learning]] attempts to determine the optimal action given a specific state. The way this method determines the Q value, or quality of an action, can be loosely defined by a function taking in a state "s" and and action "a" and outputting the perceived quality of that action:
 
Line 45 ⟶ 41:
==== https://neuro.cs.ut.ee/demystifying-deep-reinforcement-learning/ ====
==== Deep Q-Learning ====
[[Deep Q-learning|Deep Q-Learning]] takes the principles of standard Q-learning but makes approximates the q values using an artificial neural network. In many applications, there is far too much input data that needs to be accounted for (e.g. the millions of pixels in a computer screen) which would make the standard process of <s>determining quality values attached to states and actions</s> take a long amount of time. By using a neural network to process the data and predict a q value for each available action, the algorithms can be much more efficient.
 
(approximating the q values with an artificial neural network)
Line 53 ⟶ 49:
 
==== '''Exploration exploitation dilemma''' ====
The exploration exploitation dilemma is the problem of deciding whether to pursue actions that are already known to yield success or explore other pathways in order to discover greater success. AThere commonare methodtwo ofmain dealingapproaches withto learning policies to solve this problem, isgreedy, theand epsilon-greedy algorithm. This algorithm
 
In the greedy learning policy the agent chooses actions that maximize the q value.
 
{{Math|1=a = argmax_(a) Q(s,a)}}
 
 
<nowiki>https://search-proquest-com.proxy.library.ucsb.edu:9443/docview/1136383063?accountid=14522</nowiki>