Revision as of 22:59, 15 November 2020 edit ZachsGenericUsername (talk \| contribs) 23 edits stated q learning Tag: Visual edit ← Previous edit		Revision as of 23:19, 15 November 2020 edit undo ZachsGenericUsername (talk \| contribs) 23 edits added math Tag: Visual edit: Switched Next edit →
Line 22: Deep reinforcement learning has been used for a variety of applications in the past, some of which include: * The [[AlphaZero]] algorithm, developed by [[DeepMind]], that has achieved super-human like performance in many games.▼ ~~=== Video Games ===~~ * Image enhancement models such as GAN and Unet which have attained much higher performance compared to the previous methods such as [[Super-resolution imaging\|super-resolution]] and segmentation<ref>{{Cite book\|url=https://www.worldcat.org/oclc/1163522253\|title=Deep reinforcement learning fundamentals, research and applications\|date=2020\|publisher=Springer\|others=Dong, Hao., Ding, Zihan., Zhang, Shanghang.\|isbn=978-981-15-4095-0\|___location=Singapore\|oclc=1163522253}}</ref>▼ ▲The [[AlphaZero]] algorithm, developed by [[DeepMind]], that has achieved super-human like performance in many games. ~~=== Image Enhancement ===~~ ▲Image enhancement models such as GAN and Unet which have attained much higher performance compared to the previous methods such as [[Super-resolution imaging\|super-resolution]] and segmentation<ref>{{Cite book\|url=https://www.worldcat.org/oclc/1163522253\|title=Deep reinforcement learning fundamentals, research and applications\|date=2020\|publisher=Springer\|others=Dong, Hao., Ding, Zihan., Zhang, Shanghang.\|isbn=978-981-15-4095-0\|___location=Singapore\|oclc=1163522253}}</ref> [[Deep Q-learning\|Deep Q]] networks, or learning algorithms without a specified model that analyze a situation and produce an action the agent should take.▼ == Training == Line 37 ⟶ 31: ==== Q-Learning ==== ▲~~[[Deep~~ Q-learning~~\|Deep Q]]~~ networks, orare learning algorithms without a specified model that analyze a situation and produce an action the agent should take. [[Q-learning]] attempts to determine the optimal action given a specific state. The way this method determines the Q value, or quality of an action, can be loosely defined by a function taking in a state "s" and and action "a" and outputting the perceived quality of that action: Line 45 ⟶ 41: ==== https://neuro.cs.ut.ee/demystifying-deep-reinforcement-learning/ ==== ==== Deep Q-Learning ==== [[Deep Q-learning\|Deep Q-Learning]] takes the principles of standard Q-learning but ~~makes~~ approximates the q values using an artificial neural network. In many applications, there is far too much input data that needs to be accounted for (e.g. the millions of pixels in a computer screen) which would make the standard process of <s>determining quality values attached to states and actions</s> take a long amount of time. By using a neural network to process the data and predict a q value for each available action, the algorithms can be much more efficient. (approximating the q values with an artificial neural network) Line 53 ⟶ 49: ==== '''Exploration exploitation dilemma''' ==== The exploration exploitation dilemma is the problem of deciding whether to pursue actions that are already known to yield success or explore other pathways in order to discover greater success. AThere ~~common~~are ~~method~~two ofmain ~~dealing~~approaches ~~with~~to learning policies to solve this problem, isgreedy, ~~the~~and epsilon-greedy ~~algorithm~~. ~~This algorithm~~ In the greedy learning policy the agent chooses actions that maximize the q value. {{Math\|1=a = argmax_(a) Q(s,a)}} <nowiki>https://search-proquest-com.proxy.library.ucsb.edu:9443/docview/1136383063?accountid=14522</nowiki>

User:ZachsGenericUsername/sandbox/Deep reinforcement learning: Difference between revisions