User:ZachsGenericUsername/sandbox/Deep reinforcement learning: Difference between revisions

Content deleted Content added
did some work cleaning things up
added links
Line 6:
 
=== Reinforcement Learning ===
[[File:Markov diagram v2.svg|alt=Diagram explaining the loop recurring in reinforcement learning algorithms|thumb|Diagram of the loop recurring in reinforcement learning algorithms]][[Reinforcement learning]] is a process in which an agent learns to preform an action through trial and error.<nowiki>[https://arxiv.org/abs/2001.00119]</nowiki> In this process, the agent receives a reward indicating whether their previous action was good or bad and aims to optimize their behavior based on this reward.
 
<nowiki>https://arxiv.org/abs/2001.00119</nowiki>
Line 22:
* The [[AlphaZero]] algorithm, developed by [[DeepMind]], that has achieved super-human like performance in many games.
* Image enhance models such as GAN and Unet which have attained much higher performance compared to the previous methods such as [[Super-resolution imaging|super-resolution]] and segmentation<ref>{{Cite book|url=https://www.worldcat.org/oclc/1163522253|title=Deep reinforcement learning fundamentals, research and applications|date=2020|publisher=Springer|others=Dong, Hao., Ding, Zihan., Zhang, Shanghang.|isbn=978-981-15-4095-0|___location=Singapore|oclc=1163522253}}</ref>
* Procedural level generation in video games <ref>{{Cite web|last=|first=|date=|title=Fix me :(|url=https://ucsb-primo.hosted.exlibrisgroup.com/primo-explore/fulldisplay?docid=TN_proquest2073529169&vid=UCSB&search_scope=default_scope&tab=default_tab&lang=en_US&context=PC|url-status=live|archive-url=|archive-date=|access-date=2020-10-29|website=ucsb-primo.hosted.exlibrisgroup.com|language=en}}</ref> https://asmedigitalcollection.asme.org/computingengineering/article-abstract/20/5/051005/1074423/Deep-Reinforcement-Learning-for-Procedural-Content?redirectedFrom=fulltext
 
 
Line 28:
 
== Training ==
In order to have a functional agent, the algorithm must be trained with a certain goal. TheThere trainingare processdifferent oftechniques ofused to train agents, each having their own benefits.
 
=== Challenges ===
Line 37:
=== Optimizations ===
 
* '''Reward Shaping''' is the process of giving an agent intermediate rewards while it learns that are customized to fit the task. For example, if an agent is attempting to learn the game [[Atari Breakout]] , they may get a positive reward every time they successfully hit the ball and break a brick instead of successfully completing a level. This will reduce the time it takes an agent to learn a task because moreit guidedwill actionshave andto do less random guessing takes place. However, itusing this method reduces the <s>generalizability</s> of the algorithm because itthe <s>reward triggers</s> would need to be tweaked for each individual circumstance, making it not an optimal solution. (<nowiki>https://arxiv.org/abs/1903.02020</nowiki>)
* '''Curiosity driven exploration''' https://arxiv.org/abs/1910.10840
* '''Auxiliary reward signals'''
* '''Hindsight experience replay''' https://arxiv.org/abs/1707.01495
* '''Curiosity driven exploration'''
* '''Hindsight experience replay'''
 
== Generalization ==
Line 46 ⟶ 45:
 
When using reinforcement learning, the model must be aware of its environment which is usually provided manually but when this is combined with deep learning, which is very good at dictating features from raw data (e.g. pixels or raw image files) the algorithm gets the benefits of reinforcement learning without being told what it's environment looks like. With this layer of abstraction, deep reinforcement learning algorithms can become generalized and the same model can be used for different tasks. Automatic feature extraction can provide much better accuracy than if a human to do this job<ref>{{Cite web|title=https://ucsb-primo.hosted.exlibrisgroup.com/primo-explore/fulldisplay?docid=TN_proquest2074058918&vid=UCSB&search_scope=default_scope&tab=default_tab&lang=en_US&context=PC|url=https://ucsb-primo.hosted.exlibrisgroup.com/primo-explore/fulldisplay?docid=TN_proquest2074058918&vid=UCSB&search_scope=default_scope&tab=default_tab&lang=en_US&context=PC|access-date=2020-10-22|website=ucsb-primo.hosted.exlibrisgroup.com|language=en}}</ref>
 
*https://ucsb-primo.hosted.exlibrisgroup.com/permalink/f/12e9sm9/TN_arxiv1810.12282
 
== References ==<!--- See http://en.wikipedia.org/wiki/Wikipedia:Footnotes on how to create references using <ref></ref> tags, these references will then appear here automatically -->
{{Reflist}}
Line 52 ⟶ 54:
 
* [http://www.example.com www.example.com]
*https://ucsb-primo.hosted.exlibrisgroup.com/permalink/f/12e9sm9/TN_arxiv1810.12282
*https://asmedigitalcollection.asme.org/computingengineering/article-abstract/20/5/051005/1074423/Deep-Reinforcement-Learning-for-Procedural-Content?redirectedFrom=fulltext
<!--- Categories --->