Revision as of 00:41, 13 November 2020 edit ZachsGenericUsername (talk \| contribs) 23 edits re added overview, challenges and solutions sections Tag: Visual edit ← Previous edit		Revision as of 21:12, 14 November 2020 edit undo ZachsGenericUsername (talk \| contribs) 23 edits did some work cleaning things up Tag: Visual edit Next edit →
Line 1: {{Userspace draft\|source=ArticleWizard\|date=October 2020}} '''Deep reinforcement learning (DRL)''' is a [[machine learning]] method that takes principles from both [[reinforcement learning]] and [[deep learning]] to obtain benefits from both. Deep reinforcement learning has a ~~huge~~large diversity of applications including video games, computer science, healthcare, and finance because of how powerful it is as a machine learning technique. Games in particular have been ~~extremely~~very influential in the development of reinforcement learning algorithms. == Overview == Line 15: === Deep Reinforcement Learning === Deep Reinforcement Learning combines both the techniques of giving rewards based on actions from reinforcement learning and the idea of using a neural network to process data from deep learning ~~in order to create a very powerful machine learning technique (How so? why is it "powerful"?)~~. == ~~Example~~ Applications == Deep reinforcement learning has ~~become~~been ~~very~~used ~~influential in many areas and has gained~~for a ~~diverse~~ variety of applications in the past, some of which include: * [[Deep Q-learning\|Deep Q]] networks, or learning algorithms without a specified model that analyze a situation and produce an action the agent should take.▼ * The [[AlphaZero]] algorithm, developed by [[DeepMind]], that has achieved super-human like performance in many games. * Image enhance models such as GAN and Unet which have attained much higher performance compared to the previous methods such as [[Super-resolution imaging\|super-resolution]] and segmentation<ref>{{Cite book\|url=https://www.worldcat.org/oclc/1163522253\|title=Deep reinforcement learning fundamentals, research and applications\|date=2020\|publisher=Springer\|others=Dong, Hao., Ding, Zihan., Zhang, Shanghang.\|isbn=978-981-15-4095-0\|___location=Singapore\|oclc=1163522253}}</ref> * Procedural level generation in video games <ref>{{Cite web\|last=\|first=\|date=\|title=Fix me :(\|url=https://ucsb-primo.hosted.exlibrisgroup.com/primo-explore/fulldisplay?docid=TN_proquest2073529169&vid=UCSB&search_scope=default_scope&tab=default_tab&lang=en_US&context=PC\|url-status=live\|archive-url=\|archive-date=\|access-date=2020-10-29\|website=ucsb-primo.hosted.exlibrisgroup.com\|language=en}}</ref> == Challenges ==▼ Some challenges in training algorithms that have to be overcome include:▼ ▲* [[Deep Q-learning\|Deep Q]] networks, or learning algorithms without a specified model that analyze a situation and produce an action the agent should take. * '''Frequency of rewards:''' When the goal is too difficult for the learning algorithm to complete, they may never reach the goal and will never be rewarded. Additionally, if a reward is received at the end of a task, the algorithm has no way to differentiate between good and bad behavior during the task. For example, if an algorithm is attempting to learn how to play pong and they make many correct moves but they ultimately loose the point and they are rewarded negatively, there is no way to determine what movements of the paddle were good and what moves were not good due to the reward being too sparse. (<nowiki>https://arxiv.org/abs/2001.00119</nowiki>)▼ * '''Reward Shaping''': The process of giving an agent intermediate rewards during the process of learning that are shaped to the task. For example, if an agent is attempting to learn the game [[Atari Breakout]], they may get a positive reward every time they successfully hit the ball and break a brick instead of successfully completing a level. While this will reduce the need time it takes an agent to learn a task due to less randomness involved and more guided actions taking place, it reduces the generalizability of the algorithm because it would need to be tweaked for each individual circumstance making it not an optimal solution. (<nowiki>https://arxiv.org/abs/1903.02020</nowiki>)▼ == ~~Solutions~~Training == In order to have a functional agent, the algorithm must be trained with a certain goal. The training process of of Auxiliary reward signals▼ ▲=== Challenges === Curiosity driven exploration▼ ▲Some challenges in training ~~algorithms that have to be overcome~~agents include: ▲* '''Frequency of rewards:''' When the goal is too difficult for the learning algorithm to complete, they may never reach the goal and will never be rewarded. Additionally, if a reward is received at the end of a task, the algorithm has no way to differentiate between good and bad behavior during the task. For example, if an algorithm is attempting to learn how to play pong and they make many correct moves but they ultimately loose the point and they are rewarded negatively, there is no way to determine what movements of the paddle were good and what moves were not good due to the reward being too sparse. (<nowiki>https://arxiv.org/abs/2001.00119</nowiki> ) Hindsight experience replay ▼ === Optimizations === ▲* '''Reward Shaping''': ~~The~~is the process of giving an agent intermediate rewards ~~during~~while ~~the process of~~it ~~learning~~learns that are ~~shaped~~customized to fit the task. For example, if an agent is attempting to learn the game [[Atari Breakout]], they may get a positive reward every time they successfully hit the ball and break a brick instead of successfully completing a level. ~~While this~~This will reduce the ~~need~~ time it takes an agent to learn a task ~~due~~because tomore ~~less~~guided ~~randomness involved~~actions and ~~more~~less ~~guided~~random ~~actions~~guessing ~~taking~~takes place. However, it reduces the generalizability of the algorithm because it would need to be tweaked for each individual circumstance, making it not an optimal solution. (<nowiki>https://arxiv.org/abs/1903.02020</nowiki>) ▲* '''Auxiliary reward signals''' ▲* '''Curiosity driven exploration''' ▲* '''Hindsight experience replay ''' == Generalization ==

User:ZachsGenericUsername/sandbox/Deep reinforcement learning: Difference between revisions