Revision as of 17:58, 17 June 2024 edit SimLibrarian (talk \| contribs) Extended confirmed users 147,230 edits m MOS:MAJORWORK Tag: Visual edit ← Previous edit		Revision as of 11:24, 26 June 2024 edit undo Citation bot (talk \| contribs) Bots 5,865,713 edits Add: date, authors 1-1. Removed URL that duplicated identifier. Removed access-date with no URL. Removed parameters. Some additions/deletions were parameter name changes. \| Use this bot. Report bugs. \| Suggested by Abductive \| Category:Deep learning \| #UCB_Category 27/39 Next edit →
Line 1: {{Short description\|Machine learning that combines deep learning and reinforcement learning}} {{Machine learning}} '''Deep reinforcement learning''' ('''deep RL''') is a subfield of [[machine learning]] that combines [[reinforcement learning]] (RL) and [[deep learning]]. RL considers the problem of a computational agent learning to make decisions by trial and error. Deep RL incorporates deep learning into the solution, allowing agents to make decisions from unstructured input data without manual engineering of the [[state space]]. Deep RL algorithms are able to take in very large inputs (e.g. every pixel rendered to the screen in a video game) and decide what actions to perform to optimize an objective (e.g. maximizing the game score). Deep reinforcement learning has been used for a diverse set of applications including but not limited to [[robotics]], [[video game]]s, [[natural language processing]], [[computer vision]],<ref>{{Cite journal \|~~last~~last1=Le \|~~first~~first1=Ngan \|last2=Rathour \|first2=Vidhiwar Singh \|last3=Yamazaki \|first3=Kashu \|last4=Luu \|first4=Khoa \|last5=Savvides \|first5=Marios \|date=2022-04-01 \|title=Deep reinforcement learning in computer vision: a comprehensive survey \|url=https://doi.org/10.1007/s10462-021-10061-9 \|journal=Artificial Intelligence Review \|language=en \|volume=55 \|issue=4 \|pages=2733–2819 \|doi=10.1007/s10462-021-10061-9 \|issn=1573-7462\|arxiv=2108.11510 }}</ref> education, transportation, finance and [[Health care\|healthcare]].<ref name="francoislavet2018"/> == Overview == Line 73: <ref name="francoislavet2018">{{Cite journal\|last1=Francois-Lavet\|first1=Vincent\|last2=Henderson\|first2=Peter\|last3=Islam\|first3=Riashat\|last4=Bellemare\|first4=Marc G.\|last5=Pineau\|first5=Joelle\|date=2018\|title=An Introduction to Deep Reinforcement Learning\|journal=Foundations and Trends in Machine Learning\|volume=11\|issue=3–4\|pages=219–354\|arxiv=1811.12560\|bibcode=2018arXiv181112560F\|doi=10.1561/2200000071\|issn=1935-8237\|s2cid=54434537}}</ref> <ref name="Hassabis">{{cite speech \|last1=Demis \|first1=Hassabis \| date=March 11, 2016 \|title= Artificial Intelligence and the Future. \|url= https://www.youtube.com/watch?v=8Z2eLTSCuBk}}</ref> <ref name="TD-Gammon">{{cite journal ~~\| url=http://www.bkgm.com/articles/tesauro/tdl.html~~ \| title=Temporal Difference Learning and TD-Gammon \| date=March 1995 \| last=Tesauro \| first=Gerald \| journal=Communications of the ACM \| volume=38 \| issue=3 \| doi=10.1145/203330.203343 \| pages=58–68 \| s2cid=8763243 ~~\| access-date=2017-03-10 \| archive-url=https://web.archive.org/web/20100209103427/http://www.bkgm.com/articles/tesauro/tdl.html \| archive-date=2010-02-09 \| url-status=dead~~ \| doi-access=free }}</ref> <ref name="sutton1996">{{cite book \|last1=Sutton \|first1=Richard \|last2=Barto \|first2=Andrew \|date=September 1996 \|title=Reinforcement Learning: An Introduction \|publisher=Athena Scientific}}</ref> <ref name="tsitsiklis1996">{{cite book \|last1=Bertsekas \|first2=Dimitri \|last2=Tsitsiklis \|first1=John \|date=September 1996 \|title=Neuro-Dynamic Programming \|url=http://athenasc.com/ndpbook.html \|publisher=Athena Scientific \|isbn=1-886529-10-8}}</ref> Line 88: <ref name="openaihand">{{Cite web\|title=OpenAI - Solving Rubik's Cube With A Robot Hand\|url=https://openai.com/blog/solving-rubiks-cube/\|website=OpenAI}}</ref> <ref name="openaihandarxiv">{{Cite conference\|title= Solving Rubik's Cube with a Robot Hand \|last1=OpenAI \|display-authors=etal\|date=2019\|arxiv=1910.07113 }}</ref> <ref name="deepmindcooling">{{Cite web\|title=DeepMind AI Reduces Google Data Centre Cooling Bill by 40% \|url=https://deepmind.com/blog/article/deepmind-ai-reduces-google-data-centre-cooling-bill-40\|website=DeepMind\|date=14 May 2024 }}</ref> <ref name="neurips2021ml4ad">{{Cite web\|title=Machine Learning for Autonomous Driving Workshop @ NeurIPS 2021\|url=https://ml4ad.github.io/\|website=NeurIPS 2021\|date=December 2021}}</ref> <ref name="williams1992">{{Cite journal\|last1=Williams\|first1=Ronald J\|journal=Machine Learning\|pages=229–256\|title = Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning\|date=1992\|volume=8\|issue=3–4\|doi=10.1007/BF00992696\|s2cid=2332513\|doi-access=free}}</ref>

Deep reinforcement learning: Difference between revisions