Content deleted Content added
ce |
Citation bot (talk | contribs) Added bibcode. Removed URL that duplicated identifier. | Use this bot. Report bugs. | Suggested by Headbomb | #UCB_toolbar |
||
Line 1:
{{Short description|Reinforcement learning algorithms that combine policy and value estimation}}
The '''actor-critic algorithm''' (AC) is a family of [[reinforcement learning]] (RL) algorithms that combine policy-based RL algorithms such as [[policy gradient method]]s, and value-based RL algorithms such as value iteration, [[Q-learning]], [[State–action–reward–state–action|SARSA]], and [[Temporal difference learning|TD learning]].<ref>{{Cite journal |last1=Arulkumaran |first1=Kai |last2=Deisenroth |first2=Marc Peter |last3=Brundage |first3=Miles |last4=Bharath |first4=Anil Anthony |date=November 2017 |title=Deep Reinforcement Learning: A Brief Survey
An AC algorithm consists of two main components: an "'''actor'''" that determines which actions to take according to a policy function, and a "'''critic'''" that evaluates those actions according to a value function.<ref>{{Cite journal |last1=Konda |first1=Vijay |last2=Tsitsiklis |first2=John |date=1999 |title=Actor-Critic Algorithms |url=https://proceedings.neurips.cc/paper/1999/hash/6449f44a102fde848669bdd9eb6b76fa-Abstract.html |journal=Advances in Neural Information Processing Systems |publisher=MIT Press |volume=12}}</ref> Some AC algorithms are on-policy, some are off-policy. Some apply to either continuous or discrete action spaces. Some work in both cases.
Line 80:
* {{Cite book |last=Bertsekas |first=Dimitri P. |title=Reinforcement learning and optimal control |date=2019 |publisher=Athena Scientific |isbn=978-1-886529-39-7 |edition=2 |___location=Belmont, Massachusetts}}
* {{Cite book |last=Grossi |first=Csaba |title=Algorithms for Reinforcement Learning |date=2010 |publisher=Springer International Publishing |isbn=978-3-031-00423-0 |edition=1 |series=Synthesis Lectures on Artificial Intelligence and Machine Learning |___location=Cham}}
* {{Cite journal |last1=Grondman |first1=Ivo |last2=Busoniu |first2=Lucian |last3=Lopes |first3=Gabriel A. D. |last4=Babuska |first4=Robert |date=November 2012 |title=A Survey of Actor-Critic Reinforcement Learning: Standard and Natural Policy Gradients
{{Artificial intelligence navbox}}
[[Category:Reinforcement learning]]
|