Revision as of 20:51, 25 July 2025 edit Headbomb (talk \| contribs) Edit filter managers, Autopatrolled, Extended confirmed users, Page movers, File movers, New page reviewers, Pending changes reviewers, Rollbackers, Template editors 472,967 edits ce ← Previous edit		Revision as of 20:51, 25 July 2025 edit undo Citation bot (talk \| contribs) Bots 5,863,391 edits Added bibcode. Removed URL that duplicated identifier. \| Use this bot. Report bugs. \| Suggested by Headbomb \| #UCB_toolbar Next edit →
Line 1: {{Short description\|Reinforcement learning algorithms that combine policy and value estimation}} The '''actor-critic algorithm''' (AC) is a family of [[reinforcement learning]] (RL) algorithms that combine policy-based RL algorithms such as [[policy gradient method]]s, and value-based RL algorithms such as value iteration, [[Q-learning]], [[State–action–reward–state–action\|SARSA]], and [[Temporal difference learning\|TD learning]].<ref>{{Cite journal \|last1=Arulkumaran \|first1=Kai \|last2=Deisenroth \|first2=Marc Peter \|last3=Brundage \|first3=Miles \|last4=Bharath \|first4=Anil Anthony \|date=November 2017 \|title=Deep Reinforcement Learning: A Brief Survey ~~\|url=https://ieeexplore.ieee.org/document/8103164~~ \|journal=IEEE Signal Processing Magazine \|volume=34 \|issue=6 \|pages=26–38 \|doi=10.1109/MSP.2017.2743240 \|arxiv=1708.05866 \|bibcode=2017ISPM...34...26A \|issn=1053-5888}}</ref> An AC algorithm consists of two main components: an "'''actor'''" that determines which actions to take according to a policy function, and a "'''critic'''" that evaluates those actions according to a value function.<ref>{{Cite journal \|last1=Konda \|first1=Vijay \|last2=Tsitsiklis \|first2=John \|date=1999 \|title=Actor-Critic Algorithms \|url=https://proceedings.neurips.cc/paper/1999/hash/6449f44a102fde848669bdd9eb6b76fa-Abstract.html \|journal=Advances in Neural Information Processing Systems \|publisher=MIT Press \|volume=12}}</ref> Some AC algorithms are on-policy, some are off-policy. Some apply to either continuous or discrete action spaces. Some work in both cases. Line 80: * {{Cite book \|last=Bertsekas \|first=Dimitri P. \|title=Reinforcement learning and optimal control \|date=2019 \|publisher=Athena Scientific \|isbn=978-1-886529-39-7 \|edition=2 \|___location=Belmont, Massachusetts}} * {{Cite book \|last=Grossi \|first=Csaba \|title=Algorithms for Reinforcement Learning \|date=2010 \|publisher=Springer International Publishing \|isbn=978-3-031-00423-0 \|edition=1 \|series=Synthesis Lectures on Artificial Intelligence and Machine Learning \|___location=Cham}} * {{Cite journal \|last1=Grondman \|first1=Ivo \|last2=Busoniu \|first2=Lucian \|last3=Lopes \|first3=Gabriel A. D. \|last4=Babuska \|first4=Robert \|date=November 2012 \|title=A Survey of Actor-Critic Reinforcement Learning: Standard and Natural Policy Gradients ~~\|url=https://ieeexplore.ieee.org/document/6392457~~ \|journal=IEEE Transactions on Systems, Man, and Cybernetics - Part C: Applications and Reviews \|volume=42 \|issue=6 \|pages=1291–1307 \|doi=10.1109/TSMCC.2012.2218595 \|bibcode=2012ITHMS..42.1291G \|issn=1094-6977}} {{Artificial intelligence navbox}} [[Category:Reinforcement learning]]

Actor-critic algorithm: Difference between revisions