Revision as of 05:56, 21 January 2025 edit Citation bot (talk \| contribs) Bots 5,863,784 edits Altered url. URLs might have been anonymized. Add: bibcode, arxiv, authors 1-1. Removed parameters. Some additions/deletions were parameter name changes. \| Use this bot. Report bugs. \| Suggested by Cosmia Nebula \| #UCB_webform ← Previous edit		Revision as of 05:57, 21 January 2025 edit undo Cosmia Nebula (talk \| contribs) Extended confirmed users 11,296 edits mNo edit summary Tag: 2017 wikitext editor Next edit →
Line 1: {{Short description\|Reinforcement learning ~~algorithm~~algorithms that ~~combines~~combine policy and value estimation}} The '''actor-critic algorithm''' (AC) is a family of [[reinforcement learning]] (RL) algorithms that combine policy-based RL algorithms such as [[Policy gradient method\|policy gradient methods]], and value-based RL algorithms such as value iteration, [[Q-learning]], [[State–action–reward–state–action\|SARSA]], and [[Temporal difference learning\|TD learning]].<ref>{{Cite journal \|last1=Arulkumaran \|first1=Kai \|last2=Deisenroth \|first2=Marc Peter \|last3=Brundage \|first3=Miles \|last4=Bharath \|first4=Anil Anthony \|date=November 2017 \|title=Deep Reinforcement Learning: A Brief Survey \|url=https://ieeexplore.ieee.org/document/8103164 \|journal=IEEE Signal Processing Magazine \|volume=34 \|issue=6 \|pages=26–38 \|doi=10.1109/MSP.2017.2743240 \|arxiv=1708.05866 \|bibcode=2017ISPM...34...26A \|issn=1053-5888}}</ref>

Actor-critic algorithm: Difference between revisions