Actor-critic algorithm: Difference between revisions

Content deleted Content added
Citation bot (talk | contribs)
Altered url. URLs might have been anonymized. Add: bibcode, arxiv, authors 1-1. Removed parameters. Some additions/deletions were parameter name changes. | Use this bot. Report bugs. | Suggested by Cosmia Nebula | #UCB_webform
mNo edit summary
Line 1:
{{Short description|Reinforcement learning algorithmalgorithms that combinescombine policy and value estimation}}
The '''actor-critic algorithm''' (AC) is a family of [[reinforcement learning]] (RL) algorithms that combine policy-based RL algorithms such as [[Policy gradient method|policy gradient methods]], and value-based RL algorithms such as value iteration, [[Q-learning]], [[State–action–reward–state–action|SARSA]], and [[Temporal difference learning|TD learning]].<ref>{{Cite journal |last1=Arulkumaran |first1=Kai |last2=Deisenroth |first2=Marc Peter |last3=Brundage |first3=Miles |last4=Bharath |first4=Anil Anthony |date=November 2017 |title=Deep Reinforcement Learning: A Brief Survey |url=https://ieeexplore.ieee.org/document/8103164 |journal=IEEE Signal Processing Magazine |volume=34 |issue=6 |pages=26–38 |doi=10.1109/MSP.2017.2743240 |arxiv=1708.05866 |bibcode=2017ISPM...34...26A |issn=1053-5888}}</ref>