Content deleted Content added
init |
Fixed reference date error(s) (see CS1 errors: dates for details) and AWB general fixes, added orphan tag |
||
Line 1:
{{Short description|Reinforcement learning algorithm that combines policy and value estimation}}
{{Orphan|date=January 2025}}
The '''actor-critic algorithm''' (AC) is a family of [[reinforcement learning]] (RL) algorithms that combine policy-based and value-based RL algorithms. It consists of two main components: an "'''actor'''" that determines which actions to take according to a policy function, and a "'''critic'''" that evaluates those actions according to a value function.<ref>{{Cite journal |last=Konda |first=Vijay |last2=Tsitsiklis |first2=John |date=1999 |title=Actor-Critic Algorithms |url=https://proceedings.neurips.cc/paper/1999/hash/6449f44a102fde848669bdd9eb6b76fa-Abstract.html |journal=Advances in Neural Information Processing Systems |publisher=MIT Press |volume=12}}</ref> Some AC algorithms are on-policy, some are off-policy. Some apply to either continuous or discrete action spaces. Some work in both cases.
The AC algorithms are one of the main algorithm families used in modern RL.<ref>{{Cite journal |last=Arulkumaran |first=Kai |last2=Deisenroth |first2=Marc Peter |last3=Brundage |first3=Miles |last4=Bharath |first4=Anil Anthony |date=November 2017
== Overview ==
The actor-critic method belongs to the family of [[
</math>, the advantage function <math>A(s,a)</math>, or any combination thereof.
The actor is a parameterized function <math>\pi_\theta</math>, where <math>\theta</math> are the parameters of the actor. The actor takes as argument the state of the environment <math>s</math> and produces a [[probability distribution]] <math>\pi_\theta(\cdot | s)</math>.
If the action space is discrete, then <math>\sum_{a} \pi_\theta(a | s) = 1</math>. If the action space is continuous, then <math>\int_{a} \pi_\theta(a | s) da = 1</math>.
Line 51 ⟶ 52:
== References ==
{{Reflist|30em}}
* {{Cite journal |last=Konda |first=Vijay R. |last2=Tsitsiklis |first2=John N. |date=January 2003
* {{Cite book |last=Sutton |first=Richard S. |title=Reinforcement learning: an introduction |last2=Barto |first2=Andrew G. |date=2018 |publisher=The MIT Press |isbn=978-0-262-03924-6 |edition=2 |series=Adaptive computation and machine learning series |___location=Cambridge, Massachusetts}}
* {{Cite book |last=Bertsekas |first=Dimitri P. |title=Reinforcement learning and optimal control |date=2019 |publisher=Athena Scientific |isbn=978-1-886529-39-7 |edition=2 |___location=Belmont, Massachusetts}}
|