Actor-critic algorithm: Difference between revisions

Content deleted Content added
OAbot (talk | contribs)
m Open access bot: url-access updated in citation with #oabot.
Corrected use of comma next to an equation.
Line 9:
 
=== Actor ===
The '''actor''' uses a policy function <math>\pi(a|s)</math>, while the critic estimates either the [[value function]] <math>V(s)</math>, the action-value Q-function <math>Q(s,a),
</math>, the advantage function <math>A(s,a)</math>, or any combination thereof.
 
The actor is a parameterized function <math>\pi_\theta</math>, where <math>\theta</math> are the parameters of the actor. The actor takes as argument the state of the environment <math>s</math> and produces a [[probability distribution]] <math>\pi_\theta(\cdot | s)</math>.