Content deleted Content added
m Open access bot: url-access updated in citation with #oabot. |
Corrected use of comma next to an equation. |
||
Line 9:
=== Actor ===
The '''actor''' uses a policy function <math>\pi(a|s)</math>, while the critic estimates either the [[value function]] <math>V(s)</math>, the action-value Q-function <math>Q(s,a),
</math>
The actor is a parameterized function <math>\pi_\theta</math>, where <math>\theta</math> are the parameters of the actor. The actor takes as argument the state of the environment <math>s</math> and produces a [[probability distribution]] <math>\pi_\theta(\cdot | s)</math>.
|