Revision as of 20:22, 25 May 2025 edit OAbot (talk \| contribs) Bots 643,717 edits m Open access bot: url-access updated in citation with #oabot. ← Previous edit		Revision as of 13:20, 4 July 2025 edit undo 213.147.161.159 (talk) Corrected use of comma next to an equation. Next edit →
Line 9: === Actor === The '''actor''' uses a policy function <math>\pi(a\|s)</math>, while the critic estimates either the [[value function]] <math>V(s)</math>, the action-value Q-function <math>Q(s,a), </math>, the advantage function <math>A(s,a)</math>, or any combination thereof. The actor is a parameterized function <math>\pi_\theta</math>, where <math>\theta</math> are the parameters of the actor. The actor takes as argument the state of the environment <math>s</math> and produces a [[probability distribution]] <math>\pi_\theta(\cdot \| s)</math>.

Actor-critic algorithm: Difference between revisions