Revision as of 05:29, 21 January 2025 edit Cosmia Nebula (talk \| contribs) Extended confirmed users 11,296 edits →Overview: critic Tag: Visual edit ← Previous edit		Revision as of 05:30, 21 January 2025 edit undo Cosmia Nebula (talk \| contribs) Extended confirmed users 11,296 edits m →Actor Tag: 2017 wikitext editor Next edit →
Line 17: The goal of policy optimization is to improve the actor. That is, to find some <math>\theta</math> that maximizes the expected episodic reward <math>J(\theta)</math>:<math display="block"> J(\theta) = \mathbb{E}_{\pi_\theta}\left[\sum_{t=0}^{T} \gamma^t r_t\right] </math>where <math> \gamma

Actor-critic algorithm: Difference between revisions