Revision as of 03:55, 25 January 2025 edit Cosmia Nebula (talk \| contribs) Extended confirmed users 11,304 edits natural policy gradient Tag: Visual edit ← Previous edit		Revision as of 03:59, 25 January 2025 edit undo Cosmia Nebula (talk \| contribs) Extended confirmed users 11,304 edits m →Overview Tag: Visual edit Next edit →
Line 11: The goal of policy optimization is to find some <math>\theta</math> that maximizes the expected episodic reward <math>J(\theta)</math>:<math display="block"> J(\theta) = \mathbb{E}_{\pi_\theta}\left[\sum_{ti\in 0:T} \gamma^ti ~~R_t~~R_i \Big\| S_0 = s_0 \right] </math>where <math> \gamma

Policy gradient method: Difference between revisions