Revision as of 15:51, 24 May 2025 edit 2001:9e8:2d75:e600:c1d0:9d69:6873:b26a (talk) →Actor-critic methods: fixed small error Tags: Mobile edit Mobile web edit ← Previous edit		Revision as of 16:43, 22 June 2025 edit undo 2a02:2455:17f2:1c00:921b:eff:fef8:85f3 (talk) No edit summary Next edit →
Line 117: == Variance reduction == REINFORCE is an '''on-policy''' algorithm, meaning that the trajectories used for the update must be sampled from the current policy <math>\pi_\theta</math>. This can lead to high variance in the updates, as the returns <math>R(\tau)</math> can vary significantly between trajectories. Many variants of REINFORCE ~~has~~have been introduced, under the title of '''[[variance reduction]]'''. === REINFORCE with baseline ===

Policy gradient method: Difference between revisions