Policy gradient method: Difference between revisions

Content deleted Content added
Actor-critic methods: fixed small error
Tags: Mobile edit Mobile web edit
No edit summary
Line 117:
 
== Variance reduction ==
REINFORCE is an '''on-policy''' algorithm, meaning that the trajectories used for the update must be sampled from the current policy <math>\pi_\theta</math>. This can lead to high variance in the updates, as the returns <math>R(\tau)</math> can vary significantly between trajectories. Many variants of REINFORCE hashave been introduced, under the title of '''[[variance reduction]]'''.
 
=== REINFORCE with baseline ===