Content deleted Content added
→Actor-critic methods: fixed small error Tags: Mobile edit Mobile web edit |
No edit summary |
||
Line 117:
== Variance reduction ==
REINFORCE is an '''on-policy''' algorithm, meaning that the trajectories used for the update must be sampled from the current policy <math>\pi_\theta</math>. This can lead to high variance in the updates, as the returns <math>R(\tau)</math> can vary significantly between trajectories. Many variants of REINFORCE
=== REINFORCE with baseline ===
|