Content deleted Content added
No edit summary |
No edit summary |
||
Line 120:
=== REINFORCE with baseline ===
A common way for reducing variance is the '''REINFORCE with baseline''' algorithm, based on the following identity:<math display="block">\nabla_\theta J(\theta)= \mathbb{E}_{\pi_\theta}\left[\sum_{
\Big|S_0 = s_0 \right]</math>for any function <math>b: \text{States} \to \R</math>. This can be proven by applying the previous lemma.
The algorithm uses the modified gradient estimator<math display="block">g_t \leftarrow
\frac 1N \sum_{
=== Actor-critic methods ===
|