Content deleted Content added
m →Actor |
mNo edit summary |
||
Line 30:
The goal of policy gradient method is to optimize <math>J(\theta)</math> by [[Gradient descent|gradient ascent]] on the policy gradient <math>\nabla J(\theta)</math>.
As detailed on the [[Policy gradient method#Actor-critic methods|policy gradient method]] page, there are many [[Unbiased estimator|unbiased estimators]] of the policy gradient:<math display="block">\nabla_\theta J(\theta) =
\cdot \Psi_j
\Big|S_0 = s_0 \right]</math>where <math display="inline">\Psi_j</math> is a linear sum of the following:
|