Policy gradient method: Difference between revisions

Content deleted Content added
Line 213:
</math>
where:
* <math>g = \nabla_\theta \mathcal{L}(\theta_t, \theta) \big|_{\theta = \theta_t}</math> is the policy gradient.
* <math>F = \nabla_\theta^2 \bar{D}_{\text{KL}}(\pi_{\theta} \| \pi_{\theta_t}) \big|_{\theta = \theta_t}</math> is the Fisher information matrix.