Revision as of 04:45, 25 January 2025 edit Cosmia Nebula (talk \| contribs) Extended confirmed users 11,304 edits →Formulation Tag: Visual edit ← Previous edit		Revision as of 04:46, 25 January 2025 edit undo Cosmia Nebula (talk \| contribs) Extended confirmed users 11,304 edits →Formulation Tag: 2017 wikitext editor Next edit →
Line 213: </math> where: * <math>g = \nabla_\theta ~~\mathcal{~~L}(\theta_t, \theta) \big\|_{\theta = \theta_t}</math> is the policy gradient. * <math>F = \nabla_\theta^2 \bar{D}_{\text{KL}}(\pi_{\theta} \\| \pi_{\theta_t}) \big\|_{\theta = \theta_t}</math> is the Fisher information matrix.

Policy gradient method: Difference between revisions