Revision as of 04:41, 25 January 2025 edit Cosmia Nebula (talk \| contribs) Extended confirmed users 11,304 edits m →Formulation Tag: Visual edit ← Previous edit		Revision as of 04:41, 25 January 2025 edit undo Cosmia Nebula (talk \| contribs) Extended confirmed users 11,304 edits →Formulation Tag: Visual edit Next edit →
Line 221: </math>So far, this is essentially the same as natural gradient method. However, TRPO improves upon it by two modifications: * Use [[conjugate gradient method]] to solve for <math> x </math> in <math>Fx = g</math> iteratively without explicit matrix inversion.

Policy gradient method: Difference between revisions