Revision as of 07:05, 13 March 2025 edit Dhtwiki (talk \| contribs) Extended confirmed users, Mass message senders 56,534 edits correcting according to "Articles for possible copyedit from 2025-2-20 dump" – "is,This", "gradientwhich", "estimatorand", "sinceby", "tryinguntil", "advantageunder", "advantagewhere" ← Previous edit		Revision as of 02:39, 13 April 2025 edit undo Cosmia Nebula (talk \| contribs) Extended confirmed users 11,304 edits merge hidden blocks Tag: 2017 wikitext editor Next edit →
Line 49: {{hidden begin\|style=width:100%\|ta1=center\|border=1px #aaa solid\|title=Proof}} {{Math proof\|title=Proof of Lemma\|proof= Use the [[reparameterization trick#REINFORCE estimator\|reparameterization trick]]. Line 82: \end{aligned} </math> }} ~~}}{{hidden end}}~~ {{Math proof\|title=Proof of two identities\|proof=▼ ~~{{hidden begin\|style=width:100%\|ta1=center\|border=1px #aaa solid\|title=Proof}}~~ ▲{{Math proof\|title=Proof\|proof= Applying the [[reparameterization trick#REINFORCE estimator\|reparameterization trick]],

Policy gradient method: Difference between revisions