Policy gradient method: Difference between revisions

Content deleted Content added
correcting according to "Articles for possible copyedit from 2025-2-20 dump" – "is,This", "gradientwhich", "estimatorand", "sinceby", "tryinguntil", "advantageunder", "advantagewhere"
merge hidden blocks
Line 49:
{{hidden begin|style=width:100%|ta1=center|border=1px #aaa solid|title=Proof}}
 
{{Math proof|title=Proof of Lemma|proof=
 
Use the [[reparameterization trick#REINFORCE estimator|reparameterization trick]].
Line 82:
\end{aligned}
</math>
}}
}}{{hidden end}}
 
{{Math proof|title=Proof of two identities|proof=
{{hidden begin|style=width:100%|ta1=center|border=1px #aaa solid|title=Proof}}
 
{{Math proof|title=Proof|proof=
Applying the [[reparameterization trick#REINFORCE estimator|reparameterization trick]],