Content deleted Content added
No edit summary |
|||
Line 124:
The algorithm uses the modified gradient estimator<math display="block">g_t \leftarrow
\frac 1N \sum_{k=1}^N \left[\sum_{j\in 0:T} \nabla_{\theta_t}\ln\pi_\theta(A_{j,k}| S_{
=== Actor-critic methods ===
|