Content deleted Content added
Line 173:
\max_{\theta_{t+1}} J(\theta_t) + (\theta_{t+1} - \theta_t)^T \nabla_\theta J(\theta_t)\\
\bar{D}_{KL}(\pi_{\theta_{t+1}} \| \pi_{\theta_{t}}) \leq \epsilon
\end{cases}</math>where the KL divergence between two policies is '''averaged''' over the state distribution
=== Fisher information approximation ===
|