Content deleted Content added
→Overview: critic |
m →Actor |
||
Line 17:
The goal of policy optimization is to improve the actor. That is, to find some <math>\theta</math> that maximizes the expected episodic reward <math>J(\theta)</math>:<math display="block">
J(\theta) = \mathbb{E}_{\pi_\theta}\left[\sum_{t=0}^{T} \gamma^t r_t\right]
</math>where <math>
\gamma
|