Actor-critic algorithm: Difference between revisions

Content deleted Content added
Overview: critic
Line 17:
 
The goal of policy optimization is to improve the actor. That is, to find some <math>\theta</math> that maximizes the expected episodic reward <math>J(\theta)</math>:<math display="block">
J(\theta) = \mathbb{E}_{\pi_\theta}\left[\sum_{t=0}^{T} \gamma^t r_t\right]
</math>where <math>
\gamma