Content deleted Content added
→Actor-critic methods: fixed small error Tags: Mobile edit Mobile web edit |
|||
Line 128:
=== Actor-critic methods ===
{{Main|Actor-critic algorithm}}
If <math display="inline">b_i</math> is chosen well, such that <math display="inline">b_i(S_t) \approx \sum_{\tau \in t:T} (\gamma^\tau R_\tau) = \gamma^
\Big|S_0 = s_0 \right]</math>Note that, as the policy <math>\pi_{\theta_t}</math> updates, the value function <math>V^{\pi_{\theta_i}}(S_t)</math> updates as well, so the baseline should also be updated. One common approach is to train a separate function that estimates the value function, and use that as the baseline. This is one of the [[actor-critic method]]s, where the policy function is the actor and the value function is the critic.
|