Revision as of 11:15, 25 May 2025 edit OAbot (talk \| contribs) Bots 643,717 edits m Open access bot: url-access updated in citation with #oabot. ← Previous edit		Revision as of 02:40, 27 June 2025 edit undo Maxeto0910 (talk \| contribs) Extended confirmed users 117,110 edits →Value iteration Tag: Visual edit Next edit →
Line 75: ====Value iteration==== In value iteration {{harv\|Bellman\|1957}}, which is also called [[backward induction]], the <math>\pi</math> function is not used; instead, the value of <math>\pi(s)</math> is calculated within <math>V(s)</math> whenever it is needed. Substituting the calculation of <math>\pi(s)</math> into the calculation of <math>V(s)</math> gives the combined step;{{explain\|reason=The derivation of the substituion is needed\|date=July 2018}}: :<math> V_{i+1}(s) := \max_a \left\{ \sum_{s'} P_a(s,s') \left( R_a(s,s') + \gamma V_i(s') \right) \right\}, </math>

Markov decision process: Difference between revisions