Revision as of 04:48, 20 August 2024 edit Condordellanebbia (talk \| contribs) 12 edits m →Reinforcement learning ← Previous edit		Revision as of 04:52, 20 August 2024 edit undo Condordellanebbia (talk \| contribs) 12 edits mNo edit summary Next edit →
Line 77: In this variant, the steps are preferentially applied to states which are in some way important – whether based on the algorithm (there were large changes in <math>V</math> or <math>\pi</math> around those states recently) or based on use (those states are near the starting state, or otherwise of interest to the person or program using the algorithm). ===Computational complexity=== Algorithms for finding optimal policies with [[time complexity]] polynomial in the size of the problem representation exist for finite MDPs. Thus, [[decision problem]]s based on MDPs are in computational [[complexity class]] [[P (complexity)\|P]].<ref>{{cite journal

Markov decision process: Difference between revisions