Revision as of 04:44, 20 August 2024 edit Condordellanebbia (talk \| contribs) 12 edits mNo edit summary ← Previous edit		Revision as of 04:48, 20 August 2024 edit undo Condordellanebbia (talk \| contribs) 12 edits m →Reinforcement learning Next edit →
Line 175: {{main\|Reinforcement learning}} [[Reinforcement learning]] uses MDPs where the probabilities orand rewards are unknown.<ref>{{cite journal\|author1=Shoham, Y.\|author2= Powers, R.\|author3= Grenager, T. \|year=2003\|title= Multi-agent reinforcement learning: a critical survey \|pages= 1–13\|journal= Technical Report, Stanford University\|url=http://jmvidal.cse.sc.edu/library/shoham03a.pdf\|access-date=2018-12-12}}</ref> ===Reinforcement Learning for discrete MDPs=== For ~~this~~the purpose of this section, it is useful to define a further function, which corresponds to taking the action <math>a</math> and then continuing optimally (or according to whatever policy one currently has): :<math>\ Q(s,a) = \sum_{s'} P_a(s,s') (R_a(s,s') + \gamma V(s')).\ </math>

Markov decision process: Difference between revisions