[[Reinforcement learning]] uses MDPs where the probabilities orand rewards are unknown.<ref>{{cite journal|author1=Shoham, Y.|author2= Powers, R.|author3= Grenager, T. |year=2003|title= Multi-agent reinforcement learning: a critical survey |pages= 1–13|journal= Technical Report, Stanford University|url=http://jmvidal.cse.sc.edu/library/shoham03a.pdf|access-date=2018-12-12}}</ref>
===Reinforcement Learning for discrete MDPs===
For thisthe purpose of this section, it is useful to define a further function, which corresponds to taking the action <math>a</math> and then continuing optimally (or according to whatever policy one currently has):