Markov decision process: Difference between revisions

Content deleted Content added
Citation bot (talk | contribs)
Alter: title, template type. Add: isbn, volume, date, series, issue, pages, chapter-url, chapter, authors 1-1. Removed or converted URL. Removed access-date with no URL. Removed parameters. Some additions/deletions were parameter name changes. | Use this bot. Report bugs. | Suggested by Headbomb | Linked from Wikipedia:WikiProject_Academic_Journals/Journals_cited_by_Wikipedia/Sandbox | #UCB_webform_linked 349/591
m compound modifier
Line 1:
{{Short description|Mathematical model for sequential decision making under uncertainty}}
 
'''Markov decision process''' ('''MDP'''), also called a [[Stochastic dynamic programming|stochastic dynamic program]] or stochastic control problem, is a model for [[sequential decision making]] when [[Outcome (probability)|outcomes]] are uncertain.<ref>{{Cite book |last=Puterman |first=Martin L. |title=Markov decision processes: discrete stochastic dynamic programming |date=1994 |publisher=Wiley |isbn=978-0-471-61977-2 |series=Wiley series in probability and mathematical statistics. Applied probability and statistics section |___location=New York}}</ref>
 
Line 206 ⟶ 207:
=== Learning automata ===
{{main|Learning automata}}
Another application of MDP process in [[machine learning]] theory is called learning automata. This is also one type of reinforcement learning if the environment is stochastic. The first detail '''learning automata''' paper is surveyed by [[Kumpati S. Narendra|Narendra]] and Thathachar (1974), which were originally described explicitly as [[finite -state automata]].<ref>{{Cite journal|last1=Narendra|first1=K. S.|author-link=Kumpati S. Narendra|last2=Thathachar|first2=M. A. L.|year=1974 |title=Learning Automata – A Survey|journal=IEEE Transactions on Systems, Man, and Cybernetics|volume=SMC-4|issue=4|pages=323–334|doi=10.1109/TSMC.1974.5408453|issn=0018-9472|citeseerx=10.1.1.295.2280}}</ref> Similar to reinforcement learning, a learning automata algorithm also has the advantage of solving the problem when probability or rewards are unknown. The difference between learning automata and Q-learning is that the former technique omits the memory of Q-values, but updates the action probability directly to find the learning result. Learning automata is a learning scheme with a rigorous proof of convergence.<ref name="NarendraEtAl1989">{{Cite book|url=https://archive.org/details/learningautomata00nare|url-access=registration|title=Learning automata: An introduction|last1=Narendra|first1=Kumpati S.|author-link=Kumpati S. Narendra|last2=Thathachar|first2=Mandayam A. L.|year=1989|publisher=Prentice Hall|isbn=9780134855585|language=en}}</ref>
 
In learning automata theory, '''a stochastic automaton''' consists of: