Revision as of 08:58, 14 October 2024 edit Citation bot (talk \| contribs) Bots 5,869,817 edits Alter: title, template type. Add: isbn, volume, date, series, issue, pages, chapter-url, chapter, authors 1-1. Removed or converted URL. Removed access-date with no URL. Removed parameters. Some additions/deletions were parameter name changes. \| Use this bot. Report bugs. \| Suggested by Headbomb \| Linked from Wikipedia:WikiProject_Academic_Journals/Journals_cited_by_Wikipedia/Sandbox \| #UCB_webform_linked 349/591 ← Previous edit		Revision as of 15:40, 20 December 2024 edit undo Kvng (talk \| contribs) Extended confirmed users, New page reviewers 116,056 edits m compound modifier Next edit →
Line 1: {{Short description\|Mathematical model for sequential decision making under uncertainty}} '''Markov decision process''' ('''MDP'''), also called a [[Stochastic dynamic programming\|stochastic dynamic program]] or stochastic control problem, is a model for [[sequential decision making]] when [[Outcome (probability)\|outcomes]] are uncertain.<ref>{{Cite book \|last=Puterman \|first=Martin L. \|title=Markov decision processes: discrete stochastic dynamic programming \|date=1994 \|publisher=Wiley \|isbn=978-0-471-61977-2 \|series=Wiley series in probability and mathematical statistics. Applied probability and statistics section \|___location=New York}}</ref> Line 206 ⟶ 207: === Learning automata === {{main\|Learning automata}} Another application of MDP process in [[machine learning]] theory is called learning automata. This is also one type of reinforcement learning if the environment is stochastic. The first detail '''learning automata''' paper is surveyed by [[Kumpati S. Narendra\|Narendra]] and Thathachar (1974), which were originally described explicitly as [[finite -state automata]].<ref>{{Cite journal\|last1=Narendra\|first1=K. S.\|author-link=Kumpati S. Narendra\|last2=Thathachar\|first2=M. A. L.\|year=1974 \|title=Learning Automata – A Survey\|journal=IEEE Transactions on Systems, Man, and Cybernetics\|volume=SMC-4\|issue=4\|pages=323–334\|doi=10.1109/TSMC.1974.5408453\|issn=0018-9472\|citeseerx=10.1.1.295.2280}}</ref> Similar to reinforcement learning, a learning automata algorithm also has the advantage of solving the problem when probability or rewards are unknown. The difference between learning automata and Q-learning is that the former technique omits the memory of Q-values, but updates the action probability directly to find the learning result. Learning automata is a learning scheme with a rigorous proof of convergence.<ref name="NarendraEtAl1989">{{Cite book\|url=https://archive.org/details/learningautomata00nare\|url-access=registration\|title=Learning automata: An introduction\|last1=Narendra\|first1=Kumpati S.\|author-link=Kumpati S. Narendra\|last2=Thathachar\|first2=Mandayam A. L.\|year=1989\|publisher=Prentice Hall\|isbn=9780134855585\|language=en}}</ref> In learning automata theory, '''a stochastic automaton''' consists of:

Markov decision process: Difference between revisions