Dynamic programming: Difference between revisions

Content deleted Content added
m Fixed reference links that were broken
Citation bot (talk | contribs)
Add: s2cid, pages, volume, journal, year, title, doi, author pars. 1-1. Removed URL that duplicated unique identifier. Converted bare reference to cite template. Removed parameters. Formatted dashes. Some additions/deletions were actually parameter name changes. | You can use this bot yourself. Report bugs here. | Suggested by AManWithNoPlan | All pages linked from cached copy of User:AManWithNoPlan/sandbox2 | via #UCB_webform_linked
Line 18:
The solution to this problem is an optimal control law or policy <math>\mathbf{u}^{\ast} = h(\mathbf{x}(t), t)</math>, which produces an optimal trajectory <math>\mathbf{x}^{\ast}</math> and a [[cost-to-go function]] <math>J^{\ast}</math>. The latter obeys the fundamental equation of dynamic programming:
:<math>- J_{t}^{\ast} = \min_{\mathbf{u}} \left\{ f \left( \mathbf{x}(t), \mathbf{u}(t), t \right) + J_{x}^{\ast \mathsf{T}} \mathbf{g} \left( \mathbf{x}(t), \mathbf{u}(t), t \right) \right\}</math>
a [[partial differential equation]] known as the [[Hamilton–Jacobi–Bellman equation]], in which <math>J_{x}^{\ast} = \frac{\partial J^{\ast}}{\partial \mathbf{x}} = \left[ \frac{\partial J^{\ast}}{\partial x_{1}} ~~~~ \frac{\partial J^{\ast}}{\partial x_{2}} ~~~~ \dots ~~~~ \frac{\partial J^{\ast}}{\partial x_{n}} \right]^{\mathsf{T}}</math> and <math>J_{t}^{\ast} = \frac{\partial J^{\ast}}{\partial t}</math>. One finds the minimizing <math>\mathbf{u}</math> in terms of <math>t</math>, <math>\mathbf{x}</math>, and the unknown function <math>J_{x}^{\ast}</math> and then substitutes the result into the Hamilton–Jacobi–Bellman equation to get the partial differential equation to be solved with boundary condition <math>J \left( t_{1} \right) = b \left( \mathbf{x}(t_{1}), t_{1} \right)</math>.<ref>{{cite book |firstfirst1=M. I. |lastlast1=Kamien |authorlink=Morton Kamien |first2=N. L. |last2=Schwartz |authorlink2=Nancy Schwartz |title=Dynamic Optimization: The Calculus of Variations and Optimal Control in Economics and Management |___location=New York |publisher=Elsevier |edition=Second |year=1991 |isbn=978-0-444-01609-6 |url=https://books.google.com/books?id=0IoGUn8wjDQC&pg=PA261 |page=261 }}</ref> In practice, this generally requires [[Numerical partial differential equations|numerical techniques]] for some discrete approximation to the exact optimization relationship.
 
Alternatively, the continuous process can be approximated by a discrete system, which leads to a following recurrence relation analog to the Hamilton–Jacobi–Bellman equation:
Line 368:
For n=1 the problem is trivial, namely S(1,h,t) = "move a disk from rod h to rod t" (there is only one disk left).
 
The number of moves required by this solution is 2<sup>''n''</sup>&nbsp;&minus;&nbsp;1. If the objective is to '''maximize''' the number of moves (without cycling) then the dynamic programming [[Bellman equation|functional equation]] is slightly more complicated and 3<sup>''n''</sup>&nbsp;&minus;&nbsp;1 moves are required.<ref>{{Citation |author=Moshe Sniedovich |title= OR/MS Games: 2. The Towers of Hanoi Problem |journal=INFORMS Transactions on Education |volume=3 |issue=1 |year=2002 |pages=34–51 |url=https://doi.org/= 10.1287/ited.3.1.45 |postscript=.}}</ref>
 
=== Egg dropping puzzle ===
Line 394:
: ''W''(''n'',''k'') = minimum number of trials required to identify the value of the critical floor under the worst-case scenario given that the process is in state ''s'' = (''n'',''k'').
 
Then it can be shown that<ref name="sniedovich_03">Sniedovich,{{Cite M.journal|doi (2003).= [https://doi.org/10.1287/ited.4.1.48|title = OR/MS Games: 4. The joyJoy of eggEgg-droppingDropping in Braunschweig and Hong Kong].|year = INFORMS2003|last1 = Sniedovich|first1 = Moshe|journal = Informs Transactions on Education,|volume = 4(1)|pages = 48–64.}}</ref>
 
: ''W''(''n'',''k'') = 1 + min{max(''W''(''n'' &minus; 1, ''x'' &minus; 1), ''W''(''n'',''k'' &minus; ''x'')): ''x'' = 1, 2, ..., ''k'' }
Line 540:
}}
 
The word ''dynamic'' was chosen by Bellman to capture the time-varying aspect of the problems, and because it sounded impressive.<ref name="Eddy">{{cite journal |last=Eddy |first=S. R. |authorlink=Sean Eddy |title=What is Dynamic Programming? |journal=Nature Biotechnology |volume=22 |issue= 7|pages=909–910 |year=2004 |doi=10.1038/nbt0704-909 |pmid=15229554 |s2cid=5352062 }}</ref> The word ''programming'' referred to the use of the method to find an optimal ''program'', in the sense of a military schedule for training or logistics. This usage is the same as that in the phrases ''[[linear programming]]'' and ''mathematical programming'', a synonym for [[mathematical optimization]].<ref>{{cite book |lastlast1=Nocedal |firstfirst1=J. |last2=Wright |first2=S. J. |title=Numerical Optimization |url=https://archive.org/details/numericaloptimiz00noce_639 |url-access=limited |page=[https://archive.org/details/numericaloptimiz00noce_639/page/n21 9] |publisher=Springer |year=2006 }}</ref>
 
The above explanation of the origin of the term is lacking. As Russell and Norvig in their book have written, referring to the above story: "This cannot be strictly true, because his first paper using the term (Bellman, 1952) appeared before Wilson became Secretary of Defense in 1953."<ref>{{cite book |lastlast1=Russell |firstfirst1=S. |last2=Norvig |first2=P. |title=Artificial Intelligence: A Modern Approach |edition=3rd |publisher=Prentice Hall |year=2009 |isbn=978-0-13-207148-2 }}</ref> Also, there is a comment in a speech by [http://a2c2.org/awards/richard-e-bellman-control-heritage-award/2004-00-00t000000/harold-j-kushner Harold J. Kushner], where he remembers Bellman. Quoting Kushner as he speaks of Bellman: "On the other hand, when I asked him the same question, he replied that he was trying to upstage Dantzig's linear programming by adding dynamic. Perhaps both motivations were true."
 
== Algorithms that use dynamic programming ==