Content deleted Content added
No edit summary Tags: Reverted Visual edit Mobile edit Mobile web edit |
No edit summary Tags: Reverted references removed Visual edit Mobile edit Mobile web edit |
||
Line 14:
In terms of mathematical optimization, dynamic programming usually refers to simplifying a decision by breaking it down into a sequence of decision steps over time. This is done by defining a sequence of '''value functions''' ''V''<sub>1</sub>, ''V''<sub>2</sub>, ..., ''V''<sub>''n''</sub> taking ''y'' as an argument representing the '''[[State variable|state]]''' of the system at times ''i'' from 1 to ''n''. The definition of ''V''<sub>''n''</sub>(''y'') is the value obtained in state ''y'' at the last time ''n''. The values ''V''<sub>''i''</sub> at earlier times ''i'' = ''n'' −1, ''n'' − 2, ..., 2, 1 can be found by working backwards, using a [[Recursion|recursive]] relationship called the [[Bellman equation]]. For ''i'' = 2, ..., ''n'', ''V''<sub>''i''−1</sub> at any state ''y'' is calculated from ''V''<sub>''i''</sub> by maximizing a simple function (usually the sum) of the gain from a decision at time ''i'' − 1 and the function ''V''<sub>''i''</sub> at the new state of the system if this decision is made. Since ''V''<sub>''i''</sub> has already been calculated for the needed states, the above operation yields ''V''<sub>''i''−1</sub> for those states. Finally, ''V''<sub>1</sub> at the initial state of the system is the value of the optimal solution. The optimal values of the decision variables can be recovered, one by one, by tracking back the calculations already performed.
In
▲In [[control theory]], a typical problem is to find an admissible control <math>\mathbf{u}^{\ast}</math> which causes the system <math>\dot{\mathbf{x}}(t) = \mathbf{g} \left( \mathbf{x}(t), \mathbf{u}(t), t \right)</math> to follow an admissible trajectory <math>\mathbf{x}^{\ast}</math> on a continuous time interval <math>t_{0} \leq t \leq t_{1}</math> that minimizes a [[Loss function|cost function]]
The solution to this problem is an optimal control law or policy
▲The solution to this problem is an optimal control law or policy <math>\mathbf{u}^{\ast} = h(\mathbf{x}(t), t)</math>, which produces an optimal trajectory <math>\mathbf{x}^{\ast}</math> and a [[cost-to-go function]] <math>J^{\ast}</math>. The latter obeys the fundamental equation of dynamic programming:
a known as the , in which and. One finds the minimizingin terms of <math>t</math>,, and the unknown function and then substitutes the result into the Hamilton–Jacobi–Bellman equation to get the partial differential equation to be solved with boundary condition In practice, this generally requires for some discrete approximation to the exact optimization relationship.
Alternatively, the continuous process can be approximated by a discrete system, which leads to a following recurrence relation analog to the Hamilton–Jacobi–Bellman equation:
at the <math>k</math>-th stage of <math>n</math> equally spaced discrete time intervals, and where <math>\hat{f}</math> and <math>\hat{\mathbf{g}}</math> denote discrete approximations to <math>f</math> and <math>\mathbf{g}</math>. This functional equation is known as the
==== Example from economics: Ramsey's problem of optimal saving ====
{{See also|Ramsey–Cass–Koopmans model}}
In economics, the objective is generally to maximize (rather than minimize) some dynamic [[social welfare function]]. In Ramsey's problem, this function relates amounts of consumption to levels of [[utility]]. Loosely speaking, the planner faces the trade-off between contemporaneous consumption and future consumption (via investment in
:<math>k_{t+1} = \hat{g} \left( k_{t}, c_{t} \right) = f(k_{t}) - c_{t}</math>
where <math>c</math> is consumption, <math>k</math> is capital, and <math>f</math> is a
Let <math>c_t</math> be consumption in period {{mvar|t}}, and assume consumption yields
Written this way, the problem looks complicated, because it involves solving for all the choice variables <math>c_0, c_1, c_2, \ldots , c_T</math>. (The capital <math>k_0</math> is not a choice variable—the consumer's initial capital is taken as given.)
The dynamic programming approach to solve this problem involves breaking it apart into a sequence of smaller decisions. To do so, we define a sequence of ''value functions'' <math>V_t(k)</math>, for
The value of any quantity of capital at any previous time can be calculated by
: <math>V_t(k_t) \, = \, \max \left( \ln(c_t) + b V_{t+1}(k_{t+1}) \right)</math> subject to <math>k_{t+1}=Ak^a_t - c_t \geq 0</math>
Line 45 ⟶ 42:
This problem is much simpler than the one we wrote down before, because it involves only two decision variables, <math>c_t</math> and <math>k_{t+1}</math>. Intuitively, instead of choosing his whole lifetime plan at birth, the consumer can take things one step at a time. At time {{mvar|t}}, his current capital <math>k_t</math> is given, and he only needs to choose current consumption <math>c_t</math> and saving <math>k_{t+1}</math>.
To actually solve this problem, we work backwards. For simplicity, the current level of capital is denoted as {{mvar|k}}.
Working backwards, it can be shown that the value function at time <math>t=T-j</math> is
where each <math>v_{T-j}</math> is a constant, and the optimal amount to consume at time <math>t=T-j</math> is
which can be simplified to
We see that it is optimal to consume a larger fraction of current wealth as one gets older, finally consuming all remaining wealth in period {{mvar|T}}, the last period of life.
|