Content deleted Content added
Erel Segal (talk | contribs) Tags: Visual edit Disambiguation links added |
m Open access bot: url-access updated in citation with #oabot. |
||
(26 intermediate revisions by 15 users not shown) | |||
Line 9:
== History ==
An interior point method was discovered by Soviet mathematician I. I. Dikin in 1967.<ref>{{Cite journal |last1=Dikin |first1=I.I. |year=1967 |title=Iterative solution of problems of linear and quadratic programming. |url=https://zbmath.org/?q=an:0189.19504 |journal=Dokl. Akad. Nauk SSSR |volume=174 |issue=1 |pages=747–748|zbl=0189.19504 }}</ref> The method was reinvented in the U.S. in the mid-1980s. In 1984, [[Narendra Karmarkar]] developed a method for [[linear programming]] called [[Karmarkar's algorithm]],<ref>{{cite conference |last1=Karmarkar |first1=N. |year=1984 |title=Proceedings of the sixteenth annual ACM symposium on Theory of computing – STOC '84 |pages=302 |doi=10.1145/800057.808695 |isbn=0-89791-133-4 |archive-url=https://web.archive.org/web/20131228145520/http://retis.sssup.it/~bini/teaching/optim2010/karmarkar.pdf |archive-date=28 December 2013 |doi-access=free |chapter-url=http://retis.sssup.it/~bini/teaching/optim2010/karmarkar.pdf |chapter=A new polynomial-time algorithm for linear programming |url-status=dead}}</ref> which runs in
Any convex optimization problem can be transformed into minimizing (or maximizing) a [[linear function]] over a convex set by converting to the [[Epigraph (mathematics)|epigraph]] form.<ref name=":3">{{cite book |
[[Yurii Nesterov]] and [[Arkadi Nemirovski]] came up with a special class of such barriers that can be used to encode any convex set. They guarantee that the number of [[iteration]]s of the algorithm is bounded by a polynomial in the dimension and accuracy of the solution.<ref>{{Cite journal |mr=2115066 |doi=10.1090/S0273-0979-04-01040-7 |title=The interior-point revolution in optimization: History, recent developments, and lasting consequences |year=2004 |last1=Wright |first1=Margaret H. |journal=Bulletin of the American Mathematical Society |volume=42 |pages=39–57|doi-access=free }}</ref><ref name=":0" />
The class of primal-dual path-following interior-point methods is considered the most successful. [[Mehrotra predictor–corrector method|Mehrotra's predictor–corrector algorithm]] provides the basis for most implementations of this class of methods.<ref>{{cite journal |last=Potra |first=Florian A. |author2=Stephen J. Wright |title=Interior-point methods |journal=Journal of Computational and Applied Mathematics |volume=124 |year=2000 |issue=1–2 |pages=281–302 |doi=10.1016/S0377-0427(00)00433-7|doi-access=free |bibcode=2000JCoAM.124..281P }}</ref>
== Definitions ==
Line 28:
\text{subject to}\quad & g_i(x) \leq 0 \text{ for } i = 1, \dots, m. \\
\end{aligned}
</math>We assume that the constraint functions belong to some family (e.g. quadratic functions), so that the program can be represented by a finite ''vector of coefficients'' (e.g. the coefficients to the quadratic functions). The dimension of this coefficient vector is called the ''size'' of the program. A ''numerical solver'' for a given family of programs is an algorithm that, given the coefficient vector, generates a sequence of approximate solutions ''x<sub>t</sub>'' for ''t''=1,2,..., using finitely many arithmetic operations. A numerical solver is called ''convergent'' if, for any
\begin{aligned}
& f(x_{t}) - f^{*} \leq \epsilon, \\
& g_{i}(x_{t}) \leq \epsilon \quad \text{for} \quad i = 1, \dots, m, \\
& x \in G,
''x'' in ''G'',</blockquote>where f<sup>*</sup> is the optimal solution. A solver is called ''polynomial'' if the total number of arithmetic operations in the first ''T'' steps is at most<blockquote>poly(problem-size) * log(''V''/''ε''),</blockquote>where ''V'' is some data-dependent constant, e.g., the difference between the largest and smallest value in the feasible set. In other words, ''V''/''ε'' is the "relative accuracy" of the solution - the accuracy w.r.t. the largest coefficient. log(''V''/''ε'') represents the number of "accuracy digits". Therefore, a solver is 'polynomial' if each additional digit of accuracy requires a number of operations that is polynomial in the problem size.▼
\end{aligned}
▲
== Types ==
Line 38 ⟶ 40:
* '''Potential reduction methods''': [[Karmarkar algorithm|Karmarkar's algorithm]] was the first one.
* '''Path-following methods''': the algorithms of [[James Renegar]]<ref name=":1">{{Cite journal |last=Renegar |first=James |date=1988-01-01 |title=A polynomial-time algorithm, based on Newton's method, for linear programming |url=https://doi.org/10.1007/BF01580724 |journal=Mathematical Programming |language=en |volume=40 |issue=1 |pages=59–93 |doi=10.1007/BF01580724 |issn=1436-4646|url-access=subscription }}</ref> and Clovis Gonzaga<ref name=":2">{{Citation |last=Gonzaga |first=Clovis C. |title=An Algorithm for Solving Linear Programming Problems in O(n3L) Operations |date=1989 |url=https://doi.org/10.1007/978-1-4613-9617-8_1 |work=Progress in Mathematical Programming: Interior-Point and Related Methods |pages=1–28 |editor-last=Megiddo |editor-first=Nimrod |access-date=2023-11-22 |place=New York, NY |publisher=Springer |language=en |doi=10.1007/978-1-4613-9617-8_1 |isbn=978-1-4613-9617-8|url-access=subscription }}</ref> were the first ones.
* '''Primal-dual methods'''.
Line 50 ⟶ 52:
* The barrier function b(x).
* A policy for determining the penalty parameters ''t<sub>i</sub>''.
* The unconstrained-optimization solver used to solve (''P<sub>i</sub>'') and find ''x<sub>i</sub>'', such as [[Newton's method]]. Note that we can use each ''x<sub>i</sub>'' as a starting-point for solving the next problem (''P<sub>i+1</sub>'').
The main challenge in proving that the method is polytime is that, as the penalty parameter grows, the solution gets near the boundary, and the function becomes steeper. The run-time of solvers such as [[Newton's method]] becomes longer, and it is hard to prove that the total runtime is polynomial.
Line 58 ⟶ 60:
* The constraints (and the objective) are linear functions;
* The barrier function is [[Logarithmic barrier function|logarithmic]]: b(x) := - sum''<sub>j</sub>'' log(''-g<sub>j</sub>''(''x'')).
* The penalty parameter ''t'' is updated geometrically, that is, <math>t_{i+1} := \mu \cdot t_i</math>, where ''μ'' is a constant (they took <math>\mu = 1+0.001\cdot \sqrt{m}</math>, where ''m'' is the number of inequality constraints);
* The solver is Newton's method, and a ''single'' step of Newton is done for each single step in ''t''.
They proved that, in this case, the difference ''x<sub>i</sub>'' - ''x''*(''t<sub>i</sub>'') remains at most 0.01, and f(''x<sub>i</sub>'') - f* is at most 2*''m''/''t<sub>i</sub>''. Thus, the solution accuracy is proportional to 1/''t<sub>i</sub>'', so to add a single accuracy-digit, it is
[[Yurii Nesterov|Yuri Nesterov]] extended the idea from linear to non-linear programs. He noted that the main property of the logarithmic barrier, used in the above proofs, is that it is [[self-concordant]] with a finite barrier parameter. Therefore, many other classes of convex programs can be solved in polytime using a path-following method, if we can find a suitable self-concordant barrier function for their feasible region.<ref name=":0" />{{Rp|___location=Sec.1}}
=== Details ===
We are given a convex optimization problem (P) in "standard form":<blockquote>'''minimize ''c''<sup>T</sup>''x'' s.t. ''x'' in ''G''''', </blockquote>where ''G'' is convex and closed. We can also assume that ''G'' is bounded (we can easily make it bounded by adding a constraint |''x''|≤''R'' for some sufficiently large ''R'').<ref name=":0" />{{Rp|___location=Sec.4}}
To use the interior-point method, we need a [[self-concordant barrier]] for ''G''. Let ''b'' be an ''M''-self-concordant barrier for ''G'', where ''M''≥1 is the self-concordance parameter. We assume that we can compute efficiently the value of ''b'', its gradient, and its [[Hessian matrix|Hessian]], for every point x in the interior of ''G''.
For every ''t''>0, we define the ''penalized objective'' '''f<sub>t</sub>(x) := t''c''<sup>T</sup>''x +'' b(''x'')'''. We define the path of minimizers by: '''x*(t) := arg min f<sub>t</sub>(x)'''. We
For each ''t<sub>i</sub>'', we find an approximate minimum of ''f<sub>ti</sub>'', denoted by ''x<sub>i</sub>''. The approximate minimum is chosen to satisfy the following "closeness condition" (where ''L'' is the ''path tolerance''):<blockquote><math>\sqrt{[\nabla_x f_t(x_i)]^T [\nabla_x^2 f_t(x_i)]^{-1} [\nabla_x f_t(x_i)]} \leq L</math>.</blockquote>To find ''x<sub>i</sub>''<sub>+1</sub>, we start with ''x<sub>i</sub>'' and apply the [[damped Newton method]]. We apply several steps of this method, until the above "closeness relation" is satisfied. The first point that satisfies this relation is denoted by ''x<sub>i</sub>''<sub>+1</sub>.<ref name=":0" />{{Rp|___location=Sec.4}}
Line 98 ⟶ 100:
=== Practical considerations ===
The theoretic guarantees assume that the penalty parameter is increased at the rate <math>\mu = \left(1+r/\sqrt{M}\right)</math>, so the worst-case number of required Newton steps is <math>O(\sqrt{M})</math>. In theory, if ''μ'' is larger (e.g. 2 or more), then the worst-case number of required Newton steps is in <math>O(M)</math>. However, in practice, larger ''μ'' leads to a much faster convergence. These methods are called ''long-step methods''.<ref name=":0" />{{Rp|___location=Sec.4.6}} In practice, if ''μ'' is between 3 and 100, then the program converges within 20-40 Newton steps, regardless of the number of constraints (though the runtime of each Newton step of course grows with the number of constraints). The exact value of ''μ'' within this range has little effect on the
== Potential-reduction methods ==
For potential-reduction methods, the problem is presented in the ''conic form'':<ref name=":0" />{{Rp|___location=Sec.5}} <blockquote>'''minimize ''c''<sup>T</sup>''x'' s.t. ''x'' in ''{b+L}
* A. The feasible set ''{b+L}
* B. We are given in advance a strictly-feasible solution ''x''^, that is, a feasible solution in the interior of ''K''.
* C. We know in advance the optimal objective value, c*, of the problem.
* D. We are given an ''M''-logarithmically-homogeneous [[self-concordant barrier]] ''F'' for the cone ''K''.
Assumptions A, B and D are needed in most interior-point methods. Assumption C is specific to Karmarkar's approach; it can be alleviated by using a "sliding objective value". It is possible to further reduce the program to the ''Karmarkar format'':<blockquote>'''minimize ''s''<sup>T</sup>''x'' s.t. ''x'' in ''M
The method is based on the following [[
Note that in path-following methods the expression is <math>\sqrt{M}</math> rather than ''M'', which is better in theory. But in practice, Karmarkar's method allows taking much larger steps towards the goal, so it may converge much faster than the theoretical guarantees.
Line 169 ⟶ 171:
:<math>(x,\lambda) \to (x + \alpha p_x, \lambda + \alpha p_\lambda).</math>[[File:Interior_Point_Trajectory.webm|center|thumb|400x400px|Trajectory of the iterates of ''x'' by using the interior point method.]]
== Types of
Here are some special cases of convex programs that can be solved efficiently by interior-point methods.<ref name=":0" />{{Rp|___location=Sec.10}}
Line 183 ⟶ 185:
The function <math>b</math> is self-concordant with parameter ''M''=''m'' (the number of constraints). Therefore, the number of required Newton steps for the path-following method is O(''mn''<sup>2</sup>), and the total runtime complexity is O(''m''<sup>3/2</sup> ''n''<sup>2</sup>).{{Clarify|reason=This is the cost for an approximate solution - not an exact solution. The text does not elaborate on this.|date=November 2023}}
===[[Quadratically constrained quadratic program]]s===
Given a quadratically constrained quadratic program of the form:
<math display="block">\begin{aligned}
Line 190 ⟶ 192:
& f_j(x) := x^\top A_j x + b_j^\top x + c_j \leq 0 \quad \text{ for all } j = 1, \dots, m,
\end{aligned}</math>
where all matrices ''A<sub>j</sub>'' are [[Positive semidefinite matrices|positive-semidefinite
We can apply path-following methods with the barrier
<math display="block">b(x) := -\sum_{j=1}^m \ln(-f_j(x)).</math> The function <math>b</math> is a self-concordant barrier with parameter ''M''=''m''. The Newton complexity is O(''(m+n)n''<sup>2</sup>), and the total runtime complexity is O(''m''<sup>1/2</sup> (m+n) ''n''<sup>2</sup>).
===L<sub>p</sub> norm approximation===
Consider a problem of the form
<math display="block">\begin{aligned}
\operatorname{minimize}\quad & \sum_j |v_j - u_j^\top x|_p
\end{aligned},</math>
where each <math>u_j</math> is a vector, each <math>v_j</math> is a scalar, and <math>|\cdot|_p</math> is an [[Lp norm|L<sub>p</sub> norm]] with <math>1< p < \infty.</math> After converting to the standard form, we can apply path-following methods with a self-concordant barrier with parameter ''M''=4''m''. The Newton complexity is O(''(m+n)n''<sup>2</sup>), and the total runtime complexity is O(''m''<sup>1/2</sup> (m+n) ''n''<sup>2</sup>).
===[[Geometric program]]s===
Consider the problem
<math display="block">\begin{aligned}
Line 210 ⟶ 212:
\end{aligned}</math>
There is a self-concordant barrier with parameter 2''k''+''m''. The path-following method has Newton complexity O(''mk''<sup>2</sup>+''k''<sup>3</sup>+''n''<sup>3</sup>) and total complexity O((''k+m'')<sup>1/2</sup>[''mk''<sup>2</sup>+''k''<sup>3</sup>+''n''<sup>3</sup>]).
=== [[Semidefinite program]]s ===
Line 227 ⟶ 229:
* {{cite book |last1=Bonnans |first1=J. Frédéric |last2=Gilbert |first2=J. Charles |last3=Lemaréchal |first3=Claude |authorlink3=Claude Lemaréchal |last4=Sagastizábal |first4=Claudia A. |author4-link=Claudia Sagastizábal |title=Numerical optimization: Theoretical and practical aspects |url=https://www.springer.com/mathematics/applications/book/978-3-540-35445-1 |edition=Second revised ed. of translation of 1997 <!-- ''Optimisation numérique: Aspects théoriques et pratiques'' --> French |series=Universitext |publisher=Springer-Verlag |___location=Berlin |year=2006 |pages=xiv+490 |isbn=978-3-540-35445-1 |doi=10.1007/978-3-540-35447-5 |mr=2265882}}
* {{cite book |title=Numerical Optimization |first=Jorge |last=Nocedal |author2=Stephen Wright |year=1999 |publisher=Springer |___location=New York, NY |isbn=978-0-387-98793-4}}
*{{Cite book | last1=Press | first1=WH | last2=Teukolsky | first2=SA | last3=Vetterling | first3=WT | last4=Flannery | first4=BP | year=2007 | title=Numerical Recipes: The Art of Scientific Computing | edition=3rd | publisher=Cambridge University Press |
{{Optimization algorithms|convex}}
|