Content deleted Content added
→Properties: Fixed typo Tags: canned edit summary Mobile edit Mobile app edit Android app edit |
mNo edit summary |
||
(26 intermediate revisions by 16 users not shown) | |||
Line 1:
[[Image:ConvexFunction.svg|thumb|300px|right|Convex function on an [[interval (mathematics)|interval]].]]{{Use American English|date = March 2019}}
{{Short description|Real function with secant line between points above the graph itself}}
[[Image:Epigraph convex.svg|right|thumb|300px|A function (in black) is convex if and only if the region above its [[Graph of a function|graph]] (in green) is a [[convex set]].]]
Line 5:
[[File:Convex vs. Not-convex.jpg|thumb|right|300px|Convex vs. Not convex]]
In [[mathematics]], a [[real-valued function]] is called '''convex''' if the [[line segment]] between any two distinct points on the [[graph of a function|graph of the function]] lies above or on the graph between the two points. Equivalently, a function is convex if its [[epigraph (mathematics)|''epigraph'']] (the set of points on or above the graph of the function) is a [[convex set]].
In simple terms, a convex function graph is shaped like a cup <math>\cup</math> (or a straight line like a linear function), while a [[concave function]]'s graph is shaped like a cap <math>\cap</math>.
A twice-[[differentiable function|differentiable]] function of a single variable is convex [[if and only if]] its [[second derivative]] is nonnegative on its entire [[___domain of a function|___domain]].<ref>{{Cite web|url=https://www.stat.cmu.edu/~larry/=stat705/Lecture2.pdf |title=Lecture Notes 2|website=www.stat.cmu.edu|access-date=3 March 2017}}</ref> Well-known examples of convex functions of a single variable include a [[linear function]] <math>f(x) = cx</math> (where <math>c</math> is a [[real number]]), a [[quadratic function]] <math>cx^2</math> (<math>c</math> as a nonnegative real number) and an [[exponential function]] <math>ce^x</math> (<math>c</math> as a nonnegative real number).
Convex functions play an important role in many areas of mathematics. They are especially important in the study of [[optimization]] problems where they are distinguished by a number of convenient properties. For instance, a strictly convex function on an open set has no more than one minimum. Even in infinite-dimensional spaces, under suitable additional hypotheses, convex functions continue to satisfy such properties and as a result, they are the most well-understood functionals in the [[calculus of variations]]. In [[probability theory]], a convex function applied to the [[expected value]] of a [[random variable]] is always bounded above by the expected value of the convex function of the random variable. This result, known as [[Jensen's inequality]], can be used to deduce inequalities such as the [[inequality of arithmetic and geometric means|arithmetic–geometric mean inequality]] and [[Hölder's inequality]].▼
▲Convex functions play an important role in many areas of mathematics. They are especially important in the study of [[optimization]] problems where they are distinguished by a number of convenient properties. For instance, a strictly convex function on an [[open set]] has no more than one [[maximum and minimum|minimum]]. Even in infinite-dimensional spaces, under suitable additional hypotheses, convex functions continue to satisfy such properties and as a result, they are the most well-understood functionals in the [[calculus of variations]]. In [[probability theory]], a convex function applied to the [[expected value]] of a [[random variable]] is always bounded above by the expected value of the convex function of the random variable. This result, known as [[Jensen's inequality]], can be used to deduce [[inequality (mathematics)|inequalities]] such as the [[inequality of arithmetic and geometric means|arithmetic–geometric mean inequality]] and [[Hölder's inequality]].
==Definition==
Line 19 ⟶ 22:
<li>For all <math>0 \leq t \leq 1</math> and all <math>x_1, x_2 \in X</math>:
<math display=block>f\left(t x_1 + (1-t) x_2\right) \leq t f\left(x_1\right) + (1-t) f\left(x_2\right)</math>
The right hand side represents the straight line between <math>\left(x_1, f\left(x_1\right)\right)</math> and <math>\left(x_2, f\left(x_2\right)\right)</math> in the graph of <math>f</math> as a function of <math>t;</math> increasing <math>t</math> from <math>0</math> to <math>1</math> or decreasing <math>t</math> from <math>1</math> to <math>0</math> sweeps this line. Similarly, the argument of the function <math>f</math> in the left hand side represents the straight line between <math>x_1</math> and <math>x_2</math> in <math>X</math> or the <math>x</math>-axis of the graph of <math>f.</math> So, this condition requires that the straight line between any pair of points on the curve of <math>f</math>
</li>
<li>For all <math>0 < t < 1</math> and all <math>x_1, x_2 \in X</math> such that <math>x_1
<math display=block>f\left(t x_1 + (1-t) x_2\right) \leq t f\left(x_1\right) + (1-t) f\left(x_2\right)</math>
Line 34 ⟶ 37:
<math display=block>f\left(t x_1 + (1-t) x_2\right) < t f\left(x_1\right) + (1-t) f\left(x_2\right)</math>
A strictly convex function <math>f</math> is a function such that the straight line between any pair of points on the curve <math>f</math> is above the curve <math>f</math> except for the intersection points between the straight line and the curve. An example of a function which is convex but not strictly convex is <math>f(x,y) = x^2 + y</math>. This function is not strictly convex because any two points sharing an x coordinate will have a straight line between them, while any two points NOT sharing an x coordinate will have a greater value of the function than the points between them.
The function <math>f</math> is said to be '''{{em|[[Concave function|concave]]}}''' (resp. '''{{em|strictly concave}}''') if <math>-f</math> (<math>f</math> multiplied by −1) is convex (resp. strictly convex).
Line 47 ⟶ 50:
=== Functions of one variable ===
* Suppose <math>f</math> is a function of one [[real number|real]] variable defined on an interval, and let <math display=block>R(x_1, x_2) = \frac{f(x_2) - f(x_1)}{x_2 - x_1}</math> (note that <math>R(x_1, x_2)</math> is the slope of the purple line in the
* A convex function <math>f</math> of one real variable defined on some [[open interval]] <math>C</math> is [[Continuous function|continuous]] on <math>C
* A [[differentiable function|differentiable]] function of one variable is convex on an interval if and only if its [[derivative]] is [[monotonically non-decreasing]] on that interval. If a function is differentiable and convex then it is also [[continuously differentiable]].
* A differentiable function of one variable is convex on an interval if and only if its graph lies above all of its [[tangent]]s:<ref name="boyd">{{cite book| title=Convex Optimization| first1=Stephen P.|last1=Boyd |first2=Lieven| last2=Vandenberghe | year = 2004 |publisher=Cambridge University Press| isbn=978-0-521-83378-3| url= https://web.stanford.edu/~boyd/cvxbook/bv_cvxbook.pdf#page=83 |format=pdf | access-date=October 15, 2011}}</ref>{{rp|69}} <math display=block>f(x) \geq f(y) + f'(y) (x-y)</math> for all <math>x</math> and <math>y</math> in the interval.
* A twice differentiable function of one variable is convex on an interval if and only if its [[second derivative]] is non-negative there; this gives a practical test for convexity. Visually, a twice differentiable convex function "curves up", without any bends the other way ([[inflection point]]s). If its second derivative is positive at all points then the function is strictly convex, but the [[
**This property and the above property in terms of "...its derivative is monotonically non-decreasing..." are not equal since if <math>f''</math> is non-negative on an interval <math>X</math> then <math>f'</math> is monotonically non-decreasing on <math>X</math> while its converse is not true, for example, <math>f'</math> is monotonically non-decreasing on <math>X</math> while its derivative <math>f''</math> is not defined at some points on <math>X</math>.
* If <math>f</math> is a convex function of one real variable, and <math>f(0)\le 0</math>, then <math>f</math> is [[Superadditivity|superadditive]] on the [[positive reals]], that is <math>f(a + b) \geq f(a) + f(b)</math> for positive real numbers <math>a</math> and <math>b</math>.
{{math proof|proof=
Since <math>f</math> is convex, by using one of the convex function definitions above and letting <math>x_2 = 0,</math> it follows that for all real <math>0 \leq t \leq 1,</math> <math display=block>
\begin{align} f(tx_1) & = f(t x_1 + (1-t) \cdot 0) \\ & \leq t f(x_1) + (1-t) f(0) \\ & \leq t f(x_1). \ \end{align} </math> From <math>f(tx_1)\leq t f(x_1) \begin{align} f(a) + f(b) & = f \left((a+b) \frac{a}{a+b} \right) + f \left((a+b) \frac{b}{a+b} \right) \\ & \leq \frac{a}{a+b} f(a+b) + \frac{b}{a+b} & = f(a+b).\\
\end{align}</math>
Namely, <math>f(a) + f(b) \leq f(a+b)</math>.
}}
* A function <math>f</math> is midpoint convex on an interval <math>C</math> if for all <math>x_1, x_2 \in C</math> <math display=block>f\!\left(\frac{x_1 + x_2}{2}\right) \leq \frac{f(x_1) + f(x_2)}{2}.</math> This condition is only slightly weaker than convexity. For example, a real-valued [[Lebesgue measurable function]] that is midpoint-convex is convex: this is a theorem of [[Wacław Sierpiński|Sierpiński]].<ref>{{cite book|last=Donoghue|first=William F.| title= Distributions and Fourier Transforms|year=1969|publisher=Academic Press | isbn=9780122206504 |url= https://books.google.com/books?id=P30Y7daiGvQC&pg=PA12|access-date=August 29, 2012|page=12}}</ref> In particular, a continuous function that is midpoint convex will be convex.
=== Functions of several variables ===
* A function that is marginally convex in each individual variable is not necessarily (jointly) convex. For example, the function <math>f(x, y) = x y</math> is [[bilinear map|marginally linear]], and thus marginally convex, in each variable, but not (jointly) convex.
* A function <math>f : X \to [-\infty, \infty]</math> valued in the [[extended real
* A differentiable function <math>f</math> defined on a convex ___domain is convex if and only if <math>f(x) \geq f(y) + \nabla f(y)^T \cdot (x-y)</math> holds for all <math>x, y</math> in the ___domain.
* A twice differentiable function of several variables is convex on a convex set if and only if its [[Hessian matrix]] of second [[partial derivative]]s is [[Positive-definite matrix|positive semidefinite]] on the interior of the convex set.
Line 91 ⟶ 107:
==Strongly convex functions==
The concept of strong convexity extends and parametrizes the notion of strict convexity. Intuitively, a strongly-convex function is a function that grows as fast as a quadratic function.<ref>{{Cite web |title=Strong convexity · Xingyu Zhou's blog |url=https://xingyuzhou.org/blog/notes/strong-convexity |access-date=2023-09-27 |website=xingyuzhou.org}}</ref> A strongly convex function is also strictly convex, but not vice versa. If a one-dimensional function <math>f</math> is twice continuously differentiable and the ___domain is the real line, then we can characterize it as follows:
*<math>f</math> convex if and only if <math>f''(x) \ge 0</math> for all <math>x.</math>▼
*<math>f</math> strictly convex if <math>f''(x) > 0</math> for all <math>x</math> (note: this is sufficient, but not necessary).▼
*<math>f</math> strongly convex if and only if <math>f''(x) \ge m > 0</math> for all <math>x.</math>▼
For example, let <math>f</math> be strictly convex, and suppose there is a sequence of points <math>(x_n)</math> such that <math>f''(x_n) = \tfrac{1}{n}</math>. Even though <math>f''(x_n) > 0</math>, the function is not strongly convex because <math>f''(x)</math> will become arbitrarily small.▼
or, more generally,
<math display=block>\langle \nabla f(x) - \nabla f(y), x-y \rangle \ge m \|x-y\|^2 </math>
Line 120 ⟶ 140:
<math display=block>x\mapsto f(x) -\frac m 2 \|x\|^2</math>
is convex.
▲*<math>f</math> convex if and only if <math>f''(x) \ge 0</math> for all <math>x.</math>
▲*<math>f</math> strictly convex if <math>f''(x) > 0</math> for all <math>x</math> (note: this is sufficient, but not necessary).
▲*<math>f</math> strongly convex if and only if <math>f''(x) \ge m > 0</math> for all <math>x.</math>
▲For example, let <math>f</math> be strictly convex, and suppose there is a sequence of points <math>(x_n)</math> such that <math>f''(x_n) = \tfrac{1}{n}</math>. Even though <math>f''(x_n) > 0</math>, the function is not strongly convex because <math>f''(x)</math> will become arbitrarily small.
A twice continuously differentiable function <math>f</math> on a compact ___domain <math>X</math> that satisfies <math>f''(x)>0</math> for all <math>x\in X</math> is strongly convex. The proof of this statement follows from the [[extreme value theorem]], which states that a continuous function on a compact set has a maximum and minimum.
Line 132 ⟶ 145:
Strongly convex functions are in general easier to work with than convex or strictly convex functions, since they are a smaller class. Like strictly convex functions, strongly convex functions have unique minima on compact sets.
===
If ''f'' is a strongly-convex function with parameter ''m'', then:<ref name=":0">{{Cite web |last=Nemirovsky and Ben-Tal |date=2023 |title=Optimization III: Convex Optimization |url=http://www2.isye.gatech.edu/~nemirovs/OPTIIILN2023Spring.pdf}}</ref>{{Rp|___location=Prop.6.1.4}}
* For every real number ''r'', the [[level set]] {''x'' | ''f''(''x'') ≤ ''r''} is [[Compact space|compact]].
* The function ''f'' has a unique [[global minimum]] on ''R<sup>n</sup>''.
== Uniformly convex functions ==
A uniformly convex function,<ref name="Zalinescu">{{cite book|title=Convex Analysis in General Vector Spaces|author=C. Zalinescu|publisher=World Scientific|year=2002|isbn=9812380671}}</ref><ref name="Bauschke">{{cite book|page=[https://archive.org/details/convexanalysismo00hhba/page/n161 144]|title=Convex Analysis and Monotone Operator Theory in Hilbert Spaces |url=https://archive.org/details/convexanalysismo00hhba|url-access=limited|author=H. Bauschke and P. L. Combettes |publisher=Springer |year=2011 |isbn=978-1-4419-9467-7}}</ref> with modulus <math>\phi</math>, is a function <math>f</math> that, for all <math>x, y</math> in the ___domain and <math>t \in [0, 1],</math> satisfies
<math display="block">f(tx+(1-t)y) \le t f(x)+(1-t)f(y) - t(1-t) \phi(\|x-y\|)</math>
where <math>\phi</math> is a function that is non-negative and vanishes only at 0. This is a generalization of the concept of strongly convex function; by taking <math>\phi(\alpha) = \tfrac{m}{2} \alpha^2</math> we recover the definition of strong convexity.
Line 146 ⟶ 164:
* The [[absolute value]] function <math>f(x)=|x|</math> is convex (as reflected in the [[triangle inequality]]), even though it does not have a derivative at the point <math>x = 0.</math> It is not strictly convex.
* The function <math>f(x)=|x|^p</math> for <math>p \ge 1</math> is convex.
* The [[exponential function]] <math>f(x)=e^x</math> is convex. It is also strictly convex, since <math>f''(x)=e^x >0 </math>, but it is not strongly convex since the second derivative can be arbitrarily close to zero. More generally, the function <math>g(x) = e^{f(x)}</math> is [[Logarithmically convex function|logarithmically convex]] if <math>f</math> is a convex function. The term "superconvex" is sometimes used instead.<ref>{{Cite journal | last1 = Kingman | first1 = J. F. C. | doi = 10.1093/qmath/12.1.283 | title = A Convexity Property of Positive Matrices | journal = The Quarterly Journal of Mathematics | volume = 12 | pages = 283–284 | year = 1961 | bibcode = 1961QJMat..12..283K }}</ref>
* The function <math>f</math> with ___domain [0,1] defined by <math>f(0) = f(1) = 1, f(x) = 0</math> for <math>0 < x < 1</math> is convex; it is continuous on the open interval <math>(0, 1),</math> but not continuous at 0 and 1.
* The function <math>x^3</math> has second derivative <math>6 x</math>; thus it is convex on the set where <math>x \geq 0</math> and [[concave function|concave]] on the set where <math>x \leq 0.</math>
|