Convex function: Difference between revisions

Content deleted Content added
Mudthomas (talk | contribs)
Clarified that the two points need to be distinct.
Tags: Mobile edit Mobile web edit
Knief (talk | contribs)
mNo edit summary
 
(31 intermediate revisions by 21 users not shown)
Line 1:
[[Image:ConvexFunction.svg|thumb|300px|right|Convex function on an [[interval (mathematics)|interval]].]]{{Use American English|date = March 2019}}
{{Short description|Real function with secant line between points above the graph itself}}
[[Image:Epigraph convex.svg|right|thumb|300px|A function (in black) is convex if and only if the region above its [[Graph of a function|graph]] (in green) is a [[convex set]].]]
Line 5:
[[File:Convex vs. Not-convex.jpg|thumb|right|300px|Convex vs. Not convex]]
 
In [[mathematics]], a [[real-valued function]] is called '''convex''' if the [[line segment]] between any two distinct points on the [[graph of a function|graph of the function]] lies above or on the graph between the two points. Equivalently, a function is convex if its [[epigraph (mathematics)|''epigraph'']] (the set of points on or above the graph of the function) is a [[convex set]]. A twice-differentiable function of a single variable is convex [[if and only if]] its second derivative is nonnegative on its entire ___domain.<ref>{{Cite web|url=https://www.stat.cmu.edu/~larry/=stat705/Lecture2.pdf |title=Lecture Notes 2|website=www.stat.cmu.edu|access-date=3 March 2017}}</ref> Well-known examples of convex functions of a single variable include the [[quadratic function]] <math>x^2</math> and the [[exponential function]] <math>e^x</math>. In simple terms, a convex function refers to a function whose graph is shaped like a cup <math>\cup</math>, while a [[concave function]]'s graph is shaped like a cap <math>\cap</math>.
In simple terms, a convex function graph is shaped like a cup <math>\cup</math> (or a straight line like a linear function), while a [[concave function]]'s graph is shaped like a cap <math>\cap</math>.
 
A twice-[[differentiable function|differentiable]] function of a single variable is convex [[if and only if]] its [[second derivative]] is nonnegative on its entire [[___domain of a function|___domain]].<ref>{{Cite web|url=https://www.stat.cmu.edu/~larry/=stat705/Lecture2.pdf |title=Lecture Notes 2|website=www.stat.cmu.edu|access-date=3 March 2017}}</ref> Well-known examples of convex functions of a single variable include a [[linear function]] <math>f(x) = cx</math> (where <math>c</math> is a [[real number]]), a [[quadratic function]] <math>cx^2</math> (<math>c</math> as a nonnegative real number) and an [[exponential function]] <math>ce^x</math> (<math>c</math> as a nonnegative real number).
Convex functions play an important role in many areas of mathematics. They are especially important in the study of [[optimization]] problems where they are distinguished by a number of convenient properties. For instance, a strictly convex function on an open set has no more than one minimum. Even in infinite-dimensional spaces, under suitable additional hypotheses, convex functions continue to satisfy such properties and as a result, they are the most well-understood functionals in the [[calculus of variations]]. In [[probability theory]], a convex function applied to the [[expected value]] of a [[random variable]] is always bounded above by the expected value of the convex function of the random variable. This result, known as [[Jensen's inequality]], can be used to deduce inequalities such as the [[inequality of arithmetic and geometric means|arithmetic&ndash;geometric mean inequality]] and [[Hölder's inequality]].
 
Convex functions play an important role in many areas of mathematics. They are especially important in the study of [[optimization]] problems where they are distinguished by a number of convenient properties. For instance, a strictly convex function on an [[open set]] has no more than one [[maximum and minimum|minimum]]. Even in infinite-dimensional spaces, under suitable additional hypotheses, convex functions continue to satisfy such properties and as a result, they are the most well-understood functionals in the [[calculus of variations]]. In [[probability theory]], a convex function applied to the [[expected value]] of a [[random variable]] is always bounded above by the expected value of the convex function of the random variable. This result, known as [[Jensen's inequality]], can be used to deduce [[inequality (mathematics)|inequalities]] such as the [[inequality of arithmetic and geometric means|arithmetic&ndash;geometric mean inequality]] and [[Hölder's inequality]].
 
==Definition==
Line 19 ⟶ 22:
<li>For all <math>0 \leq t \leq 1</math> and all <math>x_1, x_2 \in X</math>:
<math display=block>f\left(t x_1 + (1-t) x_2\right) \leq t f\left(x_1\right) + (1-t) f\left(x_2\right)</math>
The right hand side represents the straight line between <math>\left(x_1, f\left(x_1\right)\right)</math> and <math>\left(x_2, f\left(x_2\right)\right)</math> in the graph of <math>f</math> as a function of <math>t;</math> increasing <math>t</math> from <math>0</math> to <math>1</math> or decreasing <math>t</math> from <math>1</math> to <math>0</math> sweeps this line. Similarly, the argument of the function <math>f</math> in the left hand side represents the straight line between <math>x_1</math> and <math>x_2</math> in <math>X</math> or the <math>x</math>-axis of the graph of <math>f.</math> So, this condition requires that the straight line between any pair of points on the curve of <math>f</math> to be above or just meetsmeeting the graph.<ref>{{Cite web|last=|first=|date=|title=Concave Upward and Downward|url=https://www.mathsisfun.com/calculus/concave-up-down-convex.html|url-status=live|archive-url=https://web.archive.org/web/20131218034748/http://www.mathsisfun.com:80/calculus/concave-up-down-convex.html |archive-date=2013-12-18 |access-date=|website=}}</ref>
</li>
<li>For all <math>0 < t < 1</math> and all <math>x_1, x_2 \in X</math> such that <math>x_1 \neq x_2</math>:
<math display=block>f\left(t x_1 + (1-t) x_2\right) \leq t f\left(x_1\right) + (1-t) f\left(x_2\right)</math>
 
Line 34 ⟶ 37:
<math display=block>f\left(t x_1 + (1-t) x_2\right) < t f\left(x_1\right) + (1-t) f\left(x_2\right)</math>
 
A strictly convex function <math>f</math> is a function such that the straight line between any pair of points on the curve <math>f</math> is above the curve <math>f</math> except for the intersection points between the straight line and the curve. An example of a function which is convex but not strictly convex is <math>f(x,y) = x^2 + y</math>. This function is not strictly convex because any two points sharing an x coordinate will have a straight line between them, while any two points NOT sharing an x coordinate will have a greater value of the function than the points between them.
 
The function <math>f</math> is said to be '''{{em|[[Concave function|concave]]}}''' (resp. '''{{em|strictly concave}}''') if <math>-f</math> (<math>f</math> multiplied by −1) is convex (resp. strictly convex).
 
==Alternative naming==
The term ''convex'' is often referred to as ''convex down'' or ''concave upward'', and the term [[Concave function |concave]] is often referred as ''concave down'' or ''convex upward''.<ref>{{Cite book|last=Stewart|first=James|title=Calculus|publisher=Cengage Learning|year=2015|isbn=978-1305266643|edition=8th|pages=223–224}}</ref><ref>{{cite book|last1=W. Hamming|first1=Richard|url=https://books.google.com/books?id=WLIbeA1aWvUC|title=Methods of Mathematics Applied to Calculus, Probability, and Statistics|publisher=Courier Corporation|year=2012|isbn=978-0-486-13887-9|edition=illustrated|page=227}} [https://books.google.com/books?id=WLIbeA1aWvUC&pg=PA227 Extract of page 227]</ref><ref>{{cite book|last1=Uvarov|first1=Vasiliĭ Borisovich|url=https://books.google.com/books?id=GzQnAQAAIAAJ|title=Mathematical Analysis|publisher=Mir Publishers|year=1988|isbn=978-5-03-000500-3|edition=|page=126-127}}</ref> If the term "convex" is used without an "up" or "down" keyword, then it refers strictly to a cup shaped graph <math>\cup</math>. As an example, [[Jensen's inequality]] refers to an inequality involving a convex or convex-(updown), function.<ref>{{cite book |title=The Probability Companion for Engineering and Computer Science |edition=illustrated |first1=Adam |last1=Prügel-Bennett |publisher=Cambridge University Press |year=2020 |isbn=978-1-108-48053-6 |page=160 |url=https://books.google.com/books?id=coHCDwAAQBAJ}} [https://books.google.com/books?id=coHCDwAAQBAJ&pg=PA160 Extract of page 160]</ref>
 
== Properties ==
Line 47 ⟶ 50:
=== Functions of one variable ===
 
* Suppose <math>f</math> is a function of one [[real number|real]] variable defined on an interval, and let <math display=block>R(x_1, x_2) = \frac{f(x_2) - f(x_1)}{x_2 - x_1}</math> (note that <math>R(x_1, x_2)</math> is the slope of the purple line in the abovefirst drawing; the function <math>R</math> is [[Symmetric function|symmetric]] in <math>(x_1, x_2),</math> means that <math>R</math> does not change by exchanging <math>x_1</math> and <math>x_2</math>). <math>f</math> is convex if and only if <math>R(x_1, x_2)</math> is [[monotonically non-decreasing]] in <math>x_1,</math> for every fixed <math>x_2</math> (or vice versa). This characterization of convexity is quite useful to prove the following results.
* A convex function <math>f</math> of one real variable defined on some [[open interval]] <math>C</math> is [[Continuous function|continuous]] on <math>C. </math>. Moreover, <math>f</math> admits [[Semi-differentiability|left and right derivatives]], and these are [[monotonically non-decreasing]]. In addition, the left derivative is left-continuous and the right-derivative is right-continuous. As a consequence, <math>f</math> is [[differentiable function|differentiable]] at all but at most [[countable|countably many]] points, the set on which <math>f</math> is not differentiable can however still be dense. If <math>C</math> is closed, then <math>f</math> may fail to be continuous at the endpoints of <math>C</math> (an example is shown in the [[#Examples|examples section]]).
* A [[differentiable function|differentiable]] function of one variable is convex on an interval if and only if its [[derivative]] is [[monotonically non-decreasing]] on that interval. If a function is differentiable and convex then it is also [[continuously differentiable]] (due to [[Darboux's theorem (analysis)|Darboux's theorem]]).
* A differentiable function of one variable is convex on an interval if and only if its graph lies above all of its [[tangent]]s:<ref name="boyd">{{cite book| title=Convex Optimization| first1=Stephen P.|last1=Boyd |first2=Lieven| last2=Vandenberghe | year = 2004 |publisher=Cambridge University Press| isbn=978-0-521-83378-3| url= https://web.stanford.edu/~boyd/cvxbook/bv_cvxbook.pdf#page=83 |format=pdf | access-date=October 15, 2011}}</ref>{{rp|69}} <math display=block>f(x) \geq f(y) + f'(y) (x-y)</math> for all <math>x</math> and <math>y</math> in the interval.
* A twice differentiable function of one variable is convex on an interval if and only if its [[second derivative]] is non-negative there; this gives a practical test for convexity. Visually, a twice differentiable convex function "curves up", without any bends the other way ([[inflection point]]s). If its second derivative is positive at all points then the function is strictly convex, but the [[Theorem#Converseconverse (logic)|converse]] does not hold. For example, the second derivative of <math>f(x) = x^4</math> is <math>f''(x) = 12x^{2}</math>, which is zero for <math>x = 0,</math> but <math>x^4</math> is strictly convex.
**This property and the above property in terms of "...its derivative is monotonically non-decreasing..." are not equal since if <math>f''</math> is non-negative on an interval <math>X</math> then <math>f'</math> is monotonically non-decreasing on <math>X</math> while its converse is not true, for example, <math>f'</math> is monotonically non-decreasing on <math>X</math> while its derivative <math>f''</math> is not defined at some points on <math>X</math>.
* If <math>f</math> is a convex function of one real variable, and <math>f(0)\le 0</math>, then <math>f</math> is [[Superadditivity|superadditive]] on the [[positive reals]], that is <math>f(a + b) \geq f(a) + f(b)</math> for positive real numbers <math>a</math> and <math>b</math>.
{{math proof|proof=
Since <math>f</math> is convex, by using one of the convex function definitions above and letting <math>x_2 = 0,</math> it follows that for all real <math>0 \leq t \leq 1,</math> <math display=block>
\begin{align}
f(tx_1) & = f(t x_1 + (1-t) \cdot 0) \\
& \leq t f(x_1) + (1-t) f(0) \\
& \leq t f(x_1). \rightarrow\
\end{align}
</math> From <math>f(tx_1)\leq t f(x_1).</math> From this, it follows that <math display=block>
\begin{align}
f(a) + f(b) & = f \left((a+b) \frac{a}{a+b} \right) + f \left((a+b) \frac{b}{a+b} \right) \\
& \leq \frac{a}{a+b} f(a+b) + \frac{b}{a+b} f(a+b) = f(a+b) \rightarrow f(a) + f(b) \leq f(a+b).</math>
& = f(a+b).\\
\end{align}</math>
Namely, <math>f(a) + f(b) \leq f(a+b)</math>.
}}
 
* A function <math>f</math> is midpoint convex on an interval <math>C</math> if for all <math>x_1, x_2 \in C</math> <math display=block>f\!\left(\frac{x_1 + x_2}{2}\right) \leq \frac{f(x_1) + f(x_2)}{2}.</math> This condition is only slightly weaker than convexity. For example, a real-valued [[Lebesgue measurable function]] that is midpoint-convex is convex: this is a theorem of [[Wacław Sierpiński|SierpinskiSierpiński]].<ref>{{cite book|last=Donoghue|first=William F.| title= Distributions and Fourier Transforms|year=1969|publisher=Academic Press | isbn=9780122206504 |url= https://books.google.com/books?id=P30Y7daiGvQC&pg=PA12|access-date=August 29, 2012|page=12}}</ref> In particular, a continuous function that is midpoint convex will be convex.
 
=== Functions of several variables ===
 
* A function that is marginally convex in each individual variable is not necessarily (jointly) convex. For example, the function <math>f(x, y) = x y</math> is [[bilinear map|marginally linear]], and thus marginally convex, in each variable, but not (jointly) convex.
* A function <math>f : X \to [-\infty, \infty]</math> valued in the [[extended real numbernumbers]]s <math>[-\infty, \infty] = \R \cup \{\pm\infty\}</math> is convex if and only if its [[Epigraph (mathematics)|epigraph]] <math display=block>\{(x, r) \in X \times \R ~:~ r \geq f(x)\}</math> is a convex set.
* A differentiable function <math>f</math> defined on a convex ___domain is convex if and only if <math>f(x) \geq f(y) + \nabla f(y)^T \cdot (x-y)</math> holds for all <math>x, y</math> in the ___domain.
* A twice differentiable function of several variables is convex on a convex set if and only if its [[Hessian matrix]] of second [[partial derivative]]s is [[Positive-definite matrix|positive semidefinite]] on the interior of the convex set.
Line 91 ⟶ 107:
==Strongly convex functions==
 
The concept of strong convexity extends and parametrizes the notion of strict convexity. Intuitively, a strongly-convex function is a function that grows as fast as a quadratic function.<ref>{{Cite web |title=Strong convexity · Xingyu Zhou's blog |url=https://xingyuzhou.org/blog/notes/strong-convexity |access-date=2023-09-27 |website=xingyuzhou.org}}</ref> A strongly convex function is also strictly convex, but not vice versa. If a one-dimensional function <math>f</math> is twice continuously differentiable and the ___domain is the real line, then we can characterize it as follows:
*<math>f</math> convex if and only if <math>f''(x) \ge 0</math> for all <math>x.</math>
*<math>f</math> strictly convex if <math>f''(x) > 0</math> for all <math>x</math> (note: this is sufficient, but not necessary).
*<math>f</math> strongly convex if and only if <math>f''(x) \ge m > 0</math> for all <math>x.</math>
 
For example, let <math>f</math> be strictly convex, and suppose there is a sequence of points <math>(x_n)</math> such that <math>f''(x_n) = \tfrac{1}{n}</math>. Even though <math>f''(x_n) > 0</math>, the function is not strongly convex because <math>f''(x)</math> will become arbitrarily small.
 
AMore generally, a differentiable function <math>f</math> is called strongly convex with parameter <math>m > 0</math> if the following inequality holds for all points <math>x, y</math> in its ___domain:<ref name="bertsekas">{{cite book|page=[https://archive.org/details/convexanalysisop00bert_476/page/n87 72]|title=Convex Analysis and Optimization|url=https://archive.org/details/convexanalysisop00bert_476|url-access=limited|author=Dimitri Bertsekas| others= Contributors: Angelia Nedic and Asuman E. Ozdaglar|publisher=Athena Scientific|year=2003|isbn=9781886529458}}</ref><math display="block">(\nabla f(x) - \nabla f(y) )^T (x-y) \ge m \|x-y\|_2^2 </math>
<math display=block>(\nabla f(x) - \nabla f(y) )^T (x-y) \ge m \|x-y\|_2^2 </math>
or, more generally,
<math display=block>\langle \nabla f(x) - \nabla f(y), x-y \rangle \ge m \|x-y\|^2 </math>
Line 120 ⟶ 140:
<math display=block>x\mapsto f(x) -\frac m 2 \|x\|^2</math>
is convex.
 
The distinction between convex, strictly convex, and strongly convex can be subtle at first glance. If <math>f</math> is twice continuously differentiable and the ___domain is the real line, then we can characterize it as follows:
*<math>f</math> convex if and only if <math>f''(x) \ge 0</math> for all <math>x.</math>
*<math>f</math> strictly convex if <math>f''(x) > 0</math> for all <math>x</math> (note: this is sufficient, but not necessary).
*<math>f</math> strongly convex if and only if <math>f''(x) \ge m > 0</math> for all <math>x.</math>
 
For example, let <math>f</math> be strictly convex, and suppose there is a sequence of points <math>(x_n)</math> such that <math>f''(x_n) = \tfrac{1}{n}</math>. Even though <math>f''(x_n) > 0</math>, the function is not strongly convex because <math>f''(x)</math> will become arbitrarily small.
 
A twice continuously differentiable function <math>f</math> on a compact ___domain <math>X</math> that satisfies <math>f''(x)>0</math> for all <math>x\in X</math> is strongly convex. The proof of this statement follows from the [[extreme value theorem]], which states that a continuous function on a compact set has a maximum and minimum.
Line 132 ⟶ 145:
Strongly convex functions are in general easier to work with than convex or strictly convex functions, since they are a smaller class. Like strictly convex functions, strongly convex functions have unique minima on compact sets.
 
===Uniformly Properties of strongly-convex functions ===
If ''f'' is a strongly-convex function with parameter ''m'', then:<ref name=":0">{{Cite web |last=Nemirovsky and Ben-Tal |date=2023 |title=Optimization III: Convex Optimization |url=http://www2.isye.gatech.edu/~nemirovs/OPTIIILN2023Spring.pdf}}</ref>{{Rp|___location=Prop.6.1.4}}
 
* For every real number ''r'', the [[level set]] {''x'' | ''f''(''x'') ≤ ''r''} is [[Compact space|compact]].
* The function ''f'' has a unique [[global minimum]] on ''R<sup>n</sup>''.
 
== Uniformly convex functions ==
A uniformly convex function,<ref name="Zalinescu">{{cite book|title=Convex Analysis in General Vector Spaces|author=C. Zalinescu|publisher=World Scientific|year=2002|isbn=9812380671}}</ref><ref name="Bauschke">{{cite book|page=[https://archive.org/details/convexanalysismo00hhba/page/n161 144]|title=Convex Analysis and Monotone Operator Theory in Hilbert Spaces |url=https://archive.org/details/convexanalysismo00hhba|url-access=limited|author=H. Bauschke and P. L. Combettes |publisher=Springer |year=2011 |isbn=978-1-4419-9467-7}}</ref> with modulus <math>\phi</math>, is a function <math>f</math> that, for all <math>x, y</math> in the ___domain and <math>t \in [0, 1],</math> satisfies
<math display="block">f(tx+(1-t)y) \le t f(x)+(1-t)f(y) - t(1-t) \phi(\|x-y\|)</math>
where <math>\phi</math> is a function that is non-negative and vanishes only at&nbsp;0. This is a generalization of the concept of strongly convex function; by taking <math>\phi(\alpha) = \tfrac{m}{2} \alpha^2</math> we recover the definition of strong convexity.
 
Line 146 ⟶ 164:
* The [[absolute value]] function <math>f(x)=|x|</math> is convex (as reflected in the [[triangle inequality]]), even though it does not have a derivative at the point <math>x = 0.</math> It is not strictly convex.
* The function <math>f(x)=|x|^p</math> for <math>p \ge 1</math> is convex.
* The [[exponential function]] <math>f(x)=e^x</math> is convex. It is also strictly convex, since <math>f''(x)=e^x >0 </math>, but it is not strongly convex since the second derivative can be arbitrarily close to zero. More generally, the function <math>g(x) = e^{f(x)}</math> is [[Logarithmically convex function|logarithmically convex]] if <math>f</math> is a convex function. The term "superconvex" is sometimes used instead.<ref>{{Cite journal | last1 = Kingman | first1 = J. F. C. | doi = 10.1093/qmath/12.1.283 | title = A Convexity Property of Positive Matrices | journal = The Quarterly Journal of Mathematics | volume = 12 | pages = 283–284 | year = 1961 | bibcode = 1961QJMat..12..283K }}</ref>
* The function <math>f</math> with ___domain [0,1] defined by <math>f(0) = f(1) = 1, f(x) = 0</math> for <math>0 < x < 1</math> is convex; it is continuous on the open interval <math>(0, 1),</math> but not continuous at 0 and&nbsp;1.
* The function <math>x^3</math> has second derivative <math>6 x</math>; thus it is convex on the set where <math>x \geq 0</math> and [[concave function|concave]] on the set where <math>x \leq 0.</math>