Convex function: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 09:56, 22 March 2023 edit Mudthomas (talk \| contribs) 27 edits Clarified that the two points need to be distinct. Tags: Mobile edit Mobile web edit ← Previous edit		Latest revision as of 19:23, 1 August 2025 edit undo Knief (talk \| contribs) 1 edit mNo edit summary
(31 intermediate revisions by 21 users not shown)
Line 1: [[Image:ConvexFunction.svg\|thumb\|300px\|right\|Convex function on an [[interval (mathematics)\|interval]].]]{{Use American English\|date = March 2019}} {{Short description\|Real function with secant line between points above the graph itself}} [[Image:Epigraph convex.svg\|right\|thumb\|300px\|A function (in black) is convex if and only if the region above its [[Graph of a function\|graph]] (in green) is a [[convex set]].]] Line 5: [[File:Convex vs. Not-convex.jpg\|thumb\|right\|300px\|Convex vs. Not convex]] In [[mathematics]], a [[real-valued function]] is called '''convex''' if the [[line segment]] between any two distinct points on the [[graph of a function\|graph of the function]] lies above or on the graph between the two points. Equivalently, a function is convex if its [[epigraph (mathematics)\|''epigraph'']] (the set of points on or above the graph of the function) is a [[convex set]]. A twice-differentiable function of a single variable is convex [[if and only if]] its second derivative is nonnegative on its entire ___domain.<ref>{{Cite web\|url=https://www.stat.cmu.edu/~larry/=stat705/Lecture2.pdf \|title=Lecture Notes 2\|website=www.stat.cmu.edu\|access-date=3 March 2017}}</ref> Well-known examples of convex functions of a single variable include the [[quadratic function]] <math>x^2</math> and the [[exponential function]] <math>e^x</math>. In simple terms, a convex function refers to a function whose graph is shaped like a cup <math>\cup</math>, while a [[concave function]]'s graph is shaped like a cap <math>\cap</math>. In simple terms, a convex function graph is shaped like a cup <math>\cup</math> (or a straight line like a linear function), while a [[concave function]]'s graph is shaped like a cap <math>\cap</math>. A twice-[[differentiable function\|differentiable]] function of a single variable is convex [[if and only if]] its [[second derivative]] is nonnegative on its entire [[___domain of a function\|___domain]].<ref>{{Cite web\|url=https://www.stat.cmu.edu/~larry/=stat705/Lecture2.pdf \|title=Lecture Notes 2\|website=www.stat.cmu.edu\|access-date=3 March 2017}}</ref> Well-known examples of convex functions of a single variable include a [[linear function]] <math>f(x) = cx</math> (where <math>c</math> is a [[real number]]), a [[quadratic function]] <math>cx^2</math> (<math>c</math> as a nonnegative real number) and an [[exponential function]] <math>ce^x</math> (<math>c</math> as a nonnegative real number). Convex functions play an important role in many areas of mathematics. They are especially important in the study of [[optimization]] problems where they are distinguished by a number of convenient properties. For instance, a strictly convex function on an open set has no more than one minimum. Even in infinite-dimensional spaces, under suitable additional hypotheses, convex functions continue to satisfy such properties and as a result, they are the most well-understood functionals in the [[calculus of variations]]. In [[probability theory]], a convex function applied to the [[expected value]] of a [[random variable]] is always bounded above by the expected value of the convex function of the random variable. This result, known as [[Jensen's inequality]], can be used to deduce inequalities such as the [[inequality of arithmetic and geometric means\|arithmetic–geometric mean inequality]] and [[Hölder's inequality]].▼ ▲Convex functions play an important role in many areas of mathematics. They are especially important in the study of [[optimization]] problems where they are distinguished by a number of convenient properties. For instance, a strictly convex function on an [[open set]] has no more than one [[maximum and minimum\|minimum]]. Even in infinite-dimensional spaces, under suitable additional hypotheses, convex functions continue to satisfy such properties and as a result, they are the most well-understood functionals in the [[calculus of variations]]. In [[probability theory]], a convex function applied to the [[expected value]] of a [[random variable]] is always bounded above by the expected value of the convex function of the random variable. This result, known as [[Jensen's inequality]], can be used to deduce [[inequality (mathematics)\|inequalities]] such as the [[inequality of arithmetic and geometric means\|arithmetic–geometric mean inequality]] and [[Hölder's inequality]]. ==Definition== Line 19 ⟶ 22: <li>For all <math>0 \leq t \leq 1</math> and all <math>x_1, x_2 \in X</math>: <math display=block>f\left(t x_1 + (1-t) x_2\right) \leq t f\left(x_1\right) + (1-t) f\left(x_2\right)</math> The right hand side represents the straight line between <math>\left(x_1, f\left(x_1\right)\right)</math> and <math>\left(x_2, f\left(x_2\right)\right)</math> in the graph of <math>f</math> as a function of <math>t;</math> increasing <math>t</math> from <math>0</math> to <math>1</math> or decreasing <math>t</math> from <math>1</math> to <math>0</math> sweeps this line. Similarly, the argument of the function <math>f</math> in the left hand side represents the straight line between <math>x_1</math> and <math>x_2</math> in <math>X</math> or the <math>x</math>-axis of the graph of <math>f.</math> So, this condition requires that the straight line between any pair of points on the curve of <math>f</math> to be above or just ~~meets~~meeting the graph.<ref>{{Cite web\|last=\|first=\|date=\|title=Concave Upward and Downward\|url=https://www.mathsisfun.com/calculus/concave-up-down-convex.html\|url-status=live\|archive-url=https://web.archive.org/web/20131218034748/http://www.mathsisfun.com:80/calculus/concave-up-down-convex.html \|archive-date=2013-12-18 \|access-date=\|website=}}</ref> </li> <li>For all <math>0 < t < 1</math> and all <math>x_1, x_2 \in X</math> such that <math>x_1 \neq x_2</math>: <math display=block>f\left(t x_1 + (1-t) x_2\right) \leq t f\left(x_1\right) + (1-t) f\left(x_2\right)</math> Line 34 ⟶ 37: <math display=block>f\left(t x_1 + (1-t) x_2\right) < t f\left(x_1\right) + (1-t) f\left(x_2\right)</math> A strictly convex function <math>f</math> is a function such that the straight line between any pair of points on the curve <math>f</math> is above the curve <math>f</math> except for the intersection points between the straight line and the curve. An example of a function which is convex but not strictly convex is <math>f(x,y) = x^2 + y</math>. This function is not strictly convex because any two points sharing an x coordinate will have a straight line between them, while any two points NOT sharing an x coordinate will have a greater value of the function than the points between them. The function <math>f</math> is said to be '''{{em\|[[Concave function\|concave]]}}''' (resp. '''{{em\|strictly concave}}''') if <math>-f</math> (<math>f</math> multiplied by −1) is convex (resp. strictly convex). ==Alternative naming== The term ''convex'' is often referred to as ''convex down'' or ''concave upward'', and the term [[Concave function \|concave]] is often referred as ''concave down'' or ''convex upward''.<ref>{{Cite book\|last=Stewart\|first=James\|title=Calculus\|publisher=Cengage Learning\|year=2015\|isbn=978-1305266643\|edition=8th\|pages=223–224}}</ref><ref>{{cite book\|last1=W. Hamming\|first1=Richard\|url=https://books.google.com/books?id=WLIbeA1aWvUC\|title=Methods of Mathematics Applied to Calculus, Probability, and Statistics\|publisher=Courier Corporation\|year=2012\|isbn=978-0-486-13887-9\|edition=illustrated\|page=227}} [https://books.google.com/books?id=WLIbeA1aWvUC&pg=PA227 Extract of page 227]</ref><ref>{{cite book\|last1=Uvarov\|first1=Vasiliĭ Borisovich\|url=https://books.google.com/books?id=GzQnAQAAIAAJ\|title=Mathematical Analysis\|publisher=Mir Publishers\|year=1988\|isbn=978-5-03-000500-3\|edition=\|page=126-127}}</ref> If the term "convex" is used without an "up" or "down" keyword, then it refers strictly to a cup shaped graph <math>\cup</math>. As an example, [[Jensen's inequality]] refers to an inequality involving a convex or convex-(updown), function.<ref>{{cite book \|title=The Probability Companion for Engineering and Computer Science \|edition=illustrated \|first1=Adam \|last1=Prügel-Bennett \|publisher=Cambridge University Press \|year=2020 \|isbn=978-1-108-48053-6 \|page=160 \|url=https://books.google.com/books?id=coHCDwAAQBAJ}} [https://books.google.com/books?id=coHCDwAAQBAJ&pg=PA160 Extract of page 160]</ref> == Properties == Line 47 ⟶ 50: === Functions of one variable === * Suppose <math>f</math> is a function of one [[real number\|real]] variable defined on an interval, and let <math display=block>R(x_1, x_2) = \frac{f(x_2) - f(x_1)}{x_2 - x_1}</math> (note that <math>R(x_1, x_2)</math> is the slope of the purple line in the ~~above~~first drawing; the function <math>R</math> is [[Symmetric function\|symmetric]] in <math>(x_1, x_2),</math> means that <math>R</math> does not change by exchanging <math>x_1</math> and <math>x_2</math>). <math>f</math> is convex if and only if <math>R(x_1, x_2)</math> is [[monotonically non-decreasing]] in <math>x_1,</math> for every fixed <math>x_2</math> (or vice versa). This characterization of convexity is quite useful to prove the following results. * A convex function <math>f</math> of one real variable defined on some [[open interval]] <math>C</math> is [[Continuous function\|continuous]] on <math>C. </math>. Moreover, <math>f</math> admits [[Semi-differentiability\|left and right derivatives]], and these are [[monotonically non-decreasing]]. In addition, the left derivative is left-continuous and the right-derivative is right-continuous. As a consequence, <math>f</math> is [[differentiable function\|differentiable]] at all but at most [[countable\|countably many]] points, the set on which <math>f</math> is not differentiable can however still be dense. If <math>C</math> is closed, then <math>f</math> may fail to be continuous at the endpoints of <math>C</math> (an example is shown in the [[#Examples\|examples section]]). * A [[differentiable function\|differentiable]] function of one variable is convex on an interval if and only if its [[derivative]] is [[monotonically non-decreasing]] on that interval. If a function is differentiable and convex then it is also [[continuously differentiable]] ~~(due to [[Darboux's theorem (analysis)\|Darboux's theorem]])~~. * A differentiable function of one variable is convex on an interval if and only if its graph lies above all of its [[tangent]]s:<ref name="boyd">{{cite book\| title=Convex Optimization\| first1=Stephen P.\|last1=Boyd \|first2=Lieven\| last2=Vandenberghe \| year = 2004 \|publisher=Cambridge University Press\| isbn=978-0-521-83378-3\| url= https://web.stanford.edu/~boyd/cvxbook/bv_cvxbook.pdf#page=83 \|format=pdf \| access-date=October 15, 2011}}</ref>{{rp\|69}} <math display=block>f(x) \geq f(y) + f'(y) (x-y)</math> for all <math>x</math> and <math>y</math> in the interval. * A twice differentiable function of one variable is convex on an interval if and only if its [[second derivative]] is non-negative there; this gives a practical test for convexity. Visually, a twice differentiable convex function "curves up", without any bends the other way ([[inflection point]]s). If its second derivative is positive at all points then the function is strictly convex, but the [[~~Theorem#Converse~~converse (logic)\|converse]] does not hold. For example, the second derivative of <math>f(x) = x^4</math> is <math>f''(x) = 12x^{2}</math>, which is zero for <math>x = 0,</math> but <math>x^4</math> is strictly convex. *This property and the above property in terms of "...its derivative is monotonically non-decreasing..." are not equal since if <math>f''</math> is non-negative on an interval <math>X</math> then <math>f'</math> is monotonically non-decreasing on <math>X</math> while its converse is not true, for example, <math>f'</math> is monotonically non-decreasing on <math>X</math> while its derivative <math>f''</math> is not defined at some points on <math>X</math>. If <math>f</math> is a convex function of one real variable, and <math>f(0)\le 0</math>, then <math>f</math> is [[Superadditivity\|superadditive]] on the [[positive reals]], that is <math>f(a + b) \geq f(a) + f(b)</math> for positive real numbers <math>a</math> and <math>b</math>. {{math proof\|proof= Since <math>f</math> is convex, by using one of the convex function definitions above and letting <math>x_2 = 0,</math> it follows that for all real <math>0 \leq t \leq 1,</math> <math display=block> \begin{align} f(tx_1) & = f(t x_1 + (1-t) \cdot 0) \\ & \leq t f(x_1) + (1-t) f(0) \\ & \leq t f(x_1). \~~rightarrow~~\ \end{align} </math> From <math>f(tx_1)\leq t f(x_1).</math> ~~From this~~, it follows that <math display=block> \begin{align} f(a) + f(b) & = f \left((a+b) \frac{a}{a+b} \right) + f \left((a+b) \frac{b}{a+b} \right) \\ & \leq \frac{a}{a+b} f(a+b) + \frac{b}{a+b} ~~f(a+b) =~~ f(a+b) \~~rightarrow f(a) + f(b)~~ \~~leq f(a+b).</math>~~ & = f(a+b).\\ \end{align}</math> Namely, <math>f(a) + f(b) \leq f(a+b)</math>. }} * A function <math>f</math> is midpoint convex on an interval <math>C</math> if for all <math>x_1, x_2 \in C</math> <math display=block>f\!\left(\frac{x_1 + x_2}{2}\right) \leq \frac{f(x_1) + f(x_2)}{2}.</math> This condition is only slightly weaker than convexity. For example, a real-valued [[Lebesgue measurable function]] that is midpoint-convex is convex: this is a theorem of [[Wacław Sierpiński\|~~Sierpinski~~Sierpiński]].<ref>{{cite book\|last=Donoghue\|first=William F.\| title= Distributions and Fourier Transforms\|year=1969\|publisher=Academic Press \| isbn=9780122206504 \|url= https://books.google.com/books?id=P30Y7daiGvQC&pg=PA12\|access-date=August 29, 2012\|page=12}}</ref> In particular, a continuous function that is midpoint convex will be convex. === Functions of several variables === * A function that is marginally convex in each individual variable is not necessarily (jointly) convex. For example, the function <math>f(x, y) = x y</math> is [[bilinear map\|marginally linear]], and thus marginally convex, in each variable, but not (jointly) convex. * A function <math>f : X \to [-\infty, \infty]</math> valued in the [[extended real ~~number~~numbers]]s <math>[-\infty, \infty] = \R \cup \{\pm\infty\}</math> is convex if and only if its [[Epigraph (mathematics)\|epigraph]] <math display=block>\{(x, r) \in X \times \R ~:~ r \geq f(x)\}</math> is a convex set. * A differentiable function <math>f</math> defined on a convex ___domain is convex if and only if <math>f(x) \geq f(y) + \nabla f(y)^T \cdot (x-y)</math> holds for all <math>x, y</math> in the ___domain. * A twice differentiable function of several variables is convex on a convex set if and only if its [[Hessian matrix]] of second [[partial derivative]]s is [[Positive-definite matrix\|positive semidefinite]] on the interior of the convex set. Line 91 ⟶ 107: ==Strongly convex functions== The concept of strong convexity extends and parametrizes the notion of strict convexity. Intuitively, a strongly-convex function is a function that grows as fast as a quadratic function.<ref>{{Cite web \|title=Strong convexity · Xingyu Zhou's blog \|url=https://xingyuzhou.org/blog/notes/strong-convexity \|access-date=2023-09-27 \|website=xingyuzhou.org}}</ref> A strongly convex function is also strictly convex, but not vice versa. If a one-dimensional function <math>f</math> is twice continuously differentiable and the ___domain is the real line, then we can characterize it as follows: <math>f</math> convex if and only if <math>f''(x) \ge 0</math> for all <math>x.</math>▼ <math>f</math> strictly convex if <math>f''(x) > 0</math> for all <math>x</math> (note: this is sufficient, but not necessary).▼ <math>f</math> strongly convex if and only if <math>f''(x) \ge m > 0</math> for all <math>x.</math>▼ For example, let <math>f</math> be strictly convex, and suppose there is a sequence of points <math>(x_n)</math> such that <math>f''(x_n) = \tfrac{1}{n}</math>. Even though <math>f''(x_n) > 0</math>, the function is not strongly convex because <math>f''(x)</math> will become arbitrarily small.▼ AMore generally, a differentiable function <math>f</math> is called strongly convex with parameter <math>m > 0</math> if the following inequality holds for all points <math>x, y</math> in its ___domain:<ref name="bertsekas">{{cite book\|page=[https://archive.org/details/convexanalysisop00bert_476/page/n87 72]\|title=Convex Analysis and Optimization\|url=https://archive.org/details/convexanalysisop00bert_476\|url-access=limited\|author=Dimitri Bertsekas\| others= Contributors: Angelia Nedic and Asuman E. Ozdaglar\|publisher=Athena Scientific\|year=2003\|isbn=9781886529458}}</ref><math display="block">(\nabla f(x) - \nabla f(y) )^T (x-y) \ge m \\|x-y\\|_2^2 </math> ~~<math display=block>(\nabla f(x) - \nabla f(y) )^T (x-y) \ge m \\|x-y\\|_2^2 </math>~~ or, more generally, <math display=block>\langle \nabla f(x) - \nabla f(y), x-y \rangle \ge m \\|x-y\\|^2 </math> Line 120 ⟶ 140: <math display=block>x\mapsto f(x) -\frac m 2 \\|x\\|^2</math> is convex. The distinction between convex, strictly convex, and strongly convex can be subtle at first glance. If <math>f</math> is twice continuously differentiable and the ___domain is the real line, then we can characterize it as follows: ▲<math>f</math> convex if and only if <math>f''(x) \ge 0</math> for all <math>x.</math> ▲<math>f</math> strictly convex if <math>f''(x) > 0</math> for all <math>x</math> (note: this is sufficient, but not necessary). ▲<math>f</math> strongly convex if and only if <math>f''(x) \ge m > 0</math> for all <math>x.</math> ▲For example, let <math>f</math> be strictly convex, and suppose there is a sequence of points <math>(x_n)</math> such that <math>f''(x_n) = \tfrac{1}{n}</math>. Even though <math>f''(x_n) > 0</math>, the function is not strongly convex because <math>f''(x)</math> will become arbitrarily small. A twice continuously differentiable function <math>f</math> on a compact ___domain <math>X</math> that satisfies <math>f''(x)>0</math> for all <math>x\in X</math> is strongly convex. The proof of this statement follows from the [[extreme value theorem]], which states that a continuous function on a compact set has a maximum and minimum. Line 132 ⟶ 145: Strongly convex functions are in general easier to work with than convex or strictly convex functions, since they are a smaller class. Like strictly convex functions, strongly convex functions have unique minima on compact sets. ===~~Uniformly~~ Properties of strongly-convex functions === If ''f'' is a strongly-convex function with parameter ''m'', then:<ref name=":0">{{Cite web \|last=Nemirovsky and Ben-Tal \|date=2023 \|title=Optimization III: Convex Optimization \|url=http://www2.isye.gatech.edu/~nemirovs/OPTIIILN2023Spring.pdf}}</ref>{{Rp\|___location=Prop.6.1.4}} * For every real number ''r'', the [[level set]] {''x'' \| ''f''(''x'') ≤ ''r''} is [[Compact space\|compact]]. * The function ''f'' has a unique [[global minimum]] on ''R<sup>n</sup>''. == Uniformly convex functions == A uniformly convex function,<ref name="Zalinescu">{{cite book\|title=Convex Analysis in General Vector Spaces\|author=C. Zalinescu\|publisher=World Scientific\|year=2002\|isbn=9812380671}}</ref><ref name="Bauschke">{{cite book\|page=[https://archive.org/details/convexanalysismo00hhba/page/n161 144]\|title=Convex Analysis and Monotone Operator Theory in Hilbert Spaces \|url=https://archive.org/details/convexanalysismo00hhba\|url-access=limited\|author=H. Bauschke and P. L. Combettes \|publisher=Springer \|year=2011 \|isbn=978-1-4419-9467-7}}</ref> with modulus <math>\phi</math>, is a function <math>f</math> that, for all <math>x, y</math> in the ___domain and <math>t \in [0, 1],</math> satisfies <math display="block">f(tx+(1-t)y) \le t f(x)+(1-t)f(y) - t(1-t) \phi(\\|x-y\\|)</math> where <math>\phi</math> is a function that is non-negative and vanishes only at 0. This is a generalization of the concept of strongly convex function; by taking <math>\phi(\alpha) = \tfrac{m}{2} \alpha^2</math> we recover the definition of strong convexity. Line 146 ⟶ 164: * The [[absolute value]] function <math>f(x)=\|x\|</math> is convex (as reflected in the [[triangle inequality]]), even though it does not have a derivative at the point <math>x = 0.</math> It is not strictly convex. * The function <math>f(x)=\|x\|^p</math> for <math>p \ge 1</math> is convex. * The [[exponential function]] <math>f(x)=e^x</math> is convex. It is also strictly convex, since <math>f''(x)=e^x >0 </math>, but it is not strongly convex since the second derivative can be arbitrarily close to zero. More generally, the function <math>g(x) = e^{f(x)}</math> is [[Logarithmically convex function\|logarithmically convex]] if <math>f</math> is a convex function. The term "superconvex" is sometimes used instead.<ref>{{Cite journal \| last1 = Kingman \| first1 = J. F. C. \| doi = 10.1093/qmath/12.1.283 \| title = A Convexity Property of Positive Matrices \| journal = The Quarterly Journal of Mathematics \| volume = 12 \| pages = 283–284 \| year = 1961 \| bibcode = 1961QJMat..12..283K }}</ref> * The function <math>f</math> with ___domain [0,1] defined by <math>f(0) = f(1) = 1, f(x) = 0</math> for <math>0 < x < 1</math> is convex; it is continuous on the open interval <math>(0, 1),</math> but not continuous at 0 and 1. * The function <math>x^3</math> has second derivative <math>6 x</math>; thus it is convex on the set where <math>x \geq 0</math> and [[concave function\|concave]] on the set where <math>x \leq 0.</math>