Inverse function theorem: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 11:11, 27 December 2023 edit D.Lazard (talk \| contribs) Extended confirmed users 35,614 edits →Statements: clarification ← Previous edit		Latest revision as of 10:12, 22 August 2025 edit undo Lester Mobley (talk \| contribs) Extended confirmed users 1,836 edits →Over a real closed field: add disambiguating chapter no. to reference
(46 intermediate revisions by 14 users not shown)
Line 2: {{Use dmy dates\|date=December 2023}} {{Calculus}} In [[~~mathematics~~real analysis]], ~~specifically~~a branch of [[~~differential calculus~~mathematics]], the '''inverse function theorem''' ~~gives~~is a [[~~Necessity and sufficiency\|sufficient condition~~theorem]] ~~for~~that aasserts ~~[[function~~that, ~~(mathematics)\|function]]~~if ~~to be~~a [[~~Invertible~~real function~~\|invertible~~]] in''f'' has a [[~~Neighbourhood~~continuously ~~(mathematics)~~differentiable function\|~~neighborhood~~continuous derivative]] ofnear a point inwhere its ~~[[___domain~~derivative ofis ~~a function\|___domain]]:~~nonzero, ~~namely~~then, ~~that~~near this point, ~~its~~ ''~~derivative~~f'' ishas ~~continuous~~an ~~and~~[[inverse ~~non-zero at the point''~~function]]. The ~~theorem~~inverse ~~also~~function ~~gives~~is aalso [[~~formula~~differentiable function\|differentiable]], ~~for~~and the ''[[~~derivative~~inverse function rule]]'' ofexpresses its derivative as the [[~~inverse~~multiplicative ~~function~~inverse]] of the derivative of ''f''. In [[multivariable calculus]], this theorem can be generalized to any [[continuously differentiable]], [[vector-valued function]] whose [[Jacobian determinant]] is nonzero at a point in its ___domain, giving a formula for the [[Jacobian matrix]] of the inverse. There are also versions of the inverse function theorem for [[complex numbers\|complex]] [[holomorphic function]]s, for differentiable maps between [[manifold]]s, for differentiable functions between [[Banach space]]s, and so forth.▼ The theorem applies verbatim to [[complex-valued function]]s of a [[complex number\|complex variable]]. It generalizes to functions from ''n''-[[tuples]] (of real or complex numbers) to ''n''-tuples, and to functions between [[vector space]]s of the same finite dimension, by replacing "derivative" with "[[Jacobian matrix]]" and "nonzero derivative" with "nonzero [[Jacobian determinant]]". ▲InIf ~~[[multivariable~~the ~~calculus]],~~function ~~this~~of the theorem ~~can~~belongs beto ~~generalized~~a ~~to any~~higher [[~~continuously~~differentiability ~~differentiable~~class]], ~~[[vector-valued~~the ~~function]] whose [[Jacobian determinant]]~~same is ~~nonzero at a point in its ___domain, giving a formula~~true for the ~~[[Jacobian~~inverse ~~matrix]] of the inverse~~function. There are also versions of the inverse function theorem for ~~[[complex numbers\|complex]]~~ [[holomorphic function]]s, for differentiable maps between [[manifold]]s, for differentiable functions between [[Banach space]]s, and so forth. The theorem was first established by [[Émile Picard\|Picard]] and [[Édouard Goursat\|Goursat]] using an iterative scheme: the basic idea is to prove a [[fixed point theorem]] using the [[contraction mapping theorem]]. Line 9 ⟶ 13: ==Statements== For functions of a single [[Variable (mathematics)\|variable]], the theorem states that if <math>f</math> is a [[continuously differentiable]] function with nonzero derivative at the point <math>a</math>; then <math>f</math> is injective (or bijective onto the image) in a neighborhood of <math>a</math>, the inverse is continuously differentiable near <math>b=f(a)</math>, and the derivative of the inverse function at <math>b</math> is the reciprocal of the derivative of <math>f</math> at <math>a</math>: <math display=block>\bigl(f^{-1}\bigr)'(b) = \frac{1}{f'(a)} = \frac{1}{f'(f^{-1}(b))}.</math><!-- Not sure the meaning of the following alternative version; if the function is already injective, the theorem gives nothing: An alternate version, which assumes that <math>f</math> is [[Continuous function\|continuous]] and [[Locally injective function\|injective near {{Mvar\|a}}]], and differentiable at {{Mvar\|a}} with a non-zero derivative, will also result in <math>f</math> being invertible near {{Mvar\|a}}, with an inverse that's similarly continuous and [[Injective function\|injective]] , and where the above formula would apply as well. --> It can happen that a function <math>f</math> may be injective near a point <math>a</math> while <math>f'(a) = 0</math>. An example is <math>f(x) = (x - a)^3</math>. In fact, for such a function, the inverse cannot be differentiable at <math>b = f(a)</math>, since if <math>f^{-1}</math> were differentiable at <math>b</math>, then, by the chain rule, <math>1 = (f^{-1} \circ f)'(a) = (f^{-1})'(b)f'(a)</math>, which implies <math>f'(a) \ne 0</math>. (The situation is different for holomorphic functions; see [[#Holomorphic inverse function theorem]] below.) For functions of more than one variable, the theorem states that if ~~{{Mvar\|~~<math>f}}</math> is a continuously differentiable function from an open subset <math>A</math> of <math>\mathbb{R}^n</math> into <math>\R^n</math>, and the [[total derivative\|derivative]] <math>f'(a)</math> is invertible at a point {{Mvar\|a}} (that is, the determinant of the [[Jacobian matrix and determinant\|Jacobian matrix]] of {{Mvar\|f}} at {{Mvar\|a}} is non-zero), then there exist neighborhoods <math>U</math> of <math>a</math> in <math>A</math> and <math>V</math> of <math>b = f(a)</math> such that <math>f(U) \subset V</math> and <math>f : U \to V</math> is bijective.<ref name="Hörmander">Theorem 1.1.7. in {{cite book\|title=The Analysis of Linear Partial Differential Operators I: Distribution Theory and Fourier Analysis\|series=Classics in Mathematics\|first=Lars\|last= Hörmander\|author-link=Lars Hörmander\|publisher=Springer\|year= 2015\|edition=2nd\| isbn= 978-3-642-61497-2}}</ref> Writing <math>f=(f_1,\ldots,f_n)</math>, this means that the system of {{Mvar\|n}} equations <math>y_i = f_i(x_1, \dots, x_n)</math> has a unique solution for <math>x_1, \dots, x_n</math> in terms of <math>y_1, \dots, y_n</math> when <math>x \in U, y \in V</math>. Note that the theorem ''does not'' say <math>f</math> is bijective onto the image where <math>f'</math> is invertible but that it is locally bijective where <math>f'</math> is invertible. Line 23 ⟶ 27: There are two variants of the inverse function theorem.<ref name="Hörmander" /> Given a continuously differentiable map <math>f : U \to \mathbb{R}^m</math>, the first is The derivative <math>f'(a)</math> is surjective (i.e., the Jacobian matrix representing it has rank <math>m</math>) if and only if there exists a continuously differentiable function <math>g</math> on a neighborhood <math>V</math> of <math>b = f(a)</math> such that <math>f \circ g = I</math> near <math>b</math>, and the second is The derivative <math>f'(a)</math> is injective if and only if there exists a continuously differentiable function <math>g</math> on a neighborhood <math>V</math> of <math>b = f(a)</math> such that <math>g \circ f = I</math> near <math>a</math>. In the first case (when <math>f'(a)</math> is surjective), the point <math>b = f(a)</math> is called a [[regular value]]. Since <math>m = \dim \ker(f'(a)) + \dim \operatorname{im}(f'(a))</math>, the first case is equivalent to saying <math>b = f(a)</math> is not in the image of [[~~Critical_point_~~Critical point (mathematics)#~~Critical_point_of_a_differentiable_map~~Critical point of a differentiable map\|critical points]] <math>a</math> (a critical point is a point <math>a</math> such that the kernel of <math>f'(a)</math> is nonzero). The statement in the first case is a special case of the [[submersion theorem]]. These variants are restatements of the inverse functions theorem. Indeed, in the first case when <math>f'(a)</math> is surjective, we can find an (injective) linear map <math>T</math> such that <math>f'(a) \circ T = I</math>. Define <math>h(x) = a + Tx</math> so that we have: Line 42 ⟶ 46: \end{bmatrix}. </math> The Jacobian matrix of it at <math>(x, y)</math> is: :<math> ~~J_F~~JF(x,y)= \begin{bmatrix} {e^x \cos y} & {-e^x \sin y}\\ Line 50 ⟶ 54: \end{bmatrix} </math> with ~~Jacobian~~the determinant: :<math> \det ~~J_F~~JF(x,y)= e^{2x} \cos^2 y + e^{2x} \sin^2 y= e^{2x}. Line 69 ⟶ 73: Since the fixed point theorem applies in infinite-dimensional (Banach space) settings, this proof generalizes immediately to the infinite-dimensional version of the inverse function theorem<ref>{{Cite web\|url=https://r-grande.github.io/Expository/Inverse%20Function%20Theorem.pdf \|title=Inverse Function Theorem\|last=Jaffe\|first=Ethan}}</ref> (see [[Inverse function theorem#Generalizations\|Generalizations]] below). An alternate proof in finite dimensions hinges on the [[extreme value theorem]] for functions on a [[compact set]].<ref name="spivak_manifolds">{{harvnb\|Spivak\|1965\|loc=pages 31–35 }}</ref> This approach has an advantage that the proof generalizes to a situation where there is no Cauchy completeness (see {{section link\|\|Over_a_real_closed_field}}). Yet another proof uses [[Newton's method]], which has the advantage of providing an [[effective method\|effective version]] of the theorem: bounds on the derivative of the function imply an estimate of the size of the neighborhood on which the function is invertible.<ref name="hubbard_hubbard">{{cite book \|first1=John H. \|last1=Hubbard \|author-link=John H. Hubbard \|first2=Barbara Burke \|last2=Hubbard\|author2-link=Barbara Burke Hubbard \|title=Vector Analysis, Linear Algebra, and Differential Forms: A Unified Approach \|edition=Matrix \|year=2001 }}</ref> === Proof for single-variable functions === We want to prove the following: ''Let <math>D \subseteq \R</math> be an open set with <math>x_0 \in D, f: D \to \R</math> a continuously differentiable function defined on <math>D</math>, and suppose that <math>f'(x_0) \ne 0</math>. Then there exists an open interval <math>I</math> with <math>x_0 \in I</math> such that <math>f</math> maps <math>I</math> bijectively onto the open interval <math>J = f(I)</math>, and such that the inverse function <math>f^{-1} : J \to I</math> is continuously differentiable, and for any <math>y \in J</math>, if <math>x \in I</math> is such that <math>f(x) = y</math>, then <math>(f^{-1})'(y) = \dfrac{1}{f'(x)}</math>.'' We may without loss of generality assume that <math>f'(x_0) > 0</math>. Given that <math>D</math> is an open set and <math>f'</math> is continuous at <math>x_0</math>, there exists <math>r > 0</math> such that <math>(x_0 - r, x_0 + r) \subseteq D</math> and<math display="block">\|f'(x) - f'(x_0)\| < \dfrac{f'(x_0)}{2} \qquad \text{for all } \|x - x_0\| < r.</math> In particular,<math display="block">f'(x) > \dfrac{f'(x_0)}{2} >0 \qquad \text{for all } \|x - x_0\| < r.</math> This shows that <math>f</math> is strictly increasing for all <math>\|x - x_0\| < r</math>. Let <math>\delta > 0</math> be such that <math>\delta < r</math>. Then <math>[x - \delta, x + \delta] \subseteq (x_0 - r, x_0 + r)</math>. By the intermediate value theorem, we find that <math>f</math> maps the interval <math>[x - \delta, x + \delta]</math> bijectively onto <math>[f(x - \delta), f(x + \delta)]</math>. Denote by <math>I = (x-\delta, x+\delta)</math> and <math>J = (f(x - \delta),f(x + \delta))</math>. Then <math>f: I \to J</math> is a bijection and the inverse <math>f^{-1}: J \to I</math> exists. The fact that <math>f^{-1}: J \to I</math> is differentiable follows from the differentiability of <math>f</math>. In particular, the result follows from the fact that if <math>f: I \to \R</math> is a strictly monotonic and continuous function that is differentiable at <math>x_0 \in I</math> with <math>f'(x_0) \ne 0</math>, then <math>f^{-1}: f(I) \to \R</math> is differentiable with <math>(f^{-1})'(y_0) = \dfrac{1}{f'(y_0)}</math>, where <math>y_0 = f(x_0)</math> (a standard result in analysis). This completes the proof. === A proof using successive approximation === Line 77 ⟶ 90: To prove existence, it can be assumed after an affine transformation that <math>f(0)=0</math> and <math>f^\prime(0)=I</math>, so that <math> a=b=0</math>. By the [[Mean value theorem#Mean value theorem for vector-valued functions\|mean value theorem for vector-valued functions]], for a differentiable function <math>u:[0,1]\to\mathbb R^m</math>, <math display="inline">\\|u(1)-u(0)\\|\le \sup_{0\le t\le 1} \\|u^\prime(t)\\|</math>. Setting <math>u(t)=f(x+t(x^\prime -x)) - x-t(x^\prime-x)</math>, it follows that :<math>\\|f(x) - f(x^\prime) - x + x^\prime\\| \le \\|x -x^\prime\\|\,\sup_{0\le t \le 1} \\|f^\prime(x+t(x^\prime -x))-I\\|.</math> Line 90 ⟶ 103: To check that <math>g=f^{-1}</math> is C<sup>1</sup>, write <math>g(y+k) = x+h</math> so that <math>f(x+h)=f(x)+k</math>. By the inequalities above, <math>\\|h-k\\| <\\|h\\|/2</math> so that <math>\\|h\\|/2<\\|k\\| < 2\\|h\\|</math>. On the other hand, if <math>A=f^\prime(x)</math>, then <math>\\|A-I\\|<1/2</math>. Using the [[geometric series]] for <math>B=I-A</math>, it follows that <math>\\|A^{-1}\\| < 2</math>. But then :<math> {\\|g(y+k) -g(y) - f^\prime(g(y))^{-1}k \\| \over \\|k\\|} Line 104 ⟶ 117: Here is a proof based on the [[contraction mapping theorem]]. Specifically, following T. Tao,<ref>Theorem 17.7.2 in {{cite book\|mr=3310023\|last1=Tao\|first1=Terence\|title=Analysis. II\|edition=Third edition of 2006 original\|series=Texts and Readings in Mathematics\|volume=38\|publisher=Hindustan Book Agency\|___location=New Delhi\|year=2014\|isbn=978-93-80250-65-6\|zbl=1300.26003}}</ref> it uses the following consequence of the contraction mapping theorem. {{math_theorem\|name=Lemma\|math_statement=Let <math>B(0, r)</math> denote an open ball of radius ''r'' in <math>\mathbb{R}^n</math> with center 0. Ifand <math>g : B(0, r) \to \mathbb{R}^n</math> is a map ~~such that <math>g(0) = 0</math> and there exists~~with a constant <math>0 < c < 1</math> such that :<math>\|g(y) - g(x)\| \le c\|y-x\|</math> for all <math>x, y</math> in <math>B(0, r)</math>,. Then ~~then~~for <math>f = I + g</math> ~~is injective~~ on <math>B(0, r)</math>, ~~and~~we have :<math>~~B(0,~~ (1-c)r)\|x - y\| \~~subset~~le \|f(~~B(0, r)~~x) ~~\subset~~- Bf(0y)\|, ~~(1+c)r)~~</math>. in particular, ''f'' is injective. If, moreover, <math>g(0) = 0</math>, then :<math>B(0, (1-c)r) \subset f(B(0, r)) \subset B(0, (1+c)r)</math>. (More generally, the statement remains true if <math>\mathbb{R}^n</math> is replaced by a Banach space.) Also, the first part of the lemma is true for any normed space.}} Basically, the lemma says that a small perturbation of the identity map by a contraction map is injective and preserves a ball in some sense. Assuming the lemma for a moment, we prove the theorem first. As in the above proof, it is enough to prove the special case when <math>a = 0, b = f(a) = 0</math> and <math>f'(0) = I</math>. Let <math>g = f - I</math>. The [[mean value inequality]] applied to <math>t \mapsto g(x + t(y - x))</math> says: Line 122 ⟶ 138: As <math>k \to 0</math>, we have <math>h \to 0</math> and <math>\|h\|/\|k\|</math> is bounded. Hence, <math>g</math> is differentiable at <math>y</math> with the derivative <math>g'(y) = f'(g(y))^{-1}</math>. Also, <math>g'</math> is the same as the composition <math>\iota \circ f' \circ g</math> where <math>\iota : T \mapsto T^{-1}</math>; so <math>g'</math> is continuous. It remains to show the lemma. First, ~~the~~we ~~map <math>f</math> is injective on <math>B(0, r)</math> since if <math>f(x) = f(y)</math>, then <math>g(y) - g(x) = x - y</math> and so~~have: :<math>\|g(x - y\| - \|f(x) - f(y)\| \le \|g(x) - g(y)\| =\le c\|yx - xy\|,</math>, which is to say which is a contradiction unless <math>y = x</math>. (This part does not need the assumption <math>g(0) = 0</math>.) Next we show <math>f(B(0, r)) \supset B(0, (1-c)r)</math>. The idea is to note that this is equivalent to, given a point <math>y</math> in <math>B(0, (1-c) r)</math>, find a fixed point of the map▼ :<math>(1 - c)\|x - y\| \le \|f(x) - f(y)\|.</math> ▲~~which is a contradiction unless <math>y = x</math>. (~~This ~~part does not need~~proves the ~~assumption~~first ~~<math>g(0) = 0</math>~~part.) Next, we show <math>f(B(0, r)) \supset B(0, (1-c)r)</math>. The idea is to note that this is equivalent to, given a point <math>y</math> in <math>B(0, (1-c) r)</math>, find a fixed point of the map :<math>F : \overline{B}(0, r') \to \overline{B}(0, r'), \, x \mapsto y - g(x)</math> where <math>0 < r' < r</math> such that <math>\|y\| \le (1-c)r'</math> and the bar means a closed ball. To find a fixed point, we use the contraction mapping theorem and checking that <math>F</math> is a well-defined strict-contraction mapping is straightforward. Finally, we have: <math>f(B(0, r)) \subset B(0, (1+c)r)</math> since Line 146 ⟶ 164: *given a map <math>f : \mathbb{R}^n \times \mathbb{R}^m \to \mathbb{R}^m</math>, if <math>f(a, b) = 0</math>, <math>f</math> is continuously differentiable in a neighborhood of <math>(a, b)</math> and the derivative of <math>y \mapsto f(a, y)</math> at <math>b</math> is invertible, then there exists a differentiable map <math>g : U \to V</math> for some neighborhoods <math>U, V</math> of <math>a, b</math> such that <math>f(x, g(x)) = 0</math>. Moreover, if <math>f(x, y) = 0, x \in U, y \in V</math>, then <math>y = g(x)</math>; i.e., <math>g(x)</math> is a unique solution. To see this, consider the map <math>F(x, y) = (x, f(x, y))</math>. By the inverse function theorem, <math>F : U \times V \to W</math> has the inverse <math>G</math> for some neighborhoods <math>U, V, W</math>. We then have: :<math>(x, y) = F(G_1(x, y), G_2(x, y)) = (G_1(x, y), f(G_1(x, y), G_2(x, y))),</math> implying <math>x = G_1(x, y)</math> and <math>y = f(x, G_2(x, y)).</math> Thus <math>g(x) = G_2(x, 0)</math> has the required property. <math>\square</math> Line 172 ⟶ 190: The lemma implies the following (a sort of) global version of the inverse function theorem: {{math_theorem\|name=Inverse function theorem\|math_statement=<ref>Ch. I., § 3, Exercise 10. and § 8, Exercise 14. in V. Guillemin, A. Pollack. "Differential Topology". Prentice-Hall Inc., 1974. ISBN 0-13-212605-2.</ref> Let <math>f : U \to V</math> be a map between open subsets of <math>\mathbb{R}^n~~, \mathbb{R}^m~~</math> or more generally of manifolds. Assume <math>f</math> is continuously differentiable (or is <math>C^k</math>). If <math>f</math> is injective on a closed subset <math>A \subset U</math> and if the Jacobian matrix of <math>f</math> is invertible at each point of <math>A</math>, then <math>f</math> is injective inon a neighborhood <math>A'</math> of <math>A</math> and <math>f^{-1} : f(A') \to A'</math> is continuously differentiable (or is <math>C^k</math>).}} Note that if <math>A</math> is a point, then the above is the usual inverse function theorem. Line 216 ⟶ 234: ===Selections=== When <math>f: \mathbb{R}^n \to \mathbb{R}^m</math> with <math>m\leq n</math>, <math>f</math> is <math>k</math> times [[continuously differentiable]], and the Jacobian <math>A=\nabla f(\overline{x})</math> at a point <math>\overline{x}</math> is of [[rank (linear algebra)\|rank]] <math>m</math>, the inverse of <math>f</math> may not be unique. However, there exists a local [[Choice function#Choice function of a multivalued map\|selection function]] <math>s</math> such that <math>f(s(y)) = y</math> for all <math>y</math> in a [[neighborhood (mathematics)\|neighborhood]] of <math>\overline{y} = f(\overline{x})</math>, <math>s(\overline{y}) = \overline{x}</math>, <math>s</math> is <math>k</math> times continuously differentiable in this neighborhood, and <math>\nabla s(\overline{y}) = A^T(A A^T)^{-1}</math> (<math>\nabla s(\overline{y})</math> is the [[Moore–Penrose pseudoinverse]] of <math>A</math>).<ref>{{cite book \|last1=Dontchev \|first1=Asen L. \|last2=Rockafellar \|first2=R. Tyrrell \|title=Implicit Functions and Solution Mappings: A View from Variational Analysis \|date=2014 \|publisher=Springer-Verlag \|___location=New York \|isbn=978-1-4939-1036-6 \|page=54 \|edition=Second}}</ref> === Over a real closed field === The inverse function theorem also holds over a [[real closed field]] ''k'' (or an [[o-minimal structure]]).<ref>Chapter 7, Theorem 2.11. in {{cite book \|doi=10.1017/CBO9780511525919\|title=Tame Topology and O-minimal Structures. London Mathematical Society lecture note series, no. 248\|year=1998 \|last1=Dries \|first1=L. P. D. van den \|authorlink = Lou van den Dries\|isbn=9780521598385\|publisher=Cambridge University Press\|___location=Cambridge, New York, and Oakleigh, Victoria }}</ref> Precisely, the theorem holds for a semialgebraic (or definable) map between open subsets of <math>k^n</math> that is continuously differentiable. The usual proof of the IFT uses Banach's fixed point theorem, which relies on the Cauchy completeness. That part of the argument is replaced by the use of the [[extreme value theorem]], which does not need completeness. Explicitly, in {{section link\|\|A_proof_using_the_contraction_mapping_principle}}, the Cauchy completeness is used only to establish the inclusion <math>B(0, r/2) \subset f(B(0, r))</math>. Here, we shall directly show <math>B(0, r/4) \subset f(B(0, r))</math> instead (which is enough). Given a point <math>y</math> in <math>B(0, r/4)</math>, consider the function <math>P(x) = \|f(x) - y\|^2</math> defined on a neighborhood of <math>\overline{B}(0, r)</math>. If <math>P'(x) = 0</math>, then <math>0 = P'(x) = 2[f_1(x) - y_1 \cdots f_n(x) - y_n]f'(x)</math> and so <math>f(x) = y</math>, since <math>f'(x)</math> is invertible. Now, by the extreme value theorem, <math>P</math> admits a minimal at some point <math>x_0</math> on the closed ball <math>\overline{B}(0, r)</math>, which can be shown to lie in <math>B(0, r)</math> using <math>2^{-1}\|x\| \le \|f(x)\|</math>. Since <math>P'(x_0) = 0</math>, <math>f(x_0) = y</math>, which proves the claimed inclusion. <math>\square</math> Alternatively, one can deduce the theorem from the one over real numbers by [[Tarski's principle]].{{citation needed\|date=December 2024}} ==See also==