Inverse function theorem: Difference between revisions

Content deleted Content added
Over a real closed field: add disambiguating chapter no. to reference
 
(34 intermediate revisions by 13 users not shown)
Line 2:
{{Use dmy dates|date=December 2023}}
{{Calculus}}
In [[mathematicsreal analysis]], specificallya branch of [[differential calculusmathematics]], the '''inverse function theorem''' givesis a [[Necessity and sufficiency|sufficient conditiontheorem]] forthat aasserts [[functionthat, (mathematics)|function]]if to bea [[Invertiblereal function|invertible]] in''f'' has a [[Neighbourhoodcontinuously (mathematics)differentiable function|neighborhoodcontinuous derivative]] ofnear a point inwhere its [[___domainderivative ofis a function|___domain]]:nonzero, namelythen, thatnear this point, its ''derivativef'' ishas continuousan and[[inverse non-zero at the point''function]]. The theoreminverse alsofunction givesis aalso [[formuladifferentiable function|differentiable]], forand the ''[[derivativeinverse function rule]]'' ofexpresses its derivative as the [[inversemultiplicative functioninverse]] of the derivative of ''f''.
 
In [[multivariable calculus]], this theorem can be generalized to any [[continuously differentiable]], [[vector-valued function]] whose [[Jacobian determinant]] is nonzero at a point in its ___domain, giving a formula for the [[Jacobian matrix]] of the inverse. There are also versions of the inverse function theorem for [[complex numbers|complex]] [[holomorphic function]]s, for differentiable maps between [[manifold]]s, for differentiable functions between [[Banach space]]s, and so forth.
The theorem applies verbatim to [[complex-valued function]]s of a [[complex number|complex variable]]. It generalizes to functions from
''n''-[[tuples]] (of real or complex numbers) to ''n''-tuples, and to functions between [[vector space]]s of the same finite dimension, by replacing "derivative" with "[[Jacobian matrix]]" and "nonzero derivative" with "nonzero [[Jacobian determinant]]".
 
InIf [[multivariablethe calculus]],function thisof the theorem canbelongs beto generalizeda to anyhigher [[continuouslydifferentiability differentiableclass]], [[vector-valuedthe function]] whose [[Jacobian determinant]]same is nonzero at a point in its ___domain, giving a formulatrue for the [[Jacobianinverse matrix]] of the inversefunction. There are also versions of the inverse function theorem for [[complex numbers|complex]] [[holomorphic function]]s, for differentiable maps between [[manifold]]s, for differentiable functions between [[Banach space]]s, and so forth.
 
The theorem was first established by [[Émile Picard|Picard]] and [[Édouard Goursat|Goursat]] using an iterative scheme: the basic idea is to prove a [[fixed point theorem]] using the [[contraction mapping theorem]].
Line 9 ⟶ 13:
==Statements==
For functions of a single [[Variable (mathematics)|variable]], the theorem states that if <math>f</math> is a [[continuously differentiable]] function with nonzero derivative at the point <math>a</math>; then <math>f</math> is injective (or bijective onto the image) in a neighborhood of <math>a</math>, the inverse is continuously differentiable near <math>b=f(a)</math>, and the derivative of the inverse function at <math>b</math> is the reciprocal of the derivative of <math>f</math> at <math>a</math>:
<math display=block>\bigl(f^{-1}\bigr)'(b) = \frac{1}{f'(a)} = \frac{1}{f'(f^{-1}(b))}.</math><!-- Not sure the meaning of the following alternative version; if the function is already injective, the theorem gives nothing: An alternate version, which assumes that <math>f</math> is [[Continuous function|continuous]] and [[Locally injective function|injective near {{Mvar|a}}]], and differentiable at {{Mvar|a}} with a non-zero derivative, will also result in <math>f</math> being invertible near {{Mvar|a}}, with an inverse that's similarly continuous and [[Injective function|injective]] , and where the above formula would apply as well. -->
 
It can happen that a function <math>f</math> may be injective near a point <math>a</math> while <math>f'(a) = 0</math>. An example is <math>f(x) = (x - a)^3</math>. In fact, for such a function, the inverse cannot be differentiable at <math>b = f(a)</math>, since if <math>f^{-1}</math> were differentiable at <math>b</math>, then, by the chain rule, <math>1 = (f^{-1} \circ f)'(a) = (f^{-1})'(b)f'(a)</math>, which implies <math>f'(a) \ne 0</math>. (The situation is different for holomorphic functions; see [[#Holomorphic inverse function theorem]] below.)
Line 23 ⟶ 27:
 
There are two variants of the inverse function theorem.<ref name="Hörmander" /> Given a continuously differentiable map <math>f : U \to \mathbb{R}^m</math>, the first is
*The derivative <math>f'(a)</math> is surjective (i.e., the Jacobian matrix representing it has rank <math>m</math>) if and only if there exists a continuously differentiable function <math>g</math> on a neighborhood <math>V</math> of <math>b = f(a)</math> such that <math>f \circ g = I</math> near <math>b</math>,
and the second is
*The derivative <math>f'(a)</math> is injective if and only if there exists a continuously differentiable function <math>g</math> on a neighborhood <math>V</math> of <math>b = f(a)</math> such that <math>g \circ f = I</math> near <math>a</math>.
 
In the first case (when <math>f'(a)</math> is surjective), the point <math>b = f(a)</math> is called a [[regular value]]. Since <math>m = \dim \ker(f'(a)) + \dim \operatorname{im}(f'(a))</math>, the first case is equivalent to saying <math>b = f(a)</math> is not in the image of [[Critical_point_Critical point (mathematics)#Critical_point_of_a_differentiable_mapCritical point of a differentiable map|critical points]] <math>a</math> (a critical point is a point <math>a</math> such that the kernel of <math>f'(a)</math> is nonzero). The statement in the first case is a special case of the [[submersion theorem]].
 
These variants are restatements of the inverse functions theorem. Indeed, in the first case when <math>f'(a)</math> is surjective, we can find an (injective) linear map <math>T</math> such that <math>f'(a) \circ T = I</math>. Define <math>h(x) = a + Tx</math> so that we have:
Line 42 ⟶ 46:
\end{bmatrix}.
</math>
The Jacobian matrix of it at <math>(x, y)</math> is:
:<math>
J_FJF(x,y)=
\begin{bmatrix}
{e^x \cos y} & {-e^x \sin y}\\
Line 50 ⟶ 54:
\end{bmatrix}
</math>
with Jacobianthe determinant:
:<math>
\det J_FJF(x,y)=
e^{2x} \cos^2 y + e^{2x} \sin^2 y=
e^{2x}.
Line 72 ⟶ 76:
 
Yet another proof uses [[Newton's method]], which has the advantage of providing an [[effective method|effective version]] of the theorem: bounds on the derivative of the function imply an estimate of the size of the neighborhood on which the function is invertible.<ref name="hubbard_hubbard">{{cite book |first1=John H. |last1=Hubbard |author-link=John H. Hubbard |first2=Barbara Burke |last2=Hubbard|author2-link=Barbara Burke Hubbard |title=Vector Analysis, Linear Algebra, and Differential Forms: A Unified Approach |edition=Matrix |year=2001 }}</ref>
 
=== Proof for single-variable functions ===
We want to prove the following: ''Let <math>D \subseteq \R</math> be an open set with <math>x_0 \in D, f: D \to \R</math> a continuously differentiable function defined on <math>D</math>, and suppose that <math>f'(x_0) \ne 0</math>. Then there exists an open interval <math>I</math> with <math>x_0 \in I</math> such that <math>f</math> maps <math>I</math> bijectively onto the open interval <math>J = f(I)</math>, and such that the inverse function <math>f^{-1} : J \to I</math> is continuously differentiable, and for any <math>y \in J</math>, if <math>x \in I</math> is such that <math>f(x) = y</math>, then <math>(f^{-1})'(y) = \dfrac{1}{f'(x)}</math>.''
 
We may without loss of generality assume that <math>f'(x_0) > 0</math>. Given that <math>D</math> is an open set and <math>f'</math> is continuous at <math>x_0</math>, there exists <math>r > 0</math> such that <math>(x_0 - r, x_0 + r) \subseteq D</math> and<math display="block">|f'(x) - f'(x_0)| < \dfrac{f'(x_0)}{2} \qquad \text{for all } |x - x_0| < r.</math>
 
In particular,<math display="block">f'(x) > \dfrac{f'(x_0)}{2} >0 \qquad \text{for all } |x - x_0| < r.</math>
 
This shows that <math>f</math> is strictly increasing for all <math>|x - x_0| < r</math>. Let <math>\delta > 0</math> be such that <math>\delta < r</math>. Then <math>[x - \delta, x + \delta] \subseteq (x_0 - r, x_0 + r)</math>. By the intermediate value theorem, we find that <math>f</math> maps the interval <math>[x - \delta, x + \delta]</math> bijectively onto <math>[f(x - \delta), f(x + \delta)]</math>. Denote by <math>I = (x-\delta, x+\delta)</math> and <math>J = (f(x - \delta),f(x + \delta))</math>. Then <math>f: I \to J</math> is a bijection and the inverse <math>f^{-1}: J \to I</math> exists. The fact that <math>f^{-1}: J \to I</math> is differentiable follows from the differentiability of <math>f</math>. In particular, the result follows from the fact that if <math>f: I \to \R</math> is a strictly monotonic and continuous function that is differentiable at <math>x_0 \in I</math> with <math>f'(x_0) \ne 0</math>, then <math>f^{-1}: f(I) \to \R</math> is differentiable with <math>(f^{-1})'(y_0) = \dfrac{1}{f'(y_0)}</math>, where <math>y_0 = f(x_0)</math> (a standard result in analysis). This completes the proof.
 
=== A proof using successive approximation ===
Line 90 ⟶ 103:
To check that <math>g=f^{-1}</math> is C<sup>1</sup>, write <math>g(y+k) = x+h</math> so that
<math>f(x+h)=f(x)+k</math>. By the inequalities above, <math>\|h-k\| <\|h\|/2</math> so that <math>\|h\|/2<\|k\| < 2\|h\|</math>.
On the other hand, if <math>A=f^\prime(x)</math>, then <math>\|A-I\|<1/2</math>. Using the [[geometric series]] for <math>B=I-A</math>, it follows that <math>\|A^{-1}\| < 2</math>. But then
 
:<math> {\|g(y+k) -g(y) - f^\prime(g(y))^{-1}k \| \over \|k\|}
Line 104 ⟶ 117:
Here is a proof based on the [[contraction mapping theorem]]. Specifically, following T. Tao,<ref>Theorem 17.7.2 in {{cite book|mr=3310023|last1=Tao|first1=Terence|title=Analysis. II|edition=Third edition of 2006 original|series=Texts and Readings in Mathematics|volume=38|publisher=Hindustan Book Agency|___location=New Delhi|year=2014|isbn=978-93-80250-65-6|zbl=1300.26003}}</ref> it uses the following consequence of the contraction mapping theorem.
 
{{math_theorem|name=Lemma|math_statement=Let <math>B(0, r)</math> denote an open ball of radius ''r'' in <math>\mathbb{R}^n</math> with center 0. Ifand <math>g : B(0, r) \to \mathbb{R}^n</math> is a map such that <math>g(0) = 0</math> and there existswith a constant <math>0 < c < 1</math> such that
:<math>|g(y) - g(x)| \le c|y-x|</math>
for all <math>x, y</math> in <math>B(0, r)</math>,. Then thenfor <math>f = I + g</math> is injective on <math>B(0, r)</math>, andwe have
:<math>B(0, (1-c)r)|x - y| \subsetle |f(B(0, r)x) \subset- Bf(0y)|, (1+c)r)</math>.
in particular, ''f'' is injective. If, moreover, <math>g(0) = 0</math>, then
:<math>B(0, (1-c)r) \subset f(B(0, r)) \subset B(0, (1+c)r)</math>.
 
(More generally, the statement remains true if <math>\mathbb{R}^n</math> is replaced by a Banach space.) Also, the first part of the lemma is true for any normed space.}}
 
Basically, the lemma says that a small perturbation of the identity map by a contraction map is injective and preserves a ball in some sense. Assuming the lemma for a moment, we prove the theorem first. As in the above proof, it is enough to prove the special case when <math>a = 0, b = f(a) = 0</math> and <math>f'(0) = I</math>. Let <math>g = f - I</math>. The [[mean value inequality]] applied to <math>t \mapsto g(x + t(y - x))</math> says:
Line 122 ⟶ 138:
As <math>k \to 0</math>, we have <math>h \to 0</math> and <math>|h|/|k|</math> is bounded. Hence, <math>g</math> is differentiable at <math>y</math> with the derivative <math>g'(y) = f'(g(y))^{-1}</math>. Also, <math>g'</math> is the same as the composition <math>\iota \circ f' \circ g</math> where <math>\iota : T \mapsto T^{-1}</math>; so <math>g'</math> is continuous.
 
It remains to show the lemma. First, thewe map <math>f</math> is injective on <math>B(0, r)</math> since if <math>f(x) = f(y)</math>, then <math>g(y) - g(x) = x - y</math> and sohave:
:<math>|g(x - y| - |f(x) - f(y)| \le |g(x) - g(y)| =\le c|yx - xy|,</math>,
which is to say
which is a contradiction unless <math>y = x</math>. (This part does not need the assumption <math>g(0) = 0</math>.) Next we show <math>f(B(0, r)) \supset B(0, (1-c)r)</math>. The idea is to note that this is equivalent to, given a point <math>y</math> in <math>B(0, (1-c) r)</math>, find a fixed point of the map
:<math>(1 - c)|x - y| \le |f(x) - f(y)|.</math>
which is a contradiction unless <math>y = x</math>. (This part does not needproves the assumptionfirst <math>g(0) = 0</math>part.) Next, we show <math>f(B(0, r)) \supset B(0, (1-c)r)</math>. The idea is to note that this is equivalent to, given a point <math>y</math> in <math>B(0, (1-c) r)</math>, find a fixed point of the map
:<math>F : \overline{B}(0, r') \to \overline{B}(0, r'), \, x \mapsto y - g(x)</math>
where <math>0 < r' < r</math> such that <math>|y| \le (1-c)r'</math> and the bar means a closed ball. To find a fixed point, we use the contraction mapping theorem and checking that <math>F</math> is a well-defined strict-contraction mapping is straightforward. Finally, we have: <math>f(B(0, r)) \subset B(0, (1+c)r)</math> since
Line 146 ⟶ 164:
*given a map <math>f : \mathbb{R}^n \times \mathbb{R}^m \to \mathbb{R}^m</math>, if <math>f(a, b) = 0</math>, <math>f</math> is continuously differentiable in a neighborhood of <math>(a, b)</math> and the derivative of <math>y \mapsto f(a, y)</math> at <math>b</math> is invertible, then there exists a differentiable map <math>g : U \to V</math> for some neighborhoods <math>U, V</math> of <math>a, b</math> such that <math>f(x, g(x)) = 0</math>. Moreover, if <math>f(x, y) = 0, x \in U, y \in V</math>, then <math>y = g(x)</math>; i.e., <math>g(x)</math> is a unique solution.
To see this, consider the map <math>F(x, y) = (x, f(x, y))</math>. By the inverse function theorem, <math>F : U \times V \to W</math> has the inverse <math>G</math> for some neighborhoods <math>U, V, W</math>. We then have:
:<math>(x, y) = F(G_1(x, y), G_2(x, y)) = (G_1(x, y), f(G_1(x, y), G_2(x, y))),</math>
implying <math>x = G_1(x, y)</math> and <math>y = f(x, G_2(x, y)).</math> Thus <math>g(x) = G_2(x, 0)</math> has the required property. <math>\square</math>
 
Line 172 ⟶ 190:
The lemma implies the following (a sort of) global version of the inverse function theorem:
 
{{math_theorem|name=Inverse function theorem|math_statement=<ref>Ch. I., § 3, Exercise 10. and § 8, Exercise 14. in V. Guillemin, A. Pollack. "Differential Topology". Prentice-Hall Inc., 1974. ISBN 0-13-212605-2.</ref> Let <math>f : U \to V</math> be a map between open subsets of <math>\mathbb{R}^n, \mathbb{R}^m</math> or more generally of manifolds. Assume <math>f</math> is continuously differentiable (or is <math>C^k</math>). If <math>f</math> is injective on a closed subset <math>A \subset U</math> and if the Jacobian matrix of <math>f</math> is invertible at each point of <math>A</math>, then <math>f</math> is injective inon a neighborhood <math>A'</math> of <math>A</math> and <math>f^{-1} : f(A') \to A'</math> is continuously differentiable (or is <math>C^k</math>).}}
 
Note that if <math>A</math> is a point, then the above is the usual inverse function theorem.
Line 218 ⟶ 236:
 
=== Over a real closed field ===
The inverse function theorem also holds over a [[real closed field]] ''k'' (or an [[Oo-minimal structure]]).<ref>Chapter 7, Theorem 2.11. in {{cite book |doi=10.1017/CBO9780511525919|title=Tame Topology and O-minimal Structures. London Mathematical Society lecture note series, no. 248|year=1998 |last1=Dries |first1=L. P. D. van den |authorlink = Lou van den Dries|isbn=9780521598385|publisher=Cambridge University Press|___location=Cambridge, New York, and Oakleigh, Victoria }}</ref> Precisely, the theorem holds for a semialgebraic (or definable) map between open subsets of <math>k^n</math> that is continuously differentiable.
 
The usual proof of the IFT uses Banach's fixed point theorem, which relies on the Cauchy completeness. That part of the argument is replaced by the use of the [[extreme value theorem]], which does not need completeness. Explicitly, in {{section link||A_proof_using_the_contraction_mapping_principle}}, the Cauchy completeness is used only to establish the inclusion <math>B(0, r/2) \subset f(B(0, r))</math>. ThisHere, canwe be shownshall directly asshow follows<math>B(0, r/4) \subset f(B(0, r))</math> instead (which is enough). Given a point <math>y</math> in <math>B(0, r/24)</math>, consider the function <math>P(x) = |f(x) - y|^2</math> defined on a neighborhood of <math>\overline{B}(0, r)</math>. If <math>P'(x) = 0</math>, then <math>0 = P'(x) = 2[f_1(x) - y_1 \dotscdots f_n(x) - y_n]f'(x)</math> and so <math>f(x) = y</math>, since <math>f'(x)</math> is invertible. Now, by the extreme value theorem, <math>P</math> admits a minimal at some point <math>x_0</math> on the closed ball <math>\overline{B}(0, r)</math>, which can be shown to be lie in <math>B(0, r)</math> using <math>2^{-1}|x| \le |f(x)|</math>. Since <math>P'(x_0) = 0</math>, <math>f(x_0) = y</math>, which proves the claimed inclusion. <math>\square</math>
 
Alternatively, one can deduce the theorem from the one over real numbers by [[Tarski's principle]].{{citation needed|date=December 2024}}
The usual proof of the IFT uses Banach's fixed point theorem, which relies on the Cauchy completeness. That part of the argument is replaced by the use of the [[extreme value theorem]], which does not need completeness. Explicitly, in {{section link||A_proof_using_the_contraction_mapping_principle}}, the Cauchy completeness is used only to establish the inclusion <math>B(0, r/2) \subset f(B(0, r))</math>. This can be shown directly as follows. Given a point <math>y</math> in <math>B(0, r/2)</math>, consider the function <math>P(x) = |f(x) - y|^2</math> defined on <math>B(0, r)</math>. If <math>P'(x) = 0</math>, then <math>0 = P'(x) = 2[f_1(x) - y_1 \dots f_n(x) - y_n]f'(x)</math> and so <math>f(x) = y</math>, since <math>f'(x)</math> is invertible. Now, by the extreme value theorem, <math>P</math> admits a minimal at some point <math>x_0</math> on the closed ball <math>\overline{B}(0, r)</math>, which can be shown to be lie in <math>B(0, r)</math>. Since <math>P'(x_0) = 0</math>, <math>f(x_0) = y</math>, which proves the claimed inclusion. <math>\square</math>
 
==See also==