Content deleted Content added
TakuyaMurata (talk | contribs) |
→Over a real closed field: add disambiguating chapter no. to reference |
||
(39 intermediate revisions by 13 users not shown) | |||
Line 2:
{{Use dmy dates|date=December 2023}}
{{Calculus}}
In [[
In [[multivariable calculus]], this theorem can be generalized to any [[continuously differentiable]], [[vector-valued function]] whose [[Jacobian determinant]] is nonzero at a point in its ___domain, giving a formula for the [[Jacobian matrix]] of the inverse. There are also versions of the inverse function theorem for [[complex numbers|complex]] [[holomorphic function]]s, for differentiable maps between [[manifold]]s, for differentiable functions between [[Banach space]]s, and so forth.▼
The theorem applies verbatim to [[complex-valued function]]s of a [[complex number|complex variable]]. It generalizes to functions from
''n''-[[tuples]] (of real or complex numbers) to ''n''-tuples, and to functions between [[vector space]]s of the same finite dimension, by replacing "derivative" with "[[Jacobian matrix]]" and "nonzero derivative" with "nonzero [[Jacobian determinant]]".
▲
The theorem was first established by [[Émile Picard|Picard]] and [[Édouard Goursat|Goursat]] using an iterative scheme: the basic idea is to prove a [[fixed point theorem]] using the [[contraction mapping theorem]].
Line 9 ⟶ 13:
==Statements==
For functions of a single [[Variable (mathematics)|variable]], the theorem states that if <math>f</math> is a [[continuously differentiable]] function with nonzero derivative at the point <math>a</math>; then <math>f</math> is injective (or bijective onto the image) in a neighborhood of <math>a</math>, the inverse is continuously differentiable near <math>b=f(a)</math>, and the derivative of the inverse function at <math>b</math> is the reciprocal of the derivative of <math>f</math> at <math>a</math>:
<math display=block>\bigl(f^{-1}\bigr)'(b) = \frac{1}{f'(a)} = \frac{1}{f'(f^{-1}(b))}.</math><!-- Not sure the meaning of the following alternative version; if the function is already injective, the theorem gives nothing: An alternate version, which assumes that <math>f</math> is [[Continuous function|continuous]] and [[Locally injective function|injective near {{Mvar|a}}]], and differentiable at {{Mvar|a}} with a non-zero derivative, will also result in <math>f</math> being invertible near {{Mvar|a}}, with an inverse that's similarly continuous and [[Injective function|injective]]
It can happen that a function <math>f</math> may be injective near a point <math>a</math> while <math>f'(a) = 0</math>. An example is <math>f(x) = (x - a)^3</math>. In fact, for such a function, the inverse cannot be differentiable at <math>b = f(a)</math>, since if <math>f^{-1}</math> were differentiable at <math>b</math>, then, by the chain rule, <math>1 = (f^{-1} \circ f)'(a) = (f^{-1})'(b)f'(a)</math>, which implies <math>f'(a) \ne 0</math>. (The situation is different for holomorphic functions; see [[#Holomorphic inverse function theorem]] below.)
Line 23 ⟶ 27:
There are two variants of the inverse function theorem.<ref name="Hörmander" /> Given a continuously differentiable map <math>f : U \to \mathbb{R}^m</math>, the first is
*The derivative <math>f'(a)</math> is surjective (i.e., the Jacobian matrix representing it has rank <math>m</math>) if and only if there exists a continuously differentiable function <math>g</math> on a neighborhood <math>V</math> of <math>b = f(a)</math> such that <math>f \circ g = I</math> near <math>b</math>,
and the second is
*The derivative <math>f'(a)</math> is injective if and only if there exists a continuously differentiable function <math>g</math> on a neighborhood <math>V</math> of <math>b = f(a)</math> such that <math>g \circ f = I</math> near <math>a</math>.
In the first case (when <math>f'(a)</math> is surjective), the point <math>b = f(a)</math> is called a [[regular value]]. Since <math>m = \dim \ker(f'(a)) + \dim \operatorname{im}(f'(a))</math>, the first case is equivalent to saying <math>b = f(a)</math> is not in the image of [[
These variants are restatements of the inverse functions theorem. Indeed, in the first case when <math>f'(a)</math> is surjective, we can find an (injective) linear map <math>T</math> such that <math>f'(a) \circ T = I</math>. Define <math>h(x) = a + Tx</math> so that we have:
Line 42 ⟶ 46:
\end{bmatrix}.
</math>
The Jacobian matrix of it at <math>(x, y)</math> is:
:<math>
\begin{bmatrix}
{e^x \cos y} & {-e^x \sin y}\\
Line 50 ⟶ 54:
\end{bmatrix}
</math>
with
:<math>
\det
e^{2x} \cos^2 y + e^{2x} \sin^2 y=
e^{2x}.
Line 69 ⟶ 73:
Since the fixed point theorem applies in infinite-dimensional (Banach space) settings, this proof generalizes immediately to the infinite-dimensional version of the inverse function theorem<ref>{{Cite web|url=https://r-grande.github.io/Expository/Inverse%20Function%20Theorem.pdf |title=Inverse Function Theorem|last=Jaffe|first=Ethan}}</ref> (see [[Inverse function theorem#Generalizations|Generalizations]] below).
An alternate proof in finite dimensions hinges on the [[extreme value theorem]] for functions on a [[compact set]].<ref name="spivak_manifolds">{{harvnb|Spivak|1965|loc=pages 31–35 }}</ref> This approach has an advantage that the proof generalizes to a situation where there is no Cauchy completeness (see {{section link||Over_a_real_closed_field}}).
Yet another proof uses [[Newton's method]], which has the advantage of providing an [[effective method|effective version]] of the theorem: bounds on the derivative of the function imply an estimate of the size of the neighborhood on which the function is invertible.<ref name="hubbard_hubbard">{{cite book |first1=John H. |last1=Hubbard |author-link=John H. Hubbard |first2=Barbara Burke |last2=Hubbard|author2-link=Barbara Burke Hubbard |title=Vector Analysis, Linear Algebra, and Differential Forms: A Unified Approach |edition=Matrix |year=2001 }}</ref>
=== Proof for single-variable functions ===
We want to prove the following: ''Let <math>D \subseteq \R</math> be an open set with <math>x_0 \in D, f: D \to \R</math> a continuously differentiable function defined on <math>D</math>, and suppose that <math>f'(x_0) \ne 0</math>. Then there exists an open interval <math>I</math> with <math>x_0 \in I</math> such that <math>f</math> maps <math>I</math> bijectively onto the open interval <math>J = f(I)</math>, and such that the inverse function <math>f^{-1} : J \to I</math> is continuously differentiable, and for any <math>y \in J</math>, if <math>x \in I</math> is such that <math>f(x) = y</math>, then <math>(f^{-1})'(y) = \dfrac{1}{f'(x)}</math>.''
We may without loss of generality assume that <math>f'(x_0) > 0</math>. Given that <math>D</math> is an open set and <math>f'</math> is continuous at <math>x_0</math>, there exists <math>r > 0</math> such that <math>(x_0 - r, x_0 + r) \subseteq D</math> and<math display="block">|f'(x) - f'(x_0)| < \dfrac{f'(x_0)}{2} \qquad \text{for all } |x - x_0| < r.</math>
In particular,<math display="block">f'(x) > \dfrac{f'(x_0)}{2} >0 \qquad \text{for all } |x - x_0| < r.</math>
This shows that <math>f</math> is strictly increasing for all <math>|x - x_0| < r</math>. Let <math>\delta > 0</math> be such that <math>\delta < r</math>. Then <math>[x - \delta, x + \delta] \subseteq (x_0 - r, x_0 + r)</math>. By the intermediate value theorem, we find that <math>f</math> maps the interval <math>[x - \delta, x + \delta]</math> bijectively onto <math>[f(x - \delta), f(x + \delta)]</math>. Denote by <math>I = (x-\delta, x+\delta)</math> and <math>J = (f(x - \delta),f(x + \delta))</math>. Then <math>f: I \to J</math> is a bijection and the inverse <math>f^{-1}: J \to I</math> exists. The fact that <math>f^{-1}: J \to I</math> is differentiable follows from the differentiability of <math>f</math>. In particular, the result follows from the fact that if <math>f: I \to \R</math> is a strictly monotonic and continuous function that is differentiable at <math>x_0 \in I</math> with <math>f'(x_0) \ne 0</math>, then <math>f^{-1}: f(I) \to \R</math> is differentiable with <math>(f^{-1})'(y_0) = \dfrac{1}{f'(y_0)}</math>, where <math>y_0 = f(x_0)</math> (a standard result in analysis). This completes the proof.
=== A proof using successive approximation ===
Line 90 ⟶ 103:
To check that <math>g=f^{-1}</math> is C<sup>1</sup>, write <math>g(y+k) = x+h</math> so that
<math>f(x+h)=f(x)+k</math>. By the inequalities above, <math>\|h-k\| <\|h\|/2</math> so that <math>\|h\|/2<\|k\| < 2\|h\|</math>.
On the other hand, if <math>A=f^\prime(x)</math>, then <math>\|A-I\|<1/2</math>. Using the [[geometric series]] for <math>B=I-A</math>, it follows that <math>\|A^{-1}\| < 2</math>. But then
:<math> {\|g(y+k) -g(y) - f^\prime(g(y))^{-1}k \| \over \|k\|}
Line 104 ⟶ 117:
Here is a proof based on the [[contraction mapping theorem]]. Specifically, following T. Tao,<ref>Theorem 17.7.2 in {{cite book|mr=3310023|last1=Tao|first1=Terence|title=Analysis. II|edition=Third edition of 2006 original|series=Texts and Readings in Mathematics|volume=38|publisher=Hindustan Book Agency|___location=New Delhi|year=2014|isbn=978-93-80250-65-6|zbl=1300.26003}}</ref> it uses the following consequence of the contraction mapping theorem.
{{math_theorem|name=Lemma|math_statement=Let <math>B(0, r)</math> denote an open ball of radius ''r'' in <math>\mathbb{R}^n</math> with center 0
:<math>|g(y) - g(x)| \le c|y-x|</math>
for all <math>x, y</math> in <math>B(0, r)</math>
:<math> in particular, ''f'' is injective. If, moreover, <math>g(0) = 0</math>, then
:<math>B(0, (1-c)r) \subset f(B(0, r)) \subset B(0, (1+c)r)</math>.
Basically, the lemma says that a small perturbation of the identity map by a contraction map is injective and preserves a ball in some sense. Assuming the lemma for a moment, we prove the theorem first. As in the above proof, it is enough to prove the special case when <math>a = 0, b = f(a) = 0</math> and <math>f'(0) = I</math>. Let <math>g = f - I</math>. The [[mean value inequality]] applied to <math>t \mapsto g(x + t(y - x))</math> says:
Line 122 ⟶ 138:
As <math>k \to 0</math>, we have <math>h \to 0</math> and <math>|h|/|k|</math> is bounded. Hence, <math>g</math> is differentiable at <math>y</math> with the derivative <math>g'(y) = f'(g(y))^{-1}</math>. Also, <math>g'</math> is the same as the composition <math>\iota \circ f' \circ g</math> where <math>\iota : T \mapsto T^{-1}</math>; so <math>g'</math> is continuous.
It remains to show the lemma. First,
:<math>|
which is to say
which is a contradiction unless <math>y = x</math>. (This part does not need the assumption <math>g(0) = 0</math>.) Next we show <math>f(B(0, r)) \supset B(0, (1-c)r)</math>. The idea is to note that this is equivalent to, given a point <math>y</math> in <math>B(0, (1-c) r)</math>, find a fixed point of the map▼
:<math>(1 - c)|x - y| \le |f(x) - f(y)|.</math>
▲
:<math>F : \overline{B}(0, r') \to \overline{B}(0, r'), \, x \mapsto y - g(x)</math>
where <math>0 < r' < r</math> such that <math>|y| \le (1-c)r'</math> and the bar means a closed ball. To find a fixed point, we use the contraction mapping theorem and checking that <math>F</math> is a well-defined strict-contraction mapping is straightforward. Finally, we have: <math>f(B(0, r)) \subset B(0, (1+c)r)</math> since
Line 146 ⟶ 164:
*given a map <math>f : \mathbb{R}^n \times \mathbb{R}^m \to \mathbb{R}^m</math>, if <math>f(a, b) = 0</math>, <math>f</math> is continuously differentiable in a neighborhood of <math>(a, b)</math> and the derivative of <math>y \mapsto f(a, y)</math> at <math>b</math> is invertible, then there exists a differentiable map <math>g : U \to V</math> for some neighborhoods <math>U, V</math> of <math>a, b</math> such that <math>f(x, g(x)) = 0</math>. Moreover, if <math>f(x, y) = 0, x \in U, y \in V</math>, then <math>y = g(x)</math>; i.e., <math>g(x)</math> is a unique solution.
To see this, consider the map <math>F(x, y) = (x, f(x, y))</math>. By the inverse function theorem, <math>F : U \times V \to W</math> has the inverse <math>G</math> for some neighborhoods <math>U, V, W</math>. We then have:
:<math>(x, y) = F(G_1(x, y), G_2(x, y)) = (G_1(x, y), f(G_1(x, y), G_2(x, y))),</math>
implying <math>x = G_1(x, y)</math> and <math>y = f(x, G_2(x, y)).</math> Thus <math>g(x) = G_2(x, 0)</math> has the required property. <math>\square</math>
Line 172 ⟶ 190:
The lemma implies the following (a sort of) global version of the inverse function theorem:
{{math_theorem|name=Inverse function theorem|math_statement=<ref>Ch. I., § 3, Exercise 10. and § 8, Exercise 14. in V. Guillemin, A. Pollack. "Differential Topology". Prentice-Hall Inc., 1974. ISBN 0-13-212605-2.</ref> Let <math>f : U \to V</math> be a map between open subsets of <math>\mathbb{R}^n
Note that if <math>A</math> is a point, then the above is the usual inverse function theorem.
Line 218 ⟶ 236:
=== Over a real closed field ===
The inverse function theorem also holds over a [[real closed field]] ''k'' (or
The usual proof of the IFT uses Banach's fixed point theorem, which relies on the Cauchy completeness. That part of the argument is replaced by the use of the [[extreme value theorem]], which does not need completeness. Explicitly, in
Alternatively, one can deduce the theorem from the one over real numbers by [[Tarski's principle]].{{citation needed|date=December 2024}}
▲The usual proof of the IFT uses Banach's fixed point theorem, which relies on the Cauchy completeness. That part of the argument is replaced by the use of the [[extreme value theorem]], which does not need completeness. Explicitly, in the proof {{section link||A_proof_using_the_contraction_mapping_principle}}, the Cauchy completeness is used only to establish the inclusion <math>B(0, r/2) \subset f(B(0, r))</math>. This can be shown directly as follows. Given a point <math>y</math> in <math>B(0, r/2)</math>, consider the function <math>P(x) = |f(x) - y|^2</math> defined on <math>B(0, r)</math>. If <math>P'(x) = 0</math>, then <math>0 = P'(x) = 2f'(x)(f(x) - y)</math> and so <math>f(x) = y</math>, since <math>f'(x)</math> is invertible. Now, by the extreme value theorem, <math>P</math> admits a minimal at some point <math>x_0</math> on the closed ball <math>\overline{B}(0, r)</math>, which can be shown to be lie in <math>B(0, r)</math>. Since <math>P'(x_0) = 0</math>, <math>f(x_0) = y</math>, which proves the claimed inclusion. <math>\square</math>
==See also==
|