Continuous mapping theorem: Difference between revisions

Content deleted Content added
Monkbot (talk | contribs)
 
(47 intermediate revisions by 24 users not shown)
Line 1:
{{Short description|Probability theorem}}
In [[probability theory]], the '''continuous mapping theorem''' states that continuous functions are [[Continuous_function#Heine_definition_of_continuity|limit-preserving]] even if their arguments are sequences of random variables. A continuous function, in [[Continuous_function#Heine_definition_of_continuity|Heine’s definition]], is such a function that maps convergent sequences into convergent sequences: if ''x<sub>n</sub>'' → ''x'' then ''g''(''x<sub>n</sub>'') → ''g''(''x''). The ''continuous mapping theorem'' states that this will also be true if we replace the deterministic sequence {''x<sub>n</sub>''} with a sequence of random variables {''X<sub>n</sub>''}, and replace the standard notion of convergence of real numbers “→” with one of the types of [[convergence of random variables]].
{{Distinguish|text=the [[contraction mapping theorem]]}}
In [[probability theory]], the '''continuous mapping theorem''' states that continuous functions [[Continuous function#Heine definition of continuity|preserve limits]] even if their arguments are sequences of random variables. A continuous function, in [[Continuous function#Heine definition of continuity|Heine's definition]], is such a function that maps convergent sequences into convergent sequences: if ''x<sub>n</sub>'' → ''x'' then ''g''(''x<sub>n</sub>'') → ''g''(''x''). The ''continuous mapping theorem'' states that this will also be true if we replace the deterministic sequence {''x<sub>n</sub>''} with a sequence of random variables {''X<sub>n</sub>''}, and replace the standard notion of convergence of real numbers “→” with one of the types of [[convergence of random variables]].
 
This theorem was first proved by [[Henry Mann]] and [[Abraham Wald]] in 1943,<ref>{{cite journal | doi = 10.1214/aoms/1177731415 | last1 = Mann |first1=H. B. | last2=Wald |first2=A. | year = 1943 | title = On Stochastic Limit and Order Relationships | journal = [[Annals of Mathematical Statistics]] | volume = 14 | issue = 3 | pages = 217–226 | jstor = 2235800 | doi-access = free }}</ref> and it is therefore sometimes called the '''Mann–Wald theorem'''.<ref>{{cite book | last = Amemiya | first = Takeshi | author-link = Takeshi Amemiya | year = 1985 | title = Advanced Econometrics | publisher = Harvard University Press | ___location = Cambridge, MA | isbn = 0-674-00560-0 | url = https://books.google.com/books?id=0bzGQE14CwEC&pg=pA88 |page=88 }}</ref> Meanwhile, [[Denis Sargan]] refers to it as the '''general transformation theorem'''.<ref>{{cite book |first=Denis |last=Sargan |title=Lectures on Advanced Econometric Theory |___location=Oxford |publisher=Basil Blackwell |year=1988 |isbn=0-631-14956-2 |pages=4–8 }}</ref>
This theorem was first proved by {{harv|Mann|Wald|1943}}, and it is therefore sometimes called the '''Mann–Wald theorem'''.<ref>{{harvnb|Amemiya|1985|page=88}}</ref>
 
==Statement==
Let {''X<sub>n</sub>''}, ''X'' be [[random element]]s defined on a [[metric space]] ''S''. Suppose a function {{nowrap|''g'': ''S''→''S′''}} (where ''S′'' is another metric space) has the set of [[Discontinuity (mathematics)|discontinuity points]] ''D<sub>g</sub>'' such that {{nowrap|1=Pr[''X''&thinsp;∈&thinsp; ∈ ''D<sub>g</sub>'']  =  0}}. Then<ref>{{harvnbcite book |Van derlast = Billingsley Vaart|1998 first = Patrick |loc author-link =Theorem 2.3,Patrick pageBillingsley 7}}</ref><ref>{{harvnb|Billingsley title = Convergence of Probability Measures | year = 1969 | publisher = John Wiley & Sons | isbn = 0-471-07242-7|page=31, (Corollary 1) }}</ref><ref>{{harvnbcite book |Billingsley last = van der Vaart |1999 first = A. W. | title = Asymptotic Statistics | year = 1998 | publisher = Cambridge University Press | ___location = New York | isbn = 0-521-49603-9 | url =https://books.google.com/books?id=UEuQEM5RjWgC&pg=PA7 |page=21,7 (Theorem 2.73) }}</ref>
 
# <math>X_n \ \xrightarrow{d}\ X \quad\Rightarrow\quad g(X_n)\ \xrightarrow{d}\ g(X);</math>
: <math>
# <math>X_n \ \xrightarrow{p}\ X \quad\Rightarrow\quad g(X_n)\ \xrightarrow{p}\ g(X);</math>
\begin{align}
# <math>X_n \ \xrightarrow{\!\!as\!\!}\ X \quad\Rightarrow\quad g(X_n)\ \xrightarrow{\!\!as\!\!}\ g(X).</math>
X_n \ \xrightarrow\text{d}\ X \quad & \Rightarrow\quad g(X_n)\ \xrightarrow\text{d}\ g(X); \\[6pt]
X_n \ \xrightarrow\text{p}\ X \quad & \Rightarrow\quad g(X_n)\ \xrightarrow\text{p}\ g(X); \\[6pt]
X_n \ \xrightarrow{\!\!\text{a.s.}\!\!}\ X \quad & \Rightarrow\quad g(X_n)\ \xrightarrow{\!\!\text{a.s.}\!\!}\ g(X).
\end{align}
</math>
where the superscripts, "d", "p", and "a.s." denote [[convergence in distribution]], [[convergence in probability]], and [[almost sure convergence]] respectively.
 
==Proof==
<div style="NO-align:right"><small>This proof has been adopted from {{harv|van der Vaart|1998|loc=Theorem 2.3}}</small></div>
 
Spaces ''S'' and ''S′'' are equipped with certain metrics. For simplicity we will denote both of these metrics using the |x−y''x''&nbsp;−&nbsp;''y''| notation, even though the metrics may be arbitrary and not necessarily Euclidean.
 
===Convergence in distribution===
We will need a particular statement from the [[portmanteau theorem]]: that convergence in distribution <math>X_n\xrightarrow{d}X</math> is equivalent to
: <math> \limsup_{n\to\infty}\operatorname{Pr}mathbb E f(X_n) \in F)to \leqmathbb \operatorname{Pr}E f(X\in F) \text{</math> for every closedbounded setcontinuous }functional F''f''.</math>
 
So it suffices to prove that <math> \mathbb E f(g(X_n)) \to \mathbb E f(g(X))</math> for every bounded continuous functional ''f''. For simplicity we assume ''g'' continuous. Note that <math> F = f \circ g</math> is itself a bounded continuous functional. And so the claim follows from the statement above. The general case is slightly more technical.
Fix an arbitrary closed set ''F''⊂''S′''. Denote by ''g''<sup>−1</sup>(''F'') the pre-image of ''F'' under the mapping ''g'': the set of all points ''x''∈''S'' such that ''g''(''x'')∈''F''. Consider a sequence {''x<sub>k</sub>''} such that ''g''(''x<sub>k</sub>'')∈''F'' and ''x<sub>k</sub>''→''x''. Then this sequence lies in ''g''<sup>−1</sup>(''F''), and its limit point ''x'' belongs to the [[closure (topology)|closure]] of this set, <span style="text-decoration:overline">''g''<sup>−1</sup>(''F'')</span> (by definition of the closure). The point ''x'' may be either:
* a continuity point of ''g'', in which case ''g''(''x<sub>k</sub>'')→''g''(''x''), and hence ''g''(''x'')∈''F'' because ''F'' is a closed set, and therefore in this case ''x'' belongs to the pre-image of ''F'', or
* a discontinuity point of ''g'', so that ''x''∈''D<sub>g</sub>''.
Thus the following relationship holds:
: <math>
\overline{g^{-1}(F)} \ \subset\ g^{-1}(F) \cup D_g\ .
</math>
 
Consider the event {''g''(''X<sub>n</sub>'')∈''F''}. The probability of this event can be estimated as
: <math>
\operatorname{Pr}\big(g(X_n)\in F\big) = \operatorname{Pr}\big(X_n\in g^{-1}(F)\big) \leq \operatorname{Pr}\big(X_n\in \overline{g^{-1}(F)}\big),
</math>
and by the portmanteau theorem the [[limsup]] of the last expression is less than or equal to Pr(''X''∈<span style="text-decoration:overline">''g''<sup>−1</sup>(''F'')</span>). Using the formula we derived in the previous paragraph, this can be written as
: <math>\begin{align}
& \operatorname{Pr}\big(X\in \overline{g^{-1}(F)}\big) \leq
\operatorname{Pr}\big(X\in g^{-1}(F)\cup D_g\big) \leq \\
& \operatorname{Pr}\big(X \in g^{-1}(F)\big) + \operatorname{Pr}(X\in D_g) =
\operatorname{Pr}\big(g(X) \in F\big) + 0.
\end{align}</math>
 
On plugging this back into the original expression, it can be seen that
: <math>
\limsup_{n\to\infty} \operatorname{Pr}\big(g(X_n)\in F\big) \leq \operatorname{Pr}\big(g(X) \in F\big),
</math>
which, by the portmanteau theorem, implies that ''g''(''X<sub>n</sub>'') converges to ''g''(''X'') in distribution.
 
===Convergence in probability===
Fix an arbitrary ''ε''&nbsp;>&nbsp;0. Then for any ''δ''&nbsp;>&nbsp;0 consider the set ''B<sub>δ</sub>'' defined as
: <math>
B_\delta = \big\{x\in S\ \big|\mid x\notin D_g:\ \exists y\in S:\ |x-y|<\delta,\, |g(x)-g(y)|>\varepsilon\big\}.
</math>
This is the set of continuity points ''x'' of the function ''g''(·) for which it is possible to find, within the ''δ''-neighborhood of ''x'', a point which maps outside the ''ε''-neighborhood of ''g''(''x''). By definition of continuity, this set shrinks as ''δ''  goes to zero, so that lim<sub>''δ''→0&nbsp;→&nbsp;0</sub>''B<sub>δ</sub>'' &nbsp;= &nbsp;∅.
 
Now suppose that |''g''(''X'') &nbsp; &nbsp;''g''(''X<sub>n</sub>'')| &nbsp;> &nbsp;''ε''. This implies that at least one of the following is true: either |''X''−''X<sub>n</sub>''|&nbsp;&nbsp;''δ'', or ''X''&nbsp;&nbsp;''D<sub>g</sub>'', or ''X''∈''B<sub>δ</sub>''. In terms of probabilities this can be written as
: <math>
\operatorname{Pr}\big(\big|g(X_n)-g(X)\big|>\varepsilon\big) \leq
\operatorname{Pr}\big(|X_n-X|\geq\delta\big) + \operatorname{Pr}(X\in B_\delta) + \operatorname{Pr}(X\in D_g).
</math>
 
On the right-hand side, the first term converges to zero as ''n'' &nbsp; &nbsp;∞ for any fixed ''δ'', by the definition of convergence in probability of the sequence {''X<sub>n</sub>''}. The second term converges to zero as ''δ'' &nbsp; &nbsp;0, since the set ''B<sub>δ</sub>'' shrinks to an empty set. And the last term is identically equal to zero by assumption of the theorem. Therefore, the conclusion is that
: <math>
\lim_{n\to\infty}\operatorname{Pr} \big(\big|g(X_n)-g(X)\big|>\varepsilon\big) = 0,
</math>
which means that ''g''(''X<sub>n</sub>'') converges to ''g''(''X'') in probability.
 
=== Almost sure convergence ===
===Convergence almost surely===
By definition of the continuity of the function ''g''(·),
: <math>
\lim_{n\to\infty}X_n(\omega) = X(\omega) \quad\Rightarrow\quad \lim_{n\to\infty}g(X_n(\omega)) = g(X(\omega))
</math>
at each point ''X''(''ω'') where ''g''(·) is continuous. Therefore,
: <math>\begin{align}
\operatorname{Pr}\Bigleft(\lim_{n\to\infty}g(X_n) = g(X)\Bigright)
&\geq \operatorname{Pr}\Bigleft(\lim_{n\to\infty}g(X_n) = g(X),\ X\notin D_g\Bigright) \\
&\geq \operatorname{Pr}\Bigleft(\lim_{n\to\infty}X_n = X,\ X\notin D_g\Bigright) \\ = 1,
&\geq \operatorname{Pr}\Big(\lim_{n\to\infty}X_n = X\Big) - \operatorname{Pr}(X\in D_g) = 1-0 = 1.
\end{align}</math>
because the intersection of two almost sure events is almost sure.
 
By definition, we conclude that ''g''(''X<sub>n</sub>'') converges to ''g''(''X'') almost surely.
 
==See also==
* [[Slutsky’sSlutsky's theorem]]
* [[Portmanteau theorem]]
* [[Pushforward measure]]
 
==References==
===Literature===
{{refbegin}}
* {{cite book
| last = Amemiya
| first = Takeshi
| year = 1985
| title = Advanced Econometrics
| publisher = Harvard University Press
| ___location = Cambridge, MA
| isbn = 0-674-00560-0
| lccn = HB139.A54 1985
}}
* {{cite book
| last = Billingsley
| first = Patrick
| title = Convergence of Probability Measures
| year = 1969
| publisher = John Wiley & Sons
| isbn = 0-471-07242-7
}}
* {{cite book
| last = Billingsley
| first = Patrick
| title = Convergence of Probability Measures
| year = 1999
| publisher = John Wiley & Sons
| edition = 2nd
| isbn = 0-471-19745-9
}}
* {{cite journal
| doi = 10.1214/aoms/1177731415
| author = Mann, H.B.
|author2=Wald, A.
| year = 1943
| title = On stochastic limit and order relationships
| journal = The Annals of Mathematical Statistics
| volume = 14
| issue = 3
| pages = 217–226
| jstor = 2235800
| ref = CITEREFMannWald1943
}}
* {{cite book
| last = Van der Vaart
| first = A. W.
| title = Asymptotic statistics
| year = 1998
| publisher = Cambridge University Press
| ___location = New York
| isbn = 978-0-521-49603-2
| lccn = QA276 .V22 1998
| ref = CITEREFvan_der_Vaart1998
}}
{{refend}}
 
===Notes===
{{reflist}}
 
[[Category:ProbabilityTheorems theoremsin probability theory]]
[[Category:StatisticalTheorems theoremsin statistics]]
[[Category:Articles containing proofs]]