Continuous mapping theorem: Difference between revisions

Content deleted Content added
intro text; general cleanup
 
(87 intermediate revisions by 40 users not shown)
Line 1:
{{Short description|Probability theorem}}
In [[probability theory]], '''continuous mapping theorem''' states that continuous functions retain their [[Continuous_function#Heine_definition_of_continuity|limit-preserving]] properties in the scope of the [[convergence of random variables]]. A continuous function, in Heine’s definition, is such a function which maps convergent sequences into convergent sequences: if ''x<sub>n</sub>'' → ''x'' then ''g(x<sub>n</sub>)'' → ''g(x)''. The continuous mapping theorem states that this will also be true if we replace deterministic sequence {''x<sub>n</sub>''} with a sequence of random variables {''X<sub>n</sub>''}, and the standard notion of convergence of real numbers “→” with one of the types of [[convergence of random variables]].
{{Distinguish|text=the [[contraction mapping theorem]]}}
In [[probability theory]], the '''continuous mapping theorem''' states that continuous functions [[Continuous function#Heine definition of continuity|preserve limits]] even if their arguments are sequences of random variables. A continuous function, in [[Continuous function#Heine definition of continuity|Heine's definition]], is such a function that maps convergent sequences into convergent sequences: if ''x<sub>n</sub>'' → ''x'' then ''g''(''x<sub>n</sub>'') → ''g''(''x''). The ''continuous mapping theorem'' states that this will also be true if we replace the deterministic sequence {''x<sub>n</sub>''} with a sequence of random variables {''X<sub>n</sub>''}, and replace the standard notion of convergence of real numbers “→” with one of the types of [[convergence of random variables]].
 
This theorem was first proved by [[Henry Mann]] and [[Abraham Wald]] in 1943,<ref>{{cite journal | doi = 10.1214/aoms/1177731415 | last1 = Mann |first1=H. B. | last2=Wald |first2=A. | year = 1943 | title = On Stochastic Limit and Order Relationships | journal = [[Annals of Mathematical Statistics]] | volume = 14 | issue = 3 | pages = 217–226 | jstor = 2235800 | doi-access = free }}</ref> and it is therefore sometimes called the '''Mann–Wald theorem'''.<ref>{{cite book | last = Amemiya | first = Takeshi | author-link = Takeshi Amemiya | year = 1985 | title = Advanced Econometrics | publisher = Harvard University Press | ___location = Cambridge, MA | isbn = 0-674-00560-0 | url = https://books.google.com/books?id=0bzGQE14CwEC&pg=pA88 |page=88 }}</ref> Meanwhile, [[Denis Sargan]] refers to it as the '''general transformation theorem'''.<ref>{{cite book |first=Denis |last=Sargan |title=Lectures on Advanced Econometric Theory |___location=Oxford |publisher=Basil Blackwell |year=1988 |isbn=0-631-14956-2 |pages=4–8 }}</ref>
This theorem was first proved by {{harv|Mann|Wald|1943}}, and therefore sometimes is called the '''Mann–Wald theorem'''.<ref>{{harvnb|Amemiya|1985|page=88}}</ref>
 
==Statement==
Let {''X<sub>n</sub>''}, ''X'' be [[random element]]s defined on a [[metric space]] ''S''. Suppose a function {{nowrap|''g'': ''S''→''S′''}} (where ''S′'' is another metric space) has the set of [[Discontinuity (mathematics)|discontinuity points]] ''D<sub>g</sub>'' such that P{{nowrap|1=Pr[''X'' ∈ ''D<sub>g</sub>'']  =  0}}. Then <ref>{{harvnbcite book |van derlast = Billingsley Vaart|1998|loc first =Theorem 2.3}}</ref><ref>{{harvnbPatrick | author-link = Patrick Billingsley | title = Convergence of Probability Measures | year = 1969 | publisher = John Wiley & Sons | isbn = 0-471-07242-7|page=31, (Corollary 1) }}</ref><ref>{{harvnbcite book |Billingsley last = van der Vaart |1999 first = A. W. | title = Asymptotic Statistics | year = 1998 | publisher = Cambridge University Press | ___location = New York | isbn = 0-521-49603-9 | url =https://books.google.com/books?id=UEuQEM5RjWgC&pg=PA7 |page=21,7 (Theorem 2.73) }}</ref>
# <math>X_n \ \xrightarrow{d}\ X \quad\Rightarrow\quad g(X_n)\ \xrightarrow{d}\ g(X);</math>
# <math>X_n \ \xrightarrow{p}\ X \quad\Rightarrow\quad g(X_n)\ \xrightarrow{p}\ g(X);</math>
# <math>X_n \ \xrightarrow{\!\!as\!\!}\ X \quad\Rightarrow\quad g(X_n)\ \xrightarrow{\!\!as\!\!}\ g(X).</math>
 
: <math>
\begin{align}
X_n \ \xrightarrow\text{d}\ X \quad & \Rightarrow\quad g(X_n)\ \xrightarrow\text{d}\ g(X); \\[6pt]
X_n \ \xrightarrow\text{p}\ X \quad & \Rightarrow\quad g(X_n)\ \xrightarrow\text{p}\ g(X); \\[6pt]
X_n \ \xrightarrow{\!\!\text{a.s.}\!\!}\ X \quad & \Rightarrow\quad g(X_n)\ \xrightarrow{\!\!\text{a.s.}\!\!}\ g(X).
\end{align}
</math>
where the superscripts, "d", "p", and "a.s." denote [[convergence in distribution]], [[convergence in probability]], and [[almost sure convergence]] respectively.
 
==Proof==
<div style="NO-align:right"><small>This proof has been adopted from {{harv|van der Vaart|1998|loc=Theorem 2.3}}</small></div>
 
Spaces ''S'' and ''S′'' are equipped with certain metrics. For simplicity we will denote both of these metrics using the |''x''&nbsp;−&nbsp;''y''| notation, even though the metrics may be arbitrary and not necessarily Euclidean.
 
===Convergence in distribution===
We will need a particular statement from the [[portmanteau theorem]]: that convergence in distribution <math>X_n\xrightarrow{d}X</math> is equivalent to
: <math> \mathbb E f(X_n) \to \mathbb E f(X)</math> for every bounded continuous functional ''f''.
 
So it suffices to prove that <math> \mathbb E f(g(X_n)) \to \mathbb E f(g(X))</math> for every bounded continuous functional ''f''. For simplicity we assume ''g'' continuous. Note that <math> F = f \circ g</math> is itself a bounded continuous functional. And so the claim follows from the statement above. The general case is slightly more technical.
 
===Convergence in probability===
Fix an arbitrary ''ε''&nbsp;>&nbsp;0. Then for any ''δ''&nbsp;>&nbsp;0 consider the set ''B<sub>δ</sub>'' defined as
: <math>
B_\delta = \big\{x\in S \mid x\notin D_g:\ \exists y\in S:\ |x-y|<\delta,\, |g(x)-g(y)|>\varepsilon\big\}.
</math>
This is the set of continuity points ''x'' of the function ''g''(·) for which it is possible to find, within the ''δ''-neighborhood of ''x'', a point which maps outside the ''ε''-neighborhood of ''g''(''x''). By definition of continuity, this set shrinks as ''δ'' goes to zero, so that lim<sub>''δ''&nbsp;→&nbsp;0</sub>''B<sub>δ</sub>''&nbsp;=&nbsp;∅.
 
Now suppose that |''g''(''X'')&nbsp;−&nbsp;''g''(''X<sub>n</sub>'')|&nbsp;>&nbsp;''ε''. This implies that at least one of the following is true: either |''X''−''X<sub>n</sub>''|&nbsp;≥&nbsp;''δ'', or ''X''&nbsp;∈&nbsp;''D<sub>g</sub>'', or ''X''∈''B<sub>δ</sub>''. In terms of probabilities this can be written as
: <math>
\Pr\big(\big|g(X_n)-g(X)\big|>\varepsilon\big) \leq
\Pr\big(|X_n-X|\geq\delta\big) + \Pr(X\in B_\delta) + \Pr(X\in D_g).
</math>
 
On the right-hand side, the first term converges to zero as ''n''&nbsp;→&nbsp;∞ for any fixed ''δ'', by the definition of convergence in probability of the sequence {''X<sub>n</sub>''}. The second term converges to zero as ''δ''&nbsp;→&nbsp;0, since the set ''B<sub>δ</sub>'' shrinks to an empty set. And the last term is identically equal to zero by assumption of the theorem. Therefore, the conclusion is that
: <math>
\lim_{n\to\infty}\Pr \big(\big|g(X_n)-g(X)\big|>\varepsilon\big) = 0,
</math>
which means that ''g''(''X<sub>n</sub>'') converges to ''g''(''X'') in probability.
 
=== Almost sure convergence ===
By definition of the continuity of the function ''g''(·),
: <math>
\lim_{n\to\infty}X_n(\omega) = X(\omega) \quad\Rightarrow\quad \lim_{n\to\infty}g(X_n(\omega)) = g(X(\omega))
</math>
at each point ''X''(''ω'') where ''g''(·) is continuous. Therefore,
: <math>\begin{align}
\Pr\left(\lim_{n\to\infty}g(X_n) = g(X)\right)
&\geq \Pr\left(\lim_{n\to\infty}g(X_n) = g(X),\ X\notin D_g\right) \\
&\geq \Pr\left(\lim_{n\to\infty}X_n = X,\ X\notin D_g\right) = 1,
\end{align}</math>
because the intersection of two almost sure events is almost sure.
 
By definition, we conclude that ''g''(''X<sub>n</sub>'') converges to ''g''(''X'') almost surely.
 
==See also==
* [[Slutsky's theorem]]
* [[Portmanteau theorem]]
* [[Pushforward measure]]
 
==References==
{{reflist}}
===See also===
* [[Slutsky’s theorem]]
 
[[Category:Theorems in probability theory]]
===Literature===
[[Category:Theorems in statistics]]
{{refbegin}}
[[Category:Articles containing proofs]]
* {{cite book
| last = Amemiya
| first = Takeshi
| year = 1985
| title = Advanced Econometrics
| publisher = Harvard University Press
| ___location = Cambridge, MA
| isbn = 0674005600
| lcc = HB139.A54 1985
}}
* {{cite book
| last = Billingsley
| first = Patrick
| title = Convergence of Probability Measures
| year = 1969
| publisher = John Wiley & Sons
| isbn = 0471072427
}}
* {{cite book
| last = Billingsley
| first = Patrick
| title = Convergence of Probability Measures
| year = 1999
| publisher = John Wiley & Sons
| edition = 2nd
| isbn = 0471197459
}}
* {{cite article
| author = Mann, H.B.
| coauthor = Wald, A.
| year = 1943
| title = On stochastic limit and order relationships
| journal = The Annals of Mathematical Statistics
| volume = 14
| issue = 3
| pages = 217–226
| url = http://www.jstor.org/stable/2235800
| ref = CITEREFMannWald1943
}}
* {{cite book
| last = van der Vaart
| first = A. W.
| title = Asymptotic statistics
| year = 1998
| publisher = Cambridge University Press
| ___location = New York
| isbn = 9780521496032
| lcc = QA276 .V22 1998
| ref = CITEREFvan_der_Vaart1998
}}
{{refend}}
===Notes===
{{refs}}