Consistent estimator: Difference between revisions

Content deleted Content added
Madyno (talk | contribs)
 
(5 intermediate revisions by 4 users not shown)
Line 7:
In practice one constructs an estimator as a function of an available sample of [[sample size|size]] ''n'', and then imagines being able to keep collecting data and expanding the sample ''ad infinitum''. In this way one would obtain a sequence of estimates indexed by ''n'', and consistency is a property of what occurs as the sample size “grows to infinity”. If the sequence of estimates can be mathematically shown to converge in probability to the true value ''θ''<sub>0</sub>, it is called a consistent estimator; otherwise the estimator is said to be '''inconsistent'''.
 
Consistency as defined here is sometimes referred to as '''weak consistency'''. When we replace convergence in probability with [[almost sure convergence]], then the estimator is said to be '''strongly consistent'''. Consistency is related to [[bias of an estimator|bias]]; see [[#Bias versus consistency|bias versus consistency]].
 
== Definition ==
Formally speaking, an [[estimator]] ''T<sub>n</sub>'' of parameter ''θ'' is said to be '''weakly consistent''', if it [[convergence in probability|'''converges in probability''']] to the true value of the parameter:{{sfn|Amemiya|1985|loc=Definition 3.4.2}}
: <math>
\underset{n\to\infty}{\operatorname{plim}}\;T_n = \theta.
Line 19:
</math>
 
An [[estimator]] ''T<sub>n</sub>'' of parameter ''θ'' is said to be '''strongly consistent''', if it '''converges almost surely''' to the true value of the parameter:
A more rigorous definition takes into account the fact that ''θ'' is actually unknown, and thus the convergence in probability must take place for every possible value of this parameter. Suppose {{nowrap|{''p<sub>θ</sub>'': ''θ'' ∈ Θ}}} is a family of distributions (the [[parametric model]]), and {{nowrap|1=''X<sup>θ</sup>'' = {''X''<sub>1</sub>, ''X''<sub>2</sub>, … : ''X<sub>i</sub>'' ~ ''p<sub>θ</sub>''}}} is an infinite [[statistical sample|sample]] from the distribution ''p<sub>θ</sub>''. Let { ''T<sub>n</sub>''(''X<sup>θ</sup>'') } be a sequence of estimators for some parameter ''g''(''θ''). Usually ''T<sub>n</sub>'' will be based on the first ''n'' observations of a sample. Then this sequence {''T<sub>n</sub>''} is said to be (weakly) '''consistent''' if {{sfn|Lehman|Casella|1998|page=332}}
: <math>
\Pr\big(\lim_{n\to\infty}T_n = \theta\big) = 1.
</math>
 
A more rigorous definition takes into account the fact that ''θ'' is actually unknown, and thus, the convergence in probability must take place for every possible value of this parameter. Suppose {{nowrap|{''p<sub>θ</sub>'': ''θ'' ∈ Θ}}} is a family of distributions (the [[parametric model]]), and {{nowrap|1=''X<sup>θ</sup>'' = {''X''<sub>1</sub>, ''X''<sub>2</sub>, … : ''X<sub>i</sub>'' ~ ''p<sub>θ</sub>''}}} is an infinite [[statistical sample|sample]] from the distribution ''p<sub>θ</sub>''. Let { ''T<sub>n</sub>''(''X<sup>θ</sup>'') } be a sequence of estimators for some parameter ''g''(''θ''). Usually, ''T<sub>n</sub>'' will be based on the first ''n'' observations of a sample. Then this sequence {''T<sub>n</sub>''} is said to be (weakly) '''consistent''' if {{sfn|Lehman|Casella|1998|page=332}}
: <math>
\underset{n\to\infty}{\operatorname{plim}}\;T_n(X^{\theta}) = g(\theta),\ \ \text{for all}\ \theta\in\Theta.
</math>
 
This definition uses ''g''(''θ'') instead of simply ''θ'', because often one is interested in estimating a certain function or a sub-vector of the underlying parameter. In the next example, we estimate the ___location parameter of the model, but not the scale:
 
== Examples ==
Line 37 ⟶ 42:
2\left(1-\Phi\left(\frac{\sqrt{n}\,\varepsilon}{\sigma}\right)\right) \to 0
</math>
as ''n'' tends to infinity, for any fixed {{nowrap|''ε'' > 0}}. Therefore, the sequence ''T<sub>n</sub>'' of sample means is consistent for the population mean&nbsp;''μ'' (recalling that <math>\Phi</math> is the [[Cumulative distribution function|cumulative distribution]] of the standard normal distribution).
 
== Establishing consistency ==
Line 49 ⟶ 54:
the most common choice for function ''h'' being either the absolute value (in which case it is known as [[Markov inequality]]), or the quadratic function (respectively [[Chebyshev's inequality]]).
 
* Another useful result is the [[continuous mapping theorem]]: if ''T<sub>n</sub>'' is consistent for ''θ'' and ''g''(·) is a real-valued function continuous at the point ''θ'', then ''g''(''T<sub>n</sub>'') will be consistent for ''g''(''θ''):{{sfn|Amemiya|1985|loc=Theorem 3.2.6}}
:: <math>
T_n\ \xrightarrow{p}\ \theta\ \quad\Rightarrow\quad g(T_n)\ \xrightarrow{p}\ g(\theta)
</math>
 
* [[Slutsky’sSlutsky's theorem]] can be used to combine several different estimators, or an estimator with a non-random convergent sequence. If ''T<sub>n</sub>''&nbsp;→<sup style="position:relative;top:-.2em;left:-1em;">''d''</sup>''α'', and ''S<sub>n</sub>''&nbsp;→<sup style="position:relative;top:-.2em;left:-1em;">''p''</sup>''β'', then {{sfn|Amemiya|1985|loc=Theorem 3.2.7}}
:: <math>\begin{align}
& T_n + S_n \ \xrightarrow{d}\ \alpha+\beta, \\
Line 69 ⟶ 74:
 
=== Unbiased but not consistent ===
An estimator can be [[biased estimator|unbiased]] but not consistent. For example, for an [[iid]] sample {''x''{{su|b=1}},..., ''x{{su|b=n}}''} one can use ''T{{su|b=n}}''(''X'') = ''x''{{su|b=n}} as the estimator of the mean E[''X'']. Note that here the sampling distribution of ''T{{su|b=n}}'' is the same as the underlying distribution (for any ''n,'' as it ignores all points but the last),. soSo E[''T{{su|b=n}}''(''X'')] = E[''X''] andfor any ''n,'' hence it is unbiased, but it does not converge to any value.
 
However, if a sequence of estimators is unbiased ''and'' converges to a value, then it is consistent, as it must converge to the correct value.
Line 92 ⟶ 97:
* [[Regression dilution]]
* [[Statistical hypothesis testing]]
* [[Instrumental variables estimation]]
 
== Notes ==