Consistent estimator: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 04:03, 8 June 2020 edit 129.94.177.166 (talk) →Definition ← Previous edit		Latest revision as of 17:19, 3 April 2025 edit undo 2a02:ab88:1383:7080:744b:1d74:d213:5a14 (talk) →Unbiased but not consistent
(9 intermediate revisions by 7 users not shown)
Line 7: In practice one constructs an estimator as a function of an available sample of [[sample size\|size]] ''n'', and then imagines being able to keep collecting data and expanding the sample ''ad infinitum''. In this way one would obtain a sequence of estimates indexed by ''n'', and consistency is a property of what occurs as the sample size “grows to infinity”. If the sequence of estimates can be mathematically shown to converge in probability to the true value ''θ''<sub>0</sub>, it is called a consistent estimator; otherwise the estimator is said to be '''inconsistent'''. Consistency as defined here is sometimes referred to as '''weak consistency'''. When we replace convergence in probability with [[almost sure convergence]], then the estimator is said to be '''strongly consistent'''. Consistency is related to [[bias of an estimator\|bias]]; see [[#Bias versus consistency\|bias versus consistency]]. == Definition == Formally speaking, an [[estimator]] ''T<sub>n</sub>'' of parameter ''θ'' is said to be '''weakly consistent''', if it [[convergence in probability\|'''converges in probability''']] to the true value of the parameter:{{sfn\|Amemiya\|1985\|loc=Definition 3.4.2}} : <math> \underset{n\to\infty}{\operatorname{plim}}\;T_n = \theta. Line 19: </math> An [[estimator]] ''T<sub>n</sub>'' of parameter ''θ'' is said to be '''strongly consistent''', if it '''converges almost surely''' to the true value of the parameter: A more rigorous definition takes into account the fact that ''θ'' is actually unknown, and thus the convergence in probability must take place for every possible value of this parameter. Suppose {{nowrap\|{''p<sub>θ</sub>'': ''θ'' ∈ Θ}}} is a family of distributions (the [[parametric model]]), and {{nowrap\|1=''X<sup>θ</sup>'' = {''X''<sub>1</sub>, ''X''<sub>2</sub>, … : ''X<sub>i</sub>'' ~ ''p<sub>θ</sub>''}}} is an infinite [[statistical sample\|sample]] from the distribution ''p<sub>θ</sub>''. Let { ''T<sub>n</sub>''(''X<sup>θ</sup>'') } be a sequence of estimators for some parameter ''g''(''θ''). Usually ''T<sub>n</sub>'' will be based on the first ''n'' observations of a sample. Then this sequence {''T<sub>n</sub>''} is said to be (weakly) '''consistent''' if {{sfn\|Lehman\|Casella\|1998\|page=332}}▼ : <math> \Pr\big(\lim_{n\to\infty}T_n = \theta\big) = 1. </math> ▲A more rigorous definition takes into account the fact that ''θ'' is actually unknown, and thus, the convergence in probability must take place for every possible value of this parameter. Suppose {{nowrap\|{''p<sub>θ</sub>'': ''θ'' ∈ Θ}}} is a family of distributions (the [[parametric model]]), and {{nowrap\|1=''X<sup>θ</sup>'' = {''X''<sub>1</sub>, ''X''<sub>2</sub>, … : ''X<sub>i</sub>'' ~ ''p<sub>θ</sub>''}}} is an infinite [[statistical sample\|sample]] from the distribution ''p<sub>θ</sub>''. Let { ''T<sub>n</sub>''(''X<sup>θ</sup>'') } be a sequence of estimators for some parameter ''g''(''θ''). Usually, ''T<sub>n</sub>'' will be based on the first ''n'' observations of a sample. Then this sequence {''T<sub>n</sub>''} is said to be (weakly) '''consistent''' if {{sfn\|Lehman\|Casella\|1998\|page=332}} : <math> \underset{n\to\infty}{\operatorname{plim}}\;T_n(X^{\theta}) = g(\theta),\ \ \text{for all}\ \theta\in\Theta. </math> This definition uses ''g''(''θ'') instead of simply ''θ'', because often one is interested in estimating a certain function or a sub-vector of the underlying parameter. In the next example, we estimate the ___location parameter of the model, but not the scale: == Examples == === Sample mean of a normal random variable === Suppose one has a sequence of [[Independence (probability theory)\|statistically independent]] observations {''X''<sub>1</sub>, ''X''<sub>2</sub>, ...} from a [[Normal distribution\|normal ''N''(''μ'', ''σ''<sup>2</sup>)]] distribution. To estimate ''μ'' based on the first ''n'' observations, one can use the [[sample mean]]: ''T<sub>n</sub>'' = (''X''<sub>1</sub> + ... + ''X<sub>n</sub>'')/''n''. This defines a sequence of estimators, indexed by the sample size ''n''. From the properties of the normal distribution, we know the [[sampling distribution]] of this statistic: ''T''<sub>''n''</sub> is itself normally distributed, with mean ''μ'' and variance ''σ''<sup>2</sup>/''n''. Equivalently, <math style="vertical-align:-.3em">\scriptstyle (T_n-\mu)/(\sigma/\sqrt{n})</math> has a standard normal distribution: Line 37 ⟶ 42: 2\left(1-\Phi\left(\frac{\sqrt{n}\,\varepsilon}{\sigma}\right)\right) \to 0 </math> as ''n'' tends to infinity, for any fixed {{nowrap\|''ε'' > 0}}. Therefore, the sequence ''T<sub>n</sub>'' of sample means is consistent for the population mean ''μ'' (recalling that <math>\Phi</math> is the [[Cumulative distribution function\|cumulative distribution]] of the standard normal distribution). == Establishing consistency == Line 49 ⟶ 54: the most common choice for function ''h'' being either the absolute value (in which case it is known as [[Markov inequality]]), or the quadratic function (respectively [[Chebyshev's inequality]]). * Another useful result is the [[continuous mapping theorem]]: if ''T<sub>n</sub>'' is consistent for ''θ'' and ''g''(·) is a real-valued function continuous at the point ''θ'', then ''g''(''T<sub>n</sub>'') will be consistent for ''g''(''θ''):{{sfn\|Amemiya\|1985\|loc=Theorem 3.2.6}} :: <math> T_n\ \xrightarrow{p}\ \theta\ \quad\Rightarrow\quad g(T_n)\ \xrightarrow{p}\ g(\theta) </math> * [[~~Slutsky’s~~Slutsky's theorem]] can be used to combine several different estimators, or an estimator with a non-random convergent sequence. If ''T<sub>n</sub>'' →<sup style="position:relative;top:-.2em;left:-1em;">''d''</sup>''α'', and ''S<sub>n</sub>'' →<sup style="position:relative;top:-.2em;left:-1em;">''p''</sup>''β'', then {{sfn\|Amemiya\|1985\|loc=Theorem 3.2.7}} :: <math>\begin{align} & T_n + S_n \ \xrightarrow{d}\ \alpha+\beta, \\ Line 69 ⟶ 74: === Unbiased but not consistent === An estimator can be [[biased estimator\|unbiased]] but not consistent. For example, for an [[iid]] sample {''x''{{su\|b=1}},..., ''x{{su\|b=n}}''} one can use ''T{{su\|b=n}}''(''X'') = ''x''{{su\|b=n}} as the estimator of the mean E[''xX'']. Note that here the sampling distribution of ''T{{su\|b=n}}'' is the same as the underlying distribution (for any ''n,'' as it ignores all points but the last),. soSo E[''T{{su\|b=n}}''(''X'')] = E[''xX''] ~~and~~for any ''n,'' hence it is unbiased, but it does not converge to any value. However, if a sequence of estimators is unbiased ''and'' converges to a value, then it is consistent, as it must converge to the correct value. Line 92 ⟶ 97: * [[Regression dilution]] * [[Statistical hypothesis testing]] * [[Instrumental variables estimation]] == Notes == Line 105 ⟶ 111: \| publisher = [[Harvard University Press]] \| isbn = 0-674-00560-0 ~~\| ref = CITEREFAmemiya1985~~ \| url-access = registration \| url = https://archive.org/details/advancedeconomet00amem Line 130 ⟶ 135: \| publisher = Elsevier Science \| isbn = 0-444-88766-0 ~~\| ref = CITEREFNeweyMcFadden1994~~ }} * {{SpringerEOM\| title=Consistent estimator \|id=C/c025240 \|first=M. S. \|last=Nikulin}}