Consistent estimator: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 14:35, 23 September 2014 edit Fgnievinski (talk \| contribs) Autopatrolled, Extended confirmed users 71,092 edits →See also ← Previous edit		Latest revision as of 17:19, 3 April 2025 edit undo 2a02:ab88:1383:7080:744b:1d74:d213:5a14 (talk) →Unbiased but not consistent
(45 intermediate revisions by 35 users not shown)
Line 1: {{Short description\|Statistical estimator converging in probability to a true parameter as sample size increases}}{{broader\|Consistency (statistics)}} [[Image:Consistency of estimator.svg\|thumb\|250px\|{''T''<sub>1</sub>, ''T''<sub>2</sub>, ''T''<sub>3</sub>, …} is a sequence of estimators for parameter ''θ''<sub>0</sub>, the true value of which is 4. This sequence is consistent: the estimators are getting more and more concentrated near the true value ''θ''<sub>0</sub>; at the same time, these estimators are biased. The limiting distribution of the sequence is a degenerate random variable which equals ''θ''<sub>0</sub> with probability 1.]]▼ ▲[[Image:Consistency of estimator.svg\|thumb\|250px\|{''T''<sub>1</sub>, ''T''<sub>2</sub>, ''T''<sub>3</sub>, …...} is a sequence of estimators for parameter ''θ''<sub>0</sub>, the true value of which is 4. This sequence is consistent: the estimators are getting more and more concentrated near the true value ''θ''<sub>0</sub>; at the same time, these estimators are biased. The limiting distribution of the sequence is a degenerate random variable which equals ''θ''<sub>0</sub> with probability 1.]] In [[statistics]], a '''consistent estimator''' or '''asymptotically consistent estimator''' is an [[estimator]]—a rule for computing estimates of a parameter ''θ''<sub>0</sub>—having the property that as the number of data points used increases indefinitely, the resulting sequence of estimates [[convergence in probability\|converges in probability]] to ''θ''<sub>0</sub>. This means that the distributions of the estimates become more and more concentrated near the true value of the parameter being estimated, so that the probability of the estimator being arbitrarily close to ''θ''<sub>0</sub> converges to one. Line 5 ⟶ 7: In practice one constructs an estimator as a function of an available sample of [[sample size\|size]] ''n'', and then imagines being able to keep collecting data and expanding the sample ''ad infinitum''. In this way one would obtain a sequence of estimates indexed by ''n'', and consistency is a property of what occurs as the sample size “grows to infinity”. If the sequence of estimates can be mathematically shown to converge in probability to the true value ''θ''<sub>0</sub>, it is called a consistent estimator; otherwise the estimator is said to be '''inconsistent'''. Consistency as defined here is sometimes referred to as '''weak consistency'''. When we replace convergence in probability with [[almost sure convergence]], then the estimator is said to be '''strongly consistent'''. Consistency is related to [[bias of an estimator\|bias]] in that consistent estimators are convergent and ''asymptotically'' unbiased (hence converge to the correct value), though individual estimators in a consistent sequence may be biased, so long as the bias converges to zero; see [[#Bias versus consistency\|bias versus consistency]]. == Definition == ~~Loosely~~Formally speaking, an [[estimator]] ''T<sub>n</sub>'' of parameter ''θ'' is said to be '''weakly consistent''', if it [[convergence in probability\|'''converges in probability''']] to the true value of the parameter:{{sfn\|Amemiya\|1985\|loc=Definition 3.4.2}} : <math> \underset{n\to\infty}{\operatorname{plim}}\;T_n = \theta. </math> i.e. if, for all ''ε'' > 0 : <math> \lim_{n\to\infty}\Pr\big(\|T_n-\theta\| > \varepsilon\big) = 0. </math> An [[estimator]] ''T<sub>n</sub>'' of parameter ''θ'' is said to be '''strongly consistent''', if it '''converges almost surely''' to the true value of the parameter: : <math> \Pr\big(\lim_{n\to\infty}T_n = \theta\big) = 1. </math> A more rigorous definition takes into account the fact that ''θ'' is actually unknown, and thus, the convergence in probability must take place for every possible value of this parameter. Suppose {{nowrap\|{''p<sub>θ</sub>'': ''θ'' ∈ Θ}}} is a family of distributions (the [[parametric model]]), and {{nowrap\|1=''X<sup>θ</sup>'' = {''X''<sub>1</sub>, ''X''<sub>2</sub>, … : ''X<sub>i</sub>'' ~ ''p<sub>θ</sub>''}}} is an infinite [[statistical sample\|sample]] from the distribution ''p<sub>θ</sub>''. Let { ''T<sub>n</sub>''(''X<sup>θ</sup>'') } be a sequence of estimators for some parameter ''g''(''θ''). Usually, ''T<sub>n</sub>'' will be based on the first ''n'' observations of a sample. Then this sequence {''T<sub>n</sub>''} is said to be (weakly) '''consistent''' if {{sfn\|Lehman\|Casella\|1998\|page=332}} : <math> \underset{n\to\infty}{\operatorname{plim}}\;T_n(X^{\theta}) = g(\theta),\ \ \text{for all}\ \theta\in\Theta. </math> This definition uses ''g''(''θ'') instead of simply ''θ'', because often one is interested in estimating a certain function or a sub-vector of the underlying parameter. In the next example, we estimate the ___location parameter of the model, but not the scale: == Examples == === Sample mean of a normal random variable === Suppose one has a sequence of [[Independence (probability theory)\|statistically independent]] observations {''X''<sub>1</sub>, ''X''<sub>2</sub>, …...} from a [[Normal distribution\|normal ''N''(''μ'', ''σ''<sup>2</sup>)]] distribution. To estimate ''μ'' based on the first ''n'' observations, one can use the [[sample mean]]: ''T<sub>n</sub>'' = (''X''<sub>1</sub> + …... + ''X<sub>n</sub>'')/''n''. This defines a sequence of estimators, indexed by the sample size ''n''. From the properties of the normal distribution, we know the [[sampling distribution]] of this statistic: ''T''<sub>''n''</sub> is itself normally distributed, with mean ''μ'' and variance ''σ''<sup>2</sup>/''n''. Equivalently, <math style="vertical-align:-.3em">\scriptstyle (T_n-\mu)/(\sigma/\sqrt{n})</math> has a standard normal distribution: Line 31 ⟶ 42: 2\left(1-\Phi\left(\frac{\sqrt{n}\,\varepsilon}{\sigma}\right)\right) \to 0 </math> as ''n'' tends to infinity, for any fixed {{nowrap\|''ε'' > 0}}. Therefore, the sequence ''T<sub>n</sub>'' of sample means is consistent for the population mean ''μ'' (recalling that <math>\Phi</math> is the [[Cumulative distribution function\|cumulative distribution]] of the standard normal distribution). == Establishing consistency == Line 39 ⟶ 50: * In order to demonstrate consistency directly from the definition one can use the inequality {{sfn\|Amemiya\|1985\|loc=equation (3.2.5)}} :: <math> \Pr\!\big[h(T_n-\theta)\geq\varepsilon\big] \leq \frac{\operatorname{E}\big[h(T_n-\theta)\big]}{h(\varepsilon)}, </math> the most common choice for function ''h'' being either the absolute value (in which case it is known as [[Markov inequality]]), or the quadratic function (respectively [[~~Chebychev~~Chebyshev's inequality]]). * Another useful result is the [[continuous mapping theorem]]: if ''T<sub>n</sub>'' is consistent for ''θ'' and ''g''(·) is a real-valued function continuous at the point ''θ'', then ''g''(''T<sub>n</sub>'') will be consistent for ''g''(''θ''):{{sfn\|Amemiya\|1985\|loc=Theorem 3.2.6}} :: <math> T_n\ \xrightarrow{p}\ \theta\ \quad\Rightarrow\quad g(T_n)\ \xrightarrow{p}\ g(\theta) </math> * [[~~Slutsky’s~~Slutsky's theorem]] can be used to combine several different estimators, or an estimator with a non-random convergent sequence. If ''T<sub>n</sub>'' →<sup style="position:relative;top:-.2em;left:-1em;">''pd''</sup>''α'', and ''S<sub>n</sub>'' →<sup style="position:relative;top:-.2em;left:-1em;">''p''</sup>''β'', then {{sfn\|Amemiya\|1985\|loc=Theorem 3.2.7}} :: <math>\begin{align} & T_n + S_n \ \xrightarrow{pd}\ \alpha+\beta, \\ & T_n S_n \ \xrightarrow{pd}\ \alpha \beta, \\ & T_n / S_n \ \xrightarrow{pd}\ \alpha/\beta, \text{ provided that }\beta\neq0 \end{align}</math> Line 61 ⟶ 72: == Bias versus consistency == [[Bias of an estimator\|Bias]] is related to consistency as follows: a sequence of estimators is consistent [[if and only if]] it converges to a value and the bias converges to zero. Consistent estimators are convergent and ''asymptotically'' unbiased (hence converge to the correct value): individual estimators in the sequence may be biased, but the overall sequence still consistent, if the bias converges to zero. Conversely, if the sequence does not converge to a value, then it is not consistent, regardless of whether the estimators in the sequence are biased or not. === Unbiased but not consistent === An estimator can be [[biased estimator\|unbiased]] but not consistent. For example, for an [[iid]] sample {''x''{{su\|b=1}},..., ''x{{su\|b=n}}''} one can use ''T{{su\|b=n}}''(''X'') = ''x''{{su\|b=1n}} as the estimator of the mean E[''xX'']. Note that here the sampling distribution of ''T{{su\|b=n}}'' is the same as the underlying distribution (for any ''n,'' as it ignores all points but the ~~first~~last),. soSo E[''T{{su\|b=n}}''(''X'')] = E[''xX''] ~~and~~for any ''n,'' hence it is unbiased, but it does not converge to any value. However, if a sequence of estimators is unbiased ''and'' converges to a value, then it is consistent, as it must converge to the correct value. === Biased but consistent === Alternatively, an estimator can be biased but consistent. For example, if the mean is estimated by <math>{1 \over n} \sum x_i + {1 \over n}</math> it is biased, but as <math>n \rightarrow \infty</math>, it approaches the correct value, and so it is consistent. Important examples include the [[sample variance]] and [[sample standard deviation]]. Without [[Bessel's correction]] (that is, when using the sample size ''<math>n''</math> instead of the [[Degrees of freedom (statistics)\|degrees of freedom]] ''<math>n~~'' − ~~-1</math>), these are both negatively biased but consistent estimators. With the correction, the ~~unbiased~~corrected sample variance is unbiased, while the corrected sample standard deviation is still biased, but less so, and both are still consistent: the correction factor converges to 1 as sample size grows.▼ Here is another example. Let <math>T_n</math> be a sequence of estimators for <math>\theta</math>. :<math>\Pr(T_n) = \begin{cases} 1 - 1/n, & \mbox{if }\, T_n = \theta \\ 1/n, & \mbox{if }\, T_n = n\delta + \theta \end{cases}</math> We can see that <math>T_n \xrightarrow{p} \theta</math>, <math>\operatorname{E}[T_n] = \theta + \delta </math>, and the bias does not converge to zero. ▲Important examples include the [[sample variance]] and [[sample standard deviation]]. Without [[Bessel's correction]] (using the sample size ''n'' instead of the [[Degrees of freedom (statistics)\|degrees of freedom]] ''n'' − 1), these are both negatively biased but consistent estimators. With the correction, the unbiased sample variance is unbiased, while the corrected sample standard deviation is still biased, but less so, and both are still consistent: the correction factor converges to 1 as sample size grows. == See also == Line 78 ⟶ 97: * [[Regression dilution]] * [[Statistical hypothesis testing]] * [[Instrumental variables estimation]] == Notes == Line 84 ⟶ 104: == References == * {{cite book \| last = Amemiya \| first = Takeshi \| authorlink = Takeshi Amemiya \| title = Advanced ~~econometrics~~Econometrics \| year = 1985 \| publisher = [[Harvard University Press]] \| isbn = 0-674-00560-0 \| url-access = registration ~~\| ref = CITEREFAmemiya1985~~ \| url = https://archive.org/details/advancedeconomet00amem }} * {{cite book \| ~~last1~~author1-last = Lehmann \| ~~first1~~author1-first = E. L. \| author1-link= Erich Leo Lehmann \| ~~last2~~author2-last = Casella \| ~~first2~~author2-first = G. \| author2-link= George Casella \| title = Theory of Point Estimation \| year = 1998 Line 102 ⟶ 125: }} * {{cite book \| last1 = Newey \| first1 = W. K. \| last2 = McFadden \| first2 = D. \| s2cid = 29436457 \| authorlink2 = Daniel McFadden \| ~~title~~chapter = Chapter 36: Large sample estimation and hypothesis testing \| year = 1994 \| title = Handbook of Econometrics ~~\| series = In “Handbook of Econometrics”, Vol. 4, Ch. 36~~ \| volume = 4 \|editor= Robert F. Engle \|editor2=Daniel L. McFadden \| publisher = Elsevier Science \| isbn = 0-444-88766-0 ~~\| ref = CITEREFNeweyMcFadden1994~~ }} * {{SpringerEOM\| title=Consistent estimator \|id=C/c025240 \|first=M. S. \|last=Nikulin}} *{{citation \| last= Sober \| first= E. \| author-link= Elliott Sober \| title= Likelihood and convergence \| journal= [[Philosophy of Science]] \| year= 1988 \| volume= 55 \| issue= 2 \| pages= 228–237 \| doi= 10.1086/289429}}. == External links == Line 117 ⟶ 143: {{DEFAULTSORT:Consistent estimator}} [[Category:~~Statistical theory~~Estimator]] [[Category:~~Statistical~~Asymptotic ~~inference~~theory (statistics)]] ~~[[Category:Estimation theory]]~~ ~~[[Category:Asymptotic statistical theory]]~~