Content deleted Content added
Logicdavid (talk | contribs) |
Link suggestions feature: 3 links added. |
||
(29 intermediate revisions by 18 users not shown) | |||
Line 1:
{{Short description|Measure of the error of an estimator}}
{{distinguish-redirect|Mean squared deviation|Mean squared displacement}}
In [[statistics]], the '''mean squared error''' ('''MSE''')<ref name=":1">{{Cite web|title=Mean Squared Error (MSE)|url=https://www.probabilitycourse.com/chapter9/9_1_5_mean_squared_error_MSE.php|access-date=2020-09-12|website=www.probabilitycourse.com}}</ref> or '''mean squared deviation''' ('''MSD''') of an [[estimator]] (of a procedure for estimating an unobserved quantity) measures the [[expected value|average]] of the squares of the [[Error (statistics)|errors]]—that is, the average squared difference between the estimated values and the
The MSE is a measure of the quality of an estimator. As it is derived from the square of [[Euclidean distance]], it is always a positive value that decreases as the error approaches zero.
Line 9:
==Definition and basic properties==
The MSE either assesses the quality of a ''[[predictor (statistics)|predictor]]'' (i.e., a function mapping arbitrary inputs to a sample of values of some [[random variable]]), or of an ''[[estimator]]'' (i.e., a [[mathematical function]] mapping a [[Sample (statistics)|sample]] of data to an estimate of a [[Statistical parameter|parameter]] of the [[Statistical population|population]] from which the data is sampled). In the context of prediction, understanding the [[prediction interval]] can also be useful as it provides a range within which a future observation will fall, with a certain probability. The definition of an MSE differs according to whether one is describing a predictor or an estimator.
===Predictor===
Line 15:
If a vector of <math>n</math> predictions is generated from a sample of <math>n</math> data points on all variables, and <math>Y</math> is the vector of observed values of the variable being predicted, with <math>\hat{Y}</math> being the predicted values (e.g. as from a [[least-squares fit]]), then the within-sample MSE of the predictor is computed as
:<math>\operatorname{MSE}=\frac{1}{n} \sum_{i=1}^n \left(Y_i-\hat{Y_i}\right)^2
In other words, the MSE is the ''mean'' <math display="inline">\left(\frac{1}{n} \sum_{i=1}^n \right)</math> of the ''squares of the errors'' <math display="inline">\left(Y_i-\hat{Y_i}\right)^2</math>. This is an easily computable quantity for a particular sample (and hence is sample-dependent).
Line 21:
In [[Matrix_multiplication|matrix]] notation,
:<math>\operatorname{MSE}=\frac{1}{n}\sum_{i=1}^n(e_i)^2=\frac{1}{n}\mathbf e^\mathsf T \mathbf e</math>
where <math>e_i</math> is <math> (Y_i-\hat{Y_i}) </math> and <math>\mathbf e</math> is
The MSE can also be computed on ''q ''data points that were not used in estimating the model, either because they were held back for this purpose, or because these data have been newly obtained. Within this process, known as [[
|first1=James
|last1=Gareth
Line 39:
}}</ref> and is computed as
:<math>\operatorname{MSE} = \frac{1}{q} \sum_{i=n+1}^{n+q} \left(Y_i-\hat{Y_i}\right)^2
===Estimator===
Line 47:
:<math>\operatorname{MSE}(\hat{\theta})=\operatorname{E}_{\theta}\left[(\hat{\theta}-\theta)^2\right].</math>
This definition depends on the unknown parameter,
The MSE can be written as the sum of the [[variance]] of the estimator and the squared [[Bias_of_an_estimator|bias]] of the estimator, providing a useful way to calculate the MSE and implying that in the case of unbiased estimators, the MSE and variance are equivalent.<ref name="wackerly">{{cite book |first1=Dennis |last1=Wackerly |first2=William|last2=Mendenhall |first3=Richard L.|last3=Scheaffer |title=Mathematical Statistics with Applications |publisher=Thomson Higher Education|___location=Belmont, CA, USA |year=2008 |edition=7 |isbn=978-0-495-38508-0}}</ref>
Line 55:
====Proof of variance and bias relationship====
\operatorname{MSE}(\hat{\theta})
&= \operatorname{E}_
&= \operatorname{E}_
&= \operatorname{E}_
&= \operatorname{E}_
&= \operatorname{E}_
&= \operatorname{E}_
&= \operatorname{E}_
&= \operatorname{Var}_
\end{align}</math>
An even shorter proof can be achieved using the well-known formula that for a random variable <math display="inline">X</math>, <math display="inline">\mathbb{E}(X^2) = \operatorname{Var}(X) + (\mathbb{E}(X))^2</math>. By substituting <math display="inline">X</math> with, <math display="inline">\hat\theta-\theta</math>, we have
:<math display="block">\begin{aligned} \operatorname{MSE}(\hat{\theta}) &= \mathbb{E}[(\hat\theta-\theta)^2] \\
&= \operatorname{Var}(\hat{\theta} - \theta) + (\mathbb{E}[\hat\theta - \theta])^2 \\
&= \operatorname{Var}(\hat\theta) + \operatorname{Bias}^2(\hat\theta,\theta)
\end{aligned}</math>
==In regression==
Line 81 ⟶ 83:
In regression analysis, "mean squared error", often referred to as [[mean squared prediction error]] or "out-of-sample mean squared error", can also refer to the mean value of the [[squared deviations]] of the predictions from the true values, over an out-of-sample [[test set|test space]], generated by a model estimated over a [[training set|particular sample space]]. This also is a known, computed quantity, and it varies by sample and by out-of-sample test space.
In the context of [[gradient descent]] algorithms, it is common to introduce a factor of <math>1/2</math> to the MSE for ease of computation after taking the derivative. So a value which is technically half the mean of squared errors may be called the MSE.
==Examples==
===Mean===
Suppose we have a random sample of size <math>n</math> from a population, <math>X_1,\dots,X_n</math>. Suppose the sample units were chosen [[Sampling with replacement|with replacement]]. That is, the <math>n</math> units are selected one at a time, and previously selected units are still eligible for selection for all <math>n</math> draws. The usual estimator for the population mean <math>\mu</math> is the sample average
:<math>\overline{X}=\frac{1}{n}\sum_{i=1}^n X_i </math>
Line 91 ⟶ 95:
which has an expected value equal to the true mean <math>\mu</math> (so it is unbiased) and a mean squared error of
:<math>\operatorname{MSE}\left(\overline{X}\right)=\operatorname{E}\left[\left(\overline{X}-\mu\right)^2\right]=\left(\frac
where <math>\sigma^2</math> is the [[Sample variance#Population variance|population variance]].
For a [[Gaussian distribution]]
===Variance===
Line 161 ⟶ 165:
==Applications==
▲*Minimizing MSE is a key criterion in selecting estimators: see [[minimum mean-square error]]. Among unbiased estimators, minimizing the MSE is equivalent to minimizing the variance, and the estimator that does this is the [[minimum variance unbiased estimator]]. However, a biased estimator may have lower MSE; see [[estimator bias]].
==Loss function==
Squared error loss is one of the most widely used [[loss function]]s in statistics
===Criticism===
Line 191 ⟶ 196:
*[[Mean percentage error]]
*[[Mean square quantization error]]
*[[Reduced chi-squared statistic]]
*[[Mean squared displacement]]
*[[Mean squared prediction error]]
*[[Minimum mean square error]]
*[[Overfitting]]
*[[Peak signal-to-noise ratio]]
|