Mean squared prediction error: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 16:57, 30 May 2018 edit Loraof (talk \| contribs) Extended confirmed users 22,850 edits →Computation of MSPE over out-of-sample data: hatnote to main ← Previous edit		Latest revision as of 14:41, 15 November 2024 edit undo Revirvlkodlaku (talk \| contribs) Extended confirmed users 95,048 edits Added short description Tags: Mobile edit Mobile app edit Android app edit App description add
(18 intermediate revisions by 9 users not shown)
Line 1: {{Short description\|Statistics concept}} ~~{{Unreferenced\|date=December 2009}}~~ In [[statistics]] the '''mean squared prediction error''' or('''MSPE'''), also known as '''mean squared error of the predictions''' , of a [[smoothing]] or, [[curve fitting]], or [[regression (statistics)\|regression]] procedure is the [[expected value]] of the [[Square (algebra)\|squared]] '''prediction errors''' ('''PE'''), the [[squared deviation\|square difference]] between the fitted values implied by the predictive function <math>\widehat{g}</math> and the values of the (unobservable) ~~function~~[[true value]] ''g''. It is an inverse measure of the '''''explanatory power''''' of <math>\widehat{g},</math> and can be used in the process of [[cross-validation (statistics)\|cross-validation]] of an estimated model.▼ ~~{{expert-subject\|statistics\|reason=no source, and notation/definition problems regarding ''L''}}~~ Knowledge of ''g'' would be required in order to calculate the MSPE exactly; in practice, MSPE is estimated.<ref>{{cite book \|first1=Robert S. \|last1=Pindyck \|authorlink=Robert Pindyck \|first2=Daniel L. \|last2=Rubinfeld \|authorlink2=Daniel L. Rubinfeld \|title=Econometric Models & Economic Forecasts \|___location=New York \|publisher=McGraw-Hill \|edition=3rd \|year=1991 \|isbn=0-07-050098-3 \|chapter=Forecasting with Time-Series Models \|pages=[https://archive.org/details/econometricmodel00pind/page/516 516–535] \|chapter-url=https://archive.org/details/econometricmodel00pind/page/516 }}</ref> ▲In [[statistics]] the '''mean squared prediction error''' or '''mean squared error of the predictions''' of a [[smoothing]] or [[curve fitting]] procedure is the expected value of the squared difference between the fitted values implied by the predictive function <math>\widehat{g}</math> and the values of the (unobservable) function ''g''. It is an inverse measure of the explanatory power of <math>\widehat{g},</math> and can be used in the process of [[cross-validation (statistics)\|cross-validation]] of an estimated model. ==Formulation== If the smoothing or fitting procedure has [[projection matrix]] (i.e., hat matrix) ''L'', which maps the observed values vector <math>y</math> to [[predicted ~~values~~value]]s vector ~~<math>\hat{y}</math> via~~ <math>\hat{y}=Ly,</math> then PE and MSPE are formulated as: :<math>\operatorname{~~MSPE~~PE_i}~~(L)~~=~~\operatorname{E}\left[\left(~~ g(x_i)-\widehat{g}(x_i)~~\right)^2\right].~~,</math> :<math>~~n\cdot~~\operatorname{MSPE}~~(L)~~=~~\sum_{i=1}^n\left(~~\operatorname{E}\left[\~~widehat~~operatorname{gPE}~~(x_i)~~_i^2\right]~~-g(x_i)\right)^2+~~=\sum_{i=1}^n \operatorname{~~var~~PE}~~\left[\widehat{g}(x_i)\right]~~_i^2/n.</math>▼ The MSPE can be decomposed into two terms: the mean of squared biases of the fitted values and the mean of variances of the fitted values:▼ ▲The MSPE can be decomposed into two terms: the ~~mean~~squared of[[bias ~~squared~~(statistics)\|bias]] ~~biases~~(mean error) of the fitted values and the ~~mean of variances~~[[variance]] of the fitted values: ▲:<math>n\cdot\operatorname{MSPE}(L)=\sum_{i=1}^n\left(\operatorname{E}\left[\widehat{g}(x_i)\right]-g(x_i)\right)^2+\sum_{i=1}^n\operatorname{var}\left[\widehat{g}(x_i)\right].</math> :<math>\operatorname{MSPE}=\operatorname{ME}^2 + \operatorname{VAR},</math> ~~Knowledge of ''g'' is required in order to calculate the MSPE exactly; otherwise, it can be estimated.~~ :<math>\operatorname{ME}=\operatorname{E}\left[ \widehat{g}(x_i) - g(x_i)\right]</math> :<math>\operatorname{VAR}=\operatorname{E}\left[\left(\widehat{g}(x_i) - \operatorname{E}\left[{g}(x_i)\right]\right)^2\right].</math> The quantity {{math\|SSPE{{=}}''n''MSPE}} is called '''sum squared prediction error'''. The '''root mean squared prediction error''' is the square root of MSPE: {{math\|RMSPE{{=}}{{sqrt\|MSPE}}}}. ==Computation of MSPE over out-of-sample data== {{Further\|Cross-validation (statistics)}} The mean squared prediction error can be computed exactly in two contexts. First, with a [[sample (statistics)\|data sample]] of length ''n'', the [[data analyst]] may run the [[regression analysis\|regression]] over only ''q'' of the data points (with ''q'' < ''n''), holding back the other ''n – q'' data points with the specific purpose of using them to compute the estimated model’s MSPE out of sample (i.e., not using data that were used in the model estimation process). Since the regression process is tailored to the ''q'' in-sample points, normally the in-sample MSPE will be smaller than the out-of-sample one computed over the ''n – q'' held-back points. If the increase in the MSPE out of sample compared to in sample is relatively slight, that results in the model being viewed favorably. And if two models are to be compared, the one with the lower MSPE over the ''n – q'' out-of-sample data points is viewed more favorably, regardless of the models’ relative in-sample performances. The out-of-sample MSPE in this context is exact for the out-of-sample data points that it was computed over, but is merely an estimate of the model’s MSPE for the mostly unobserved population from which the data were drawn. Line 21 ⟶ 27: ==Estimation of MSPE over the population== {{disputed\|section\|date=May 2018\|reason=this needs to be checked against a source}} When the model has been estimated over all available data with none held back, the MSPE of the model over the entire [[statistical population\|population]] of mostly unobserved data can be estimated as follows. Line 28 ⟶ 35: :<math>n\cdot\operatorname{MSPE}(L)=g^{\text{T}}(I-L)^{\text{T}}(I-L)g+\sigma^2\operatorname{tr}\left[L^{\text{T}} L\right].</math> Using in-sample data values, the first term on the right side is equivalent to :<math>\sum_{i=1}^n\left(\operatorname{E}\left[g(x_i)-\widehat{g}(x_i)\right]\right)^2 =\operatorname{E}\left[\sum_{i=1}^n\left(y_i-\widehat{g}(x_i)\right)^2\right]-\sigma^2\operatorname{tr}\left[\left(I-L\right)^T\left(I-L\right)\right].</math> Thus, Line 43 ⟶ 50: [[Colin Mallows]] advocated this method in the construction of his model selection statistic [[Mallows's Cp\|''C<sub>p</sub>'']], which is a normalized version of the estimated MSPE: :<math>~~n\cdot~~ C_p=\frac{\sum_{i=1}^n\left(y_i-\widehat{g}(x_i)\right)^2}{\widehat{\sigma}^2}-n+~~\operatorname{tr}\left[L\right]~~2p.</math> where ''p'' ~~comes from the fact that~~ the number of estimated parameters ''p'' ~~estimated for a parametric smoother is given by~~and <math>p=\~~operatorname~~widehat{tr\sigma}~~\left[L\right]~~^2</math>, ~~and~~is ~~''C''~~computed isfrom inthe ~~honor~~version of ~~[[Cuthbert~~the ~~Daniel]].{{citation~~model ~~needed\|date=March~~that ~~2013}}~~includes all possible regressors. That concludes this proof. ==See also== * [[Akaike information criterion]] * [[Bias-variance tradeoff]] * [[Mean squared error]] * [[Errors and residuals in statistics]] * [[Law of total variance]] * [[Mallows's Cp\|Mallows's ''C<sub>p</sub>'']] * [[Model selection]] == References == {{reflist}} {{Machine learning evaluation metrics}} {{DEFAULTSORT:Mean Squared Prediction Error}}