Local regression: Difference between revisions

Content deleted Content added
Zaqrfv (talk | contribs)
Localized subsets of data: working on it. TBC.
Zaqrfv (talk | contribs)
Localized subsets of data; Bandwidth: small fixes. still TBC.
Line 77:
===Localized subsets of data; Bandwidth===
 
The bandwidth <math>h</math> controls the resolution of the local regression estimate. If <math>''h</math>'' is too small, the estimate may show high-resolution features that represent noise in the data, rather than any real structure in the mean function. Conversely, if <math>''h</math>'' is too large, the estimate will only show low-resolution features, and important structure may be lost. This is the ''bias-variance tradeoff''; if <math>''h</math>'' is too small, the estimate exhibits large variation; while at large <math>''h</math>'', the estimate exhibits large bias.
 
Careful choice of bandwidth is therefore crucial when applying local regression. Mathematical methods for bandwidth selection require, firstly, formal criteria to assess the performance of an estimate. One such criterion is prediction error: if a new observation is made at <math>\tilde x</math>, how well does the estimate <math>\hat\mu(\tilde x)</math> predict the new response <math>\tilde Y</math>?
Line 94:
 
In global bandwidth selection, these measures can be integrated over the <math>x</math> space ("mean integrated squared error", often used in theoretical work), or averaged over the actual <math>x_i</math> (more useful for practical implementations). Some standard techniques from model selection can be readily adapted to local regression:
# [[cross-validation (statistics)|Cross Validation]], which estimates the mean-squared prediction error.
# [[MallowsMallow's <math>C_p</math>Cp]] and [[Akaike's Information Criterion]], which estimate mean squared estimation error.
# Other methods which attempt to estimate bias and variance variance components of the estimation error directly.
Any of these criteria can be minimized to produce an automatic bandwidth selector. Cleveland and Devlin<ref name="clevedev" /> prefer a graphical method (the ''M''-plot) to visually display the bias-variance trade-off and guide bandwidth choice.