Local regression: Difference between revisions

Content deleted Content added
Zaqrfv (talk | contribs)
section on fitting criteria/robustness. needs more work.
Zaqrfv (talk | contribs)
Localized subsets of data: working on it. TBC.
Line 75:
Each of these components has been the subject of extensive study; a summary is provided below.
 
===Localized subsets of data; Bandwidth===
 
The bandwidth <math>h</math> controls the resolution of the local regression estimate. If <math>h</math> is too small, the estimate may show high-resolution features that represent noise in the data, rather than any real structure in the mean function. Conversely, if <math>h</math> is too large, the estimate will only show low-resolution features, and important structure may be lost. This is the ''bias-variance tradeoff''; if <math>h</math> is too small, the estimate exhibits large variation; while at large <math>h</math>, the estimate exhibits large bias.
 
Careful choice of bandwidth is therefore crucial when applying local regression. Mathematical methods for bandwidth selection require, firstly, formal criteria to assess the performance of an estimate. One such criterion is prediction error: if a new observation is made at <math>\tilde x</math>, how well does the estimate <math>\hat\mu(\tilde x)</math> predict the new response <math>\tilde Y</math>?
 
Performance is often assessed using a squared-error loss function. The mean squared prediction error is
<math display="block">
\begin{align}
E \left( \tilde Y - \hat\mu(\tilde x) \right )^2
&= E \left ( \tilde Y - \mu(x) + \mu(x) - \hat\mu(\tilde x) \right )^2 \\
&= E \left (\tilde Y - \mu(x) \right )^2
+ E \left ( \mu(x)-\hat\mu(\tilde x) \right )^2.
\end{align}
</math>
The first term <math>E \left (\tilde Y - \mu(x) \right )^2</math> is the random variation of the observation; this is entirely independent of the local regression estimate. The second term, <math> E \left ( \mu(x)-\hat\mu(\tilde x) \right )^2</math>
is the mean squared estimation error. This relation shows that, for squared error loss, minimizing prediction error and estimation error are equivalent problems.
 
In global bandwidth selection, these measures can be integrated over the <math>x</math> space ("mean integrated squared error", often used in theoretical work), or averaged over the actual <math>x_i</math> (more useful for practical implementations). Some standard techniques from model selection can be readily adapted to local regression:
# [[Cross Validation]], which estimates the mean-squared prediction error.
# [[Mallows' <math>C_p</math>]] and [[Akaike's Information Criterion]], which estimate mean squared estimation error.
# Other methods which attempt to estimate bias and variance variance components of the estimation error directly.
Any of these criteria can be minimized to produce an automatic bandwidth selector. Cleveland and Devlin<ref name="clevedev" /> prefer a graphical method (the ''M''-plot) to visually display the bias-variance trade-off and guide bandwidth choice.
 
The subsets of data used for each weighted least squares fit in LOESS are determined by a nearest neighbors algorithm. A user-specified input to the procedure called the "bandwidth" or "smoothing parameter" determines how much of the data is used to fit each local polynomial. The smoothing parameter, <math>\alpha</math>, is the fraction of the total number ''n'' of data points that are used in each local fit. The subset of data used in each weighted least squares fit thus comprises the <math>n\alpha</math> points (rounded to the next largest integer) whose explanatory variables' values are closest to the point at which the response is being estimated.<ref name="NIST">NIST, [http://www.itl.nist.gov/div898/handbook/pmd/section1/pmd144.htm "LOESS (aka LOWESS)"], section 4.1.4.4, ''NIST/SEMATECH e-Handbook of Statistical Methods,'' (accessed 14 April 2017)</ref>