Local regression: Difference between revisions

Content deleted Content added
Zaqrfv (talk | contribs)
m fixing a citation
Zaqrfv (talk | contribs)
Localized subsets of data; Bandwidth: adding paragraph about adaptive smoothers.
Line 99:
Any of these criteria can be minimized to produce an automatic bandwidth selector. Cleveland and Devlin<ref name="clevedev" /> prefer a graphical method (the ''M''-plot) to visually display the bias-variance trade-off and guide bandwidth choice.
 
TheOne subsetsquestion ofnot dataaddressed usedabove foris, eachhow weightedshould leastthe squaresbandwidth fitdepend inupon LOESSthe arefitting determinedpoint by<math>x</math>? Often a nearestconstant neighborsbandwidth algorithm.is Aused, user-specifiedwhile inputLOWESS toand theLOESS procedureprefer calleda thenearest-neighbor "bandwidth", ormeaning "smoothing''h'' parameter"is determinessmaller howin muchregions ofwith themany data ispoints. usedFormally, to fit each local polynomial. Thethe smoothing parameter, <math>\alpha</math>, is the fraction of the total number ''n'' of data points that are used in each local fit. The subset of data used in each weighted least squares fit thus comprises the <math>n\alpha</math> points (rounded to the next largest integer) whose explanatory variables' values are closest to the point at which the response is being estimated.<ref name="NIST">NIST, [http://www.itl.nist.gov/div898/handbook/pmd/section1/pmd144.htm "LOESS (aka LOWESS)"], section 4.1.4.4, ''NIST/SEMATECH e-Handbook of Statistical Methods,'' (accessed 14 April 2017)</ref>
 
More sophisticated methods attempt to choose the bandwidth ''adaptively''; that is, choose a bandwidth at each fitting point <math>x</math> by applying criteria such as cross-validation locally within the smoothing window. An early example of this is [[Jerome H. Friedman]]'s<ref>{{cite|first=Jerome H.|last=Friedman|title=A Variable Span Smoother|date=October 1984|publisher=Technical report, Laboratory for Computational Statistics LCS 5; SLAC PUB-3466|doi=10.2171/1447470|url=http://www.slac.stanford.edu/cgi-wrap/getdoc/slac-pub-3477.pdf}}</ref> "supersmoother", which uses cross-validation to choose among local linear fits at different bandwidths.
Since a polynomial of degree ''k'' requires at least ''k''&nbsp;+&nbsp;1 points for a fit, the smoothing parameter <math>\alpha</math> must be between <math>\left(\lambda+1\right)/n</math> and 1, with <math>\lambda</math> denoting the degree of the local polynomial.
 
<math>\alpha</math> is called the smoothing parameter because it controls the flexibility of the LOESS regression function. Large values of <math>\alpha</math> produce the smoothest functions that wiggle the least in response to fluctuations in the data. The smaller <math>\alpha</math> is, the closer the regression function will conform to the data. Using too small a value of the smoothing parameter is not desirable, however, since the regression function will eventually start to capture the random error in the data.
 
===Degree of local polynomials===