Local regression: Difference between revisions

Content deleted Content added
Zaqrfv (talk | contribs)
Start "further reading" section
Citation bot (talk | contribs)
Altered doi-broken-date. | Use this bot. Report bugs. | #UCB_CommandLine
 
(11 intermediate revisions by 5 users not shown)
Line 9:
LOESS and LOWESS thus build on [[classical statistics|"classical" methods]], such as linear and nonlinear [[least squares regression]]. They address situations in which the classical procedures do not perform well or cannot be effectively applied without undue labor. LOESS combines much of the simplicity of linear least squares regression with the flexibility of [[Non-linear regression|nonlinear regression]]. It does this by fitting simple models to localized subsets of the data to build up a function that describes the deterministic part of the variation in the data, point by point. In fact, one of the chief attractions of this method is that the data analyst is not required to specify a global function of any form to fit a model to the data, only to fit segments of the data.
 
The trade-off for these features is increased computation. Because it is so computationally intensive, LOESS would have been practically impossible to use in the era when least squares regression was being developed. Most other modern methods for process modelingmodelling are similar to LOESS in this respect. These methods have been consciously designed to use our current computational ability to the fullest possible advantage to achieve goals not easily achieved by traditional approaches.
 
A smooth curve through a set of data points obtained with this statistical technique is called a ''loess curve'', particularly when each smoothed value is given by a weighted quadratic least squares regression over the span of values of the ''y''-axis [[scattergram]] criterion variable. When each smoothed value is given by a weighted linear least squares regression over the span, this is known as a ''lowess curve.''; howeverHowever, some authorities treat ''lowess'' and loess as synonyms.<ref>Kristen Pavlik, US Environmental Protection Agency, ''[https://19january2021snapshot.epa.gov/sites/static/files/2016-07/documents/loess-lowess.pdf Loess (or Lowess)]'', ''Nutrient Steps'', July 2016.</ref><ref name="NIST"/>
 
==History==
Line 20:
:<math>
\begin{align}
\sum_{j=-h}^h ( a + b j + c j^2 + d j^3) W_xW_j &= \sum_{j=-h}^h W_j Y_j \\
\sum_{j=-h}^h ( aj + b j^2 + c j^3 + d j^4) W_xW_j &= \sum_{j=-h}^h j W_j Y_j \\
\sum_{j=-h}^h ( aj^2 + b j^3 + c j^4 + d j^5) W_xW_j &= \sum_{j=-h}^h j^2 W_j Y_j \\
\sum_{j=-h}^h ( aj^3 + b j^4 + c j^5 + d j^6) W_xW_j &= \sum_{j=-h}^h j^3 W_j Y_j
\end{align}
</math>
Solving these equations for the polynomial coefficients yields the graduated value, <math>\hat Y_0 = a</math>.
 
Henderson went further. In preceding years, many 'summation formula' methods of graduation had been developed, which derived graduation rules based on summation formulae (convolution of the series of obeservations with a chosen set of weights). Two such rules are the 15-point and 21-point rules of [[John Spencer (Actuary)|Spencer]] (1904).<ref>{{citeQ|Q127775139}}</ref>. These graduation rules were carefully designed to have a quadratic-reproducing property: If the ungraduated values happen to be exactly follow a quadratic formula, then the graduated values equal the ungraduated values. This is an important property: a simple moving average, by contrast, cannot adequately model peaks and troughs in the data. Henderson's insight was to show that ''any'' such graduation rule can be represented as a local cubic (or quadratic) fit for an appropriate choice of weights.
 
Further discussions of the historical work on graduation and local polynomial fitting can be found in [[Frederick Macaulay|MaculayMacaulay]] (1931),<ref name="mac1931">{{citeQ|Q134465853}}</ref>, [[William S. Cleveland|Cleveland]] and [[Catherine Loader|Loader]] (1995);<ref name="slrpm">{{cite QciteQ|Q132138257}}</ref> and [[Lori Murray|Murray]] and [[David Bellhouse (statistician)|Bellhouse]] (2019).<ref>{{cite Q|Q127772934}}</ref> discuss more of the historical work on graduation.
 
The [[Savitzky-Golay filter]], introduced by [[Abraham Savitzky]] and [[Marcel J. E. Golay]] (1964)<ref>{{cite Q|Q56769732}}</ref> significantly expanded the method. Like the earlier graduation work, their focus was data with an equally-spaced predictor variable, where (excluding boundary effects) local regression can be represented as a [[convolution]]. Savitzky and Golay published extensive sets of convolution coefficients for different orders of polynomial and smoothing window widths.
 
Local regression methods started to appear extensively in statistics literature in the 1970s; for example, [[Charles Joel Stone|Charles J. Stone]] (1977),<ref>{{cite Q|Q56533608}}</ref> [[Vladimir Katkovnik]] (1979)<ref>{{citation |first=Vladimir|last=Katkovnik|title=Linear and nonlinear methods of nonparametric regression analysis|journal=Soviet Automatic Control|date=1979|volume=12|issue=5|pages=25–34}}</ref> and [[William S. Cleveland]] (1979).<ref name="cleve79">{{cite Q|Q30052922}}</ref> Katkovnik (1985)<ref name="katbook">{{cite QciteQ|Q132129931}}</ref> is the earliest book devoted primarily to local regression methods.
 
Theoretical work continued to appear throughout the 1990s. Important contributions include [[Jianqing Fan]] and [[Irène Gijbels]] (1992)<ref>{{cite Q|Q132202273}}</ref> studying efficiency properties, and [[David Ruppert]] and [[Matthew P. Wand]] (1994)<ref>{{cite Q|Q132202598}}</ref> developing an asymptotic distribution theory for multivariate local regression.
Line 115:
One question not addressed above is, how should the bandwidth depend upon the fitting point <math>x</math>? Often a constant bandwidth is used, while LOWESS and LOESS prefer a nearest-neighbor bandwidth, meaning ''h'' is smaller in regions with many data points. Formally, the smoothing parameter, <math>\alpha</math>, is the fraction of the total number ''n'' of data points that are used in each local fit. The subset of data used in each weighted least squares fit thus comprises the <math>n\alpha</math> points (rounded to the next largest integer) whose explanatory variables' values are closest to the point at which the response is being estimated.<ref name="NIST">NIST, [http://www.itl.nist.gov/div898/handbook/pmd/section1/pmd144.htm "LOESS (aka LOWESS)"], section 4.1.4.4, ''NIST/SEMATECH e-Handbook of Statistical Methods,'' (accessed 14 April 2017)</ref>
 
More sophisticated methods attempt to choose the bandwidth ''adaptively''; that is, choose a bandwidth at each fitting point <math>x</math> by applying criteria such as cross-validation locally within the smoothing window. An early example of this is [[Jerome H. Friedman]]'s<ref>{{citation|first=Jerome H.|last=Friedman|title=A Variable Span Smoother|date=October 1984|publisher=Technical report, Laboratory for Computational Statistics LCS 5; SLAC PUB-3466|doi=10.2171/1447470|doi-broken-date=201 MarchJuly 2025 |url=http://www.slac.stanford.edu/cgi-wrap/getdoc/slac-pub-3477.pdf}}</ref> "supersmoother", which uses cross-validation to choose among local linear fits at different bandwidths.
 
===Degree of local polynomials===
Line 189:
</math>
 
An asymptotic theory for local likelihood estimation is developed in J. Fan, [[Nancy E. Heckman]] and M.P.Wand (1995);<ref>{{cite Q|Q132508409}}</ref> the book Loader (1999)<ref name="loabook">{{cite QciteQ|Q59410587}}</ref> discusses many more applications of local likelihood.
 
====Robust local regression====
Line 199:
\right )
</math>
where <math>\rho(\cdot)</math> is a robustness function and <math>s</math> is a scale parameter. Discussion of the merits of different choices of robustness function is best left to the [[robust regression]] literature. The scale parameter <math>s</math> must also be estimated. References for local M-estimation include Katkovnik (1985)<ref name="katbook">{{cite QciteQ|Q132129931}}</ref> and [[Alexandre Tsybakov]] (1986).<ref>{{citation |first=Alexandre B.|last=Tsybakov|title=Robust reconstruction of functions by the local-approximation method.|journal=Problems of Information Transmission|volume=22|pages=133–146}}</ref>
 
The robustness iterations in LOWESS and LOESS correspond to the robustness function defined by
Line 225:
Finally, as discussed above, LOESS is a computationally intensive method (with the exception of evenly spaced data, where the regression can then be phrased as a non-causal [[finite impulse response]] filter). LOESS is also prone to the effects of outliers in the data set, like other least squares methods. There is an iterative, [[robust statistics|robust]] version of LOESS [Cleveland (1979)] that can be used to reduce LOESS' sensitivity to [[outliers]], but too many extreme outliers can still overcome even the robust method.
 
==Further Readingreading==
 
Books substantially covering to local regression and extensions:
* Macaulay (1931) "The Smoothing of Time Series",<ref name="mac1931">{{citeQ|Q134465853}}</ref>, discusses graduation methods with several chapters related to local polynomial fitting.
* Katkovnik (1985) "Nonparametric Identification and Smoothing of Data"<ref name="katbook">{{citeQ|Q132129931}}</ref> in Russian.
* Fan and Gijbels (1996) "Local Polynomial Modelling and Its Applications".<ref>{{citeQ|Q134377589}}</ref>
Line 235:
 
Book chapters, Reviews:
* "Smoothing by Local Regression: Principles and Methods"<ref name="slrpm">{{citeQ|Q132138257}}</ref>
* "Local Regression and Likelihood", Chapter 13 of ''Observed Brain Dynamics'', Mitra and Bokil (2007)<ref>{{citeQ|Q57575432}}</ref>
* [[Rafael Irizarry (scientist)|Rafael Irizarry]], "Local Regression". Chapter 3 of "Applied Nonparametric and Modern Statistics".<ref>{{cite web|last=Irizarry|first=Rafael|title=Applied Nonparametric and Modern Statistics|url=https://rafalab.dfci.harvard.edu/pages/754/|access-date=2025-05-16}}</ref>
 
==See also==