Local regression: Difference between revisions

Content deleted Content added
Zaqrfv (talk | contribs)
History: adding some more references.
Zaqrfv (talk | contribs)
Model definition: Expand/rewrite. much more to do.
Line 32:
==Model definition==
 
Local regression uses a [[data set]] consistiing of observations one or more `independent' or `predictor' variables, and a `dependent' or `response' variable. The dataset will consist of a number <math>n</math> observations. The observations of the predictor variable can be denoted <math>x_1,\ldots,x_n</math>, and corresponding observations of the response variable by <math>Y_1,\ldots,Y_n</math>.
At each point in the range of the [[data set]] a low-degree [[polynomial]] is fitted to a subset of the data, with [[explanatory variable]] values near the point whose [[response variable|response]] is being estimated. The polynomial is fitted using [[weighted least squares]], giving more weight to points near the point whose response is being estimated and less weight to points further away. The value of the regression function for the point is then obtained by evaluating the local polynomial using the explanatory variable values for that data point. The LOESS fit is complete after regression function values have been computed for each of the <math>n</math> data points. Many of the details of this method, such as the degree of the polynomial model and the weights, are flexible. The range of choices for each part of the method and typical defaults are briefly discussed next.
 
For ease of presentation, the development below assumes a single predictor variable; the extension to multiple predictors (when the <math>x_i</math> are vectors) is conceptually straightforward. A functional relationship between the predictor and response variables is assumed:
:<math>Y_i = \mu(x_i) + \epsilon_i</math>
where <math>\mu(x)</math> is the unknown `smooth' regression function to be estimated, and represents the conditional expectation of the response, given a value of the predictor variables. In theoretical work, the `smoothness' of this function can be formally characterized by placing bounds on higher order derivatives. The <math>\epsilon_i</math> represents random error; for estimation purposes these are assumed to have [[mean]] zero. Stronger assumptions (eg, [[independence (probability theory)|independence]] and equal [[variance]]) may be made when assessing properties of the estimates.
 
Local regression then estimates the function <math>\mu(x)</math>, for one value of <math>x</math> at a time. Since the function is assumed to be smooth, the most informative data points are those whose <math>x_i</math> values are close to <math>x</math>. This is formalized with a bandwidth <math>h</math> and a [[kernel (statistics)|kernel]] or weight function <math>W(\cdot)</math>, with observations assigned weights
: <math>w_i(x) = W\left ( \frac{x_i-x}{h} \right )</math>.
A typical choice of <math>W</math>, used by Cleveland in LOWESS, is <math>W(u) = (1-|u|^3)^3</math> for <math>|u|<1</math>, although any similar function (peaked at <math>u=0</math> and small or 0 for large values of <math>u</math>) can be used. Questions of bandwidth selection and specification (how large should <math>h</math> be, and should it vary depending upon the fitting point <math>x</math>?) are deferred for now.
 
A local model (usually a low-order polynomial with degree <math>p \le 3</math>), expressed as
:<math>\mu(x_i) \approx \beta_0 + \beta_1(x_i-x) + \ldots \beta_p(x_i-x)^p</math>
is then fitted by [[weighted least squares]]: choose regression coefficients
<math>(\hat \beta_0,\ldots,\beta_p)</math> to minimize
:<math>
\sum_{i=1}^n w_i(x) \left ( Y_i - \beta_0 - \beta_1(x_i-x) - \ldots \beta_p(x_i-x)^p \right )^2.
</math>
The local regresssion estimate of <math>\mu(x)</math> is then simply the intercept estimate:
:<math>\hat\mu(x) = \hat\beta_0</math>
while the remaining coefficients can be interpreted
(up to a factor of <math>p!</math>) as derivative estimates.
 
It is to be emphasized that the above procedure produces the estimate <math>\hat\mu(x)</math> for one value of <math>x</math>. When considering a new value of <math>x</math>, a new set of weights <math>w_i(x)</math> must be computed, and the regression coefficient estimated afresh.
 
===Matrix Representation of the Local Regression Estimate===
 
As with all least squares estimates, the estimated regression coefficients can be expressed in closed form (see [[Weighted least squares]] for details):
<math display="block">\hat{\boldsymbol{\beta}} = (\mathbf{X^\textsf{T} W X})^{-1} \mathbf{X^\textsf{T} W} \mathbf{y} </math>
where <math>\hat{\boldsymbol{\beta}}</math> is a vector of the local regression coefficients;
<math>\mathbf{X}</math> is the <math>n \times (p+1)</math> [[design matrix]] with entries <math>(x_i-x)^j</math>; <math>\mathbf{W}</math> is a diagonal matrix of the smoothing weights <math>w_i(x)</math>; and <math>\mathbf{y}</math> is a vector of the responses <math>Y_i</math>.
 
This matrix representation is crucial for studying the theoretical properties of local regression estimates. With appropriate definitions of the design and weight matrices, it immediately generalizes to the multiple-predictor setting.
 
==Selection Issues: Bandwidth, local model, fittinge criteria==
 
Implementation of local regression requires specification and selection of several components:
# The bandwidth, and more generally the localized subsets of the data.
# The degree of local polynomial, or more generally, the form of the local model.
# The choice of weight function <math>W(\cdot)</math>.
# The choice of fitting criterion (least sqaures or something else).
 
Each of these components has been the subject of extensive study; a summary is provided below.
 
===Localized subsets of data===