Content deleted Content added
→History: adding some more references. |
→Model definition: Expand/rewrite. much more to do. |
||
Line 32:
==Model definition==
Local regression uses a [[data set]] consistiing of observations one or more `independent' or `predictor' variables, and a `dependent' or `response' variable. The dataset will consist of a number <math>n</math> observations. The observations of the predictor variable can be denoted <math>x_1,\ldots,x_n</math>, and corresponding observations of the response variable by <math>Y_1,\ldots,Y_n</math>.
For ease of presentation, the development below assumes a single predictor variable; the extension to multiple predictors (when the <math>x_i</math> are vectors) is conceptually straightforward. A functional relationship between the predictor and response variables is assumed:
:<math>Y_i = \mu(x_i) + \epsilon_i</math>
where <math>\mu(x)</math> is the unknown `smooth' regression function to be estimated, and represents the conditional expectation of the response, given a value of the predictor variables. In theoretical work, the `smoothness' of this function can be formally characterized by placing bounds on higher order derivatives. The <math>\epsilon_i</math> represents random error; for estimation purposes these are assumed to have [[mean]] zero. Stronger assumptions (eg, [[independence (probability theory)|independence]] and equal [[variance]]) may be made when assessing properties of the estimates.
Local regression then estimates the function <math>\mu(x)</math>, for one value of <math>x</math> at a time. Since the function is assumed to be smooth, the most informative data points are those whose <math>x_i</math> values are close to <math>x</math>. This is formalized with a bandwidth <math>h</math> and a [[kernel (statistics)|kernel]] or weight function <math>W(\cdot)</math>, with observations assigned weights
: <math>w_i(x) = W\left ( \frac{x_i-x}{h} \right )</math>.
A typical choice of <math>W</math>, used by Cleveland in LOWESS, is <math>W(u) = (1-|u|^3)^3</math> for <math>|u|<1</math>, although any similar function (peaked at <math>u=0</math> and small or 0 for large values of <math>u</math>) can be used. Questions of bandwidth selection and specification (how large should <math>h</math> be, and should it vary depending upon the fitting point <math>x</math>?) are deferred for now.
A local model (usually a low-order polynomial with degree <math>p \le 3</math>), expressed as
:<math>\mu(x_i) \approx \beta_0 + \beta_1(x_i-x) + \ldots \beta_p(x_i-x)^p</math>
is then fitted by [[weighted least squares]]: choose regression coefficients
<math>(\hat \beta_0,\ldots,\beta_p)</math> to minimize
:<math>
\sum_{i=1}^n w_i(x) \left ( Y_i - \beta_0 - \beta_1(x_i-x) - \ldots \beta_p(x_i-x)^p \right )^2.
</math>
The local regresssion estimate of <math>\mu(x)</math> is then simply the intercept estimate:
:<math>\hat\mu(x) = \hat\beta_0</math>
while the remaining coefficients can be interpreted
(up to a factor of <math>p!</math>) as derivative estimates.
It is to be emphasized that the above procedure produces the estimate <math>\hat\mu(x)</math> for one value of <math>x</math>. When considering a new value of <math>x</math>, a new set of weights <math>w_i(x)</math> must be computed, and the regression coefficient estimated afresh.
===Matrix Representation of the Local Regression Estimate===
As with all least squares estimates, the estimated regression coefficients can be expressed in closed form (see [[Weighted least squares]] for details):
<math display="block">\hat{\boldsymbol{\beta}} = (\mathbf{X^\textsf{T} W X})^{-1} \mathbf{X^\textsf{T} W} \mathbf{y} </math>
where <math>\hat{\boldsymbol{\beta}}</math> is a vector of the local regression coefficients;
<math>\mathbf{X}</math> is the <math>n \times (p+1)</math> [[design matrix]] with entries <math>(x_i-x)^j</math>; <math>\mathbf{W}</math> is a diagonal matrix of the smoothing weights <math>w_i(x)</math>; and <math>\mathbf{y}</math> is a vector of the responses <math>Y_i</math>.
This matrix representation is crucial for studying the theoretical properties of local regression estimates. With appropriate definitions of the design and weight matrices, it immediately generalizes to the multiple-predictor setting.
==Selection Issues: Bandwidth, local model, fittinge criteria==
Implementation of local regression requires specification and selection of several components:
# The bandwidth, and more generally the localized subsets of the data.
# The degree of local polynomial, or more generally, the form of the local model.
# The choice of weight function <math>W(\cdot)</math>.
# The choice of fitting criterion (least sqaures or something else).
Each of these components has been the subject of extensive study; a summary is provided below.
===Localized subsets of data===
|