Local regression: Difference between revisions

Content deleted Content added
Zaqrfv (talk | contribs)
History: another reference.
Zaqrfv (talk | contribs)
m Model definition: fix typos, formatting.
Line 32:
==Model definition==
 
Local regression uses a [[data set]] consisting of observations one or more `independent'‘independent’ or `predictor'‘predictor’ variables, and a `dependent'‘dependent’ or `response'‘response’ variable. The dataset will consist of a number <math>n</math> observations. The observations of the predictor variable can be denoted <math>x_1,\ldots,x_n</math>, and corresponding observations of the response variable by <math>Y_1,\ldots,Y_n</math>.
 
For ease of presentation, the development below assumes a single predictor variable; the extension to multiple predictors (when the <math>x_i</math> are vectors) is conceptually straightforward. A functional relationship between the predictor and response variables is assumed:
:<math display="block">Y_i = \mu(x_i) + \epsilon_i</math>
where <math>\mu(x)</math> is the unknown `smooth'‘smooth’ regression function to be estimated, and represents the conditional expectation of the response, given a value of the predictor variables. In theoretical work, the `smoothness'‘smoothness’ of this function can be formally characterized by placing bounds on higher order derivatives. The <math>\epsilon_i</math> represents random error; for estimation purposes these are assumed to have [[mean]] zero. Stronger assumptions (eg, [[independence (probability theory)|independence]] and equal [[variance]]) may be made when assessing properties of the estimates.
 
Local regression then estimates the function <math>\mu(x)</math>, for one value of <math>x</math> at a time. Since the function is assumed to be smooth, the most informative data points are those whose <math>x_i</math> values are close to <math>x</math>. This is formalized with a bandwidth <math>h</math> and a [[kernel (statistics)|kernel]] or weight function <math>W(\cdot)</math>, with observations assigned weights
: <math display="block">w_i(x) = W\left ( \frac{x_i-x}{h} \right )</math>.
A typical choice of <math>W</math>, used by Cleveland in LOWESS, is <math>W(u) = (1-|u|^3)^3</math> for <math>|u|<1</math>, although any similar function (peaked at <math>u=0</math> and small or 0 for large values of <math>u</math>) can be used. Questions of bandwidth selection and specification (how large should <math>h</math> be, and should it vary depending upon the fitting point <math>x</math>?) are deferred for now.
 
A local model (usually a low-order polynomial with degree <math>p \le 3</math>), expressed as
:<math display="block">\mu(x_i) \approx \beta_0 + \beta_1(x_i-x) + \ldots + \beta_p(x_i-x)^p</math>
is then fitted by [[weighted least squares]]: choose regression coefficients
<math>(\hat \beta_0,\ldots,\hat\beta_p)</math> to minimize
:<math display="block">
\sum_{i=1}^n w_i(x) \left ( Y_i - \beta_0 - \beta_1(x_i-x) - \ldots - \beta_p(x_i-x)^p \right )^2.
</math>
The local regresssion estimate of <math>\mu(x)</math> is then simply the intercept estimate:
:<math display="block">\hat\mu(x) = \hat\beta_0</math>
while the remaining coefficients can be interpreted
(up to a factor of <math>p!</math>) as derivative estimates.