Revision as of 22:42, 12 February 2025 edit Zaqrfv (talk \| contribs) 289 edits →History: another reference. ← Previous edit		Revision as of 23:28, 12 February 2025 edit undo Zaqrfv (talk \| contribs) 289 edits m →Model definition: fix typos, formatting. Next edit →
Line 32: ==Model definition== Local regression uses a [[data set]] consisting of observations one or more ~~`independent'~~‘independent’ or ~~`predictor'~~‘predictor’ variables, and a ~~`dependent'~~‘dependent’ or ~~`response'~~‘response’ variable. The dataset will consist of a number <math>n</math> observations. The observations of the predictor variable can be denoted <math>x_1,\ldots,x_n</math>, and corresponding observations of the response variable by <math>Y_1,\ldots,Y_n</math>. For ease of presentation, the development below assumes a single predictor variable; the extension to multiple predictors (when the <math>x_i</math> are vectors) is conceptually straightforward. A functional relationship between the predictor and response variables is assumed: :<math display="block">Y_i = \mu(x_i) + \epsilon_i</math> where <math>\mu(x)</math> is the unknown ~~`smooth'~~‘smooth’ regression function to be estimated, and represents the conditional expectation of the response, given a value of the predictor variables. In theoretical work, the ~~`smoothness'~~‘smoothness’ of this function can be formally characterized by placing bounds on higher order derivatives. The <math>\epsilon_i</math> represents random error; for estimation purposes these are assumed to have [[mean]] zero. Stronger assumptions (eg, [[independence (probability theory)\|independence]] and equal [[variance]]) may be made when assessing properties of the estimates. Local regression then estimates the function <math>\mu(x)</math>, for one value of <math>x</math> at a time. Since the function is assumed to be smooth, the most informative data points are those whose <math>x_i</math> values are close to <math>x</math>. This is formalized with a bandwidth <math>h</math> and a [[kernel (statistics)\|kernel]] or weight function <math>W(\cdot)</math>, with observations assigned weights : <math display="block">w_i(x) = W\left ( \frac{x_i-x}{h} \right )</math>. A typical choice of <math>W</math>, used by Cleveland in LOWESS, is <math>W(u) = (1-\|u\|^3)^3</math> for <math>\|u\|<1</math>, although any similar function (peaked at <math>u=0</math> and small or 0 for large values of <math>u</math>) can be used. Questions of bandwidth selection and specification (how large should <math>h</math> be, and should it vary depending upon the fitting point <math>x</math>?) are deferred for now. A local model (usually a low-order polynomial with degree <math>p \le 3</math>), expressed as :<math display="block">\mu(x_i) \approx \beta_0 + \beta_1(x_i-x) + \ldots + \beta_p(x_i-x)^p</math> is then fitted by [[weighted least squares]]: choose regression coefficients <math>(\hat \beta_0,\ldots,\hat\beta_p)</math> to minimize :<math display="block"> \sum_{i=1}^n w_i(x) \left ( Y_i - \beta_0 - \beta_1(x_i-x) - \ldots - \beta_p(x_i-x)^p \right )^2. </math> The local regresssion estimate of <math>\mu(x)</math> is then simply the intercept estimate: :<math display="block">\hat\mu(x) = \hat\beta_0</math> while the remaining coefficients can be interpreted (up to a factor of <math>p!</math>) as derivative estimates.

Local regression: Difference between revisions