Content deleted Content added
Kaltenmeyer (talk | contribs) |
→Weight function: rewrite; additional considerations |
||
Line 119:
As mentioned above, the weight function gives the most weight to the data points nearest the point of estimation and the least weight to the data points that are furthest away. The use of the weights is based on the idea that points near each other in the explanatory variable space are more likely to be related to each other in a simple way than points that are further apart. Following this logic, points that are likely to follow the local model best influence the local model parameter estimates the most. Points that are less likely to actually conform to the local model have less influence on the local model [[Parameter#Statistics|parameter]] [[Statistical estimation|estimates]].
Cleveland (1979)<ref name="cleve79" /> sets out four requirements for the weight function:
The traditional weight function used for LOESS is the [[Kernel (statistics)#Kernel functions in common use|tri-cube weight function]],▼
# Non-negative: <math>
# Symmetry: <math>W(-x) = W(x)</math>.
# Monotone: <math>W(x)</math> is a nonincreasing function for <math>x \ge 0</math>.
# Bounded support: <math>W(x)=0</math> for <math>|x| \ge 1</math>.
Asymptotic efficiency of weight functions has been considered by [[V. A. Epanechnikov]] (1969)<ref>{{citeQ|Q57308723}}</ref> in the context of kernel density estimation; J. Fan (1993)<ref>{{citeQ|Q132691957}}</ref> has derived similar results for local regression. They conclude that the quadratic kernel, <math>W(x) = 1-x^2</math> for <math>|x|\le1</math> has greatest efficiency under a mean-squared-error loss function. See [[Kernel (statistics)#Kernel functions in common use|"kernel functions in common use"]] for more discussion of different kernels and their efficiencies.
Considerations other than MSE are also relevant to the choice of weight function. Smoothness properties of <math>W(x)</math> directly affect smoothness of the estimate <math>\hat\mu(x)</math>. In particular, the quadaratic kernel is not differentiable at <math>x=\pm 1</math>, and <math>\hat\mu(x)</math> is not differentiable as a result.
▲The
<math display="block">W(x) = (1 - |x|^3)^3; |x|<1</math>
has been used in LOWESS and other local regression software; this combines higher-order differentiability with a high MSE efficiency.
One criticism of weight functions with bounded support is that they can lead to numerical problems (i.e. an unstable or singular design matrix) when fitting in regions with sparse data. For this reason, some authors choose to use the Gaussian kernel, or others with unbounded support.
===Choice of Fitting Criterion===
|