Content deleted Content added
Added Silverman's rule of thumb |
m WP:CHECKWIKI error fix for #61. Punctuation goes before References. Do general fixes if a problem exists. - using AWB (8855) |
||
Line 1:
{{Merge to|Kernel density estimation|date=September 2010}}
[[Kernel density estimation]] is a [[nonparametric]] technique for [[density estimation]] i.e., estimation of [[probability density function]]s, which is one of the fundamental questions in [[statistics]]. It can be viewed as a generalisation of [[histogram]] density estimation with improved statistical properties. Apart from histograms, other types of density estimators include [[parametric statistics
==Motivation==
Line 25:
* {{nowrap|''K''<sub>'''H'''</sub>('''x''') {{=}} {{!}}'''H'''{{!}}<sup>−1/2</sup> ''K''('''H'''<sup>−1/2</sup>'''x''')}}.
The choice of the kernel function ''K'' is not crucial to the accuracy of kernel density estimators, so we use the standard [[multivariate normal distribution|multivariate normal]] kernel throughout: {{nowrap|''K''('''x''') {{=}} (2''π'')<sup>−''d''/2</sup> exp(−{{frac|2}}'''x'''<sup>''T''</sup>'''x''')}}. Whereas the choice of the bandwidth matrix <strong>H</strong> is the single most important factor affecting its accuracy since it controls the amount of and orientation of smoothing induced.<ref name="WJ1995">{{Cite book| author1=Wand, M.P | author2=Jones, M.C. | title=Kernel Smoothing | publisher=Chapman & Hall/CRC | ___location=London | year=1995 | isbn = 0-412-55270-1}}</ref>{{rp|36–39}} That the bandwidth matrix also induces an orientation is a basic difference between multivariate kernel density estimation from its univariate analogue since orientation is not defined for 1D kernels. This leads to the choice of the parametrisation of this bandwidth matrix. The three main parametrisation classes (in increasing order of complexity) are ''S'', the class of positive scalars times the identity matrix; ''D'', diagonal matrices with positive entries on the main diagonal; and ''F'', symmetric positive definite matrices. The ''S'' class kernels have the same amount of smoothing applied in all coordinate directions, ''D'' kernels allow different amounts of smoothing in each of the coordinates, and ''F'' kernels allow arbitrary amounts and orientation of the smoothing. Historically ''S'' and ''D'' kernels are the most widespread due to computational reasons, but research indicates that important gains in accuracy can be obtained using the more general ''F'' class kernels.<ref>{{cite journal | author1=Wand, M.P. | author2=Jones, M.C. | title=Comparison of smoothing parameterizations in bivariate kernel density estimation | journal=Journal of the American Statistical Association | year=1993 | volume=88 | pages=520–528 | url=http://www.jstor.org/stable/2290332}}</ref><ref name="DH2003">{{Cite journal| doi=10.1080/10485250306039 | author1=Duong, T. | author2=Hazelton, M.L. | title=Plug-in bandwidth matrices for bivariate kernel density estimation | journal=Journal of Nonparametric Statistics | year=2003 | volume=15 | pages=17–30}}</ref>
[[File:Kernel parametrisation class.png
==Optimal bandwidth matrix selection==
Line 54:
where ''o'' indicates the usual [[big O notation|small o notation]]. Heuristically this statement implies that the AMISE is a 'good' approximation of the MISE as the sample size <em>n → ∞<em>.
It can be shown that any reasonable bandwidth selector '''H''' has '''H''' = ''O(n<sup>-2/(d+4)</sup>)'' where the [[big O notation]] is applied elementwise. Substituting this into the MISE formula yields that the optimal MISE is ''O(n<sup>-4/(d+4)</sup>).''<ref name="WJ1995"/>{{rp|99-100}} Thus as ''n → ∞'', the MISE → 0, i.e. the kernel density estimate [[convergence in mean|converges in mean square]] and thus also in probability to the true density ''f''. These modes of convergence are confirmation of the statement in the motivation section that kernel methods lead to reasonable density estimators. An ideal optimal bandwidth selector is
Line 69 ⟶ 68:
where <math>\hat{\bold{\Psi}}_4 (\bold{G}) = n^{-2} \sum_{i=1}^n
\sum_{j=1}^n [(\operatorname{vec} \, \operatorname{D}^2) (\operatorname{vec}^T \operatorname{D}^2)] K_\bold{G} (\bold{X}_i - \bold{X}_j)</math>. Thus <math>\hat{\bold{H}}_{\operatorname{PI}} = \operatorname{argmin}_{\bold{H} \in F} \, \operatorname{PI} (\bold{H})</math> is the plug-in selector.<ref>{{Cite journal| author1=Wand, M.P. | author2=Jones, M.C. | title=Multivariate plug-in bandwidth selection | journal=Computational Statistics | year=1994 | volume=9 | pages=97–177}}</ref><ref name="DH2005"
===Smoothed cross validation===
Line 78 ⟶ 77:
+ K_{2\bold{G}}) (\bold{X}_i - \bold{X}_j)</math>
Thus <math>\hat{\bold{H}}_{\operatorname{SCV}} = \operatorname{argmin}_{\bold{H} \in F} \, \operatorname{SCV} (\bold{H})</math> is the SCV selector.<ref name="DH2005">{{Cite journal| doi=10.
These references also contain algorithms on optimal estimation of the pilot bandwidth matrix <strong>G</strong> and establish that <math>\hat{\bold{H}}_{\operatorname{SCV}}</math> converges in probability to '''H'''<sub>AMISE</sub>.
Line 98 ⟶ 97:
:<math>\operatorname{MSE} \, \hat{f}(\bold{x};\bold{H}) = \operatorname{Var} \hat{f}(\bold{x};\bold{H}) + [\operatorname{E} \hat{f}(\bold{x};\bold{H}) - f(\bold{x})]^2</math>
we have that the MSE tends to 0, implying that the kernel density estimator is (mean square) consistent and hence converges in probability to the true density ''f''. The rate of convergence of the MSE to 0 is the necessarily the same as the MISE rate noted previously ''O(n<sup>-4/(d+4)</sup>)'', hence the covergence rate of the density estimator to ''f'' is ''O<sub>p</sub>(n<sup>-2/(d+4)</sup>)'' where ''O<sub>p</sub>'' denotes [[Big O in probability notation
For the data-based bandwidth selectors considered, the target is the AMISE bandwidth matrix. We say that a data-based selector converges to the AMISE selector at relative rate ''O<sub>p</sub>(n<sup>-α</sup>), α > 0'' if
Line 104 ⟶ 103:
:<math>\operatorname{vec} (\hat{\bold{H}} - \bold{H}_{\operatorname{AMISE}}) = O(n^{-2\alpha}) \operatorname{vec} \bold{H}_{\operatorname{AMISE}}.</math>
It has been established that the plug-in and smoothed cross validation selectors (given a single pilot bandwidth '''G''') both converge at a relative rate of ''O<sub>p</sub>(n<sup>-2/(d+6)</sup>)'' <ref name="DH2005"/><ref>{{Cite journal| doi=10.1016/j.jmva.2004.04.004 | author1=Duong, T. | author2=Hazelton, M.L. | title=Convergence rates for unconstrained bandwidth matrix selectors in multivariate kernel density estimation | journal=Journal of Multivariate Analysis | year=2005 | volume=93 | pages=417–433}}</ref> i.e., both these data-based selectors are consistent estimators.
==Density estimation in R with a full bandwidth matrix==
[[File:Old Faithful Geyser KDE with plugin bandwidth.png|thumb|250px|alt=Old Faithful Geyser data kernel density estimate with plug-in bandwidth matrix.|Old Faithful Geyser data kernel density estimate with plug-in bandwidth matrix.]]
Line 164 ⟶ 163:
[http://www.mathworks.com/matlabcentral/fileexchange/17204 2-dimensional data].
The routine is an automatic bandwidth selection method specifically designed
for a second order Gaussian kernel
| author1 = Botev, Z.I.
| author2 = Grotowski, J.F.
Line 176 ⟶ 175:
| doi = 10.1214/10-AOS799
}}
</ref>
The figure shows the joint density estimate that results from using the automatically selected bandwidth.
Line 197 ⟶ 196:
plot(data(:,1),data(:,2),'r.','MarkerSize',5)
</pre>
===Alternative optimality criteria===
Line 218 ⟶ 215:
: <math>\operatorname{MH} (\bold{H}) = \operatorname{E} \int (\hat{f}_\bold{H} (\bold{x})^{1/2} - f(\bold{x})^{1/2})^2 \, d\bold{x} .</math>
The KL can be estimated using a cross-validation method, although KL cross-validation selectors can be sub-optimal even if it remains [[Consistent estimator
All these optimality criteria are distance based measures, and do not always correspond to more intuitive notions of closeness, so more visual criteria have been developed in response to this concern.<ref>{{cite journal | author1=Marron, J.S. | author2=Tsybakov, A. | title=Visual error criteria for qualitative smoothing | journal = Journal of the American Statistical Association | year=1996 | volume=90 | pages=499–507 | url=http://www.jstor.org/stable/2291060}}</ref>
==References==
|