Revision as of 07:44, 23 October 2012 edit 203.213.78.58 (talk) Added Silverman's rule of thumb ← Previous edit		Revision as of 22:02, 13 January 2013 edit undo BG19bot (talk \| contribs) 1,005,055 edits m WP:CHECKWIKI error fix for #61. Punctuation goes before References. Do general fixes if a problem exists. - using AWB (8855) Next edit →
Line 1: {{Merge to\|Kernel density estimation\|date=September 2010}} [[Kernel density estimation]] is a [[nonparametric]] technique for [[density estimation]] i.e., estimation of [[probability density function]]s, which is one of the fundamental questions in [[statistics]]. It can be viewed as a generalisation of [[histogram]] density estimation with improved statistical properties. Apart from histograms, other types of density estimators include [[parametric statistics \| parametric]], [[spline interpolation \|spline]], [[wavelet]] and [[Fourier series]]. Kernel density estimators were first introduced in the scientific literature for [[univariate]] data in the 1950s and 1960s<ref>{{Cite journal\| doi=10.1214/aoms/1177728190 \| last=Rosenblatt \| first=M.\| title=Remarks on some nonparametric estimates of a density function \| journal=Annals of Mathematical Statistics \| year=1956 \| volume=27 \| pages=832–837}}</ref><ref>{{Cite journal\| doi=10.1214/aoms/1177704472\| last=Parzen \| first=E.\| title=On estimation of a probability density function and mode \| journal=Annals of Mathematical Statistics\| year=1962 \| volume=33 \| pages=1065–1076}}</ref> and subsequently have been widely adopted. It was soon recognised that analogous estimators for multivariate data would be an important addition to [[multivariate statistics]]. Based on research carried out in the 1990s and 2000s, multivariate kernel density estimation has reached a level of maturity comparable to their univariate counterparts.<ref name="simonoff1996">{{Cite book\| author=Simonoff, J.S. \| title=Smoothing Methods in Statistics \| publisher=Springer \| year=1996 \| isbn=0-387-94716-7}}</ref> ==Motivation== Line 25: * {{nowrap\|''K''<sub>'''H'''</sub>('''x''') {{=}} {{!}}'''H'''{{!}}<sup>−1/2</sup> ''K''('''H'''<sup>−1/2</sup>'''x''')}}. The choice of the kernel function ''K'' is not crucial to the accuracy of kernel density estimators, so we use the standard [[multivariate normal distribution\|multivariate normal]] kernel throughout: {{nowrap\|''K''('''x''') {{=}} (2''π'')<sup>−''d''/2</sup> exp(−{{frac\|2}}'''x'''<sup>''T''</sup>'''x''')}}. Whereas the choice of the bandwidth matrix <strong>H</strong> is the single most important factor affecting its accuracy since it controls the amount of and orientation of smoothing induced.<ref name="WJ1995">{{Cite book\| author1=Wand, M.P \| author2=Jones, M.C. \| title=Kernel Smoothing \| publisher=Chapman & Hall/CRC \| ___location=London \| year=1995 \| isbn = 0-412-55270-1}}</ref>{{rp\|36–39}} That the bandwidth matrix also induces an orientation is a basic difference between multivariate kernel density estimation from its univariate analogue since orientation is not defined for 1D kernels. This leads to the choice of the parametrisation of this bandwidth matrix. The three main parametrisation classes (in increasing order of complexity) are ''S'', the class of positive scalars times the identity matrix; ''D'', diagonal matrices with positive entries on the main diagonal; and ''F'', symmetric positive definite matrices. The ''S'' class kernels have the same amount of smoothing applied in all coordinate directions, ''D'' kernels allow different amounts of smoothing in each of the coordinates, and ''F'' kernels allow arbitrary amounts and orientation of the smoothing. Historically ''S'' and ''D'' kernels are the most widespread due to computational reasons, but research indicates that important gains in accuracy can be obtained using the more general ''F'' class kernels.<ref>{{cite journal \| author1=Wand, M.P. \| author2=Jones, M.C. \| title=Comparison of smoothing parameterizations in bivariate kernel density estimation \| journal=Journal of the American Statistical Association \| year=1993 \| volume=88 \| pages=520–528 \| url=http://www.jstor.org/stable/2290332}}</ref><ref name="DH2003">{{Cite journal\| doi=10.1080/10485250306039 \| author1=Duong, T. \| author2=Hazelton, M.L. \| title=Plug-in bandwidth matrices for bivariate kernel density estimation \| journal=Journal of Nonparametric Statistics \| year=2003 \| volume=15 \| pages=17–30}}</ref>. [[File:Kernel parametrisation class.png \|thumb\|center\|500px\|alt=Comparison of the three main bandwidth matrix parametrisation classes. Left. S positive scalar times the identity matrix. Centre. D diagonal matrix with positive entries on the main diagonal. Right. F symmetric positive definite matrix.\|Comparison of the three main bandwidth matrix parametrisation classes. Left. ''S'' positive scalar times the identity matrix. Centre. ''D'' diagonal matrix with positive entries on the main diagonal. Right. ''F'' symmetric positive definite matrix.]] ==Optimal bandwidth matrix selection== Line 54: where ''o'' indicates the usual [[big O notation\|small o notation]]. Heuristically this statement implies that the AMISE is a 'good' approximation of the MISE as the sample size <em>n → ∞<em>. It can be shown that any reasonable bandwidth selector '''H''' has '''H''' = ''O(n<sup>-2/(d+4)</sup>)'' where the [[big O notation]] is applied elementwise. Substituting this into the MISE formula yields that the optimal MISE is ''O(n<sup>-4/(d+4)</sup>).''<ref name="WJ1995"/>{{rp\|99-100}} Thus as ''n → ∞'', the MISE → 0, i.e. the kernel density estimate [[convergence in mean\|converges in mean square]] and thus also in probability to the true density ''f''. These modes of convergence are confirmation of the statement in the motivation section that kernel methods lead to reasonable density estimators. An ideal optimal bandwidth selector is Line 69 ⟶ 68: where <math>\hat{\bold{\Psi}}_4 (\bold{G}) = n^{-2} \sum_{i=1}^n \sum_{j=1}^n [(\operatorname{vec} \, \operatorname{D}^2) (\operatorname{vec}^T \operatorname{D}^2)] K_\bold{G} (\bold{X}_i - \bold{X}_j)</math>. Thus <math>\hat{\bold{H}}_{\operatorname{PI}} = \operatorname{argmin}_{\bold{H} \in F} \, \operatorname{PI} (\bold{H})</math> is the plug-in selector.<ref>{{Cite journal\| author1=Wand, M.P. \| author2=Jones, M.C. \| title=Multivariate plug-in bandwidth selection \| journal=Computational Statistics \| year=1994 \| volume=9 \| pages=97–177}}</ref><ref name="DH2005">< /~~ref~~> These references also contain algorithms on optimal estimation of the pilot bandwidth matrix <strong>G</strong> and establish that <math>\hat{\bold{H}}_{\operatorname{PI}}</math> [[convergence in probability\|converges in probability]] to '''H'''<sub>AMISE</sub>. ===Smoothed cross validation=== Line 78 ⟶ 77: + K_{2\bold{G}}) (\bold{X}_i - \bold{X}_j)</math> Thus <math>\hat{\bold{H}}_{\operatorname{SCV}} = \operatorname{argmin}_{\bold{H} \in F} \, \operatorname{SCV} (\bold{H})</math> is the SCV selector.<ref name="DH2005">{{Cite journal\| doi=10.~~1007~~1111/~~BF01205233~~j.1467-9469.2005.00445.x \| author1=~~Hall~~Duong, PT. \| author2=~~Marron~~Hazelton, JM. ~~\| author3=Park, B~~L. \| title=~~Smoothed~~Cross ~~cross-~~validation bandwidth matrices for multivariate kernel density estimation \| journal=~~Probability~~Scandinavian ~~Theory~~Journal ~~and~~of ~~Related Fields~~Statistics \| year=~~1992~~2005 \| volume=9232 \| pages=~~1–20~~485–506}}</ref><ref ~~name="DH2005"~~>{{Cite journal\| doi=10.~~1111~~1007/~~j.1467-9469.2005.00445.x~~BF01205233 \| author1=~~Duong~~Hall, TP. \| author2=~~Hazelton~~Marron, MJ.L \| author3=Park, B. \| title=~~Cross~~Smoothed cross-validation ~~bandwidth matrices for multivariate kernel density estimation~~ \| journal=~~Scandinavian~~Probability ~~Journal~~Theory ofand ~~Statistics~~Related Fields \| year=~~2005~~1992 \| volume=3292 \| pages=~~485–506~~1–20}}</ref> These references also contain algorithms on optimal estimation of the pilot bandwidth matrix <strong>G</strong> and establish that <math>\hat{\bold{H}}_{\operatorname{SCV}}</math> converges in probability to '''H'''<sub>AMISE</sub>. Line 98 ⟶ 97: :<math>\operatorname{MSE} \, \hat{f}(\bold{x};\bold{H}) = \operatorname{Var} \hat{f}(\bold{x};\bold{H}) + [\operatorname{E} \hat{f}(\bold{x};\bold{H}) - f(\bold{x})]^2</math> we have that the MSE tends to 0, implying that the kernel density estimator is (mean square) consistent and hence converges in probability to the true density ''f''. The rate of convergence of the MSE to 0 is the necessarily the same as the MISE rate noted previously ''O(n<sup>-4/(d+4)</sup>)'', hence the covergence rate of the density estimator to ''f'' is ''O<sub>p</sub>(n<sup>-2/(d+4)</sup>)'' where ''O<sub>p</sub>'' denotes [[Big O in probability notation \| order in probability]]. This establishes pointwise convergence. The functional covergence is established similarly by considering the behaviour of the MISE, and noting that under sufficient regularity, integration does not affect the convergence rates. For the data-based bandwidth selectors considered, the target is the AMISE bandwidth matrix. We say that a data-based selector converges to the AMISE selector at relative rate ''O<sub>p</sub>(n<sup>-α</sup>), α > 0'' if Line 104 ⟶ 103: :<math>\operatorname{vec} (\hat{\bold{H}} - \bold{H}_{\operatorname{AMISE}}) = O(n^{-2\alpha}) \operatorname{vec} \bold{H}_{\operatorname{AMISE}}.</math> It has been established that the plug-in and smoothed cross validation selectors (given a single pilot bandwidth '''G''') both converge at a relative rate of ''O<sub>p</sub>(n<sup>-2/(d+6)</sup>)'' <ref name="DH2005"/><ref>{{Cite journal\| doi=10.1016/j.jmva.2004.04.004 \| author1=Duong, T. \| author2=Hazelton, M.L. \| title=Convergence rates for unconstrained bandwidth matrix selectors in multivariate kernel density estimation \| journal=Journal of Multivariate Analysis \| year=2005 \| volume=93 \| pages=417–433}}</ref> i.e., both these data-based selectors are consistent estimators. ==Density estimation in R with a full bandwidth matrix== [[File:Old Faithful Geyser KDE with plugin bandwidth.png\|thumb\|250px\|alt=Old Faithful Geyser data kernel density estimate with plug-in bandwidth matrix.\|Old Faithful Geyser data kernel density estimate with plug-in bandwidth matrix.]] Line 164 ⟶ 163: [http://www.mathworks.com/matlabcentral/fileexchange/17204 2-dimensional data]. The routine is an automatic bandwidth selection method specifically designed for a second order Gaussian kernel .<ref>{{Cite journal \| author1 = Botev, Z.I. \| author2 = Grotowski, J.F. Line 176 ⟶ 175: \| doi = 10.1214/10-AOS799 }} </ref>. The figure shows the joint density estimate that results from using the automatically selected bandwidth. Line 197 ⟶ 196: plot(data(:,1),data(:,2),'r.','MarkerSize',5) </pre> ===Alternative optimality criteria=== Line 218 ⟶ 215: : <math>\operatorname{MH} (\bold{H}) = \operatorname{E} \int (\hat{f}_\bold{H} (\bold{x})^{1/2} - f(\bold{x})^{1/2})^2 \, d\bold{x} .</math> The KL can be estimated using a cross-validation method, although KL cross-validation selectors can be sub-optimal even if it remains [[Consistent estimator \| consistent]] for bounded density functions.<ref>{{cite journal \| author=Hall, P. \| title=On Kullback-Leibler loss and density estimation \| journal=Annals of Statistics \| volume=15 \| year=1989 \| pages=589–605 \| doi=10.1214/aos/1176350606}}</ref> MH selectors have been briefly examined in the literature.<ref>{{cite journal \| author1=Ahmad, I.A. \| author2=Mugdadi, A.R. \| title=Weighted Hellinger distance as an error criterion for bandwidth selection in kernel estimation \| journal=Journal of Nonparametric Statistics \| volume=18 \| year=2006 \| pages=215–226 \| doi=10.1080/10485250600712008}}</ref> All these optimality criteria are distance based measures, and do not always correspond to more intuitive notions of closeness, so more visual criteria have been developed in response to this concern.<ref>{{cite journal \| author1=Marron, J.S. \| author2=Tsybakov, A. \| title=Visual error criteria for qualitative smoothing \| journal = Journal of the American Statistical Association \| year=1996 \| volume=90 \| pages=499–507 \| url=http://www.jstor.org/stable/2291060}}</ref> ==References==

Multivariate kernel density estimation: Difference between revisions