Revision as of 19:38, 3 March 2014 edit NickPenguin (talk \| contribs) Extended confirmed users, Pending changes reviewers 11,476 edits no consensus to merge ← Previous edit		Revision as of 17:47, 14 March 2014 edit undo Magioladitis (talk \| contribs) Extended confirmed users, Rollbackers 908,596 edits m clean up using AWB (9987) Next edit →
Line 24: * {{nowrap\|''K''<sub>'''H'''</sub>('''x''') {{=}} {{!}}'''H'''{{!}}<sup>−1/2</sup> ''K''('''H'''<sup>−1/2</sup>'''x''')}}. The choice of the kernel function ''K'' is not crucial to the accuracy of kernel density estimators, so we use the standard [[multivariate normal distribution\|multivariate normal]] kernel throughout: {{nowrap\|''K''('''x''') {{=}} (2''π'')<sup>−''d''/2</sup> exp(−{{frac\|2}}'''x'''<sup>''T''</sup>'''x''')}}. Whereas the choice of the bandwidth matrix <strong>H</strong> is the single most important factor affecting its accuracy since it controls the amount of and orientation of smoothing induced.<ref name="WJ1995">{{Cite book\| author1=Wand, M.P \| author2=Jones, M.C. \| title=Kernel Smoothing \| publisher=Chapman & Hall/CRC \| ___location=London \| year=1995 \| isbn = 0-412-55270-1}}</ref>{{rp\|36–39}} That the bandwidth matrix also induces an orientation is a basic difference between multivariate kernel density estimation from its univariate analogue since orientation is not defined for 1D kernels. This leads to the choice of the parametrisation of this bandwidth matrix. The three main parametrisation classes (in increasing order of complexity) are ''S'', the class of positive scalars times the identity matrix; ''D'', diagonal matrices with positive entries on the main diagonal; and ''F'', symmetric positive definite matrices. The ''S'' class kernels have the same amount of smoothing applied in all coordinate directions, ''D'' kernels allow different amounts of smoothing in each of the coordinates, and ''F'' kernels allow arbitrary amounts and orientation of the smoothing. Historically ''S'' and ''D'' kernels are the most widespread due to computational reasons, but research indicates that important gains in accuracy can be obtained using the more general ''F'' class kernels.<ref>{{cite journal \| author1=Wand, M.P. \| author2=Jones, M.C. \| title=Comparison of smoothing parameterizations in bivariate kernel density estimation \| journal=Journal of the American Statistical Association \| year=1993 \| volume=88 \| pages=520–528 \| url=http://www.jstor.org/stable/2290332}}</ref><ref name="DH2003">{{Cite journal\| doi=10.1080/10485250306039 \| author1=Duong, T. \| author2=Hazelton, M.L. \| title=Plug-in bandwidth matrices for bivariate kernel density estimation \| journal=Journal of Nonparametric Statistics \| year=2003 \| volume=15 \| pages=17–30}}</ref> [[File:Kernel parametrisation class.png\|thumb\|center\|500px\|alt=Comparison of the three main bandwidth matrix parametrisation classes. Left. S positive scalar times the identity matrix. Centre. D diagonal matrix with positive entries on the main diagonal. Right. F symmetric positive definite matrix.\|Comparison of the three main bandwidth matrix parametrisation classes. Left. ''S'' positive scalar times the identity matrix. Centre. ''D'' diagonal matrix with positive entries on the main diagonal. Right. ''F'' symmetric positive definite matrix.]] Line 52: : <math>\operatorname{MISE} (\bold{H}) = \operatorname{AMISE} (\bold{H}) + o(n^{-1} \|\bold{H}\|^{-1/2} + \operatorname{tr} \, \bold{H}^2)</math> where ''o'' indicates the usual [[big O notation\|small o notation]]. Heuristically this statement implies that the AMISE is a 'good' approximation of the MISE as the sample size <em>n → ∞<em>. It can be shown that any reasonable bandwidth selector '''H''' has '''H''' = ''O(n<sup>-2/(d+4)</sup>)'' where the [[big O notation]] is applied elementwise. Substituting this into the MISE formula yields that the optimal MISE is ''O(n<sup>-4/(d+4)</sup>).''<ref name="WJ1995"/>{{rp\|~~99-100~~99–100}} Thus as ''n → ∞'', the MISE → 0, i.e. the kernel density estimate [[convergence in mean\|converges in mean square]] and thus also in probability to the true density ''f''. These modes of convergence are confirmation of the statement in the motivation section that kernel methods lead to reasonable density estimators. An ideal optimal bandwidth selector is : <math>\bold{H}_{\operatorname{AMISE}} = \operatorname{argmin}_{\bold{H} \in F} \, \operatorname{AMISE} (\bold{H}).</math> Line 86: In the optimal bandwidth selection section, we introduced the MISE. Its construction relies on the [[expected value]] and the [[variance]] of the density esimator<ref name="WJ1995"/>{{rp\|97}} :<math>\operatorname{E} \hat{f}(\bold{x};\bold{H}) = K_\bold{H} * f (\bold{x}) = f(\bold{x}) + \frac{1}{2} m_2(K) \int \operatorname{tr} (\bold{H} \operatorname{D}^2 f(\bold{x})) \, d\bold{x} + O(\operatorname{tr} \, \bold{H}^2)</math> where * is the [[convolution]] operator between two functions, and :<math>\operatorname{Var} \hat{f}(\bold{x};\bold{H}) = n^{-1} \|\bold{H}\|^{-1/2} R(K) + o(n^{-1} \|\bold{H}\|^{-1/2}).</math> For these two expressions to be well-defined, we require that all elements of '''H''' tend to 0 and that ''n<sup>-1</sup>'' \|'''H'''\|<sup>-1/2</sup> tends to 0 as ''n'' tends to infinity. Assuming these two conditions, we see that the expected value tends to the true density ''f'' i.e. the kernel density estimator is asymptotically [[Bias of an estimator\|unbiased]]; and that the variance tends to zero. Using the standard mean squared value decomposition Line 96: :<math>\operatorname{MSE} \, \hat{f}(\bold{x};\bold{H}) = \operatorname{Var} \hat{f}(\bold{x};\bold{H}) + [\operatorname{E} \hat{f}(\bold{x};\bold{H}) - f(\bold{x})]^2</math> we have that the MSE tends to 0, implying that the kernel density estimator is (mean square) consistent and hence converges in probability to the true density ''f''. The rate of convergence of the MSE to 0 is the necessarily the same as the MISE rate noted previously ''O(n<sup>-4/(d+4)</sup>)'', hence the covergence rate of the density estimator to ''f'' is ''O<sub>p</sub>(n<sup>-2/(d+4)</sup>)'' where ''O<sub>p</sub>'' denotes [[Big O in probability notation\|order in probability]]. This establishes pointwise convergence. The functional covergence is established similarly by considering the behaviour of the MISE, and noting that under sufficient regularity, integration does not affect the convergence rates. For the data-based bandwidth selectors considered, the target is the AMISE bandwidth matrix. We say that a data-based selector converges to the AMISE selector at relative rate ''O<sub>p</sub>(n<sup>-α</sup>), α > 0'' if :<math>\operatorname{vec} (\hat{\bold{H}} - \bold{H}_{\operatorname{AMISE}}) = O(n^{-2\alpha}) \operatorname{vec} \bold{H}_{\operatorname{AMISE}}.</math> Line 199: ===Alternative optimality criteria=== The MISE is the expected integrated ''L<sub>2</sub>'' distance between the density estimate and the true density function ''f''. It is the most widely used, mostly due to its tractability and most software implement MISE-based bandwidth selectors. There are alternative optimality criteria, which attempt to cover cases where MISE is not an appropriate measure.<ref name="simonoff1996"/>{{rp\|34-37,78}} The equivalent ''L<sub>1</sub>'' measure, Mean Integrated Absolute Error, is : <math>\operatorname{MIAE} (\bold{H}) = \operatorname{E}\, \int \|\hat{f}_\bold{H} (\bold{x}) - f(\bold{x})\| \, d\bold{x}.</math> Its mathematical analysis is considerably more difficult than the MISE ones. In practise, the gain appears not to be significant.<ref>{{cite journal \| author1=Hall, P. \| author2=Wand, M.P. \| title=Minimizing L<sub>1</sub> distance in nonparametric density estimation \| journal = Journal of Multivariate Analysis \| year=1988 \| volume=26 \| pages=59–88 \| doi=10.1016/0047-259X(88)90073-5}}</ref> The ''L<sub>∞</sub>'' norm is the Mean Uniform Absolute Error : <math>\operatorname{MUAE} (\bold{H}) = \operatorname{E}\, \operatorname{sup}_{\bold{x}} \|\hat{f}_\bold{H} (\bold{x}) - f(\bold{x})\|.</math> Line 218: All these optimality criteria are distance based measures, and do not always correspond to more intuitive notions of closeness, so more visual criteria have been developed in response to this concern.<ref>{{cite journal \| author1=Marron, J.S. \| author2=Tsybakov, A. \| title=Visual error criteria for qualitative smoothing \| journal = Journal of the American Statistical Association \| year=1996 \| volume=90 \| pages=499–507 \| url=http://www.jstor.org/stable/2291060}}</ref> ==See also== * [[Kernel density estimation]] – univariate kernel density estimation. * [[Variable kernel density estimation]] – estimation of multivariate densities using the kernel with variable bandwidth ==References== Line 229 ⟶ 233: * [[Kernel density estimation]] – univariate kernel density estimation. * [[Variable kernel density estimation]] – estimation of multivariate densities using the kernel with variable bandwidth ==References== {{Reflist}} [[Category:Estimation of densities]]

Multivariate kernel density estimation: Difference between revisions