Multivariate kernel density estimation: Difference between revisions

Content deleted Content added
Drleft (talk | contribs)
No edit summary
Drleft (talk | contribs)
No edit summary
Line 18:
Let <math>\bold{X}_1, \bold{X}_2, \dots, \bold{X}_n</math> be a <em>d</em>-variate random sample drawn from a common density function <em>f</em>. The kernel density estimate is defined to be
 
<math>\widehathat{f}_\bold{H}(\bold{x})= n^{-1} |\bold{H}|^{-1/2} \sum_{i=1}^n K_\bold{H} (\bold{x} - \bold{X}_i)</math>
 
where
Line 27:
</ul>
 
The choice of the kernel function <em>K</em> is not crucial to the accuracy of kernel density estimators, whereas the choice of the bandwidth matrix <strong>H</strong> is the single most important factor affecting its accuracy.<ref>{{cite book | author1=Wand, M.P | author2=Jones, M.C. | title=Kernel Smoothing | publisher=Chapman Hall | ___location=London | date=1995 | page=36-39 | isbn = 0412552701}}</ref> So we use the standard [[multivariate normal distribution|multivariate normal]] or Gaussian density function as our kernel <em>K</em>
 
<math>K (\bold{x}) = (2\pi)^{-d/2} \exp(-\tfrac{1}{2} \, \bold{x}^T \bold{x}).</math>
== Optimal bandwidth matrix selection ==
The most commonly used optimality criterion for selecting a bandwidth matrix is the MISE or Mean Integrated Squared Error
 
<math>\operatorname{MISE} (\bold{H}) = E \int [\hat{f}_\bold{H} (\bold{x}) - f(\bold{x})]^2 \, d\bold{x} .</math>
 
This is in general does not possess a closed form expression, so it is usual to use its asymptotic approximation (AMISE) as a proxy
 
<math>\operatorname{MISE} (\bold{H}) = \operatorname{AMISE} (\bold{H}) + o(n^{-1} |\bold{H}|^{-1/2}) + O(\operatorname{tr} \, \bold{H}^2)</math>
 
where [[Big O notation | o, O]] indicate the usual small and big O notation. Heuristically this statement implies that the AMISE is a 'good' approximation of the MISE as the sample size <em>n → ∞<em>.
 
The many different varieties of bandwidth selectors arise from the different estimators of the MISE or AMISE. We concentrate on two classes of selectors which have been shown to be the most widely applicable in practise: smoothed cross validation and plug-in selectors.
 
=== Plug-in ===
 
 
=== Smoothed cross validation ===
 
 
== References ==