Multivariate kernel density estimation: Difference between revisions

Content deleted Content added
Drleft (talk | contribs)
No edit summary
Drleft (talk | contribs)
No edit summary
Line 28:
</ul>
 
The choice of the kernel function <em>K</em> is not crucial to the accuracy of kernel density estimators, so we use the standard [[multivariate normal distribution|multivariate normal]] or Gaussian density function as our kernel <em>K</em> throughout: <math>K (\bold{x}) = (2\pi)^{-d/2} \exp(-\tfrac{1}{2} \, \bold{x}^T \bold{x})</math>. Whereas the choice of the bandwidth matrix <strong>H</strong> is the single most important factor affecting its accuracy since it controls the amount of and orientation of smoothing induced.<ref name="WJ1995">{{cite book | author1=Wand, M.P | author2=Jones, M.C. | title=Kernel Smoothing | publisher=Chapman & Hall/CRC | ___location=London | date=1995 | isbn = 0412552701}}</ref>(pp. {{rp|36-39).}}
 
== Optimal bandwidth matrix selection ==
Line 50:
<li>vec is the vector operator which stacks the columns of a matrix into a single vector e.g. <math>\operatorname{vec}\begin{bmatrix}a & c \\ b & d\end{bmatrix} = \begin{bmatrix}a & b & c & d\end{bmatrix}^T.</math>
</ul>
This formula of the AMISE is due to <ref name="WJ1995">/</ref>(p. 97). The quality of the AMISE approximation to the MISE is given by
 
<math>\operatorname{MISE} (\bold{H}) = \operatorname{AMISE} (\bold{H}) + o(n^{-1} |\bold{H}|^{-1/2}) + O(\operatorname{tr} \, \bold{H}^2)</math>
 
where <em>o, O</em> indicate the usual small and [[big O notation]].<ref name="WJ1995">/</ref>{{rp|97}} Heuristically this statement implies that the AMISE is a 'good' approximation of the MISE as the sample size <em>n → ∞<em>. An ideal optimal bandwidth selector is
 
<math>\bold{H}_{\operatorname{AMISE}} = \operatorname{argmin}_{\bold{H} \in F} \, \operatorname{AMISE} (\bold{H})</math>