==Definition==
The previous figure is a graphical representation of kernel density estimate, which we now define in an exact manner. Let '''Xx'''<sub>1</sub>, '''Xx'''<sub>2</sub>, …, '''Xx'''<sub>''n''</sub> be a [[random sample|sample]] of ''d''-variate [[random samplevector]]s drawn from a common distribution described by the [[probability density function|density function]] ''ƒ''. The kernel density estimate is defined to be
: <math>
: <math> \hat{f}_\bold{H}(\bold{x})= n^{-1}\frac1n \sum_{i=1}^n K_\bold{H} (\bold{x} - \bold{Xx}_i) </math>
where
<li>* {{nowrap|'''x''' {{=}} (''x''<sub>1</sub>, ''x''<sub>2</sub>, …, ''x<sub>d</sub>'')<sup>''T''</sup>}}, {{nowrap|''' Xx'''<sub>''i''</sub> {{=}} ('' Xx''<sub>''i''1</sub>, '' Xx''<sub>''i''2</sub>, …, '' Xx<sub>id</sub>'')<sup>''T''</sup>, ''i'' {{=}} 1, 2, …, ''n''}} are ''d''-vectors ;▼
<ul>
<li><strong>* '''H </strong>''' is the bandwidth (or smoothing) ''d×d'' matrix which is a[[symmetric matrix|symmetric ,]] and [[positive definite matrix|positive definite]] ''d x d'' matrix.;▼
▲<li>{{nowrap|'''x''' {{=}} (''x''<sub>1</sub>, ''x''<sub>2</sub>, …, ''x<sub>d</sub>'')<sup>''T''</sup>}}, {{nowrap|'''X'''<sub>''i''</sub> {{=}} (''X''<sub>''i''1</sub>, ''X''<sub>''i''2</sub>, …, ''X<sub>id</sub>'')<sup>''T''</sup>, ''i'' {{=}} 1, 2, …, ''n''}} are ''d''-vectors
<li>* ''K'' is the [[kernel (statistics)|kernel]] function which is a symmetric multivariate density function, with; * {{nowrap|''K''<sub>'''H'''(</sub>('''x''') {{=}} {{!}}'''H'''{{!}}<sup>−1/2</sup> ''K''('''H'''<sup>−1/2</sup>'''x''')}}.
▲<li><strong>H</strong> is the bandwidth (or smoothing) matrix which is a symmetric, [[positive definite matrix|positive definite]] ''d x d'' matrix.
</ul>
The choice of the kernel function ''K'' is not crucial to the accuracy of kernel density estimators, so we use the standard [[multivariate normal distribution|multivariate normal]] or Gaussian density function as our kernel throughout: {{nowrap|''K'' throughout: <math>K (\bold{'''x}''') {{=}} (2\pi''π'')^{-<sup>−''d''/2} \</sup> exp(-\tfrac−{1}{frac|2} \, \bold{x}^T \bold{'''x})'''<sup>''T''</mathsup>'''x''')}}. Whereas the choice of the bandwidth matrix <strong>H</strong> is the single most important factor affecting its accuracy since it controls the amount of and orientation of smoothing induced.<ref name="WJ1995">{{Cite book| author1=Wand, M.P | author2=Jones, M.C. | title=Kernel Smoothing | publisher=Chapman & Hall/CRC | ___location=London | date=1995 | isbn = 0412552701}}</ref>{{rp|36–39}}
==Optimal bandwidth matrix selection==
|