Revision as of 16:33, 24 September 2010 edit Stpasha (talk \| contribs) Extended confirmed users, Pending changes reviewers 2,969 edits m clean up inline <math> formulas and substitute named html entities using AWB ← Previous edit		Revision as of 16:51, 24 September 2010 edit undo Stpasha (talk \| contribs) Extended confirmed users, Pending changes reviewers 2,969 edits →Definition: copyedit Next edit →
Line 15: ==Definition== The previous figure is a graphical representation of kernel density estimate, which we now define in an exact manner. Let '''Xx'''<sub>1</sub>, '''Xx'''<sub>2</sub>, …, '''Xx'''<sub>''n''</sub> be a [[random sample\|sample]] of ''d''-variate [[random ~~sample~~vector]]s drawn from a common distribution described by the [[probability density function\|density function]] ''ƒ''. The kernel density estimate is defined to be : <math> : ~~<math>~~ \hat{f}_\bold{H}(\bold{x})= ~~n^{-1}~~\frac1n \sum_{i=1}^n K_\bold{H} (\bold{x} - \bold{Xx}_i) </math> where ~~<li>~~* {{nowrap\|'''x''' {{=}} (''x''<sub>1</sub>, ''x''<sub>2</sub>, …, ''x<sub>d</sub>'')<sup>''T''</sup>}}, {{nowrap\|'''Xx'''<sub>''i''</sub> {{=}} (''Xx''<sub>''i''1</sub>, ''Xx''<sub>''i''2</sub>, …, ''Xx<sub>id</sub>'')<sup>''T''</sup>, ''i'' {{=}} 1, 2, …, ''n''}} are ''d''-vectors;▼ ~~<ul>~~ ~~<li><strong>~~* '''H~~</strong>~~''' is the bandwidth (or smoothing) ''d×d'' matrix which is a[[symmetric matrix\|symmetric,]] and [[positive definite matrix\|positive definite]] ~~''d x d'' matrix.~~;▼ ▲<li>{{nowrap\|'''x''' {{=}} (''x''<sub>1</sub>, ''x''<sub>2</sub>, …, ''x<sub>d</sub>'')<sup>''T''</sup>}}, {{nowrap\|'''X'''<sub>''i''</sub> {{=}} (''X''<sub>''i''1</sub>, ''X''<sub>''i''2</sub>, …, ''X<sub>id</sub>'')<sup>''T''</sup>, ''i'' {{=}} 1, 2, …, ''n''}} are ''d''-vectors ~~<li>~~* ''K'' is the [[kernel (statistics)\|kernel]] function which is a symmetric multivariate density ~~function, with~~; * {{nowrap\|''K''<sub>'''H'''(</sub>('''x''') {{=}} {{!}}'''H'''{{!}}<sup>−1/2</sup> ''K''('''H'''<sup>−1/2</sup>'''x''')}}. ▲<li><strong>H</strong> is the bandwidth (or smoothing) matrix which is a symmetric, [[positive definite matrix\|positive definite]] ''d x d'' matrix. ~~</ul>~~ The choice of the kernel function ''K'' is not crucial to the accuracy of kernel density estimators, so we use the standard [[multivariate normal distribution\|multivariate normal]] ~~or Gaussian density function as our~~ kernel throughout: {{nowrap\|''K'' ~~throughout: <math>K~~ (~~\bold{~~'''x}''') {{=}} (2~~\pi~~''π'')~~^{-~~<sup>−''d''/2~~} \~~</sup> exp(~~-\tfrac~~−{1}{frac\|2} ~~\, \bold{x~~}~~^T \bold{~~'''x})'''<sup>''T''</~~math~~sup>'''x''')}}. Whereas the choice of the bandwidth matrix <strong>H</strong> is the single most important factor affecting its accuracy since it controls the amount of and orientation of smoothing induced.<ref name="WJ1995">{{Cite book\| author1=Wand, M.P \| author2=Jones, M.C. \| title=Kernel Smoothing \| publisher=Chapman & Hall/CRC \| ___location=London \| date=1995 \| isbn = 0412552701}}</ref>{{rp\|36–39}} ==Optimal bandwidth matrix selection==

Multivariate kernel density estimation: Difference between revisions