Revision as of 20:57, 15 September 2010 edit Drleft (talk \| contribs) 102 edits No edit summary ← Previous edit		Revision as of 21:12, 15 September 2010 edit undo Drleft (talk \| contribs) 102 edits No edit summary Next edit →
Line 1: {{Userspace draft\|source=ArticleWizard\|date=September 2010}} [[Kernel density estimation]] is one of the most popular techniques for [[density estimation]] i.e., estimation of [[probability density function\|probability density functions]], which is one of the fundamental questions in [[statistics]]. It can be viewed as a generalisation of [[histogram]] density estimation with improved statistical properties. Kernel density estimators were first introduced in the scientific literature for [[univariate]] data in the 1950s and 1960s<ref>{{cite journal \| doi=10.1214/aoms/1177728190 \| last=Rosenblatt \| first=M.\| title=Remarks on some nonparametric estimates of a density function \| journal=Annals of Mathematical Statistics \| year=1956 \| volume=27 \| pages=832-837}}</ref><ref>{{cite journal \| doi=10.1214/aoms/1177704472\| last=Parzen \| first=E.\| title=On estimation of a probability density function and mode \| journal=Annals of Mathematical Statistics\| year=1962 \| volume=33 \| pages=1065-1076}}</ref> and subsequently have been widely adopted. It was soon recognised that analogous estimators for multivariate data would be an important addition to [[multivariate statistics]]. Based on research carried out in the 1990s and 2000s, multivariate kernel density estimation has reached a level of maturity comparable to their univariate counterparts.<ref>{{cite book \| author=Simonoff, J.S. \| title=Smoothing Methods in Statistics \| publisher=Springer \| date=1996 \| isbn=0387947167}}</ref> Line 42: <li><math>R(K) = \int K(\bold{x})^2 \, d\bold{x}</math>, with <math>R(K) = (4 \pi)^{-d/2}</math> when <math>K</math> is a normal kernel <li><math>\int \bold{x} \bold{x}^T K(\bold{x})^2 \, d\bold{x} = m_2(K) \bold{I}_d</math>, with <strong>I</strong><sub>d</sub> being the <em>d x d</em> [[identity matrix]], with <em>m</em><sub>2</sub> = 1 for the normal kernel <li><math>\operatorname{D}^2 f</math> is the <em>d x d</em> Hessian matrix of second order partial derivatives of <math>f</math> <li><math>\bold{\Psi}_4 = \int (\operatorname{vec} \, \operatorname{D}^2 f(\bold{x})) (\operatorname{vec}^T \operatorname{D}^2 f(\bold{x})) \, d\bold{x}</math> is a <em>d</em><sup>2</sup> x <em>d</em><sup>2</sup> matrix of integrated fourth order partial derivatives of <em>f</em> <li>vec is the vector operator which stacks the columns of a matrix into a single vector e.g. <math>\operatorname{vec}\begin{bmatrix}a & c \\ b & d\end{bmatrix} = \begin{bmatrix}a & b & c & d\end{bmatrix}^T.</math> </ul> Line 62 ⟶ 63: <math>\operatorname{PI}(\bold{H}) = n^{-1} \|\bold{H}\|^{-1/2} R(K) + \tfrac{1}{4} m_2(K)^2 ~~\int~~ (\operatorname{trvec}^2T (\bold{H}) \~~widehat~~hat{\~~operatorname~~bold{D\Psi}^2}_4 ~~f}_{~~(\bold{G}}) (\~~bold~~operatorname{xvec})) \, d\bold{xH})</math> where <math>\hat{\bold{\Psi}}_4 (\bold{G}) = n^{-1} \sum_{i=1}^n ~~where~~\sum_{j=1}^n ~~<math>~~[(\~~widehat~~operatorname{vec} \, \operatorname{D}^2 ~~f}_{\bold{G}}~~) (\~~bold~~operatorname{xvec}~~) = n~~^~~{-1}~~T \~~sum_~~operatorname{~~i=1~~D}~~^n D~~^2)] K_\bold{G} (\bold{xX}_i - \bold{X}_i_j)</math>. Thus <math>\hat{\bold{H}}_{\operatorname{PI}} = \operatorname{argmin}_{\bold{H} \in F} \, \operatorname{PI} (\bold{H})</math> is the plug-in selector<ref>{{cite journal \| author1=Wand, M.P. \| author2=Jones, M.C. \| title=Multivariate plug-in bandwidth selection \| journal=Computational Statistics \| year=1994 \| volume=9 \| pages=97-177}}</ref><ref>{{cite journal \| doi=10.1080/10485250306039 \| author1=Duong, T. \| author2=Hazelton, M.L. \| title=Plug-in bandwidth matrices for bivariate kernel density estimation \| journal=Journal of Nonparametric Statistics \| year=2003 \| volume=15 \| pages=17-30}}</ref>. These references also contain algorithms on optimal estimation of the pilot bandwidth matrix <strong>G</strong>. === Smoothed cross validation ===

Multivariate kernel density estimation: Difference between revisions