Revision as of 11:43, 18 September 2010 edit Sphilbrick (talk \| contribs) Administrators 180,483 edits remove template, as it has been reviewed ← Previous edit		Revision as of 14:19, 18 September 2010 edit undo Drleft (talk \| contribs) 102 edits m Made changes as suggested by reviewers Next edit →
Line 1: [[Kernel density estimation]] is ~~one~~a oftechnique ~~the~~for ~~most popular techniques for~~[[nonparametric]] [[density estimation]] i.e., estimation of [[probability density function]]s, which is one of the fundamental questions in [[statistics]]. It can be viewed as a generalisation of [[histogram]] density estimation with improved statistical properties. Apart for histograms, other types of density estimators include [[parametric statistics \| parametric]], [[spline interpolation \|spline]], [[wavelet]] and [[Fourier series]]. Kernel density estimators were first introduced in the scientific literature for [[univariate]] data in the 1950s and 1960s<ref>{{Cite journal\| doi=10.1214/aoms/1177728190 \| last=Rosenblatt \| first=M.\| title=Remarks on some nonparametric estimates of a density function \| journal=Annals of Mathematical Statistics \| year=1956 \| volume=27 \| pages=832–837}}</ref><ref>{{Cite journal\| doi=10.1214/aoms/1177704472\| last=Parzen \| first=E.\| title=On estimation of a probability density function and mode \| journal=Annals of Mathematical Statistics\| year=1962 \| volume=33 \| pages=1065–1076}}</ref> and subsequently have been widely adopted. It was soon recognised that analogous estimators for multivariate data would be an important addition to [[multivariate statistics]]. Based on research carried out in the 1990s and 2000s, multivariate kernel density estimation has reached a level of maturity comparable to their univariate counterparts.<ref>{{Cite book\| author=Simonoff, J.S. \| title=Smoothing Methods in Statistics \| publisher=Springer \| date=1996 \| isbn=0387947167}}</ref> Line 28 ⟶ 29: ==Optimal bandwidth matrix selection== The most commonly used optimality criterion for selecting a bandwidth matrix is the MISE or [[mean integrated squared error \| Mean Integrated Squared Error]] : <math>\operatorname{MISE} (\bold{H}) = E \int [\hat{f}_\bold{H} (\bold{x}) - f(\bold{x})]^2 \, d\bold{x} .</math> This in general does not possess a closed form expression, so it is usual to use its asymptotic approximation (AMISE) as a proxy : <math>\operatorname{AMISE} (\bold{H}) = n^{-1} \|\bold{H}\|^{-1/2} R(K) + \tfrac{1}{4} m_2(K)^2 (\operatorname{vec}^T \bold{H}) \bold{\Psi}_4 (\operatorname{vec}^T \bold{H})</math> Line 49 ⟶ 50: The quality of the AMISE approximation to the MISE<ref name="WJ1995"/>{{rp\|97}} is given by : <math>\operatorname{MISE} (\bold{H}) = \operatorname{AMISE} (\bold{H}) + o(n^{-1} \|\bold{H}\|^{-1/2} + \operatorname{tr} \, \bold{H}^2)</math> where ''o'' indicates the usual [[big O notation\|small o notation]]. Heuristically this statement implies that the AMISE is a 'good' approximation of the MISE as the sample size <em>n → ∞<em>. An ideal optimal bandwidth selector is Line 59 ⟶ 60: ===Plug-in=== The plug-in (PI) estimate of the AMISE is formed by replacing <math>\bold{\Psi}_4</math> by its estimator <math>\hat{\bold{\Psi}}_4</math> ~~: <math>\bold{\Psi}_4</math> by its estimator <math>\hat{\bold{\Psi}}_4</math>~~ : <math>\operatorname{PI}(\bold{H}) = n^{-1} \|\bold{H}\|^{-1/2} R(K) + \tfrac{1}{4} m_2(K)^2

Multivariate kernel density estimation: Difference between revisions