Content deleted Content added
Sphilbrick (talk | contribs) remove template, as it has been reviewed |
m Made changes as suggested by reviewers |
||
Line 1:
[[Kernel density estimation]] is
It can be viewed as a generalisation of [[histogram]] density estimation with improved statistical properties.
Apart for histograms, other types of density estimators include [[parametric statistics | parametric]], [[spline interpolation |spline]], [[wavelet]] and [[Fourier series]].
Kernel density estimators were first introduced in the scientific literature for [[univariate]] data in the 1950s and 1960s<ref>{{Cite journal| doi=10.1214/aoms/1177728190 | last=Rosenblatt | first=M.| title=Remarks on some nonparametric estimates of a density function | journal=Annals of Mathematical Statistics | year=1956 | volume=27 | pages=832–837}}</ref><ref>{{Cite journal| doi=10.1214/aoms/1177704472| last=Parzen | first=E.| title=On estimation of a probability density function and mode | journal=Annals of Mathematical Statistics| year=1962 | volume=33 | pages=1065–1076}}</ref> and subsequently have been widely adopted. It was soon recognised that analogous estimators for multivariate data would be an important addition to [[multivariate statistics]]. Based on research carried out in the 1990s and 2000s, multivariate kernel density estimation has reached a level of maturity comparable to their univariate counterparts.<ref>{{Cite book| author=Simonoff, J.S. | title=Smoothing Methods in Statistics | publisher=Springer | date=1996 | isbn=0387947167}}</ref>
Line 28 ⟶ 29:
==Optimal bandwidth matrix selection==
The most commonly used optimality criterion for selecting a bandwidth matrix is the MISE or [[mean integrated squared error | Mean Integrated Squared Error]]
: <math>\operatorname{MISE} (\bold{H}) = E \int [\hat{f}_\bold{H} (\bold{x}) - f(\bold{x})]^2 \, d\bold{x} .</math>
This in general does not possess a closed form expression, so it is usual to use its asymptotic approximation (AMISE) as a proxy
: <math>\operatorname{AMISE} (\bold{H}) = n^{-1} |\bold{H}|^{-1/2} R(K) + \tfrac{1}{4} m_2(K)^2
(\operatorname{vec}^T \bold{H}) \bold{\Psi}_4 (\operatorname{vec}^T \bold{H})</math>
Line 49 ⟶ 50:
The quality of the AMISE approximation to the MISE<ref name="WJ1995"/>{{rp|97}} is given by
: <math>\operatorname{MISE} (\bold{H}) = \operatorname{AMISE} (\bold{H}) + o(n^{-1} |\bold{H}|^{-1/2} + \operatorname{tr} \, \bold{H}^2)</math>
where ''o'' indicates the usual [[big O notation|small o notation]]. Heuristically this statement implies that the AMISE is a 'good' approximation of the MISE as the sample size <em>n → ∞<em>. An ideal optimal bandwidth selector is
Line 59 ⟶ 60:
===Plug-in===
The plug-in (PI) estimate of the AMISE is formed by replacing <math>\bold{\Psi}_4</math> by its estimator <math>\hat{\bold{\Psi}}_4</math>
: <math>\operatorname{PI}(\bold{H}) = n^{-1} |\bold{H}|^{-1/2} R(K) + \tfrac{1}{4} m_2(K)^2
|