Content deleted Content added
No edit summary |
No edit summary |
||
Line 1:
{{Userspace draft|source=ArticleWizard|date=September 2010}}
[[Kernel density estimation]] is one of the most popular techniques for [[density estimation]] i.e., estimation of [[probability density function|probability density functions]] which is one of the fundamental questions in [[statistics]].
Kernel density estimators were first introduced in the scientific literature for [[univariate]] data in the 1950s and 1960s<ref>{{cite journal | doi=10.1214/aoms/1177728190 | last=Rosenblatt | first=M.| title=Remarks on some nonparametric estimates of a density function | journal=Annals of Mathematical Statistics | year=1956 | volume=27 | pages=832-837}}</ref><ref>{{cite journal | doi=10.1214/aoms/1177704472| last=Parzen | first=E.| title=On estimation of a probability density function and mode | journal=Annals of Mathematical Statistics| year=1962 | volume=33 | pages=1065-1076}}</ref> and subsequently have been widely adopted. It was soon recognised that
▲[[Kernel density estimation]] is one of the most popular techniques for density estimation. It can be viewed as a generalisation of [[histogram]] density estimation with improved statistical properties.
▲Kernel density estimators were first introduced in the scientific literature for [[univariate]] data in the 1950s and 1960s<ref>{{cite journal | doi=10.1214/aoms/1177728190 | last=Rosenblatt | first=M.| title=Remarks on some nonparametric estimates of a density function | journal=Annals of Mathematical Statistics | year=1956 | volume=27 | pages=832-837}}</ref><ref>{{cite journal | doi=10.1214/aoms/1177704472| last=Parzen | first=E.| title=On estimation of a probability density function and mode | journal=Annals of Mathematical Statistics| year=1962 | volume=33 | pages=1065-1076}}</ref> and subsequently have been widely adopted. It was soon recognised that analagous estimators for multivariate data would be an important addition to [[multivariate statistics]]. Based on research carried out in the 1990s and 2000s, multivariate kernel density estimation has reached a level of maturity comparable to their univariate counterparts.
Line 18 ⟶ 15:
Let <math>\bold{X}_1, \bold{X}_2, \dots, \bold{X}_n</math> be a <em>d</em>-variate random sample drawn from a common density function <em>f</em>. The kernel density estimate is defined to be
<math>\hat{f}_\bold{H}(\bold{x})= n^{-1} |\bold{H}|^{-1/2} \sum_{i=1}^n K_\bold{H} (\bold{x} - \bold{X}_i)</math>
where
Line 39 ⟶ 36:
<math>\operatorname{AMISE} (\bold{H}) = n^{-1} |\bold{H}|^{-1/2} R(K) + \tfrac{1}{4} m_2(K)^2
where
<ul>
<li><math>R(K) = \int K(\bold{x})^2 \, d\bold{x}</math>
<li><math>\int \bold{x} \bold{x}^T K(\bold{x})^2 \, d\bold{x}
with <
<li><math>\operatorname{D}^2 f</math> is the <em>d x d</em> Hessian matrix of second order partial derivatives of <math>f</math>
<li><math>\bold{\Psi}_4 = \int (\operatorname{vec} \, \operatorname{D}^2 f(\bold{x})) (\operatorname{vec}^T \operatorname{D}^2 f(\bold{x})) \, d\bold{x}</math>
<li>vec is the vector operator which stacks the columns of a matrix into a single vector e.g. <math>\operatorname{vec}\begin{bmatrix}a & c \\ b & d\end{bmatrix} = \begin{bmatrix}a & b & c & d\end{bmatrix}^T.</math>
</ul>
This formula of the AMISE is due to <ref name="WJ1995">/</ref>(p. 97). The quality of the AMISE approximation to the MISE is given by
Line 57 ⟶ 56:
<math>\bold{H}_{\operatorname{AMISE}} = \operatorname{argmin}_{\bold{H} \in F} \, \operatorname{AMISE} (\bold{H})</math>
where <em>F</em> is the space of all symmetric, positive definite matrices.
Since this ideal selector contains the unknown density function
=== Plug-in ===
Line 82 ⟶ 81:
waiting time until the next eruption (minutes), and is contained in the base distribution of R.
This code
<pre>
Line 100 ⟶ 99:
== External links ==
* [http://www.mvstat.net/tduong/research www.mvstat.net/tduong/research]
==See also==
*[[Kernel density estimation]] Univariate kernel density estimation.
|