Multivariate kernel density estimation: Difference between revisions

Content deleted Content added
Drleft (talk | contribs)
No edit summary
Drleft (talk | contribs)
No edit summary
Line 6:
 
[[Kernel density estimation]] is one of the most popular techniques for density estimation. It can be viewed as a generalisation of [[histogram]] density estimation with improved statistical properties.
Kernel density estimators were first introduced in the scientific literature for [[univariate]] data in the 1950s and 1960s<ref>{{cite journal|doi=10.1214/aoms/1177728190|last=Rosenblatt|first=M.|title=Remarks on some nonparametric estimates of a density function |url=http://projecteuclid.org/euclid.aoms/1177728190|journal=[[Annals of Mathematical Statistics]]|year=1956|volume=27|pages=832-837}}</ref><ref>{{cite journal|doi=10.1214/aoms/1177704472|last=Parzen|first=E.|title=On estimation of a probability density function and mode|url=http://projecteuclid.org/euclid.aoms/1177704472|journal=[[Annals of Mathematical Statistics]]|year=1962|volume=33|pages=1065-1076}}</ref> and subsequently have been widely adopted. It was soon recognised that analagous estimators for multivariate data would be an important addition to [[multivariate statistics]]. Based on research carried out in the 1990s and 2000s, multivariate kernel density estimation has reached a level of maturity comparable to their univariate counterparts.
Kernel density estimators were first introduced in the scientific literature for [[univariate]] data in the 1950s and 1960s by
<ref>{{cite journal|doi=10.1214/aoms/1177728190|last=Rosenblatt|first=M.|title=Remarks on some nonparametric estimates of a density function |url=http://projecteuclid.org/euclid.aoms/1177728190|journal=[[Annals of Mathematical Statistics]]|year=1956|volume=27|pages=832-837}}</ref><ref>{{cite journal|doi=10.1214/aoms/1177704472|last=Parzen|first=E.|title=On estimation of a probability density function and mode|url=http://projecteuclid.org/euclid.aoms/1177704472|journal=[[Annals of Mathematical Statistics]]|year=1962|volume=33|pages=1065-1076}}</ref> and subsequently have been widely adopted. It was soon recognised that analagous estimators for multivariate data would be an important addition to [[multivariate statistics]].
 
 
== Motivation ==
To motivate the definition of multivariate kernel density estimators, we take as an illustrative [[bivariate]] data set drawn from ....
 
Problems with bivariate histograms.
 
 
== Definition ==
Let <math>\bold{X}_1, \bold{X}_2, \dots, \bold{X}_n</math> be a <em>d</em>-variate random sample drawn from a common density function <em>f</em>. The kernel density estimate is defined to be
 
<math>\widehat{f}_\bold{H}(\bold{x})= n^{-1} |\bold{H}|^{-1/2} \sum_{i=1}^n K_\bold{H} (\bold{x} - \bold{X}_i)</math>
 
where
<ul>
<li><math>\bold{x} = (x_1, x_2, \dots, x_d)^T</math>, <math>\bold{X}_i = (X_{i1}, X_{i2}, \dots, X_{id})^T, i=1, 2, \dots, n.</math>
<li><em>K</em> is the kernel function which is a symmetric density function, with<math>K_\bold{H}(\bold{x}) = |\bold{H}|^{-1/2} K(\bold{H}^{-1/2} \bold{x})</math>
<li><strong>H</strong> is the bandwidth (or smoothing) matrix which is a symmetric, [[positive definite matrix|positive definite]] <em>d x d</em> matrix.
</ul>
 
== References ==