Multivariate kernel density estimation: Difference between revisions

Content deleted Content added
Adding short description: "Concept in statistics mathematics"
 
(One intermediate revision by one other user not shown)
Line 154:
 
<syntaxhighlight lang="matlab" style="overflow:auto;">
clear all
% generate synthetic data
data=[randn(500, 2);
randn(500, 1) + 3.5, randn(500, 1);];
% call the routine, which has been saved in the current directory
[bandwidth, density, X, Y] = kde2d(data);
% plot the data and the density estimate
contour3(X, Y, density, 50), hold on
plot(data(:,1), data(:,2), 'r.', 'MarkerSize', 5)
</syntaxhighlight>
 
Line 197:
where, ''N'' is the number of data points, ''d'' is the number of dimensions (variables), and <math>I_{\vec{A}}(\vec{t})</math> is a filter that is equal to 1 for 'accepted frequencies' and 0 otherwise. There are various ways to define this filter function, and a simple one that works for univariate or multivariate samples is called the 'lowest contiguous hypervolume filter'; <math>I_{\vec{A}}(\vec{t})</math> is chosen such that the only accepted frequencies are a contiguous subset of frequencies surrounding the origin for which <math>|\hat{\varphi}(\vec{t})|^2 \ge 4(N-1)N^{-2}</math> (see <ref name=":22"/> for a discussion of this and other filter functions).
 
Note that direct calculation of the ''empirical characteristic function'' (ECF) is slow, since it essentially involves a direct Fourier transform of the data samples. However, it has been found that the ECF can be approximated accurately using a [[Non-uniform discrete Fourier transform|non-uniform fast Fourier transform]] (nuFFT) method,<ref name=":1" /><ref name=":22"/> which increases the calculation speed by several orders of magnitude (depending on the dimensionality of the problem). The combination of this objective KDE method and the nuFFT-based ECF approximation has been referred to as ''[https://bitbucketgithub.orgcom/lblLBL-cascadeEESA/fastkde fastKDE]'' in the literature.<ref name=":22"/>
[[File:FastKDE_example.jpg|alt=A demonstration of fastKDE relative to a sample PDF. (a) True PDF, (b) a good representation with fastKDE, and (c) a slightly blurry representation.|none|thumb|664x664px|A non-trivial mixture of normal distributions: (a) the underlying PDF, (b) a fastKDE estimate on 1,000,000 samples, and (c) a fastKDE estimate on 10,000 samples.]]