Multivariate kernel density estimation: Difference between revisions

Content deleted Content added
Adding short description: "Concept in statistics mathematics"
Update link to fastKDE repository, which moved to GitHub
Line 197:
where, ''N'' is the number of data points, ''d'' is the number of dimensions (variables), and <math>I_{\vec{A}}(\vec{t})</math> is a filter that is equal to 1 for 'accepted frequencies' and 0 otherwise. There are various ways to define this filter function, and a simple one that works for univariate or multivariate samples is called the 'lowest contiguous hypervolume filter'; <math>I_{\vec{A}}(\vec{t})</math> is chosen such that the only accepted frequencies are a contiguous subset of frequencies surrounding the origin for which <math>|\hat{\varphi}(\vec{t})|^2 \ge 4(N-1)N^{-2}</math> (see <ref name=":22"/> for a discussion of this and other filter functions).
 
Note that direct calculation of the ''empirical characteristic function'' (ECF) is slow, since it essentially involves a direct Fourier transform of the data samples. However, it has been found that the ECF can be approximated accurately using a [[Non-uniform discrete Fourier transform|non-uniform fast Fourier transform]] (nuFFT) method,<ref name=":1" /><ref name=":22"/> which increases the calculation speed by several orders of magnitude (depending on the dimensionality of the problem). The combination of this objective KDE method and the nuFFT-based ECF approximation has been referred to as ''[https://bitbucketgithub.orgcom/lblLBL-cascadeEESA/fastkde fastKDE]'' in the literature.<ref name=":22"/>
[[File:FastKDE_example.jpg|alt=A demonstration of fastKDE relative to a sample PDF. (a) True PDF, (b) a good representation with fastKDE, and (c) a slightly blurry representation.|none|thumb|664x664px|A non-trivial mixture of normal distributions: (a) the underlying PDF, (b) a fastKDE estimate on 1,000,000 samples, and (c) a fastKDE estimate on 10,000 samples.]]