Kernel density estimation: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 03:05, 22 June 2005 edit Oleg Alexandrov (talk \| contribs) Extended confirmed users 47,270 edits #redirect parzen window ← Previous edit		Latest revision as of 21:17, 9 August 2025 edit undo Bender the Bot (talk \| contribs) Bots 1,064,377 edits m →Statistical implementation: HTTP to HTTPS for SourceForge Tag: AWB
(602 intermediate revisions by more than 100 users not shown)
Line 1: {{Short description\|Estimator}} ~~#redirect [[parzen window]]~~ [[File:Kernel density.svg\|thumb\|right\|250px\|Kernel density estimation of 100 [[normal distribution\|normally distributed]] [[random number generator\|random numbers]] using different smoothing bandwidths.]] In [[statistics]], '''kernel density estimation''' ('''KDE''') is the application of [[kernel smoothing]] for [[probability density estimation]], i.e., a [[non-parametric statistics\|non-parametric]] method to [[Estimation\|estimate]] the [[probability density function]] of a [[random variable]] based on ''[[Kernel (statistics)\|kernels]]'' as [[weight function\|weights]]. KDE answers a fundamental data smoothing problem where inferences about the [[statistical population\|population]] are made based on a finite data [[statistical sample\|sample]]. In some fields such as [[signal processing]] and [[econometrics]] it is also termed the '''Parzen–Rosenblatt window''' method, after [[Emanuel Parzen]] and [[Murray Rosenblatt]], who are usually credited with independently creating it in its current form.<ref name="Ros1956">{{Cite journal \| last1 = Rosenblatt \| first1 = M. \|author-link = Murray Rosenblatt\| title = Remarks on Some Nonparametric Estimates of a Density Function \| doi = 10.1214/aoms/1177728190 \| journal = The Annals of Mathematical Statistics \| volume = 27 \| issue = 3 \| pages = 832–837 \| year = 1956 \| doi-access = free }}</ref><ref name="Par1962">{{Cite journal \| last1 = Parzen \| first1 = E. \| author-link = Emanuel Parzen\| title = On Estimation of a Probability Density Function and Mode \| doi = 10.1214/aoms/1177704472 \| journal = [[The Annals of Mathematical Statistics]]\| volume = 33 \| issue = 3 \| pages = 1065–1076 \| year = 1962 \| jstor = 2237880\| doi-access = free }}</ref> One of the famous applications of kernel density estimation is in estimating the class-conditional [[Marginal distribution#Marginal probability density function\|marginal densities]] of data when using a [[naive Bayes classifier]], which can improve its prediction accuracy.<ref name=":0">{{Cite book \|last1=Hastie \|first1=Trevor \|author-link1=Trevor Hastie \|title=The Elements of Statistical Learning : Data Mining, Inference, and Prediction : with 200 full-color illustrations \|last2=Tibshirani \|first2=Robert \|author-link2=Robert Tibshirani \|last3=Friedman \|first3=Jerome H. \|author-link3=Jerome H. Friedman \|date=2001 \|publisher=Springer \|isbn=0-387-95284-5 \|___location=New York \|oclc=46809224}}</ref> ==Definition== Let {{math\|(''x''<sub>1</sub>, ''x''<sub>2</sub>, ..., ''x<sub>n</sub>'')}} be [[Independent and identically distributed random variables\|independent and identically distributed]] samples drawn from some univariate distribution with an unknown [[probability density function\|density]] {{math\|''f''}} at any given point {{mvar\|x}}. We are interested in estimating the shape of this function {{math\|''f''}}. Its ''kernel density estimator'' is <math display="block"> \hat{f}_h(x) = \frac{1}{n}\sum_{i=1}^n K_h (x - x_i) = \frac{1}{nh} \sum_{i=1}^n K{\left(\frac{x-x_i}{h}\right)}, </math> where {{math\|''K''}} is the [[kernel (statistics)#Nonparametric statistics\|kernel]] — a non-negative function — and {{math\|''h'' > 0}} is a [[smoothing]] parameter called the ''bandwidth'' or simply width.<ref name=":0" /> A kernel with subscript {{mvar\|h}} is called the ''scaled kernel'' and defined as {{math\|1=''K''<sub>''h''</sub>(''x'') = {{sfrac\|1\|''h''}} ''K''({{sfrac\|''x''\|''h''}})}}. Intuitively one wants to choose {{mvar\|h}} as small as the data will allow; however, there is always a trade-off between the bias of the estimator and its variance. The choice of bandwidth is discussed in more detail below. A range of [[kernel (statistics)#Kernel functions in common use\|kernel functions]] are commonly used: uniform, triangular, biweight, triweight, [[Kernel (statistics)\|Epanechnikov]] (parabolic), normal, and others. The Epanechnikov kernel is optimal in a mean square error sense,<ref>{{cite journal \|doi=10.1137/1114019 \|last = Epanechnikov \| first = V.A. \|title=Non-parametric estimation of a multivariate probability density \|journal=Theory of Probability and Its Applications \|volume=14 \|pages=153–158 \|year=1969}}</ref> though the loss of efficiency is small for the kernels listed previously.<ref name="WJ1995">{{Cite book\| last1 = Wand \| first1 = M.P \|last2 = Jones \| first2 = M.C. \|title=Kernel Smoothing \|publisher=Chapman & Hall/CRC \|___location=London \|year=1995 \|isbn=978-0-412-55270-0}}</ref> Due to its convenient mathematical properties, the normal kernel is often used, which means {{math\|1=''K''(''x'') = ''ϕ''(''x'')}}, where {{mvar\|ϕ}} is the [[standard normal]] density function. The kernel density estimator then becomes <math display="block"> \hat{f}_h(x) = \frac{1}{nh\sigma} \frac{1}{\sqrt{2\pi}} \sum_{i=1}^n \exp\left(\frac{-(x-x_i)^2}{2h^2\sigma^2}\right), </math> where <math>\sigma</math> is the standard deviation of the sample <math>\vec{x}</math>. The construction of a kernel density estimate finds interpretations in fields outside of density estimation.<ref name="bo07">{{cite tech report \|first=Zdravko \|last=Botev \|title=Nonparametric Density Estimation via Diffusion Mixing \|url=https://espace.library.uq.edu.au/view/UQ:120006 \|institution=University of Queensland \|year=2007 }}</ref> For example, in [[thermodynamics]], this is equivalent to the amount of heat generated when [[heat kernel]]s (the fundamental solution to the [[heat equation]]) are placed at each data point locations {{math\|''x<sub>i</sub>''}}. Similar methods are used to construct [[discrete Laplace operator]]s on point clouds for [[manifold learning]] (e.g. [[diffusion map]]). ==Example== Kernel density estimates are closely related to [[histograms]], but can be endowed with properties such as smoothness or continuity by using a suitable kernel. The diagram below based on these 6 data points illustrates this relationship: {\| class="wikitable" \|- ! Sample \| 1 \|\| 2 \|\| 3 \|\| 4 \|\| 5 \|\| 6 \|- ! Value \| −2.1 \|\| −1.3 \|\| −0.4 \|\| 1.9 \|\| 5.1 \|\| 6.2 \|} For the histogram, first, the horizontal axis is divided into sub-intervals or bins which cover the range of the data: In this case, six bins each of width 2. Whenever a data point falls inside this interval, a box of height 1/12 is placed there. If more than one data point falls inside the same bin, the boxes are stacked on top of each other. For the kernel density estimate, normal kernels with a standard deviation of 1.5 (indicated by the red dashed lines) are placed on each of the data points ''x<sub>i</sub>''. The kernels are summed to make the kernel density estimate (solid blue curve). The smoothness of the kernel density estimate (compared to the discreteness of the histogram) illustrates how kernel density estimates converge faster to the true underlying density for continuous random variables.<ref>{{cite journal \|last=Scott \| first = D. \|title=On optimal and data-based histograms \|journal=Biometrika \|year=1979 \|volume=66 \|pages=605–610 \|doi=10.1093/biomet/66.3.605 \|issue=3}}</ref> [[File:Comparison of 1D histogram and KDE.png\|thumb\|center\|500px\|alt=Comparison of the histogram (left) and kernel density estimate (right) constructed using the same data. The six individual kernels are the red dashed curves, the kernel density estimate the blue curves. The data points are the rug plot on the horizontal axis.\|Comparison of the histogram (left) and kernel density estimate (right) constructed using the same data. The six individual kernels are the red dashed curves, the kernel density estimate the blue curves. The data points are the [[rug plot]] on the horizontal axis.]] ==Bandwidth selection== [[File:Comparison of 1D bandwidth selectors.png\|thumb\|Kernel density estimate (KDE) with different bandwidths of a random sample of 100 points from a standard normal distribution. Grey: true density (standard normal). Red: KDE with h=0.05. Black: KDE with h=0.337. Green: KDE with h=2.]] The [[Bandwidth (computing)\|bandwidth]] of the kernel is a [[free parameter]] which exhibits a strong influence on the resulting estimate. To illustrate its effect, we take a simulated [[Random number generator\|random sample]] from the standard [[normal distribution]] (plotted at the blue spikes in the [[rug plot]] on the horizontal axis). The grey curve is the true density (a normal density with mean 0 and variance 1). In comparison, the red curve is ''undersmoothed'' since it contains too many spurious data artifacts arising from using a bandwidth ''h'' = 0.05, which is too small. The green curve is ''oversmoothed'' since using the bandwidth {{math\|1=''h'' = 2}} obscures much of the underlying structure. The black curve with a bandwidth of ''h'' = 0.337 is considered to be optimally smoothed since its density estimate is close to the true density. An extreme situation is encountered in the limit <math>h \to 0</math> (no smoothing), where the estimate is a sum of {{mvar\|n}} [[Dirac delta function\|delta functions]] centered at the coordinates of analyzed samples. In the other extreme limit <math>h \to \infty</math> the estimate retains the shape of the used kernel, centered on the mean of the samples (completely smooth). The most common optimality criterion used to select this parameter is the expected ''L''<sub>2</sub> [[risk function]], also termed the [[mean integrated squared error]]: <math display="block">\operatorname{MISE} (h) = \operatorname{E}\!\left[ \int\! {\left(\hat{f}\!_h(x) - f(x)\right)}^2 dx \right]</math> Under weak assumptions on {{math\|''f''}} and {{math\|''K''}}, ({{math\|''f''}} is the, generally unknown, real density function),<ref name="Ros1956"/><ref name="Par1962"/> <math display="block">\operatorname{MISE}(h) = \operatorname{AMISE}(h) + \mathcal{o}{\left((nh)^{-1} + h^4\right)}</math> where {{math\|''o''}} is the [[little o notation]], and {{mvar\|n}} the sample size (as above). The AMISE is the asymptotic MISE, i. e. the two leading terms, <math display="block">\operatorname{AMISE}(h) = \frac{R(K)}{nh} + \frac{1}{4} m_2(K)^2 h^4 R(f'')</math> where <math display="inline">R(g) = \int g(x)^2 \, dx</math> for a function {{math\|''g''}}, <math display="inline">m_2(K) = \int x^2 K(x) \, dx</math> and <math>f''</math> is the second derivative of <math>f</math> and <math>K</math> is the kernel. The minimum of this AMISE is the solution to this differential equation <math display="block"> \frac{\partial}{\partial h} \operatorname{AMISE}(h) = -\frac{R(K)}{nh^2} + m_2(K)^2 h^3 R(f'') = 0 </math> or <math display="block">h_{\operatorname{AMISE}} = \frac{ R(K)^{1/5}}{m_2(K)^{2/5}R(f'')^{1/5} } n^{-1/5} = C n^{-1/5}</math> Neither the AMISE nor the ''h''<sub>AMISE</sub> formulas can be used directly since they involve the unknown density function <math>f</math> or its second derivative <math>f''</math>. To overcome that difficulty, a variety of automatic, data-based methods have been developed to select the bandwidth. Several review studies have been undertaken to compare their efficacies,<ref>{{cite journal \|author1=Park, B.U. \|author2=Marron, J.S. \|year=1990 \|title=Comparison of data-driven bandwidth selectors \|journal=Journal of the American Statistical Association \|volume=85 \|issue=409 \|pages=66–72 \|jstor=2289526 \|doi=10.1080/01621459.1990.10475307\|citeseerx=10.1.1.154.7321 }}</ref><ref>{{cite journal \|author1=Park, B.U. \|author2=Turlach, B.A. \|year=1992 \|title=Practical performance of several data driven bandwidth selectors (with discussion) \|journal=Computational Statistics \|volume=7 \|pages=251–270\|url=https://ideas.repec.org/p/cor/louvco/1992005.html}}</ref><ref>{{cite journal\|author1=Cao, R. \|author2=Cuevas, A. \|author3=Manteiga, W. G. \|year=1994 \|title=A comparative study of several smoothing methods in density estimation \|journal=Computational Statistics and Data Analysis \|volume=17 \|pages=153–176 \|doi=10.1016/0167-9473(92)00066-Z\|issue=2}}</ref><ref>{{cite journal \|doi=10.2307/2291420 \|author1=Jones, M.C. \|author2=Marron, J.S. \|author3=Sheather, S. J. \|year=1996 \|title=A brief survey of bandwidth selection for density estimation\| journal=Journal of the American Statistical Association \|volume=91 \|issue=433 \|pages=401–407 \|jstor=2291420}}</ref><ref>{{cite journal \|author=Sheather, S.J. \|year=1992 \|title=The performance of six popular bandwidth selection methods on some real data sets (with discussion) \|journal=Computational Statistics \|volume=7 \|pages=225–250, 271–281}}</ref><ref>{{cite journal \|author1=Agarwal, N. \|author2=Aluru, N.R. \|year=2010 \|title=A data-driven stochastic collocation approach for uncertainty quantification in MEMS \|journal=International Journal for Numerical Methods in Engineering \|volume=83 \|issue=5 \|pages=575–597 \|doi=10.1002/nme.2844 \|bibcode=2010IJNME..83..575A \|s2cid=84834908 \|url=https://webhost.engr.illinois.edu/~aluru/Journals/IJNME10.pdf}}</ref><ref>{{cite journal \|author1=Xu, X. \|author2=Yan, Z. \|author3=Xu, S. \|year=2015 \|title=Estimating wind speed probability distribution by diffusion-based kernel density method \|journal=Electric Power Systems Research\|volume=121 \|pages=28–37 \|doi=10.1016/j.epsr.2014.11.029 \|bibcode=2015EPSR..121...28X }}</ref> with the general consensus that the plug-in selectors<ref name="bo07"/><ref name="bo10">{{Cite journal \|author1=Botev, Z.I. \|author2=Grotowski, J.F. \|author3=Kroese, D.P. \|title=Kernel density estimation via diffusion \|journal=[[Annals of Statistics]] \|volume= 38 \|issue=5 \|pages=2916–2957 \|year=2010 \|doi=10.1214/10-AOS799\|arxiv=1011.2602 \|s2cid=41350591 }}</ref><ref name="SJ91">{{cite journal \|author1=Sheather, S.J. \|author2=Jones, M.C. \|year=1991 \|title=A reliable data-based bandwidth selection method for kernel density estimation \|journal=Journal of the Royal Statistical Society, Series B \|volume=53 \|issue=3 \|pages=683–690 \|jstor=2345597 \|doi=10.1111/j.2517-6161.1991.tb01857.x}}</ref> and [[Cross-validation (statistics)\|cross validation]] selectors<ref>{{cite journal \|author=Rudemo, M. \|year=1982 \|title=Empirical choice of histograms and kernel density estimators \|journal=Scandinavian Journal of Statistics \|volume=9 \|issue=2 \|pages=65–78 \|jstor=4615859}}</ref><ref>{{cite journal \|author=Bowman, A.W. \|year=1984 \|title=An alternative method of cross-validation for the smoothing of density estimates \|journal=Biometrika \|volume=71 \|pages=353–360 \|doi=10.1093/biomet/71.2.353 \|issue=2}}</ref><ref>{{cite journal \|last1 = Hall \| first1 = P. \|last2 = Marron \| first2 = J.S. \|last3 = Park \| first3 = B.U. \|year=1992 \|title=Smoothed cross-validation \|journal=Probability Theory and Related Fields \|volume=92 \|pages=1–20 \|doi=10.1007/BF01205233\|s2cid=121181481 \|doi-access=free }}</ref> are the most useful over a wide range of data sets. Substituting any bandwidth {{mvar\|h}} which has the same asymptotic order {{math\|''n''<sup>−1/5</sup>}} as {{math\|''h''<sub>AMISE</sub>}} into the AMISE gives that {{math\|1=AMISE(''h'') = ''O''(''n''<sup>−4/5</sup>)}}, where {{math\|''O''}} is the [[Big O notation\|big ''O'' notation]]. It can be shown that, under weak assumptions, there cannot exist a non-parametric estimator that converges at a faster rate than the kernel estimator.<ref>{{Cite journal\|doi=10.1214/aos/1176342997\|last=Wahba\|first=G.\|title=Optimal convergence properties of variable knot, kernel, and orthogonal series methods for density estimation\|journal=[[Annals of Statistics]]\|year=1975\|volume=3\|issue=1\|pages=15–29\|doi-access=free}}</ref> Note that the {{math\|''n''<sup>−4/5</sup>}} rate is slower than the typical {{math\|''n''<sup>−1</sup>}} convergence rate of parametric methods. If the bandwidth is not held fixed, but is varied depending upon the ___location of either the estimate (balloon estimator) or the samples (pointwise estimator), this produces a particularly powerful method termed [[variable kernel density estimation\|adaptive or variable bandwidth kernel density estimation]]. Bandwidth selection for kernel density estimation of heavy-tailed distributions is relatively difficult.<ref name="Buch2005">{{Cite journal \| last1 = Buch-Larsen \| first1 = TINE \| title = Kernel density estimation for heavy-tailed distributions using the Champernowne transformation \| doi = 10.1080/02331880500439782 \| journal = Statistics \| volume = 39 \| issue = 6 \| pages = 503–518 \| year = 2005 \| citeseerx = 10.1.1.457.1544 \| s2cid = 219697435 }}</ref> ===A rule-of-thumb bandwidth estimator=== If Gaussian basis functions are used to approximate [[univariate]] data, and the underlying density being estimated is Gaussian, the optimal choice for ''h'' (that is, the bandwidth that minimises the [[mean integrated squared error]]) is:<ref name="SI1998">{{Cite book \|last=Silverman \|first=B.W. \|author-link=Bernard Silverman \|title=Density Estimation for Statistics and Data Analysis \|publisher=Chapman & Hall/CRC \|___location=London \|year=1986 \|isbn=978-0-412-24620-3 \|page=[https://archive.org/details/densityestimatio00silv_0/page/45 45] \|url-access=registration \|url=https://archive.org/details/densityestimatio00silv_0/page/45 }}</ref> <math display="block">h = {\left(\frac{4\hat{\sigma}^5}{3n}\right)}^{1/5} \approx 1.06 \, \hat{\sigma} \, n^{-1/5},</math> An <math>h</math> value is considered more robust when it improves the fit for long-tailed and skewed distributions or for bimodal mixture distributions. This is often done empirically by replacing the [[standard deviation]] <math>\hat{\sigma}</math> by the parameter <math>A</math> below: <math display="block">A = \min\left(\hat{\sigma}, \frac{\mathrm{IQR}}{1.34}\right)</math> where [[Interquartile range\|IQR]] is the interquartile range. [[File:Kernel density estimation, comparison between rule of thumb and solve-the-equation bandwidth.png\|thumb\|upright=1.25\|alt=Comparison between rule of thumb and solve-the-equation bandwidth\|Comparison between rule of thumb and solve-the-equation bandwidth.]] Another modification that will improve the model is to reduce the factor from 1.06 to 0.9. Then the final formula would be: <math display="block">h = 0.9\, \min\left(\hat{\sigma}, \frac{\mathrm{IQR}}{1.34}\right)\, n^{-1/5}</math> where <math>n</math> is the sample size. This approximation is termed the ''normal distribution approximation'', Gaussian approximation, or ''[[Bernard Silverman\|Silverman]]'s rule of thumb''.<ref name="SI1998" /> While this rule of thumb is easy to compute, it should be used with caution as it can yield widely inaccurate estimates when the density is not close to being normal. For example, when estimating the bimodal [[Gaussian mixture model]] <math display="block">\frac{1}{2\sqrt{2\pi}} e^{-\frac{1}{2}(x-10)^2} + \frac{1}{2\sqrt{2\pi}}e^{-\frac{1}{2}(x+10)^2}</math> from a sample of 200 points, the figure on the right shows the true density and two kernel density estimates — one using the rule-of-thumb bandwidth, and the other using a solve-the-equation bandwidth.<ref name="bo07" /><ref name="SJ91" /> The estimate based on the rule-of-thumb bandwidth is significantly oversmoothed. ==Relation to the characteristic function density estimator== Given the sample {{math\|(''x''<sub>1</sub>, ''x''<sub>2</sub>, ..., ''x<sub>n</sub>'')}}, it is natural to estimate the [[characteristic function (probability theory)\|characteristic function]] {{math\|1=''φ''(''t'') = E[''e''<sup>''itX''</sup>]}} as <math display="block"> \hat\varphi(t) = \frac{1}{n} \sum_{j=1}^n e^{i t x_j} </math> Knowing the characteristic function, it is possible to find the corresponding probability density function through the [[Fourier transform]] formula. One difficulty with applying this inversion formula is that it leads to a diverging integral, since the estimate <math>\hat\varphi(t)</math> is unreliable for large {{mvar\|t}}'s. To circumvent this problem, the estimator <math>\hat\varphi(t)</math> is multiplied by a damping function {{math\|1=''ψ<sub>h</sub>''(''t'') = ''ψ''(''ht'')}}, which is equal to 1 at the origin and then falls to 0 at infinity. The "bandwidth parameter" {{mvar\|h}} controls how fast we try to dampen the function <math> \hat\varphi(t)</math>. In particular when {{mvar\|h}} is small, then {{math\|''ψ<sub>h</sub>''(''t'')}} will be approximately one for a large range of {{mvar\|t}}'s, which means that <math>\hat\varphi(t)</math> remains practically unaltered in the most important region of {{mvar\|t}}'s. The most common choice for function {{math\|''ψ''}} is either the uniform function {{math\|1=''ψ''(''t'') = '''1'''{−1 ≤ ''t'' ≤ 1}<nowiki/>}}, which effectively means truncating the interval of integration in the inversion formula to {{closed-closed\|−1/''h'', 1/''h''}}, or the [[Gaussian function]] {{math\|1=''ψ''(''t'') = ''e''<sup>''−πt''<sup>2</sup></sup>}}. Once the function {{math\|''ψ''}} has been chosen, the inversion formula may be applied, and the density estimator will be <math display="block">\begin{align} \hat{f}(x) &= \frac{1}{2\pi} \int_{-\infty}^{+\infty} \hat\varphi(t) \psi_h(t) e^{-itx} \, dt \\[1ex] &= \frac{1}{2\pi} \int_{-\infty}^{+\infty} \frac{1}{n} \sum_{j=1}^n e^{it(x_j-x)} \psi(ht) \, dt \\[1ex] &= \frac{1}{nh} \sum_{j=1}^n \frac{1}{2\pi} \int_{-\infty}^{+\infty} e^{-i(ht)\frac{x-x_j}{h}} \psi(ht) \, d(ht) \\[1ex] &= \frac{1}{nh} \sum_{j=1}^n K{\left(\frac{x-x_j}{h}\right)}, \end{align}</math> where {{math\|''K''}} is the [[Fourier transform]] of the damping function {{math\|''ψ''}}. Thus the kernel density estimator coincides with the characteristic function density estimator. == Geometric and topological features == We can extend the definition of the (global) mode to a local sense and define the local modes: <math display="block">M = \{ x:g(x) = 0 \mid \lambda_1(x)<0 \}</math> Namely, <math>M</math> is the collection of points for which the density function is locally maximized. A natural estimator of <math>M</math> is a plug-in from KDE,<ref>{{Cite journal\|last1=Chen\|first1=Yen-Chi\|last2=Genovese\|first2=Christopher R.\|last3=Wasserman\|first3=Larry\|date=2016\|title=A comprehensive approach to mode clustering\|journal=Electronic Journal of Statistics\|volume=10\|issue=1\|pages=210–241\|doi=10.1214/15-ejs1102\|issn=1935-7524\|doi-access=free\|arxiv=1406.1780}}</ref><ref>{{Cite book\|last1=Chazal\|first1=Frédéric\|last2=Fasy\|first2=Brittany Terese\|last3=Lecci\|first3=Fabrizio\|last4=Rinaldo\|first4=Alessandro\|last5=Wasserman\|first5=Larry\|title=Proceedings of the thirtieth annual symposium on Computational geometry \|chapter=Stochastic Convergence of Persistence Landscapes and Silhouettes \|date=2014\|volume=6\|issue=2\|pages=474–483\|___location=New York, New York, USA\|publisher=ACM Press\|doi=10.1145/2582112.2582128\|isbn=978-1-4503-2594-3\|s2cid=6029340\|chapter-url=https://jocg.org/index.php/jocg/article/view/2982}}</ref> where <math>g(x)</math> and <math>\lambda_1(x)</math> are KDE version of <math>g(x)</math> and <math>\lambda_1(x)</math>. Under mild assumptions, <math>M_c</math> is a [[consistent estimator]] of <math>M</math>. Note that one can use the mean shift algorithm<ref>{{Cite journal\|last1=Fukunaga\|first1=K.\|last2=Hostetler\|first2=L.\|date=January 1975\|title=The estimation of the gradient of a density function, with applications in pattern recognition\|journal=IEEE Transactions on Information Theory\|volume=21\|issue=1\|pages=32–40\|doi=10.1109/tit.1975.1055330\|issn=0018-9448}}</ref><ref>{{Cite journal\|last=Yizong Cheng\|date=1995\|title=Mean shift, mode seeking, and clustering\|journal=IEEE Transactions on Pattern Analysis and Machine Intelligence\|volume=17\|issue=8\|pages=790–799\|doi=10.1109/34.400568\|issn=0162-8828\|citeseerx=10.1.1.510.1222}}</ref><ref>{{Cite journal\|last1=Comaniciu\|first1=D.\|last2=Meer\|first2=P.\|date=May 2002\|title=Mean shift: a robust approach toward feature space analysis\|journal=IEEE Transactions on Pattern Analysis and Machine Intelligence\|volume=24\|issue=5\|pages=603–619\|doi=10.1109/34.1000236\|s2cid=691081 \|issn=0162-8828}}</ref> to compute the estimator <math>M_c</math> numerically. == Statistical implementation == A non-exhaustive list of software implementations of kernel density estimators includes: * In [[Analytica (software)\|Analytica]] release 4.4, the ''Smoothing'' option for PDF results uses KDE, and from expressions it is available via the built-in <code>Pdf</code> function. * In [[C (programming language)\|C]]/[[C++]], [https://github.com/vmorariu/figtree FIGTree] is a library that can be used to compute kernel density estimates using normal kernels. MATLAB interface available. * In [[C++]], [https://libagf.sf.net libagf] is a library for [[variable kernel density estimation]]. * In [[C++]], [[mlpack]] is a library that can compute KDE using many different kernels. It allows to set an error tolerance for faster computation. [[Python (programming language)\|Python]] and [[R (programming language)\|R]] interfaces are available. * in [[C Sharp (programming language)\|C#]] and [[F Sharp (programming language)\|F#]], [[Math.NET Numerics]] is an open source library for numerical computation which includes [https://numerics.mathdotnet.com/api/MathNet.Numerics.Statistics/KernelDensity.htm kernel density estimation] * In [[CrimeStat]], kernel density estimation is implemented using five different kernel functions – normal, uniform, quartic, negative exponential, and triangular. Both single- and dual-kernel density estimate routines are available. Kernel density estimation is also used in interpolating a Head Bang routine, in estimating a two-dimensional Journey-to-crime density function, and in estimating a three-dimensional Bayesian Journey-to-crime estimate. * In [[ELKI]], kernel density functions can be found in the package <code>de.lmu.ifi.dbs.elki.math.statistics.kernelfunctions</code> * In [[Environmental Systems Research Institute\|ESRI]] products, kernel density mapping is managed out of the Spatial Analyst toolbox and uses the Quartic(biweight) kernel. * In [[Microsoft Excel\|Excel]], the Royal Society of Chemistry has created an add-in to run kernel density estimation based on their [http://www.rsc.org/Membership/Networking/InterestGroups/Analytical/AMC/Software/kerneldensities.asp Analytical Methods Committee Technical Brief 4]. * In [[gnuplot]], kernel density estimation is implemented by the <code>smooth kdensity</code> option, the datafile can contain a weight and bandwidth for each point, or the bandwidth can be set automatically<ref>{{cite book \|last=Janert \|first=Philipp K \|title=Gnuplot in action : understanding data with graphs \|year=2009 \|publisher=Manning Publications \|___location=Connecticut, USA \|isbn=978-1-933988-39-9 }} See section 13.2.2 entitled ''Kernel density estimates''.</ref> according to "Silverman's rule of thumb" (see above). * In [[Haskell (programming language)\|Haskell]], kernel density is implemented in the [http://hackage.haskell.org/package/statistics statistics] package. * In [[IGOR Pro]], kernel density estimation is implemented by the <code>StatsKDE</code> operation (added in Igor Pro 7.00). Bandwidth can be user specified or estimated by means of Silverman, Scott or Bowmann and [[Adelchi Azzalini\|Azzalini]]. Kernel types are: Epanechnikov, Bi-weight, Tri-weight, Triangular, Gaussian and Rectangular. * In [[Java (programming language)\|Java]], the [[Weka (machine learning)\|Weka]] machine learning package provides [https://weka.sourceforge.net/doc.stable/weka/estimators/KernelEstimator.html weka.estimators.KernelEstimator], among others. * In [[JavaScript]], the visualization package [[D3js\|D3.js]] offers a KDE package in its science.stats package. * In [[JMP (statistical software)\|JMP]], the Graph Builder platform utilizes kernel density estimation to provide contour plots and high density regions (HDRs) for bivariate densities, and violin plots and HDRs for univariate densities. Sliders allow the user to vary the bandwidth. Bivariate and univariate kernel density estimates are also provided by the Fit Y by X and Distribution platforms, respectively. * In [[Julia (programming language)\|Julia]], kernel density estimation is implemented in the [https://github.com/JuliaStats/KernelDensity.jl KernelDensity.jl] package. * In [[KNIME]], 1D and 2D Kernel Density distributions can be generated and plotted using nodes from the [[Vernalis Research\|Vernalis]] community contribution, e.g. [https://hub.knime.com/vernalis/extensions/com.vernalis.knime.feature/latest/com.vernalis.knime.plot.nodes.kerneldensity.KernelDensity1DPlotNodeFactory/ 1D Kernel Density Plot], among others. The underlying implementation is written in [[Java (programming language)\|Java]]. * In [[MATLAB]], kernel density estimation is implemented through the <code>ksdensity</code> function (Statistics Toolbox). As of the 2018a release of MATLAB, both the bandwidth and kernel smoother can be specified, including other options such as specifying the range of the kernel density.<ref>{{Cite web\|title=Kernel smoothing function estimate for univariate and bivariate data - MATLAB ksdensity\|url=https://www.mathworks.com/help/stats/ksdensity.html\|access-date=2020-11-05\|website=www.mathworks.com}}</ref> Alternatively, a free MATLAB software package which implements an automatic bandwidth selection method<ref name="bo07"/> is available from the MATLAB Central File Exchange for [http://www.mathworks.com/matlabcentral/fileexchange/14034-kernel-density-estimator 1-dimensional data] [http://www.mathworks.com/matlabcentral/fileexchange/17204-kernel-density-estimation 2-dimensional data] ** [http://www.mathworks.com/matlabcentral/fileexchange/58312-kernel-density-estimator-for-high-dimensions n-dimensional data] <br /> A free MATLAB toolbox with implementation of kernel regression, kernel density estimation, kernel estimation of hazard function and many others is available on [http://www.math.muni.cz/english/science-and-research/developed-software/232-matlab-toolbox.html these pages] (this toolbox is a part of the book <ref name="HorKolZel">{{cite book\|last1=Horová\|first1=I.\|last2=Koláček\|first2=J.\|last3=Zelinka\|first3=J.\|title=Kernel Smoothing in MATLAB: Theory and Practice of Kernel Smoothing\|date=2012\|publisher=World Scientific Publishing\|___location=Singapore\|isbn=978-981-4405-48-5}}</ref>). * In [[Mathematica]], numeric kernel density estimation is implemented by the function <code>SmoothKernelDistribution</code><ref>{{Cite web\|title = SmoothKernelDistribution—Wolfram Language Documentation\|url=https://reference.wolfram.com/language/ref/SmoothKernelDistribution.html.en\|access-date=2020-11-05\|website=reference.wolfram.com}}</ref> and symbolic estimation is implemented using the function <code>KernelMixtureDistribution</code><ref>{{Cite web\|title=KernelMixtureDistribution—Wolfram Language Documentation\|url=https://reference.wolfram.com/language/ref/KernelMixtureDistribution.html.en\|access-date=2020-11-05\|website=reference.wolfram.com}}</ref> both of which provide data-driven bandwidths. * In [[Minitab]], the Royal Society of Chemistry has created a macro to run kernel density estimation based on their Analytical Methods Committee Technical Brief 4.<ref>{{Cite web\|title=Software for calculating kernel densities\|url=https://www.rsc.org/Membership/Networking/InterestGroups/Analytical/AMC/Software/kerneldensities.asp\|access-date=2020-11-05\|website=www.rsc.org}}</ref> * In the [[NAG Numerical Library\|NAG Library]], kernel density estimation is implemented via the <code>g10ba</code> routine (available in both the Fortran<ref>{{cite web \|last=The Numerical Algorithms Group \|title=NAG Library Routine Document: nagf_smooth_kerndens_gauss (g10baf) \|work=NAG Library Manual, Mark 23 \|url=http://www.nag.co.uk/numeric/fl/nagdoc_fl23/pdf/G10/g10baf.pdf \|access-date=2012-02-16 }}</ref> and the C<ref>{{cite web \|last=The Numerical Algorithms Group \|title=NAG Library Routine Document: nag_kernel_density_estim (g10bac) \|work=NAG Library Manual, Mark 9 \|url=http://www.nag.co.uk/numeric/CL/nagdoc_cl09/pdf/G10/g10bac.pdf \|access-date=2012-02-16 \|archive-url=https://web.archive.org/web/20111124062333/http://nag.co.uk/numeric/cl/nagdoc_cl09/pdf/G10/g10bac.pdf \|archive-date=2011-11-24 \|url-status=dead }}</ref> versions of the Library). * In [https://nuklei.sourceforge.net/ Nuklei], [[C++]] kernel density methods focus on data from the Special Euclidean group <math>SE(3)</math>. * In [[GNU Octave\|Octave]], kernel density estimation is implemented by the <code>kernel_density</code> option (econometrics package). * In [[Origin (data analysis software)\|Origin]], 2D kernel density plot can be made from its user interface, and two functions, Ksdensity for 1D and Ks2density for 2D can be used from its [http://wiki.originlab.com/~originla/ltwiki/index.php?title=Category:LabTalk_Programming LabTalk], [[Python (programming language)\|Python]], or [[C (programming language)\|C]] code. * In [[Perl]], an implementation can be found in the [http://search.cpan.org/~janert/Statistics-KernelEstimation-0.05 Statistics-KernelEstimation module] * In [[PHP]], an implementation can be found in the [https://github.com/markrogoyski/math-php MathPHP library] * In [[Python (programming language)\|Python]], many implementations exist: [http://pythonhosted.org/PyQt-Fit/mod_kde.html pyqt_fit.kde Module] in the [https://pypi.python.org/packages/source/P/PyQt-Fit/PyQt-Fit-1.3.4.tar.gz PyQt-Fit package], [[SciPy]] (<code>scipy.stats.gaussian_kde</code>), Statsmodels (<code>KDEUnivariate</code> and <code>KDEMultivariate</code>), and [[scikit-learn]] (<code>KernelDensity</code>) (see comparison<ref>{{cite web \|last=Vanderplas\|first=Jake\|title=Kernel Density Estimation in Python\|date=2013-12-01\|url=https://jakevdp.github.io/blog/2013/12/01/kernel-density-estimation/\|access-date=2014-03-12}}</ref>). [https://kdepy.readthedocs.io/en/latest/ KDEpy] supports weighted data and its FFT implementation is orders of magnitude faster than the other implementations. The commonly used pandas library [https://pandas.pydata.org/] offers support for kde plotting through the plot method (<code>df.plot(kind='kde')</code>[https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.plot.kde.html]). The [https://getdist.readthedocs.io getdist] package for weighted and correlated MCMC samples supports optimized bandwidth, boundary correction and higher-order methods for 1D and 2D distributions. One newly used package for kernel density estimation is seaborn ( <code>import seaborn as sns</code> , <code>sns.kdeplot()</code> ).<ref>{{Cite web\|title=seaborn.kdeplot — seaborn 0.10.1 documentation\|url=https://seaborn.pydata.org/generated/seaborn.kdeplot.html\|website=seaborn.pydata.org\|access-date=2020-05-12}}</ref> A GPU implementation of KDE also exists.<ref>{{Cite web\|url=https://pypi.org/project/kde-gpu/#description\|title=Kde-gpu: We implemented nadaraya waston kernel density and kernel conditional probability estimator using cuda through cupy. It is much faster than cpu version but it requires GPU with high memory}}</ref> * In [[R (programming language)\|R]], it is implemented through <code>density</code> in the base distribution, and <code>bw.nrd0</code> function is used in stats package, this function uses the optimized formula in Silverman's book. <code>bkde</code> in the [https://cran.r-project.org/web/packages/KernSmooth/index.html KernSmooth library], <code>ParetoDensityEstimation</code> in the [https://CRAN.R-project.org/package=DataVisualizations DataVisualizations library] (for [[pareto distribution]] density estimation), <code>kde</code> in the [https://cran.r-project.org/web/packages/ks/index.html ks library], <code>dkden</code> and <code>dbckden</code> in the [https://cran.r-project.org/web/packages/evmix/index.html evmix library] (latter for boundary corrected kernel density estimation for bounded support), <code>npudens</code> in the [https://cran.r-project.org/web/packages/np/index.html np library] (numeric and [[categorical variable\|categorical data]]), <code>sm.density</code> in the [https://cran.r-project.org/web/packages/sm/index.html sm library]. For an implementation of the <code>kde.R</code> function, which does not require installing any packages or libraries, see [http://web.maths.unsw.edu.au/~zdravkobotev/kde.R kde.R]. The [https://cran.r-project.org/web/packages/btb/index.html btb library], dedicated to urban analysis, implements kernel density estimation through <code>kernel_smoothing</code>. * In [[SAS (software)\|SAS]], <code>proc kde</code> can be used to estimate univariate and bivariate kernel densities. * In [[Apache Spark]], the <code>KernelDensity()</code> class<ref>{{Cite web\|title=Basic Statistics - RDD-based API - Spark 3.0.1 Documentation\|url=http://spark.apache.org/docs/latest/mllib-statistics.html#kernel-density-estimation\|access-date=2020-11-05\|website=spark.apache.org}}</ref> * In [[Stata]], it is implemented through <code>kdensity</code>;<ref>{{cite web\|url=https://www.stata.com/manuals15/rkdensity.pdf\|title=kdensity — Univariate kernel density estimation\|work=Stata 15 manual}}</ref> for example <code>histogram x, kdensity</code>. Alternatively a free Stata module KDENS is available<ref>{{Citation \|last=Jann \|first=Ben \|title=KDENS: Stata module for univariate kernel density estimation \|journal=Statistical Software Components \|date=2008-05-26 \|url=https://ideas.repec.org/c/boc/bocode/s456410.html \|publisher=Boston College Department of Economics \|access-date=2022-10-15}}</ref> allowing a user to estimate 1D or 2D density functions. * In [[Swift (programming language)\|Swift]], it is implemented through <code>SwiftStats.KernelDensityEstimation</code> in the open-source statistics library [https://github.com/r0fls/swiftstats SwiftStats]. ==See also== {{Commons category\|Kernel density estimation}} * [[Kernel (statistics)]] * [[Kernel smoothing]] * [[Kernel regression]] * [[Density estimation]] (with presentation of other examples) * [[Mean-shift]] * [[Scale space]]: The triplets {(''x'', ''h'', KDE with bandwidth ''h'' evaluated at ''x'': all ''x'', ''h'' > 0} form a [[scale space]] representation of the data. * [[Multivariate kernel density estimation]] * [[Variable kernel density estimation]] * [[Head/tail Breaks\|Head/tail breaks]] ==Further reading== * {{ cite book \| last1 = Härdle \| first1 = Wolfgang \| last2 = Müller \| first2 = Marlene \| last3 = Sperlich \| first3 = Stefan \| last4 = Werwatz \| first4 = Axel \|title = Nonparametric and Semiparametric Models \| publisher = Springer-Verlag \| isbn = 978-3-540-20722-1 \| series = Springer Series in Statistics \|___location = Berlin Heidelberg \| date = 2004 \| pages = 39–83}} ==References== {{Reflist\|30em}} ==External links== * [http://www.mvstat.net/tduong/research/seminars/seminar-2001-05 Introduction to kernel density estimation] A short tutorial which motivates kernel density estimators as an improvement over histograms. * [https://www.neuralengine.org/res/kernel.html Kernel Bandwidth Optimization] A free online tool that generates an optimized kernel density estimate. * [http://www.wessa.net/rwasp_density.wasp Free Online Software (Calculator)] computes the Kernel Density Estimation for a data series according to the following Kernels: Gaussian, Epanechnikov, Rectangular, Triangular, Biweight, Cosine, and Optcosine. * [http://pcarvalho.com/things/kerneldensityestimation/index.html Kernel Density Estimation Applet] An online interactive example of kernel density estimation. Requires .NET 3.0 or later. {{DEFAULTSORT:Kernel density estimation}} [[Category:Estimation of densities]] [[Category:Nonparametric statistics]] [[Category:Machine learning]]