Content deleted Content added
No edit summary |
No edit summary |
||
Line 65:
\int \operatorname{tr}^2 (\bold{H} \widehat{\operatorname{D}^2 f}_{\bold{G}} (\bold{x})) \, d\bold{x}</math>
where <math>\widehat{\operatorname{D}^2 f}_{\bold{G}} (\bold{x}) = n^{-1} \sum_{i=1}^n D^2 K_\bold{G} (\bold{x} - \bold{X}_i)</math>. Thus <math>\hat{\bold{H}}_{\operatorname{PI}} = \operatorname{argmin}_{\bold{H} \in F} \, \operatorname{PI} (\bold{H})</math> is the plug-in selector<ref>{{cite journal | author1=Wand, M.P. | author2=Jones, M.C. | title=Multivariate plug-in bandwidth selection | journal=Computational Statistics | year=1994 | volume=9 | pages=97-177}}</ref><ref>{{cite journal | doi=10.1080/10485250306039 | author1=Duong, T. | author2=Hazelton, M.L. | title=Plug-in bandwidth matrices for bivariate kernel density estimation | journal=Journal of Nonparametric Statistics | year=2003 | volume=15 | pages=17-30}}</ref>. These references also contain
=== Smoothed cross validation ===
Line 75:
Thus <math>\hat{\bold{H}}_{\operatorname{SCV}} = \operatorname{argmin}_{\bold{H} \in F} \, \operatorname{SCV} (\bold{H})</math> is the SCV selector<ref>{{cite journal | doi=10.1007/BF01205233 | author1=Hall, P. | author2=Marron, J. | author3=Park, B. | title=Smoothed cross-validation | journal=Probability Theory and Related Fields | year=1992 | volume=92 | pages=1-20}}</ref><ref>{{cite journal | doi=10.1111/j.1467-9469.2005.00445.x | author1=Duong, T. | author2=Hazelton, M.L. | title=Cross validation bandwidth matrices for multivariate kernel density estimation | journal=Scandinavian Journal of Statistics | year=2005 | volume=32 | pages=485-506}}</ref>.
These references also contain
== Implementation in the R statistical software ==
The [http://cran.r-project.org/web/packages/ks/index.html ks package] in the [[R programming language]] implements the plug-in and smoothed cross validation selectors. This example is based on the [[Old Faithful Geyser]] in Yellowstone National Park, USA. This dataset contains
272 records with two measurements each: the eruption duration time (minutes) and the
waiting time until the next eruption (minutes), contained in the base distribution of R.
[[Image:Old Faithful Geyser KDE with plugin bandwidth.png|thumb]]
This code snippet computes the kernel density estimate with the plug-in bandwidth matrix. The coloured contours correspond to the smallest region which contains that corresponding probability mass: red = 25%, orange + red = 50%, yellow + orange + red = 75%. To compute the SCV selector, replace <code>Hpi</code> with <code>Hscv</code>.
<pre>
library(ks)
data(faithful)
H <- Hpi(x=faithful)
fhat <- kde(x=faithful, H=H)
plot(fhat, display="filled.contour2")
points(faithful)
</pre>
|