Distance correlation: Difference between revisions

Content deleted Content added
m Distance covariance: "squared" was incorrect here
- updated intro to be more approachable for non-experts. larger focus on the ability to detect linear + nonlinear interactions, how dcorr can be used as a statistical test, and more scope for dcorr (i.e., kernel based methods, and its use in CCA and ICA)
Line 1:
In [[statistics]] and in [[probability theory]], '''distance correlation''' or '''distance covariance''' is a measure of [[statistical dependence]] between two [[random variable]]s or twopaired [[random vector]]s of arbitrary, not necessarily equal, [[Euclidean vector|dimension]]. ItIn the limit of an infinite number of samples, the distance correlation is zero if and only if the [[multivariate random variable|random variables]]vectors are [[statistically independent]]. Thus, unlikedistance correlation can detect both linear and nonlinear interactions between two random vectors. This is in contrast to [[Pearson's correlation]], which can beonly zerodetect forlinear dependentinteractions between two [[random variablesvariable]]s.
 
Distance correlation can be used to perform a [[Statistical hypothesis testing|statistical test]] of dependence with a [[permutation test]]. One first computes the distance correlation (involving the re-centering of Euclidean distance matrices) between two random vectors, and then compares this value to the distance correlations of many shuffles of the data.
The distance correlation is derived from a number of other quantities that are used in its specification, specifically: '''distance variance''', '''distance standard deviation''' and '''distance covariance'''. These quantities take the same roles as the ordinary [[Moment (mathematics)|moment]]s with corresponding names in the specification of the [[Pearson product-moment correlation coefficient]].
 
TheseDistance distance-based measurescorrelation can be put into an indirect relationship to the ordinary moments by an [[#Alternative formulation: Brownian covariance|alternative formulation]] (described below) using ideas related to [[Brownian motion]],. and thisThis has led to the use of names such as '''Brownian covariance''' and '''Brownian distance covariance'''. Other correlational metrics, including kernel-based correlational metrics (such as the Hilbert-Schmidt Independence Criterion or HSIC) can also detect linear and nonlinear interactions. Both distance correlation and kernel-based metrics can be used in methods such as [[canonical correlation analysis]] and [[independent component analysis]] to yield stronger [[statistical power]].
 
[[Image:Distance Correlation Examples.svg|thumb|400px|right|Several sets of (''x'', ''y'') points, with the distance correlation coefficient of ''x'' and ''y'' for each set. Compare to the graph on [[correlation]]]]
Line 10:
 
The classical measure of dependence, the [[Pearson product-moment correlation coefficient|Pearson correlation coefficient]],<ref>Pearson (1895)</ref> is mainly sensitive to a linear relationship between two variables. Distance correlation was introduced in 2005 by [[Gabor J Szekely]] in several lectures to address this deficiency of Pearson’s [[correlation]], namely that it can easily be zero for dependent variables. Correlation = 0 (uncorrelatedness) does not imply independence while distance correlation = 0 does imply independence. The first results on distance correlation were published in 2007 and 2009.<ref name=SR2007>{{citation|author1=G. J. Szekely |author2=M. L. Rizzo |author3=N. K. Bakirov | year=2007| title= Measuring and Testing Independence by Correlation of Distances| journal= Annals of Statistics| volume=35| issue=6| pages=2769–2794| url=http://dx.doi.org/10.1214/009053607000000505}}.</ref><ref name=SR2009>Székely & Rizzo (2009)</ref> It was proved that distance covariance is the same as the Brownian covariance.<ref name=SR2009/> These measures are examples of [[energy distance]]s.
 
The distance correlation is derived from a number of other quantities that are used in its specification, specifically: '''distance variance''', '''distance standard deviation''' and '''distance covariance'''. These quantities take the same roles as the ordinary [[Moment (mathematics)|moment]]s with corresponding names in the specification of the [[Pearson product-moment correlation coefficient]].
 
==Definitions==