Distance correlation: Difference between revisions

Content deleted Content added
Link suggestions feature: 2 links added.
 
(98 intermediate revisions by 39 users not shown)
Line 1:
{{Short description|Statistical measure}}
{{Cleanup|date=July 2010}}
In [[statistics]] and in [[probability theory]], '''distance correlation''' or '''distance covariance''' is a measure of [[Independence (probability theory)|dependence]] between two paired [[random vector]]s of arbitrary, not necessarily equal, [[Euclidean vector|dimension]]. The population distance correlation coefficient is zero if and only if the random vectors are [[Independence (probability theory)|independent]]. Thus, distance correlation measures both linear and nonlinear association between two random variables or random vectors. This is in contrast to [[Pearson's correlation]], which can only detect linear association between two [[random variable]]s.
 
Distance correlation can be used to perform a [[Statistical hypothesis testing|statistical test]] of dependence with a [[permutation test]]. One first computes the distance correlation (involving the re-centering of Euclidean distance matrices) between two random vectors, and then compares this value to the distance correlations of many shuffles of the data.
In [[statistics]] and in [[probability theory]], '''distance correlation''' is a measure of [[statistical dependence]] between two [[random variable]]s or two [[random vector]]s of arbitrary, not necessarily equal [[Euclidean vector|dimension]]. An important property is that this measure of dependence is zero if and only if the [[multivariate random variable|random variables]] are [[statistically independent]]. This measure is derived from a number of other quantities that are used in its specification, specifically: '''distance variance''', '''distance standard deviation''' and '''distance covariance'''. These take the same roles as the ordinary [[Moment (mathematics)|moment]]s with corresponding names in the specification of the [[Pearson product-moment correlation coefficient]].
 
[[Image:Distance Correlation Examples.svg|thumb|400pxupright=1.8|right|Several sets of (''x'', ''y'') points, with the Distancedistance correlation coefficient of ''x'' and ''y'' for each set. Compare to the graph on [[correlation]]]]
These distance-based measures can be put into an indirect relationship to the ordinary moments by an [[#Alternative formulation: Brownian covariance|alternative formulation]] (described below) using ideas related to [[Brownian motion]], and this has led to the use of names such as '''Brownian covariance''' and '''Brownian distance covariance'''.
 
[[Image:Distance Correlation Examples.svg|thumb|400px|right|Several sets of (''x'', ''y'') points, with the Distance correlation coefficient of ''x'' and ''y'' for each set. Compare to the graph on [[correlation]]]]
 
==Background==
 
The classical measure of dependence, the [[Pearson product-moment correlation coefficient|Pearson correlation coefficient]],<ref>{{harvs|nb|last=Pearson (1895)|year=1895a|year2=1895b}}</ref> is mainly sensitive to a linear relationship between two variables. Distance correlation was introduced in 2005 by [[GaborGábor J. SzekelySzékely]] in several lectures to address this deficiency of Pearson’sPearson's [[correlation]], namely that it can easily be zero for dependent variables. Correlation = 0 (uncorrelatedness) does not imply independence while distance correlation = 0 does imply independence. The first results on distance correlation were published in 2007 and 2009.<ref name=SR2007>{{sfn|Székely, |Rizzo and |Bakirov (|2007)</ref><ref name=SR2009>}}{{sfn|Székely & |Rizzo (2009)</ref>|2009a}} It was proved that distance covariance is the same as the Brownian covariance.<ref name=SR2009/>{{sfn|Székely|Rizzo|2009a}} These measures are examples of [[energy distance]]s.
 
InThe [[statistics]] and in [[probability theory]], '''distance correlation''' is a measure of [[statistical dependence]] between two [[random variable]]s or two [[random vector]]s of arbitrary, not necessarily equal [[Euclidean vector|dimension]]. An important property is that this measure of dependence is zero if and only if the [[multivariate random variable|random variables]] are [[statistically independent]]. This measure is derived from a number of other quantities that are used in its specification, specifically: '''distance variance''', '''distance standard deviation''', and '''distance covariance'''. These quantities take the same roles as the ordinary [[Moment (mathematics)|moment]]s with corresponding names in the specification of the [[Pearson product-moment correlation coefficient]].
 
==Definitions==
Line 15 ⟶ 16:
===Distance covariance===
 
Let us start with the definition of the '''sample distance covariance'''. Let (''X''<sub>''k''</sub>,&nbsp;''Y''<sub>''k''</sub>), ''k''&nbsp;= 1, 2, ..., ''n'' be a [[statistical sample]] from a pair of real valued or vector valued random variables (''X'',&nbsp;''Y''). First, compute the ''n'' by ''n'' [[distance matrix|distance matrices]] (''a''<sub>''j'', ''k''</sub>) and (''b''<sub>''j'', ''k''</sub>) containing all pairwise [[Euclidean distance|distances]]
 
:<math>
Line 24 ⟶ 25:
</math>
 
where || &sdot; || denotes [[Euclidean norm]]. That is, compute the ''n'' by ''n'' distance matrices (''a''<sub>''j'', ''k''</sub>) and (''b''<sub>''j'', ''k''</sub>). Then take all doubly centered distances
 
:<math>
A_{j, k} := a_{j, k}-\overline{a}_{j.\cdot}-\overline{a}_{.\cdot k} + \overline{a}_{..\cdot\cdot}, \qquad
B_{j, k} := b_{j, k} - \overline{b}_{j.\cdot} -\overline{b}_{.\cdot k} + \overline{b}_{..\cdot\cdot},
</math>
 
where <math>\textstyle \overline{a}_{j.\cdot}</math> is the {{math|''j''}}-th row mean, <math>\textstyle \overline{a}_{.\cdot k}</math> is the {{math|''k''}}-th column mean, and <math>\textstyle \overline{a}_{..\cdot\cdot}</math> is the [[grand mean]] of the distance matrix of the {{math|''X''}} sample. The notation is similar for the {{math|''b''}} values. (In the matrices of centered distances (''A''<sub>''j'', ''k''</sub>) and (''B''<sub>''j'',''k''</sub>) all rows and all columns sum to zero.) The squared '''sample distance covariance''' (a scalar) is simply the arithmetic average of the products ''A''<sub>''j'', ''k ''</sub>''B''<sub>''j'', ''k''</sub>:
 
:<math>
\operatorname{dCov}^2_n(X,Y) := \frac{ 1} {n^2} \sum_{j, = 1}^n \sum_{k = 1}^n A_{j, k} \, B_{j, k}.
</math>
The statistic ''T''<sub>''n''</sub> = ''n'' dCov<sup>2</sup><sub>''n''</sub>(''X'', ''Y'') determines a consistent multivariate test of independence of random vectors in arbitrary dimensions. For an implementation see ''dcov.test'' function in the ''energy'' package for [[R (programming language)|R]].<ref name=energy>[http://cran.us.r-project.org/web/packages/energy/index.html energy package for R]</ref>
 
The statistic ''T''<sub>''n''</sub> = ''n'' dCov<sup>2</sup><sub>''n''</sub>(''X'', ''Y'') determines a consistent multivariate test of independence of random vectors in arbitrary dimensions. For an implementation see ''dcov.test'' function in the ''energy'' package for [[R (programming language)|R]].<ref name=energy>[http://cran.us.r-project.org/web/packages/energy/index.html energy package for R]</ref>{{sfn|Rizzo|Székely|2021}}
The population value of '''distance covariance''' can be defined along the same lines. Let ''X'' be a random variable that takes values in a ''p''-dimensional Euclidean space with probability distribution {{math| &mu;}} and let ''Y'' be a random variable that takes values in a ''q''-dimensional Euclidean space with probability distribution {{math| &nu;}}, and suppose that ''X'' and ''Y'' have finite expectations. Write
 
The population value of '''distance covariance''' can be defined along the same lines. Let ''X'' be a random variable that takes values in a ''p''-dimensional Euclidean space with probability distribution {{math| &mu;}} and let ''Y'' be a random variable that takes values in a ''q''-dimensional Euclidean space with probability distribution {{math| &nu;}}, and suppose that ''X'' and ''Y'' have finite expectations. Write
 
:<math>a_\mu(x):= \operatorname{E}[\|X-x\|], \quad D(\mu) := \operatorname{E}[a_\mu(X)], \quad d_\mu(x, x') := \|x-x'\|-a_\mu(x)-a_\mu(x')+D(\mu).
Line 50 ⟶ 53:
:<math>
\begin{align}
\operatorname{dCov}^2(X,Y) & := {} & \operatorname{E}[\|X-X'\|\,\|Y-Y'\|] + \operatorname{E}[\|X-X'\|]\,\operatorname{E}[\|Y-Y'\|] \\
&\qquad {} - \operatorname{E}[\|X-X'\|\,\|Y-Y''\|] - \operatorname{E}[\|X-X''\|\,\|Y-Y'\|]
\\
& = {} & \operatorname{E}[\|X-X'\|\,\|Y-Y'\|] + \operatorname{E}[\|X-X'\|]\,\operatorname{E}[\|Y-Y'\|] \\
&\qquad {} - 2\operatorname{E}[\|X-X'\|\,\|Y-Y''\|],
\end{align}
</math>
 
where '''''E''''' denotes expected value, and <math>\textstyle (X, Y),</math> <math>\textstyle (X', Y'),</math> and <math>\textstyle (X'',Y'')</math> are independent and identically distributed. DistanceThe covarianceprimed canrandom bevariables expressed<math>\textstyle in(X', termsY')</math> ofand Pearson’s<math>\textstyle covariance(X'',Y'')</math> denote
independent and identically distributed (iid) copies of the variables <math>X</math> and <math>Y</math> and are similarly iid.{{sfn|Székely|Rizzo|2014|p=11}} Distance covariance can be expressed in terms of the classical Pearson's [[covariance]],
'''cov''', as follows:
 
Line 64 ⟶ 68:
</math>
 
This identity shows that the distance covariance is not the same as the covariance of distances, {{nowrap|cov(|{{norm|''X''-''X' ''||}}, |{{norm|''Y''-''Y' '' ||}}}}). This can be zero even if ''X'' and ''Y'' are not independent.
 
AlternatelyAlternatively, the squared distance covariance can be defined as the weighted {{math[[Norm (mathematics)#Euclidean_norm|''L''<subsup>2</subsup>}} norm]] of the distance between the joint [[Characteristic function (probability theory)|characteristic function]] of the random variables and the product of their marginal characteristic functions:<ref name=SR2009a>{{harvnb|Székely & |Rizzo (2009)|2009a|p=1249}}, Theorem 7, (3.7), p. 1249.</ref>
 
: <math>
\operatorname{dCov}^2(X,Y)= \frac{ 1} {c_p c_q} \int_{\mathbb{R}^{p+q}} \frac{\left| \phi_varphi_{X,Y}(s, t) - \phi_Xvarphi_X(s)\phi_Yvarphi_Y(t) \right|^2}{|s|_p^{1+p} |t|_q^{1+q}} \,dt\,ds
</math>
 
where ''ϕ''<submath>''\varphi_{X'', ''Y''</sub>}(''s'', ''t'')</math>, {{nowrap|''ϕ''<submath>''\varphi_{X''}(s)</submath>(''s''),}} and {{nowrap|''ϕ''<submath>''\varphi_{Y''}(t)</submath>(''t'')}} are the [[Characteristic function (probability theory)|characteristic functions]] of {{nowrap|(''X'', ''Y''),}} ''X'', and ''Y'', respectively, ''p'', ''q'' denote the Euclidean dimension of ''X'' and ''Y'', and thus of ''s'' and ''t'', and ''c''<sub>''p''</sub>, ''c''<sub>''q''</sub> are constants. The weight function <math>({c_p c_q}{|s|_p^{1+p} |t|_q^{1+q}})^{-1}</math> is chosen to produce a scale equivariant and rotation [[invariant measure]] that doesn't go to zero for dependent variables.<ref name=SR2009a/><ref>{{cite journalsfn|author= Székely, G. J. and |Rizzo, M. L.|title=On the uniqueness of distance covariance|journal= Statistics & Probability Letters | year=2012| volume=82 |issue=12 | pages=2278–2282 | doi=10.1016/j.spl.2012.08.007}}</ref> One interpretation<ref name=neustats2012>{{cite web|url=http://www.neustats.com/neu-da-documentation/how-distance-correlation-works/|title=How distance correlation works|accessdate=2012-12-13}}</ref> of the characteristic function definition is that the variables ''e<sup>isX</sup>'' and ''e<sup>itY</sup>'' are cyclic representations of ''X'' and ''Y'' with different periods given by ''s'' and ''t'', and the expression {{nowrap|''ϕ''<sub>''X'', ''Y''</sub>(''s'', ''t'') - ''ϕ''<sub>''X''</sub>(''s'') ''ϕ''<sub>''Y''</sub>(''t'')}} in the numerator of the characteristic function definition of distance covariance is simply the classical covariance of ''e<sup>isX</sup>'' and ''e<sup>itY</sup>''. The characteristic function definition clearly shows that
dCov<sup>2</sup>(''X'', ''Y'') = 0 if and only if ''X'' and ''Y'' are independent.
 
===Distance variance and distance standard deviation===
 
The ''distance variance'' is a special case of distance covariance when the two variables are identical. The population value of distance variance is the square root of
 
The ''distance variance'' is a special case of distance covariance when the two variables are identical.
The population value of distance variance is the square root of
:<math>
\operatorname{dVar}^2(X) := \operatorname{E}[\|X-X'\|^2] + \operatorname{E}^2[\|X-X'\|] - 2\operatorname{E}[\|X-X'\|\,\|X-X''\|],
</math>
 
where <math>\operatorname{E}</math> denotes the expected value, <math>X'</math> is an independent and identically distributed copy of <math>X</math> and <math>X''</math> is independent of <math>X</math> and <math>X'</math> and has the same distribution as <math>X</math> and <math>X'</math>.
where <math>X</math>, <math>X'</math>, and <math>X''</math> are [[independent and identically distributed random variables]], <math>\operatorname{E}</math> denotes the [[expected value]], and <math>f^2(\cdot)=(f(\cdot))^2</math> for function <math>f(\cdot)</math>, e.g., <math>\operatorname{E}^2[\cdot] = (\operatorname{E}[\cdot])^2</math>.
 
The ''sample distance variance'' is the square root of
 
:<math>
\operatorname{dVar}^2_n(X) := \operatorname{dCov}^2_n(X,X) = \tfrac{1}{n^2}\sum_{k,\ell}A_{k,\ell}^2,
</math>
which is a relative of [[Corrado Gini]]’s [[mean difference]] introduced in 1912 (but Gini did not work with centered distances).
 
which is a relative of [[Corrado Gini]]’s's [[Mean absolute difference|mean difference]] introduced in 1912 (but Gini did not work with centered distances).{{sfn|Gini|1912}}
===Distance standard deviation===
 
The ''distance standard deviation'' is the square root of the ''distance variance''.
Line 96 ⟶ 101:
===Distance correlation===
 
The ''distance correlation'' <ref name=SR2007/><ref name=SR2009/>{{sfn|Székely|Rizzo|Bakirov|2007}}{{sfn|Székely|Rizzo|2009a}} of two random variables is obtained by dividing their ''distance covariance'' by the product of their ''distance standard deviations''. The distance correlation is the square root of
 
:<math>
\operatorname{dCor}^2(X,Y) = \frac{\operatorname{dCov}^2(X,Y)}{\sqrt{\operatorname{dVar}^2(X)\,\operatorname{dVar}^2(Y)}},
</math>
 
and the ''sample distance correlation'' is defined by substituting the sample distance covariance and distance variances for the population coefficients above.
 
For easy computation of sample distance correlation see the ''dcor'' function in the ''energy'' package for [[R (programming language)|R]].<ref name=energy />{{sfn|Rizzo|Székely|2021}}
 
==Properties==
 
===Distance correlation===
{{Ordered list |list_style_type=lower-roman
(i) |<math>0\leq\operatorname{dCor}_n(X,Y)\leq1</math> and <math>0\leq\operatorname{dCor}(X,Y)\leq1</math>.;
 
this is in contrast to Pearson's correlation, which can be negative.
(i) <math>0\leq\operatorname{dCor}_n(X,Y)\leq1</math> and <math>0\leq\operatorname{dCor}(X,Y)\leq1</math>.
 
(ii) <math>\operatorname{dCor}(X,Y) = 0</math> if and only if <math>X</math> and <math>Y</math> are independent.
 
(ii) |<math>\operatorname{dCor}(X,Y) = 0</math> if and only if <math>{{mvar|X</math>}} and <math>{{mvar|Y</math>}} are independent.
(iii) <math>\operatorname{dCor}_n(X,Y) = 1</math> implies that dimensions of the linear subspaces spanned by <math>X</math> and <math>Y</math> samples respectively are almost surely equal and if we assume that these subspaces are equal, then in this subspace <math>Y = A + b\,\mathbf{C}X</math> for some vector <math>A</math>, scalar <math>b</math>, and [[orthonormal matrix]] <math>\mathbf{C}</math>.
 
(iii) |<math>\operatorname{dCor}_n(X,Y) = 1</math> implies that dimensions of the linear subspaces spanned by <math>{{mvar|X</math>}} and <math>{{mvar|Y</math>}} samples respectively are almost surely equal and if we assume that these subspaces are equal, then in this subspace <math>Y = A + b\,\mathbf{C}X</math> for some vector <math>{{mvar|A</math>}}, scalar <math>{{mvar|b</math>}}, and [[orthonormal matrix]] <math>\mathbf{C}</math>.
}}
===Distance covariance===
{{Ordered list |list_style_type=lower-roman
(ii) |<math>\operatorname{dCov}^2(a_1 + b_1\,\mathbf{C}_1\,X, a_2 + b_2\,\mathbf{C}_2\,Y)\geq0</math> =and |b_1\,b_2|<math>\operatorname{dCov}^2_n(X,Y)\geq0</math>;
 
(i) |<math>\operatorname{dCov}^2(a_1 + b_1\,\mathbf{C}_1\,X, a_2 + b_2\,\mathbf{C}_2\,Y)\geq0</math> and= <math>|b_1\,b_2|\operatorname{dCov}_n^2(X,Y)\geq0</math>.
 
(ii) <math>\operatorname{dCov}^2(a_1 + b_1\,\mathbf{C}_1\,X, a_2 + b_2\,\mathbf{C}_2\,Y) = |b_1\,b_2|\operatorname{dCov}^2(X,Y)</math>
for all constant vectors <math>a_1, a_2</math>, scalars <math>b_1, b_2</math>, and orthonormal matrices <math>\mathbf{C}_1, \mathbf{C}_2</math>.
 
(iii) |If the random vectors <math>(X_1, Y_1)</math> and <math>(X_2, Y_2)</math> are independent then
:<math>
\operatorname{dCov}(X_1 + X_2, Y_1 + Y_2) \leq \operatorname{dCov}(X_1, Y_1) + \operatorname{dCov}(X_2, Y_2).
Line 128 ⟶ 136:
Equality holds if and only if <math>X_1</math> and <math>Y_1</math> are both constants, or <math>X_2</math> and <math>Y_2</math> are both constants, or <math>X_1, X_2, Y_1, Y_2</math> are mutually independent.
 
(iv) |<math>\operatorname{dCov}(X,Y) = 0</math> if and only if <math>{{mvar|X</math>}} and <math>{{mvar|Y</math>}} are independent.
}}
 
This last property is the most important effect of working with centered distances.
 
The statistic <math>\operatorname{dCov}^2_n(X,Y)</math> is a biased estimator of <math>\operatorname{dCov}^2(X,Y)</math>. Under independence of X and Y <ref>{{sfn|Székely and |Rizzo (2009), Rejoinder</ref> |2009b}}
 
:<math>
\begin{align}
\operatorname{E}[\operatorname{dCov}^2_n(X,Y)] & = \frac{n-1}{n^2} \left\{(n-2) \operatorname{dCov}^2(X,Y)+ \operatorname{E}[\|X-X'\|]\,\operatorname{E}[\|Y-Y'\|] \right\} = \frac{n-1}{n^2}\operatorname {E}[\|X-X'\|]\,\operatorname{E}[\|Y-Y'\|6pt].
</math>
& = \frac{n-1}{n^2}\operatorname {E}[\|X-X'\|]\,\operatorname{E}[\|Y-Y'\|].
\end{align}
</math>
 
An [[Bias of an estimator|unbiased estimator]] of <math>\operatorname{dCov}^2(X,Y)</math> is given in.<refby name=SR2014>Székely &and Rizzo (2014).</ref>{{sfn|Székely|Rizzo|2014}}
 
===Distance variance===
{{Ordered list |list_style_type=lower-roman
|<math>\operatorname{dVar}(X) = 0</math> if and only if <math>X = \operatorname{E}[X]</math> almost surely.
 
(i) |<math>\operatorname{dVar}_n(X) = 0</math> if and only if <math>Xevery =sample \operatorname{E}[X]</math>observation almostis surelyidentical.
 
(ii) |<math>\operatorname{dVar}_n(A + b\,\mathbf{C}\,X) = 0|b|\operatorname{dVar}(X)</math> iffor andall onlyconstant ifvectors every{{mvar|A}}, samplescalars observation{{mvar|b}}, isand identicalorthonormal matrices <math>\mathbf{C}</math>.
 
(iii)|If {{mvar|X}} and {{mvar|Y}} are independent then <math>\operatorname{dVar}(AX + bY) \,leq\mathbfoperatorname{CdVar}\,(X) =+ |b|\operatorname{dVar}(XY)</math> for all constant vectors <math>A</math>, scalars <math>b</math>, and orthonormal matrices <math>\mathbf{C}</math>.
}}
 
Equality holds in (iv) if and only if one of the random variables <math>{{mvar|X</math>}} or <math>{{mvar|Y</math>}} is a constant.
(iv) If <math>X</math> and <math>Y</math> are independent then <math>\operatorname{dVar}(X + Y)\leq\operatorname{dVar}(X) + \operatorname{dVar}(Y)</math>.
 
Equality holds in (iv) if and only if one of the random variables <math>X</math> or <math>Y</math> is a constant.
 
==Generalization==
Line 156 ⟶ 168:
:<math>
\begin{align}
\operatorname{dCov}^2(X, Y; \alpha) &:= {} & \operatorname{E}[\|X-X'\|^\alpha\,\|Y-Y'\|^\alpha] + \operatorname{E}[\|X-X'\|^\alpha]\,\operatorname{E}[\|Y-Y'\|^\alpha]\\
&\qquad {} - 2\operatorname{E}[\|X-X'\|^\alpha\,\|Y-Y''\|^\alpha].
\end{align}
</math>
 
Then for every <math>0<\alpha<2</math>, <math>X</math> and <math>Y</math> are independent if and only if <math>\operatorname{dCov}^2(X, Y; \alpha) = 0</math>. It is important to note that this characterization does not hold for exponent <math>\alpha=2</math>; in this case for bivariate <math>(X, Y)</math>, <math>\operatorname{dCor}(X, Y; \alpha=2)</math> is a deterministic function of the Pearson correlation.<ref name=SR2007>{{sfn|Székely & |Rizzo (|Bakirov|2007) Theorem 7, p. 2785.</ref>}} If <math>a_{k,\ell}</math> and <math>b_{k,\ell}</math> are <math>\alpha</math> powers of the corresponding distances, <math>0<\alpha\leq2</math>, then <math>\alpha</math> sample distance covariance can be defined as the nonnegative number for which
:<math>
\operatorname{dCov}^2_n(X, Y; \alpha):= \frac{1}{n^2}\sum_{k,\ell}A_{k,\ell}\,B_{k,\ell}.
Line 170 ⟶ 182:
\operatorname{dCov}^2(X, Y) := \operatorname{E}\big[d_\mu(X,X')d_\nu(Y,Y')\big].
</math>
This is non-negative for all such <math>X, Y</math> iff both metric spaces have negative type.<ref{{sfn|Lyons|2014}} name=LyonsdcovHere, a metric space <math>Lyons(M, R.d)</math> has negative type if <math>(2011M, d^{1/2})</math> "Distanceis covariance[[isometry|isometric]] into metrica spaces".subset of a [[Hilbert space]].{{arXivsfn|1106.5758Klebanov|2005|p={{pn|date=October 2021}}}} If both metric spaces have strong negative type, then <math>\operatorname{dCov}^2(X, Y)= 0</refmath> iff <math>X, Y</math> are independent.{{sfn|Lyons|2014}}
Here, a metric space <math>(M, d)</math> has negative type
if <math>(M, d^{1/2})</math> is [[isometry|isometric]] to a subset of a [[Hilbert space]].<ref>Klebanov, L. B. (2005) ''N-distances and their Applications'', Karolinum Press,
Charles University, Prague.</ref>
If both metric spaces have strong negative type, then <math>\operatorname{dCov}^2(X, Y)= 0</math> iff <math>X, Y</math> are independent.<ref name=Lyonsdcov/>
 
==Alternative definition of distance covariance==
 
The original [[Distance_correlationDistance correlation#Distance_covarianceDistance covariance|distance covariance]] has been defined as the square root of <math>\operatorname{dCov}^2(X,Y)</math>, rather than the squared coefficient itself. <math>\operatorname{dCov}(X,Y)</math> has the property that it is the [[energy distance]] between the joint distribution of <math>\operatorname X, Y </math> and the product of its marginals. Under this definition, however, the distance variance, rather than the distance standard deviation, is measured in the same units as the <math>\operatorname X </math> distances.
 
Alternately, one could define '''''distance covariance''''' to be the square of the energy distance:
<math> \operatorname{dCov}^2(X,Y).</math> In this case, the distance standard deviation of <math>X</math> is measured in the same units as <math>X</math> distance, and there exists an unbiased estimator for the population distance covariance.<ref name=SR2014>{{sfn|Székely & |Rizzo (|2014)</ref> }}
 
Under these alternate definitions, the distance correlation is also defined as the square <math>\operatorname{dCor}^2(X,Y)</math>, rather than the square root.
Line 202 ⟶ 210:
</math>
 
whenever the subtracted conditional expected value exists and denote by Y<sub>V</sub> the V-centered version of Y.<ref name=SR2009/><ref>{{sfn|Székely|Rizzo|2009a}}{{sfn|Bickel|Xu|2009}}{{sfn|Kosorok|2009}} &The Xu(U,V) covariance of (2009X,Y)</ref><ref> is defined as the nonnegative number whose square is
Kosorok (2009)</ref> The (U,V) covariance of (X,Y) is defined as the nonnegative number whose square is
:<math>
\operatorname{cov}_{U,V}^2(X,Y) := \operatorname{E}\left[X_U X_U^\mathrm{'} Y_V Y_V^\mathrm{'}\right]
</math>
 
whenever the right-hand side is nonnegative and finite. The most important example is when U and V are two-sided independent [[Brownian motion]]s /[[Wiener process]]es with expectation zero and covariance {{nowrap|1={{abs|''s''}} + {{abs|''t''}} − {{abs|''s'' − ''t''}} = 2 min(''s'',''t'')}} (for nonnegative s, t only). (This is twice the covariance of the standard Wiener process; here the factor 2 simplifies the computations.) In this case the (''U'',''V'') covariance is called '''Brownian covariance''' and is denoted by
|s| + |t| - |s-t| = 2 min(s,t). (This is twice the covariance of the standard Wiener process; here the factor 2 simplifies the computations.) In this case the (U,V) covariance is called '''Brownian covariance''' and is denoted by
:<math>
\operatorname{cov}_W(X,Y).
Line 220 ⟶ 226:
and thus '''Brownian correlation''' is the same as distance correlation.
 
On the other hand, if we replace the Brownian motion with the deterministic identity function ''id'' then Cov<sub>id</sub>(''X'',''Y'') is simply the absolute value of the classical Pearson [[covariance]],
:<math>
\operatorname{cov}_{\mathrm{id}}(X,Y) = \left\vert\operatorname{cov}(X,Y)\right\vert.
</math>
 
==Related metrics==
 
Other correlational metrics, including kernel-based correlational metrics (such as the Hilbert-Schmidt Independence Criterion or HSIC) can also detect linear and nonlinear interactions. Both distance correlation and kernel-based metrics can be used in methods such as [[canonical correlation analysis]] and [[independent component analysis]] to yield stronger [[statistical power]].
 
==See also==
* [[RV coefficient]]
* For a related third-order statistic, see [[Skewness#Distance skewness|Distance skewness]].
*
 
==Notes==
{{reflist|20em}}
 
==References==
*Bickel,{{cite P.Jjournal |doi=10.1214/09-AOAS312A and Xu, Y|url=http://projecteuclid.org/download/pdfview_1/euclid.aoas/1267453934 (2009) "|title=Discussion of: Brownian distance covariance", ''|year=2009 |last1=Bickel |first1=Peter J. |last2=Xu |first2=Ying |journal=[[The Annals of Applied Statistics'',]] |volume=3 (|issue=4), 1266&ndash;1269.|pages=1266–1269 {{|doi|10.1214/09-AOAS312A}}access=free [http://arxiv4.library.cornell.edu/PS_cache/|arxiv/pdf/0912/=0912.3295v2.pdf3295 Free access to article]}}
*{{cite book |last=Gini, |first=C. (|year=1912). |title=Variabilità e Mutabilità. |___location=Bologna: |publisher=Tipografia di Paolo Cuppini |bibcode=1912vamu.book.....G }}
*{{cite book |last=Klebanov |first=L. B. |year=2005 |title=''N''-distances and their applications |publisher=[[Karolinum Press]], Charles University |place=Prague |isbn=9788024611525}}
*Pearson, K. (1895). "Note on regression and inheritance in the case of two parents", ''[[Proceedings of the Royal Society]]'', 58, 240&ndash;242
*{{cite journal |doi=10.1214/09-AOAS312B |arxiv=1010.0822 |title=Discussion of: Brownian distance covariance |year=2009 |last1=Kosorok |first1=Michael R. |journal=[[The Annals of Applied Statistics]] |volume=3 |issue=4 |pages=1270–1278 |s2cid=88518490 }}
*Pearson, K. (1920). "Notes on the history of correlation", ''[[Biometrika]]'', 13, 25&ndash;45.
*{{Cite journal |last1=Lyons |first1=Russell |year=2014 |title=Distance covariance in metric spaces |journal=The Annals of Probability |volume=41 |issue=5 |pages=3284–3305 |arxiv=1106.5758 |doi=10.1214/12-AOP803 |s2cid=73677891}}
* Székely, G. J. Rizzo, M. L. and Bakirov, N. K. (2007). "Measuring and testing independence by correlation of distances", ''[[The Annals of Statistics]]'', 35/6, 2769&ndash;2794. {{doi| 10.1214/009053607000000505}} [http://personal.bgsu.edu/~mrizzo/energy/AOS0283-reprint.pdf Reprint]
*{{cite journal |last=Pearson, |first=K. (1895).|year=1895a "|title=Note on regression and inheritance in the case of two parents", ''|journal=[[Proceedings of the Royal Society]]'', |volume=58, 240&ndash;242|pages=240–242 |bibcode=1895RSPS...58..240P }}
* Székely, G. J. and Rizzo, M. L. (2009). "Brownian distance covariance", ''Annals of Applied Statistics'', 3/4, 1233&ndash;1303. {{doi| 10.1214/09-AOAS312}} [http://personal.bgsu.edu/~mrizzo/energy/AOAS312.pdf Reprint]
*{{cite journal |last=Pearson |first=K. |year=1895b |title=Notes on the history of correlation |journal=[[Biometrika]] |volume=13 |pages=25–45 |doi=10.1093/biomet/13.1.25 |url=https://zenodo.org/record/1431597 }}
*Kosorok, M. R. (2009) "Discussion of: Brownian Distance Covariance", ''Annals of Applied Statistics'', 3/4, 1270–1278. {{doi|10.1214/09-AOAS312B}} [http://arxiv.org/PS_cache/arxiv/pdf/1010/1010.0822v1.pdf Free access to article]
*{{cite web |last1=Rizzo |first1=Maria |last2=Székely |first2=Gábor |date=2021-02-22 |title=energy: E-Statistics: Multivariate Inference via the Energy of Data |version=Version: 1.7-8 |url=https://cran.r-project.org/web/packages/energy/index.html |access-date=2021-10-31}}
*Székely, G.J. and Rizzo, M.L. (2014) Partial distance correlation with methods for dissimilarities, The Annals of Statistics, 42/6, 2382-2412.
*{{cite journal |last1=Székely, G.|first1=Gábor J. |last2=Rizzo, M.|first2=Maria L. and |last3=Bakirov, N.|first3=Nail K. (|year=2007). "|title=Measuring and testing independence by correlation of distances", ''|journal=[[The Annals of Statistics]]'', |volume=35/ |issue=6, 2769&ndash;2794.|pages=2769–2794 {{|doi| =10.1214/009053607000000505}} [http://personal|arxiv=0803.bgsu.edu/~mrizzo/energy/AOS0283-reprint.pdf4101 Reprint]|s2cid=5661488}}
*{{cite journal |doi=10.1214/09-AOAS312 |pmid=20574547 |pmc=2889501 |url=http://projecteuclid.org/download/pdfview_1/euclid.aoas/1267453933 |title=Brownian distance covariance |year=2009a |last1=Székely |first1=Gábor J. |last2=Rizzo |first2=Maria L. |journal=[[The Annals of Applied Statistics]] |volume=3 |issue=4 |pages=1236–1265 }}
*{{cite journal |doi=10.1214/09-AOAS312REJ |title=Rejoinder: Brownian distance covariance |year=2009b |last1=Székely |first1=Gábor J. |last2=Rizzo |first2=Maria L. |journal=[[The Annals of Applied Statistics]] |volume=3 |issue=4 |pages=1303–1308 |doi-access=free |arxiv=1010.0844 }}
*{{cite journal |last1=Székely |first1=Gábor J. |last2=Rizzo |first2=Maria L. |title=On the uniqueness of distance covariance |journal=[[Statistics & Probability Letters]] |year=2012 |volume=82 |issue=12 |pages=2278–2282 |doi=10.1016/j.spl.2012.08.007}}
*{{cite journal |arxiv=1310.2926 |last1=Székely |first1=Gabor J. |last2=Rizzo |first2=Maria L. |title=Partial Distance Correlation with Methods for Dissimilarities |journal=[[The Annals of Statistics]] |volume=42 |issue=6 |pages=2382–2412 |year=2014 |doi=10.1214/14-AOS1255 |bibcode=2014arXiv1310.2926S |s2cid=55801702 }}
 
==External links==
*[http://personal.bgsu.edu/~mrizzo/energy.htm E-statistics (energy statistics)] {{Webarchive|url=https://web.archive.org/web/20190913232038/http://personal.bgsu.edu/~mrizzo/energy.htm |date=2019-09-13 }}
 
{{DEFAULTSORT:Distance Correlation}}
[[Category:Statistical dependencedistance]]
[[Category:Statistical distance measures]]
[[Category:Theory of probability distributions]]
[[Category:Multivariate statistics]]
[[Category:Covariance and correlation]]