Coefficient of multiple correlation: Difference between revisions

Content deleted Content added
copy-editing; X is not random; remove expert tag; mention of a matrix R was wrong; etc.
Undid revision 1216542234 by 95.83.136.159 (talk)
Tags: Undo Mobile edit Mobile web edit Advanced mobile edit
 
(94 intermediate revisions by 40 users not shown)
Line 1:
{{Short description|Statistical concept}}
In [[statistics]], [[regression analysis]] is a method for the explanation of phenomena and the prediction of future events. In regression analysis, a [[correlation coefficient|coefficient of correlation]] ''r'' between variables ''X'' and ''Y'' is a quantitative index of co-movement between these two variables. Its squared form, the [[coefficient of determination]] ''r''<sup>&nbsp;2</sup>, indicates the fraction of [[variance]] in the criterion variable ''Y'' that is accounted for by the variation in the predictor variable ''X''. In multiple regression analysis, the set of predictor variables (also called independent variables or explanatory variables) ''X''<sub>1</sub>, ''X''<sub>2</sub>, ... is used to explain variability of the criterion variable (also called the dependent variable) ''Y''. A multivariate counterpart of the coefficient of determination ''r''<sup>&nbsp;2</sup> is the ''coefficient of multiple determination'', ''R''<sup>&nbsp;2</sup>, which is frequently called simply the coefficient of determination. The [[square root]] of the coefficient of multiple determination is the '''coefficient of multiple correlation''',&nbsp;'''''R'''''.
{{More footnotes|date=November 2010}}
 
In [[statistics]], the '''coefficient of multiple correlation''' is a measure of how well a given variable can be predicted using a [[linear function]] of a set of other variables. It is the [[Pearson correlation|correlation]] between the variable's values and the best predictions that can be computed [[linear equation|linearly]] from the predictive variables.<ref>[http://onlinestatbook.com/2/regression/multiple_regression.html Introduction to Multiple Regression] </ref>
==Conceptualization of multiple correlation==
An intuitive approach to multiple regression analysis is to sum the squared correlations between the predictor variables and the criterion variable to obtain an index of the strength of the overall relationship between the predictor variables and the criterion variable. However, such a sum is often greater than one, suggesting that simple summation of the squared coefficients of correlations is not a correct procedure to employ. In fact, the simple summation of squared coefficients of correlations between the predictor variables and the criterion variable ''is'' the correct procedure if and only if one has the special case in which the predictor variables are not correlated. If the predictors are correlated, their inter-correlations must be removed so that only the unique contributions of each predictor toward explanation of the criterion remain.
 
The coefficient of multiple correlation takes values between 0 and 1. Higher values indicate higher predictability of the [[dependent and independent variables|dependent variable]] from the [[dependent and independent variables|independent variables]], with a value of 1 indicating that the predictions are exactly correct and a value of 0 indicating that no linear combination of the independent variables is a better predictor than is the fixed [[mean]] of the dependent variable.<ref>[http://mtweb.mtsu.edu/stats/regression/level3/multicorrel/multicorrcoef.htm Multiple correlation coefficient]</ref>
==Fundamental equation of multiple regression analysis==
{| class="wikitable"
The coefficient of multiple determination ''R''<sup>2</sup> (a [[scalar (mathematics)|scalar]]), is computed using the [[Euclidean space|vector]] ''c'' of cross-correlations between the predictor variables and the criterion variable, its [[transpose]]&nbsp;''c''', and the matrix&nbsp;''R''<sub>''xx''</sub> of inter-correlations between predictor variables. The fundamental equation of multiple regression analysis is
|Correlation Coefficient (r)
|Direction and Strength of Correlation
|-
|1
|Perfectly positive
|-
|0.8
|Strongly positive
|-
|0.5
|Moderately positive
|-
|0.2
|Weakly positive
|-
|0
|No association
|-
| -0.2
|Weakly negative
|-
| -0.5
|Moderately negative
|-
| -0.8
|Strongly negative
|-
| -1
|Perfectly negative
|}
The coefficient of multiple correlation is known as the square root of the [[coefficient of determination]], but under the particular assumptions that an intercept is included and that the best possible linear predictors are used, whereas the coefficient of determination is defined for more general cases, including those of nonlinear prediction and those in which the predicted values have not been derived from a model-fitting procedure.
 
==Definition==
::''R''<sup>2</sup> = ''c''' ''R''<sub>''xx''</sub><sup>&minus;1</sup> ''c''.
The coefficient of multiple correlation, denoted ''R'', is a [[scalar (mathematics)|scalar]] that is defined as the [[Pearson correlation coefficient]] between the predicted and the actual values of the dependent variable in a linear regression model that includes an [[Y-intercept|intercept]].
 
The expression on the left side denotes the coefficient of multiple determination (the squared coefficient of multiple correlation). The terms on the right side are the transposed vector ''c'' ' of cross-correlations, the [[Matrix inversion|inverse]] of the matrix ''R''<sub>''xx''</sub> of inter-correlations, and the vector ''c'' of cross-correlations. The inverted matrix of the inter-correlations removes the redundant variance that results from the inter-correlations of the predictor variables. The square root of the resulting coefficient of multiple determination is the coefficient of multiple correlation ''R''. Note that if all the predictor variables are uncorrelated, the matrix ''R''<sub>''xx''</sub> is the identity matrix and ''R''<sup>2</sup> simply equals ''c''' ''c'', the sum of the squared cross-correlations.
==Computation==
 
The square of the coefficient of multiple correlation can be computed using the [[Euclidean space|vector]] <math>\mathbf{c} = {(r_{x_1 y}, r_{x_2 y},\dots,r_{x_N y})}^\top</math> of [[correlation]]s <math>r_{x_n y}</math> between the predictor variables <math>x_n</math> (independent variables) and the target variable <math>y</math> (dependent variable), and the [[correlation matrix]] <math>R_{xx}</math> of correlations between predictor variables. It is given by
 
::<math>R^2 = \mathbf{c}^\top R_{xx}^{-1}\, \mathbf{c},</math>
 
where <math>\mathbf{c}^\top</math> is the [[transpose]] of <math>\mathbf{c}</math>, and <math>R_{xx}^{-1}</math> is the [[Matrix inversion|inverse]] of the matrix
 
::<math>R_{xx} = \left(\begin{array}{cccc}
r_{x_1 x_1} & r_{x_1 x_2} & \dots & r_{x_1 x_N} \\
r_{x_2 x_1} & \ddots & & \vdots \\
\vdots & & \ddots & \\
r_{x_N x_1} & \dots & & r_{x_N x_N}
\end{array}\right).</math>
 
If all the predictor variables are uncorrelated, the matrix <math>R_{xx}</math> is the identity matrix and <math>R^2</math> simply equals <math>\mathbf{c}^\top\, \mathbf{c}</math>, the sum of the squared correlations with the dependent variable. If the predictor variables are correlated among themselves, the inverse of the correlation matrix <math>R_{xx}</math> accounts for this.
 
The squared coefficient of multiple correlation can also be computed as the fraction of variance of the dependent variable that is explained by the independent variables, which in turn is 1 minus the unexplained fraction. The unexplained fraction can be computed as the [[sum of squares of residuals]]&mdash;that is, the sum of the squares of the prediction errors&mdash;divided by the [[Total sum of squares|sum of squares of deviations of the values of the dependent variable]] from its [[expected value]].
 
==Properties==
 
With more than two variables being related to each other, the value of the coefficient of multiple correlation depends on the choice of dependent variable: a regression of <math>y</math> on <math>x</math> and <math>z</math> will in general have a different <math>R</math> than will a regression of <math>z</math> on <math>x</math> and <math>y</math>. For example, suppose that in a particular sample the variable <math>z</math> is [[Correlation and dependence|uncorrelated]] with both <math>x</math> and <math>y</math>, while <math>x</math> and <math>y</math> are linearly related to each other. Then a regression of <math>z</math> on <math>y</math> and <math>x</math> will yield an <math>R</math> of zero, while a regression of <math>y</math> on <math>x</math> and <math>z</math> will yield a strictly positive <math>R</math>. This follows since the correlation of <math>y</math> with its best predictor based on <math>x</math> and <math>z</math> is in all cases at least as large as the correlation of <math>y</math> with its best predictor based on <math>x</math> alone, and in this case with <math>z</math> providing no explanatory power it will be exactly as large.
 
==References==
{{Reflist}}
* Paul D. Allison. ''Multiple Regression: A Primer'' (1998)
 
* Cohen, Jacob, et al. ''Applied Multiple Regression - Correlation Analysis for the Behavioral Sciences'' (2002) (ISBN: 0805822232)
==Further reading==
* Crown, William H. ''Statistical Models for the Social and Behavioral Sciences: Multiple Regression and Limited-Dependent Variable Models'' (1998) (ISBN: 0275953165)
* Allison, Paul D. Allison(1998). ''Multiple Regression: A Primer''. (1998)London: Sage Publications. {{ISBN|9780761985334}}
* Edwards, Allen Louis. ''Multiple regression and the analysis of variance and covariance'' (1985)(ISBN: 0716710811)
* TimothyCohen, ZJacob, et al. Keith(2002). ''Applied Multiple Regression: andCorrelation BeyondAnalysis for the Behavioral Sciences''. (2005) {{ISBN|0805822232}}
* Crown, William H. (1998). ''Statistical Models for the Social and Behavioral Sciences: Multiple Regression and Limited-Dependent Variable Models''. (1998) ({{ISBN: |0275953165) }}
* Fred N. Kerlinger, Elazar J. Pedhazur, ''Multiple Regression in Behavioral Research.'' (1973)
* Edwards, Allen Louis (1985). ''Multiple regressionRegression and the analysisAnalysis of varianceVariance and covarianceCovariance''. (1985)({{ISBN: |0716710811) }}
* Keith, Timothy (2006). ''Multiple Regression and Beyond''. Boston: Pearson Education.
* Fred N. Kerlinger, Elazar J. Pedhazur, (1973). ''Multiple Regression in Behavioral Research.'' (1973)New York: Holt Rinehart Winston. {{ISBN|9780030862113}}
* Stanton, Jeffrey M. (2001). [http://www.amstat.org/publications/jse/v9n3/stanton.html "Galton, Pearson, and the Peas: A Brief History of Linear Regression Analysisfor Statistics Instructors"], ''Journal of Statistics Education'', 9 (3).
 
{{DEFAULTSORT:Multiple Correlation}}
==External links==
[[Category:Correlation indicators]]
* [http://www.amstat.org/publications/jse/v9n3/stanton.html A Brief History of Linear Regression Analysis]
* [http://www.visualstatistics.net/Visual%20Statistics%20Multimedia/multiple_regression_analysis.htm A Guide To Computing <math>R^2</math> For Multiple Correlation]
* [http://www.docstoc.com/docs/3530187/A-Derivation-of-the-Sample-Multiple-Corelation-Formula-for-Standard-Scores, "Derivations"]
[[Category:Regression analysis]]