Coefficient of multiple correlation

This is an old revision of this page, as edited by Duoduoduo (talk | contribs) at 18:10, 20 December 2010 (adding that multiple R-square is not commutative). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

In statistics, multiple correlation is a linear relationship among more than two variables. It is measured by the coefficient of multiple determination, denoted as R2, which is a measure of the fit of a linear regression. A regression's R2 falls somewhere between zero and one (assuming a constant term has been included in the regression); a higher value indicates a stronger relationship among the variables, with a value of one indicating that all data points fall exactly on a line in multidimensional space and a value of zero indicating no relationship at all between the independent variables collectively and the dependent variable.

Unlike the coefficient of determination in a regression involving just two variables, the coefficient of multiple determination is not computationally commutative: a regression of y on x and z will in general have a different R2 than will a regression of z on x and y. For example, suppose that in a particular sample the variable z is uncorrelated with both x and y, while x and y are linearly related to each other. Then a regression of z on y and x will yield an R2 of zero, while a regression of y on x and z will yield a positive R2.

Fundamental equation of multiple regression analysis

The coefficient of multiple determination R2 (a scalar), can be computed using the vector c of cross-correlations between the predictor variables and the criterion variable, its transpose c', and the matrix Rxx of inter-correlations between predictor variables. The "fundamental equation of multiple regression analysis"[1] is

R2 = c' Rxx−1 c.

The expression on the left side denotes the coefficient of multiple determination. The terms on the right side are the transposed vector c ' of cross-correlations, the inverse of the matrix Rxx of inter-correlations, and the vector c of cross-correlations. Note that if all the predictor variables are uncorrelated, the matrix Rxx is the identity matrix and R2 simply equals c' c, the sum of the squared cross-correlations. Otherwise, the inverted matrix of the inter-correlations removes the redundant variance that results from the inter-correlations of the predictor variables.

References

  • Paul D. Allison. Multiple Regression: A Primer (1998)
  • Cohen, Jacob, et al. Applied Multiple Regression - Correlation Analysis for the Behavioral Sciences (2002) (ISBN 0805822232)
  • Crown, William H. Statistical Models for the Social and Behavioral Sciences: Multiple Regression and Limited-Dependent Variable Models (1998) (ISBN 0275953165)
  • Edwards, Allen Louis. Multiple regression and the analysis of variance and covariance (1985)(ISBN 0716710811)
  • Timothy Z. Keith. Multiple Regression and Beyond (2005)
  • Fred N. Kerlinger, Elazar J. Pedhazur, Multiple Regression in Behavioral Research. (1973)