This article includes a list of general references, but it lacks sufficient corresponding inline citations. (November 2010) |
In statistics, the coefficient of multiple correlation is a measure of how well a given variable can be predicted using a linear function of a set of other variables. It is measured by the coefficient of determination, but under the particular assumptions that an intercept is included and that the best possible linear predictors are used, whereas the coefficient of determination is defined for more general cases, including those of nonlinear prediction and those in which the predicted values have not been derived from a model-fitting procedure. The coefficient of multiple determination takes values between zero and one; a higher value indicates a better predictability of the dependent variable from the independent variables, with a value of one indicating that the predictions are exactly correct and a value of zero indicating that no linear combination of dependent variables is better than the simpler predictor which consists of mean of the target variable.
Definition
The coefficient of multiple correlation R2 (a scalar) is the square of the correlation between the predicted and the actual values of the dependent variable. It can be computed using the vector c of cross-correlations between the predictor variables (independent variables) and the target variable (dependent variable), and the matrix Rxx of inter-correlations between predictor variables. It is given by
- R2 = c' Rxx−1 c,
where c ' is the transpose of c, and Rxx−1 is inverse of the matrix Rxx.
If all the predictor variables are uncorrelated, the matrix Rxx is the identity matrix and R2 simply equals c' c, the sum of the squared cross-correlations. If there is cross-correlation among the predictor variables, the inverse of the cross-correlation matrix accounts for this.
Properties
Unlike the coefficient of determination in a regression involving just two variables, the coefficient of multiple determination is not computationally commutative: a regression of y on x and z will in general have a different R2 than will a regression of z on x and y. For example, suppose that in a particular sample the variable z is uncorrelated with both x and y, while x and y are linearly related to each other. Then a regression of z on y and x will yield an R2 of zero, while a regression of y on x and z will yield a positive R2.
References
- Allison, Paul D. (1998) Multiple Regression: A Primer[full citation needed]
- Cohen, Jacob, et al. (2002) Applied Multiple Regression - Correlation Analysis for the Behavioral Sciences ISBN 0805822232
- Crown, William H. (1998) Statistical Models for the Social and Behavioral Sciences: Multiple Regression and Limited-Dependent Variable Models ISBN 0275953165
- Edwards, Allen Louis (1985) Multiple regression and the analysis of variance and covariance ISBN 0716710811
- Timothy Z. Keith. (2005) Multiple Regression and Beyond[full citation needed]
- Fred N. Kerlinger, Elazar J. Pedhazur (1973) Multiple Regression in Behavioral Research.[full citation needed]
- Stanton, Jeffrey M. (2001) "Galton, Pearson, and the Peas: A Brief History of Linear Regression for Statistics Instructors", Journal of Statistics Education, 9 (3)