In statistics, regression analysis is a method for the explanation of phenomena and the prediction of future events. In regression analysis, a coefficient of correlation r between variables X and Y is a quantitative index of co-movement between these two variables. Its squared form, the coefficient of determination r 2, indicates the fraction of variance in the criterion variable Y that is accounted for by the variation in the predictor variable X. In multiple regression analysis, the set of predictor variables (also called independent variables or explanatory variables) X1, X2, ... is used to explain variability of the criterion variable (also called the dependent variable) Y. A multivariate counterpart of the coefficient of determination r 2 is the coefficient of multiple determination, R 2, which is frequently called simply the coefficient of determination. The square root of the coefficient of multiple determination is the coefficient of multiple correlation, R. Since the coefficient of multiple correlation is always between minus one and one, the coefficient of multiple determination is always between zero and one. For either coefficient, a larger absolute value indicates a stronger relationship between the predictor variable(s) and the criterion variable.
Conceptualization of multiple correlation
R2 is simply the square of the sample correlation coefficient between the actual and predicted values of the criterion variable.
An intuitive approach to multiple regression analysis is to sum the squared correlations between the predictor variables and the criterion variable to obtain an index of the strength of the overall relationship between the predictor variables and the criterion variable. However, such a sum is often greater than one, suggesting that simple summation of the squared coefficients of correlations is not a correct procedure to employ. In fact, the simple summation of squared coefficients of correlations between the predictor variables and the criterion variable is the correct procedure if and only if one has the special case in which the predictor variables are not correlated. If the predictors are correlated, their inter-correlations must be removed so that only the unique contributions of each predictor toward explanation of the criterion remain.
Fundamental equation of multiple regression analysis
The coefficient of multiple determination R2 (a scalar), is computed using the vector c of cross-correlations between the predictor variables and the criterion variable, its transpose c', and the matrix Rxx of inter-correlations between predictor variables. The fundamental equation of multiple regression analysis is
- R2 = c' Rxx−1 c.
The expression on the left side denotes the coefficient of multiple determination (the squared coefficient of multiple correlation). The terms on the right side are the transposed vector c ' of cross-correlations, the inverse of the matrix Rxx of inter-correlations, and the vector c of cross-correlations. The inverted matrix of the inter-correlations removes the redundant variance that results from the inter-correlations of the predictor variables. The square root of the resulting coefficient of multiple determination is the coefficient of multiple correlation R. Note that if all the predictor variables are uncorrelated, the matrix Rxx is the identity matrix and R2 simply equals c' c, the sum of the squared cross-correlations.
This article includes a list of references, related reading, or external links, but its sources remain unclear because it lacks inline citations. |
References
- Paul D. Allison. Multiple Regression: A Primer (1998)
- Cohen, Jacob, et al. Applied Multiple Regression - Correlation Analysis for the Behavioral Sciences (2002) (ISBN 0805822232)
- Crown, William H. Statistical Models for the Social and Behavioral Sciences: Multiple Regression and Limited-Dependent Variable Models (1998) (ISBN 0275953165)
- Edwards, Allen Louis. Multiple regression and the analysis of variance and covariance (1985)(ISBN 0716710811)
- Timothy Z. Keith. Multiple Regression and Beyond (2005)
- Fred N. Kerlinger, Elazar J. Pedhazur, Multiple Regression in Behavioral Research. (1973)