Discriminant function analysis: Difference between revisions

Content deleted Content added
added references, more info on interpretation and use.
REmove stub tag. Wikify for capitals, punctuastion of cites, solve some maths formatting
Line 7:
==Assumptions==
 
The assumptions of discriminant analysis are the same as those for MANOVA. Note, theThe analysis is quite sensitive to outliers and the ''n'' in the smallest group must be larger than the number of predictor variables .<ref name="buy"/>.
 
*[[Normality|Multivariate Normalitynormality]]: Independent variables are normal for each level of the grouping variable .<ref name="buy"/><ref name="green"/>.
 
*Homogeneity of Variance/Covariance ([[Homoscedasticityhomoscedasticity]]): Variances among group variables are the same across levels of predictors. Can be tested with Boxes M statistic <ref name="green"/>. It has been suggested, however, that [[linear discriminant analysis]] be used when covariances are equal, and that [[quadratic classifier#quadratic discriminant analysis|quadratic discriminant analysis]] may be used when covariances are not equal .<ref name="buy"/>.
 
*[[Multicollinearity]]: Predictive power can decrease with an increased correlation between predictor variables, since variance is being accounted for twice .<ref name="buy"/>.
 
*[[statistical independence|Independence]]: Participants are assumed to be randomly sampled, and a participant’s score on one variable is assumed to be independent of scores on that variable for all other participants .<ref name="buy"/><ref name="green"/>.
 
It has been suggested that discriminant analysis is relatively robust to slight violations of these assumptions ,<ref>Lachenbruch, P. A. (1975). ''Discriminant analysis''. NY: Hafner</ref>, and it has also been shown that discriminant analysis may still be reliable when using dichotomous variables (where multivariate normality is often violated) .<ref>Klecka, William R. (1980). ''Discriminant analysis''. Quantitative Applications in the Social Sciences Series, No. 19. Thousand Oaks, CA: Sage Publications.</ref>.
 
==Discriminant Functionsfunctions==
 
Discriminant analysis works by creating one or more linear combinations of predictors, creating a new [[latent variable]] for each function. These functions are called discriminant functions. The number of functions possible is either ''N<sub>g</sub>''-1 where ''N<sub>g</sub>'' = number of groups, or ''p'' (the number of predictors), whichever is smaller. The first function created maximizes the differences between groups on that function. The second function maximizes differences on that function, but also must not be correlated with the previous function. This continues with subsequent functions with the requirement that the new function not be correlated with any of the previous functions .<ref name="green"/>.
 
Given group <math>j</math>, with <math> \mathbb{R}_j</math><sub>j</sub> sets of sample space, there is a discriminant rule such that if <math>x</math><big>∈</big><math>\mathbb{R}_j</math><sub>j</sub> , then <math>x</math><big>∈</big> <math>j</math>. Discriminant analysis then, finds “good” regions of <math> \mathbb{R}_j</math><sub>j</sub> to minimize classification error, therefore leading to a high percent correct classified in the classification table .<ref name="har">Hardle, W., Simar, L. (2007). ''Applied Multivariate Statistical Analysis''. Springer Berlin Heidelberg. pp. 289-303.</ref>.
 
Each function is given a discriminant score to determine how well it predicts group placement.
*Structure Correlation Coefficients: The correlation between each predictor and the discriminant score of each function. This is a whole{{clarify}} correlation .<ref name="buy"/><ref name="garson">Garson, G. D. (2008). Discriminant function analysis. http://www2.chass.ncsu.edu/garson/pa765/discrim.htm .</ref>.
*Standardized Coefficients: Each predictor’s unique contribution to each function, therefore this is a [[partial correlation]]. Indicates the relative importance of each predictor in predicting group assignment from each function .<ref name="garson"/><ref name="buy"/>.
*Functions at Group Centroids: Mean discriminant scores for each grouping variable are given for each function. The farther apart the means are, the less error there will be in classification .<ref name="garson"/><ref name="buy"/>.
 
==Discrimination rules==
 
*[[Maximum Likelihoodlikelihood]]: Assigns x to the group that maximizes population (group) density. <ref name="har"/>
*Bayes Discriminant Rule: Assigns x to the group that maximizes π<sub>i</sub><math>f</math><sub>i</sub><math>\pi_i f_i(x)</math>, where <math>f</math><sub>i</sub><math>f_i(x)</math> represents the [[prior probability]] of that classification, and ''π<sub>i</sub>'' represents the population density <ref name="har"/>.
*[[Linear Discriminant Analysis|Fisher’s Linearlinear Discriminantdiscriminant Ruleuule]]: Maximizes the ratio between SSbetween''SS''<sub>between</sub> and SSwithin''SS''<sub>within</sub> , and finds a linear combination of the predictors to predict group .<ref name="har"/>.
 
==Eigenvalues==
 
An [[eigenvalues and eigenvectors|eigenvalue]] in discriminant analysis is the characteristic root of each function.{{clarify}} It is an indication of how well that function differentiates the groups, where the larger the eigenvalue, the better the function differentiates .<ref name="buy"/>. This however, should be interpreted with caution, as eigenvalues have no upper limit .<ref name="buy"/><ref name="green"/>.
The eigenvalue can be viewed as a ratio of SSbetween''SS''<sub>between</sub> and SSwithin''SS''<sub>within</sub> as in ANOVA when the dependent variable is the discriminant function, and the groups are the levels of the IV {{clarify}}.<ref name="green"/>. This means that the largest eigenvalue is associated with the first function, the second largest with the second, etc..
 
==Effect Sizesize==
 
Some suggest the use of eigenvalues as [[effect size]] measures, however, this is generally not supported.<ref name="green"/>. Instead, the [[canonical correlation]] is the preferred measure of effect size. It is similar to the eigenvalue, but is the square root of the ratio of SSbetween''SS''<sub>between</sub> toand SStotal''SS''<sub>total</sub>. It is the correlation between groups and the function.<ref name="green"/>.
Another popular measure of effect size is the percent of variance{{clarify}} for each function. This is calculated by: (''λ<sub>x</sub>/Σλ<sub>i</sub>'') X 100 where ''λ<sub>x</sub>'' is the eigenvalue for the function and ΣλΣ''λ<sub>i</sub>'' is the sum of all eigenvalues. This tells us how strong the prediction is for that particular function compared to the others .<ref name="green"/>.
Percent correctly classified can also be analyzed as an effect size. The kappa value{{clarify}} can describe this while correcting for chance agreement.<ref name="green"/>.
 
==Variations==
 
*[[Linear Discriminant Analysis#Multiclass LDA|Multiple Discriminantdiscriminant Analysisanalysis (MDA)]]: related to MANOVA. Has more than two groups, and uses multiple dummy variables .<ref name="garson"/>.
*Sequential Discriminantdiscriminant Analysisanalysis: assesses the importance of a set of IVs over and above a set of controls. In this case, the controls are entered first, and then the IVs .<ref name="garson"/>.
*Stepwise Discriminantdiscriminant Analysisanalysis: Selects the most correlated predictor first, removes that variance in the grouping variable then adds the next most correlated and continues until the change in canonical correlation is not significant. Of course, both forward and backward stepwise procedures may be performed .<ref name="garson"/>.
 
==Comparison to Logisticlogistic Regressionregression==
 
Discriminant function analysis is very similar to [[logistic regression]], and both can be used to answer the same research questions .<ref name="green"/>. Logistic regression does not have as many assumptions and restrictions as discriminant analysis,. howeverHowever, when discriminant analysis’ assumptions are met, it is more powerful than logistic regression.{{cn}} Unlike logistic regression, discriminant analysis can be used with small sample sizes. It has been shown that when sample sizes are equal, and homogeneity of variance/covariance holds, discriminant analysis is more accurate .<ref name="buy"/>. With all this being considered, logistic regression is the common choice nowadays, since the assumptions of discriminant analysis are rarely met .<ref name="buy"/><ref name="cohen"/>.
 
==See also==
Line 71:
* [http://www.psychstat.missouristate.edu/multibook/mlt03m.html Course notes, Discriminant function analysis by David W. Stockburger, Missouri State University]
* [http://userwww.sfsu.edu/~efc/classes/biol710/discrim/discrim.pdf Discriminant function analysis (DA) by John Poulsen and Aaron French, San Francisco State University]
 
[[Category:Multivariate statistics]]
[[Category:Statistical classification]]
 
{{statistics-stub}}
 
[[es:Análisis discriminante]]