Chi-square automatic interaction detection

This is an old revision of this page, as edited by Delmonde (talk | contribs) at 20:59, 12 March 2010. The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

CHAID is a type of decision tree technique, based upon adjusted significance testing (Bonferroni testing). The technique was developed in South Africa and was published in 1980 by Gordon V. Kass, who had completed a PhD thesis on this topic. CHAID can be used for prediction (in a similar fashion to regression analysis, this version of CHAID being originally known as XAID) as well as classification, and for detection of interaction between variables. CHAID stands for CHi-squared Automatic Interaction Detector, based upon a formal extension of the US AID (Automatic Interaction Detector) and THAID (THeta Automatic Interaction Detector) procedures of the 1960's and 70's, which in turn were extended versions of an algorithm developed in the UK in the 1950's.

In practice, CHAID is often used in the context of direct marketing to select groups of consumers and predict how their responses to some variables affect other variables, although other early applications were in the field of medical and psychiatric research.

Like other decision trees, CHAID's advantages are that its output is highly visual and easy to interpret. Because it uses multiway splits by default, it needs rather large sample sizes to work effectively as with small sample sizes the respondent groups can quickly become too small for reliable analysis.

CHAID detects interaction between variables in the data set. Using this technique it is possible to establish relationships between a ‘dependent variable’ – for example readership of a certain newspaper – and other explanatory variables such as price, size, supplements etc. CHAID does this by identifying discrete groups of respondents and, by taking their responses to explanatory variables, seeks to predict what the impact will be on the dependent variable.

CHAID is often used as an exploratory technique and is an alternative to multiple linear regression and logistic regression, especially when the data set is not well-suited to regression analysis.

See also

References

  • W.A. Belson. Matching and prediction on the principle of biological classification. Applied Statistics, Vol. 8 (1959), pp. 65-75.
  • G. V. Kass. An Exploratory Technique for Investigating Large Quantities of Categorical Data. Journal of Applied Statistics, Vol. 29, No. 2 (1980), pp. 119-127.
  • D.M. Hawkins & G.V. Kass. Automatic Interaction Detection. In D.M. Hawkins (ed) Topics in Applied Multivariate Analysis. Cambridge University Press, Cambridge, 1982, pp. 269-302.
  • T.M. Hooton, R.W. Haley, D.K. Culver, J.W. White, W.B. Morgan & R.J. Carroll. The Joint Associations of Multiple Risk Factors with the Occurrence of Nosocomial Infections. American Journal of Medicine, Vol. 70, (1981), pp. 960-970.
  • S. Brink & D.J. Van Schalkwyk. Serum ferritin and mean corpuscular volume as predictors of bone marrow iron stores. South African Medical Journal, Vol. 61, (1982), pp. 432-434.
  • D.P. McKenzie, P.D. McGorry, C.S. Wallace, L.H. Low, D.L. Copolov & B.S. Singh. Constructing a Minimal Diagnostic Decision Tree. Methods of Information in Medicine, Vol. 32 (1993), pp. 161-166.
  • D.M. Hawkins, S.S. Young & A. Rosinko. Analysis of a large structure-activity dataset using recursive partitioning. Quantitative Structure-Activity Relationships, Vol. 16, (1997), pp. 296-302.