CHAID is a type of decision tree technique, based upon adjusted significance testing. It was published in 1980 by Gordon V. Kass. It can be used for prediction (like regression analysis, originally known as XAID) or for detection of interaction between variables. CHAID stands for CHi-squared Automatic Interaction Detector, based upon a formal extension of AID (Automatic Interaction Detection) and THAID (THeta Automatic Interaction Detection) of the 1960's and 70's.
In practice, it is often used in the context of direct marketing to select groups of consumers and predict how their responses to some variables affect other variables.
Like other decision trees, its advantages are that its output is highly visual and easy to interpret. Because it uses multiway splits by default, it needs rather large sample sizes to work effectively as with small sample sizes the respondent groups can quickly become too small for reliable analysis.
CHAID detects interaction between variables in the data set. Using this technique it is possible to establish relationships between a ‘dependent variable’ – for example readership of a certain newspaper – and other explanatory variables such as price, size, supplements etc. CHAID does this by identifying discrete groups of respondents and, by taking their responses to explanatory variables, seeks to predict what the impact will be on the dependent variable.
CHAID is often used as an exploratory technique and is an alternative to multiple regression, especially when the data set is not well-suited to regression analysis.
See also
References
- G. V. Kass. An Exploratory Technique for Investigating Large Quantities of Categorical Data. Journal of Applied Statistics, Vol. 29, No. 2 (1980), pp. 119-127.
External links
- Statsoft - CHAID Analysis
- JMP Partition Platform
- SPSS - How decision tree results are different in AnswerTree
- SmartDrill - Analytic Techniques: CHAID
- ADAPA - Batch and real-time scoring of data mining models, including decision trees - CHAID
- R-Forge CHAID - CHAID package download for the free R statistical software