Chi-square automatic interaction detection: Difference between revisions

Content deleted Content added
US AID is a different thing and may cause confusion
Fitzroy14 (talk | contribs)
m Added recent reference to paper with CHAID application in medical imaging.
Line 2:
'''Chi-square automatic interaction detection''' ('''CHAID''') is a [[Decision tree learning|decision tree]] technique, based on adjusted significance testing ([[Bonferroni testing]]). The technique was developed in South Africa and was published in 1980 by Gordon V. Kass, who had completed a PhD thesis on this topic. CHAID can be used for prediction (in a similar fashion to [[regression analysis]], this version of CHAID being originally known as XAID) as well as classification, and for detection of interaction between variables. CHAID is based on a formal extension of the United States' AID (Automatic Interaction Detection) and THAID (THeta Automatic Interaction Detection) procedures of the 1960s and 1970s, which in turn were extensions of earlier research, including that performed in the UK in the 1950s.
 
In practice, CHAID is often used in the context of [[direct marketing]] to select groups of consumers and predict how their responses to some variables affect other variables, although other early applications were in the field of medical and psychiatric research. In medical research more recently, CHAID has been applied to clinical and medical imaging data to identify important diagnostic variables and improve clinical decision making<ref>{{Cite journal|last=Reddan|first=Tristan|last2=Corness|first2=Jonathan|last3=Harden|first3=Fiona|last4=Mengersen|first4=Kerrie|date=2018|title=Analysis of the predictive value of clinical and sonographic variables in children with suspected acute appendicitis using decision tree algorithms|url=https://onlinelibrary.wiley.com/doi/abs/10.1002/sono.12156|journal=Sonography|language=en|volume=5|issue=4|pages=157–163|doi=10.1002/sono.12156|issn=2054-6750}}</ref>.
 
Like other decision trees, CHAID's advantages are that its output is highly visual and easy to interpret. Because it uses multiway splits by default, it needs rather large sample sizes to work effectively, since with small sample sizes the respondent groups can quickly become too small for reliable analysis.