Chi-square automatic interaction detection: Difference between revisions

Content deleted Content added
Yensaa (talk | contribs)
added ref
Yensaa (talk | contribs)
Sources: renamed section title as "Software"
Line 1:
'''Chi-square automatic interaction detection''' ('''CHAID''')<ref>{{Cite journal |last=Kass |first=G. V. |date=1980 |title=An Exploratory Technique for Investigating Large Quantities of Categorical Data |url=https://www.jstor.org/stable/10.2307/2986296?origin=crossref |journal=Applied Statistics |volume=29 |issue=2 |pages=119 |doi=10.2307/2986296}}</ref><ref name=":0">{{Cite journal |last=Biggs |first=David |last2=De Ville |first2=Barry |last3=Suen |first3=Ed |date=1991 |title=A method of choosing multiway partitions for classification and decision trees |url=https://www.tandfonline.com/doi/full/10.1080/02664769100000005 |journal=Journal of Applied Statistics |language=en |volume=18 |issue=1 |pages=49–62 |doi=10.1080/02664769100000005 |issn=0266-4763}}</ref><ref name=":1" /> is a [[Decision tree learning|decision tree]] technique, based on adjusted significance testing ([[Bonferroni testing]]). The technique was developed in South Africa and was published in 1980 by Gordon V. Kass, who had completed a PhD thesis on this topic. CHAID can be used for prediction (in a similar fashion to [[regression analysis]], this version of CHAID being originally known as XAID) as well as classification, and for detection of interaction between variables. CHAID is based on a formal extension of AID (Automatic Interaction Detection)<ref>{{Cite journal |last=Morgan |first=James N. |last2=Sonquist |first2=John A. |date=1963 |title=Problems in the Analysis of Survey Data, and a Proposal |url=http://www.tandfonline.com/doi/abs/10.1080/01621459.1963.10500855 |journal=Journal of the American Statistical Association |language=en |volume=58 |issue=302 |pages=415–434 |doi=10.1080/01621459.1963.10500855 |issn=0162-1459}}</ref> and THAID (THeta Automatic Interaction Detection)<ref>{{Cite journal |last=Messenger |first=Robert |last2=Mandell |first2=Lewis |date=1972 |title=A Modal Search Technique for Predictive Nominal Scale Multivariate Analysis |url=http://www.tandfonline.com/doi/abs/10.1080/01621459.1972.10481290 |journal=Journal of the American Statistical Association |language=en |volume=67 |issue=340 |pages=768–772 |doi=10.1080/01621459.1972.10481290 |issn=0162-1459}}</ref><ref>{{Cite book |last=Morgan |first=James N. |url=https://www.worldcat.org/oclc/666930 |title=THAID, a sequential analysis program for the analysis of nominal scale dependent variables |date=1973 |others=Robert C. Messenger |isbn=0-87944-137-2 |___location=Ann Arbor, Mich. |oclc=666930}}</ref> procedures of the 1960s and 1970s, which in turn were extensions of earlier research, including that performed in the UK in the 1950s.<ref>{{Cite journal |last=Belson |first=William A. |date=1959 |title=Matching and Prediction on the Principle of Biological Classification |url=https://www.jstor.org/stable/10.2307/2985543?origin=crossref |journal=Applied Statistics |volume=8 |issue=2 |pages=65 |doi=10.2307/2985543}}</ref> An history of earlier supervised tree methods and a detailed description of the original CHAID algorithm and itsthe exhaustive CHAID extension by Biggs, De Ville, and Suen<ref name=":0" />, can be found in<ref name=":1">{{Cite journal |last=Ritschard |first=Gilbert |title=CHAID and Earlier Supervised Tree Methods |url=https://www.researchgate.net/publication/315476407_CHAID_and_Earlier_Supervised_Tree_Methods |journal=Contemporary Issues in Exploratory Data Mining in the Behavioral Sciences, McArdle, J.J. and G. Ritschard (eds) |___location=New York |publisher=Routledge |publication-date=2013 |pages=48-74}}</ref>.
 
In practice, CHAID is often used in the context of [[direct marketing]] to select groups of consumers so as to predict how their responses to some variables affect other variables, although other early applications were in the fields of medical and psychiatric research.
Line 28:
* Hawkins, Douglas M.; Young, S. S.; & Rosinko, A.; ''Analysis of a large structure-activity dataset using recursive partitioning'', Quantitative Structure-Activity Relationships, Vol. 16, (1997), pp.&nbsp;296–302
 
==SourcesSoftware==
* Luchman, J.N.; ''CHAID: Stata module to conduct chi-square automated interaction detection'', Available for free [https://ideas.repec.org/c/boc/bocode/s457752.html download], or type within Stata: ssc install chaid.
* Luchman, J.N.; ''CHAIDFOREST: Stata module to conduct random forest ensemble classification based on chi-square automated interaction detection (CHAID) as base learner'', Available for free [https://ideas.repec.org/c/boc/bocode/s457932.html download], or type within Stata: ssc install chaidforest.
* [https://www.ibm.com/downloads/cas/Z6XD69WQ IBM SPSS Decision Trees] grows exhaustive CHAID trees as well as a few other types of trees such as CART.