Chi-square automatic interaction detection: Difference between revisions

Content deleted Content added
Yensaa (talk | contribs)
m capitalized CHAID
Citation bot (talk | contribs)
Alter: url, pages, journal. URLs might have been anonymized. Add: jstor, authors 1-1. Removed parameters. Formatted dashes. Some additions/deletions were parameter name changes. | Use this bot. Report bugs. | Suggested by AManWithNoPlan | Linked from User:AManWithNoPlan/sandbox4 | #UCB_webform_linked 214/1024
Line 1:
'''Chi-square automatic interaction detection''' ('''CHAID''')<ref>{{Cite journal |last=Kass |first=G. V. |date=1980 |title=An Exploratory Technique for Investigating Large Quantities of Categorical Data |url=https://www.jstor.org/stable/10.2307/2986296?origin=crossref |journal=Applied Statistics |volume=29 |issue=2 |pages=119119–127 |doi=10.2307/2986296|jstor=2986296 }}</ref><ref name=":0">{{Cite journal |lastlast1=Biggs |firstfirst1=David |last2=De Ville |first2=Barry |last3=Suen |first3=Ed |date=1991 |title=A method of choosing multiway partitions for classification and decision trees |url=https://www.tandfonline.com/doi/full/10.1080/02664769100000005 |journal=Journal of Applied Statistics |language=en |volume=18 |issue=1 |pages=49–62 |doi=10.1080/02664769100000005 |issn=0266-4763}}</ref><ref name=":1" /> is a [[Decision tree learning|decision tree]] technique, based on adjusted significance testing ([[Bonferroni testing]]). The technique was developed in South Africa and was published in 1980 by Gordon V. Kass, who had completed a PhD thesis on this topic. CHAID can be used for prediction (in a similar fashion to [[regression analysis]], this version of CHAID being originally known as XAID) as well as classification, and for detection of interaction between variables. CHAID is based on a formal extension of AID (Automatic Interaction Detection)<ref>{{Cite journal |lastlast1=Morgan |firstfirst1=James N. |last2=Sonquist |first2=John A. |date=1963 |title=Problems in the Analysis of Survey Data, and a Proposal |url=http://www.tandfonline.com/doi/abs/10.1080/01621459.1963.10500855 |journal=Journal of the American Statistical Association |language=en |volume=58 |issue=302 |pages=415–434 |doi=10.1080/01621459.1963.10500855 |issn=0162-1459}}</ref> and THAID (THeta Automatic Interaction Detection)<ref>{{Cite journal |lastlast1=Messenger |firstfirst1=Robert |last2=Mandell |first2=Lewis |date=1972 |title=A Modal Search Technique for Predictive Nominal Scale Multivariate Analysis |url=http://www.tandfonline.com/doi/abs/10.1080/01621459.1972.10481290 |journal=Journal of the American Statistical Association |language=en |volume=67 |issue=340 |pages=768–772 |doi=10.1080/01621459.1972.10481290 |issn=0162-1459}}</ref><ref>{{Cite book |last=Morgan |first=James N. |url=https://www.worldcat.org/oclc/666930 |title=THAID, a sequential analysis program for the analysis of nominal scale dependent variables |date=1973 |others=Robert C. Messenger |isbn=0-87944-137-2 |___location=Ann Arbor, Mich. |oclc=666930}}</ref> procedures of the 1960s and 1970s, which in turn were extensions of earlier research, including that performed in the UK in the 1950s.<ref>{{Cite journal |last=Belson |first=William A. |date=1959 |title=Matching and Prediction on the Principle of Biological Classification |url=https://www.jstor.org/stable/10.2307/2985543?origin=crossref |journal=Applied Statistics |volume=8 |issue=2 |pages=6565–75 |doi=10.2307/2985543|jstor=2985543 }}</ref> An history of earlier supervised tree methods and a detailed description of the original CHAID algorithm and the exhaustive CHAID extension by Biggs, De Ville, and Suen<ref name=":0" />, can be found in<ref name=":1">{{Cite journal |last=Ritschard |first=Gilbert |title=CHAID and Earlier Supervised Tree Methods |url=https://www.researchgate.net/publication/315476407_CHAID_and_Earlier_Supervised_Tree_Methods315476407 |journal=Contemporary Issues in Exploratory Data Mining in the Behavioral Sciences, McArdle, J.J. andAnd G. Ritschard (edsEds) |___location=New York |publisher=Routledge |publication-date=2013 |pages=48-7448–74}}</ref>.
 
In practice, CHAID is often used in the context of [[direct marketing]] to select groups of consumers so as to predict how their responses to some variables affect other variables, although other early applications were in the fields of medical and psychiatric research.