Content deleted Content added
.anacondabot (talk | contribs) m robot Adding: de:CHAID |
Added applications of Chaid and added sources and citations for editors to refine it |
||
(129 intermediate revisions by 74 users not shown) | |||
Line 1:
{{Short description|Decision tree learning technique}}
'''Chi-square automatic interaction detection''' ('''CHAID''')<ref name=":1" /> is a [[Decision tree learning|decision tree]] technique based on adjusted significance testing ([[Bonferroni correction]], [[Holm-Bonferroni method|Holm-Bonferroni testing]]).<ref name="kass1980">{{Cite journal |last=Kass |first=G. V. |date=1980 |title=An Exploratory Technique for Investigating Large Quantities of Categorical Data |url=https://www.jstor.org/stable/2986296 |journal=Applied Statistics |volume=29 |issue=2 |pages=119–127 |doi=10.2307/2986296|jstor=2986296 |url-access=subscription }}</ref><ref name=":0">{{Cite journal |last1=Biggs |first1=David |last2=De Ville |first2=Barry |last3=Suen |first3=Ed |date=1991 |title=A method of choosing multiway partitions for classification and decision trees |url=https://www.tandfonline.com/doi/full/10.1080/02664769100000005 |journal=Journal of Applied Statistics |language=en |volume=18 |issue=1 |pages=49–62 |doi=10.1080/02664769100000005 |bibcode=1991JApSt..18...49B |issn=0266-4763|url-access=subscription }}</ref>
==History==
CHAID is based on a formal extension of AID (Automatic Interaction Detection)<ref name="morgan1963">{{Cite journal |last1=Morgan |first1=James N. |last2=Sonquist |first2=John A. |date=1963 |title=Problems in the Analysis of Survey Data, and a Proposal |url=http://www.tandfonline.com/doi/abs/10.1080/01621459.1963.10500855 |journal=Journal of the American Statistical Association |language=en |volume=58 |issue=302 |pages=415–434 |doi=10.1080/01621459.1963.10500855 |issn=0162-1459|url-access=subscription }}</ref> and THAID (THeta Automatic Interaction Detection)<ref name="messenger1972">{{Cite journal |last1=Messenger |first1=Robert |last2=Mandell |first2=Lewis |date=1972 |title=A Modal Search Technique for Predictive Nominal Scale Multivariate Analysis |url=http://www.tandfonline.com/doi/abs/10.1080/01621459.1972.10481290 |journal=Journal of the American Statistical Association |language=en |volume=67 |issue=340 |pages=768–772 |doi=10.1080/01621459.1972.10481290 |issn=0162-1459|url-access=subscription }}</ref><ref name="morgan1973">{{Cite book |last=Morgan |first=James N. |title=THAID, a sequential analysis program for the analysis of nominal scale dependent variables |date=1973 |others=Robert C. Messenger |isbn=0-87944-137-2 |___location=Ann Arbor, Mich. |oclc=666930}}</ref> procedures of the 1960s and 1970s, which in turn were extensions of earlier research, including that performed by Belson in the UK in the 1950s.<ref>{{Cite journal |last=Belson |first=William A. |date=1959 |title=Matching and Prediction on the Principle of Biological Classification |url=https://www.jstor.org/stable/2985543 |journal=Applied Statistics |volume=8 |issue=2 |pages=65–75 |doi=10.2307/2985543|jstor=2985543 |url-access=subscription }}</ref>
In 1975, the CHAID technique itself was developed in South Africa. It was published in 1980 by Gordon V. Kass, who had completed a PhD thesis on the topic.<ref name="kass1980"/>
A history of earlier supervised tree methods can be found in [[Gilbert Ritschard|Ritschard]], including a detailed description of the original CHAID algorithm and the exhaustive CHAID extension by Biggs, De Ville, and Suen.<ref name=":0" /><ref name=":1">{{Cite journal |last=Ritschard |first=Gilbert |title=CHAID and Earlier Supervised Tree Methods |url=https://www.researchgate.net/publication/315476407 |journal=Contemporary Issues in Exploratory Data Mining in the Behavioral Sciences, McArdle, J.J. And G. Ritschard (Eds) |___location=New York |publisher=Routledge |publication-date=2013 |pages=48–74}}</ref>
But it needs large sample sizes to work effectively. CHAID does not work well with small sample sizes as respondent groups can quickly become too small for reliable analysis.▼
CHAID was used as the data mining technique. It is a technique based on multiway splitting to create discrete groups and understand their impact on the dependent variable. CHAID was preferred for analysis because of five major criteria:
1. A good proportion of input data was categorical;
2. Its efficiency in large datasets;
3. Its highly visual and ease of interpretation;
4. Ease of implementation/integration of business rules generated from CHAID in business; and
5. Input data quality can be handled efficiently<ref>{{Cite web |last=Behera, Desik |first= |date=Nov 2012 |title=Acquiring Insurance Customer: The CHAID Way |url=https://www.researchgate.net/publication/256038754_Acquiring_Insurance_Customer_The_CHAID_Way |access-date=7 Aug 2025 |website=Research Gate}}</ref><ref>{{Cite web |last=Kotane |first=Inta |date=September 2024 |title=APPLICATION OF CHAID DECISION TREES AND NEURAL NETWORKS METHODS IN FORECASTING THE YIELD OF CEREAL INDUSTRY COMPANIES |url=https://www.researchgate.net/publication/383956028_APPLICATION_OF_CHAID_DECISION_TREES_AND_NEURAL_NETWORKS_METHODS_IN_FORECASTING_THE_YIELD_OF_CEREAL_INDUSTRY_COMPANIES |url-status=live |archive-url= |archive-date= |access-date=7 August 2025 |website=Research Gate |doi=10.17770/het2024.28.8264}}</ref>
==Properties==
CHAID can be used for prediction (in a similar fashion to [[regression analysis]], this version of CHAID being originally known as XAID) as well as classification, and for detection of interaction between variables.<ref name="morgan1963"/><ref name="messenger1972"/><ref name="morgan1973"/>
In practice, CHAID is often used in the context of [[direct marketing]] to select groups of consumers to predict how their responses to some variables affect other variables, although other early applications were in the fields of medical and psychiatric research.{{fact|date=December 2024}}
▲
One important advantage of CHAID over alternatives such as multiple regression is that it is non-parametric.{{fact|date=December 2024}}
==See also==
*[[
*[[
*[[Decision tree learning]]
*[[Latent class model]]
*[[Structural equation modeling]]▼
*[[Market segment]]
*[[Multiple comparisons]]
▲*[[Structural equation modeling]]
==
{{reflist|1}}
==Bibliography==
[[Category:Statistical algorithms]]▼
* Press, Laurence I.; Rogers, Miles S.; & Shure, Gerald H.; ''An interactive technique for the analysis of multivariate data'', Behavioral Science, Vol. 14 (1969), pp. 364–370
* Hawkins, Douglas M.; and Kass, Gordon V.; ''Automatic Interaction Detection'', in Hawkins, Douglas M. (ed), ''Topics in Applied Multivariate Analysis'', Cambridge University Press, Cambridge, 1982, pp. 269–302
* Hooton, Thomas M.; Haley, Robert W.; Culver, David H.; White, John W.; Morgan, W. Meade; & Carroll, Raymond J.; ''The Joint Associations of Multiple Risk Factors with the Occurrence of Nosocomial Infections'', American Journal of Medicine, Vol. 70, (1981), pp. 960–970
* Brink, Susanne; & Van Schalkwyk, Dirk J.; ''Serum ferritin and mean corpuscular volume as predictors of bone marrow iron stores'', South African Medical Journal, Vol. 61, (1982), pp. 432–434
* McKenzie, Dean P.; McGorry, Patrick D.; Wallace, Chris S.; Low, Lee H.; Copolov, David L.; & Singh, Bruce S.; ''Constructing a Minimal Diagnostic Decision Tree'', Methods of Information in Medicine, Vol. 32 (1993), pp. 161–166
* Magidson, Jay; ''The CHAID approach to segmentation modeling: chi-squared automatic interaction detection'', in Bagozzi, Richard P. (ed); ''Advanced Methods of Marketing Research'', Blackwell, Oxford, GB, 1994, pp. 118–159
* Hawkins, Douglas M.; Young, S. S.; & Rosinko, A.; ''Analysis of a large structure-activity dataset using recursive partitioning'', Quantitative Structure-Activity Relationships, Vol. 16, (1997), pp. 296–302
==External lkinks==
* Luchman, J.N.; ''CHAID: Stata module to conduct chi-square automated interaction detection'', Available for free [https://ideas.repec.org/c/boc/bocode/s457752.html download], or type within Stata: ssc install chaid.
* Luchman, J.N.; ''CHAIDFOREST: Stata module to conduct random forest ensemble classification based on chi-square automated interaction detection (CHAID) as base learner'', Available for free [https://ideas.repec.org/c/boc/bocode/s457932.html download], or type within Stata: ssc install chaidforest.
* [https://www.ibm.com/downloads/cas/Z6XD69WQ IBM SPSS Decision Trees] grows exhaustive CHAID trees as well as a few other types of trees such as CART.
* An R package ''[https://r-forge.r-project.org/R/?group_id=343 CHAID]'' is available on R-Forge.
[[Category:Market research]]
[[Category:Market segmentation]]
▲[[Category:Statistical algorithms]]
[[Category:Statistical classification]]
[[Category:Decision trees]]
[[Category:Classification algorithms]]
|