Conditional logistic regression: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 17:35, 25 March 2023 edit Peterfrazier (talk \| contribs) 7 edits →Motivation: Clarifying the context about logistic regression and adding an example to help make the more article more clear. Also adding that a conditional logit implementation is available in statsmodels. Tag: Visual edit ← Previous edit		Latest revision as of 19:34, 17 July 2025 edit undo Citation bot (talk \| contribs) Bots 5,863,356 edits Removed URL that duplicated identifier. \| Use this bot. Report bugs. \| #UCB_CommandLine
(4 intermediate revisions by 4 users not shown)
Line 1: {{Short description\|Statistical technique}} '''Conditional logistic regression''' is an extension of [[logistic regression]] that allows one to account for [[stratification (clinical trials)\|stratification]] and [[Matching (statistics)\|matching]]. Its main field of application is [[observational studies]] and in particular [[epidemiology]]. It was devised in 1978 by [[Norman Breslow]], [[Nick Day (statistician)\|Nicholas Day]], [[Katherine Halvorsen]], [[Ross L. Prentice]] and C. Sabai.<ref name="pmid727199">{{cite journal\|vauthors=Breslow NE, Day NE, Halvorsen KT, Prentice RL, Sabai C\| title=Estimation of multiple relative risk functions in matched case-control studies. \| journal=Am J Epidemiol \| year= 1978 \| volume= 108 \| issue= 4 \| pages= 299–307 \| pmid=727199 \| doi= 10.1093/oxfordjournals.aje.a112623~~\| url=https://www.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=pubmed&tool=sumsearch.org/cite&retmode=ref&cmd=prlinks&id=727199~~ }} </ref> It is the most flexible and general procedure for matched data. ==Background== Line 15 ⟶ 16: Logistic regression as described above works satisfactorily when the number of strata is small relative to the amount of data. If we hold the number of strata fixed and increase the amount of data, estimates of the model parameters (<math>\alpha_i</math> for each stratum and the vector <math>\boldsymbol\beta</math>) converge to their true values. Pathological behavior, however, occurs when we have many small strata because the number of parameters grow with the amount of data. For example, if each stratum contains two datapoints, then the number of parameters in a model with <math>N</math> datapoints is <math> N/2 + p</math>, so the number of parameters is of the same order as the number of datapoints. In these settings, as we increase the amount of data, the asymptotic results on which maximum likelihood estimation is based on are not valid and the resulting estimates are biased. Conditional logistic regression fixes this issue. In fact, it can be shown that the unconditional analysis of matched pair data results in an estimate of the [[odds ratio]] which is the square of the correct, conditional one.<ref>{{cite book \|last1=Breslow \|first1=N.E. \|last2=Day \|first2=N.E. \|date=1980 \|title=Statistical Methods in Cancer Research. Volume 1-The Analysis of Case-Control Studies \|url=http://www.iarc.fr/en/publications/pdfs-online/stat/sp32/ \|___location=Lyon, France \|publisher=IARC \|pages=249–251 \|access-date=2016-11-04 \|archive-url=https://web.archive.org/web/20161226114802/http://www.iarc.fr/en/publications/pdfs-online/stat/sp32/ \|archive-date=2016-12-26 \|url-status=dead }}</ref> In addition to tests based on logistic regression, several other tests existed before conditional logistic regression for matched data as shown in [[#Related tests\|related tests]]. However, they did not allow for the analysis of continuous predictors with arbitrary stratum size. All of those procedures also lack the flexibility of conditional logistic regression and in particular the possibility to control for covariates. Line 46 ⟶ 47: ==Related tests== * A [[~~Paired~~paired difference test]] ~~allows to~~can test the association between a binary outcome and a continuous predictor while taking into account pairing. * A [[Cochran-Mantel-Haenszel test]] ~~allows to~~can test the association between a binary outcome and a binary predictor while taking into account stratification with arbitrary strata size. When its conditions of application are verified, it is identical to the conditional logistic regression [[score test]].<ref>{{cite journal \| author = Day, N. E., Byar, D. P.\| title = Testing hypotheses in case-control studies-equivalence of Mantel-Haenszel statistics and logit score tests. \| journal = Biometrics \| date = 1979 \| volume = 35 \| issue = 3 \| pages = 623–630 \| doi=10.2307/2530253\| jstor = 2530253 \| pmid = 497345 }}</ref> ==Notes==