Content deleted Content added
Peterfrazier (talk | contribs) →Motivation: Clarifying the context about logistic regression and adding an example to help make the more article more clear. Also adding that a conditional logit implementation is available in statsmodels. |
Citation bot (talk | contribs) Removed URL that duplicated identifier. | Use this bot. Report bugs. | #UCB_CommandLine |
||
(4 intermediate revisions by 4 users not shown) | |||
Line 1:
{{Short description|Statistical technique}}
'''Conditional logistic regression''' is an extension of [[logistic regression]] that allows one to account for [[stratification (clinical trials)|stratification]] and [[Matching (statistics)|matching]]. Its main field of application is [[observational studies]] and in particular [[epidemiology]]. It was devised in 1978 by [[Norman Breslow]], [[Nick Day (statistician)|Nicholas Day]], [[Katherine Halvorsen]], [[Ross L. Prentice]] and C. Sabai.<ref name="pmid727199">{{cite journal|vauthors=Breslow NE, Day NE, Halvorsen KT, Prentice RL, Sabai C| title=Estimation of multiple relative risk functions in matched case-control studies. | journal=Am J Epidemiol | year= 1978 | volume= 108 | issue= 4 | pages= 299–307 | pmid=727199 | doi= 10.1093/oxfordjournals.aje.a112623
==Background==
Line 15 ⟶ 16:
Logistic regression as described above works satisfactorily when the number of strata is small relative to the amount of data. If we hold the number of strata fixed and increase the amount of data, estimates of the model parameters (<math>\alpha_i</math> for each stratum and the vector <math>\boldsymbol\beta</math>) converge to their true values.
Pathological behavior, however, occurs when we have many small strata because the number of parameters grow with the amount of data. For example, if each stratum contains two datapoints, then the number of parameters in a model with <math>N</math> datapoints is <math> N/2 + p</math>, so the number of parameters is of the same order as the number of datapoints. In these settings, as we increase the amount of data, the asymptotic results on which maximum likelihood estimation is based on are not valid and the resulting estimates are biased. Conditional logistic regression fixes this issue. In fact, it can be shown that the unconditional analysis of matched pair data results in an estimate of the [[odds ratio]] which is the square of the correct, conditional one.<ref>{{cite book |last1=Breslow |first1=N.E. |last2=Day |first2=N.E. |date=1980 |title=Statistical Methods in Cancer Research. Volume 1-The Analysis of Case-Control Studies |url=http://www.iarc.fr/en/publications/pdfs-online/stat/sp32/ |___location=Lyon, France |publisher=IARC |pages=249–251 |access-date=2016-11-04 |archive-url=https://web.archive.org/web/20161226114802/http://www.iarc.fr/en/publications/pdfs-online/stat/sp32/ |archive-date=2016-12-26 |url-status=dead }}</ref>
In addition to tests based on logistic regression, several other tests existed before conditional logistic regression for matched data as shown in [[#Related tests|related tests]]. However, they did not allow for the analysis of continuous predictors with arbitrary stratum size. All of those procedures also lack the flexibility of conditional logistic regression and in particular the possibility to control for covariates.
Line 46 ⟶ 47:
==Related tests==
* A [[
* A [[Cochran-Mantel-Haenszel test]]
==Notes==
|