Logistic regression: Difference between revisions

Content deleted Content added
m Maximum likelihood estimation (MLE): Article is lengthy at this point and expansion seems to have been accomplished.
"Rule of ten": removed "rule of thumb" phrase (carries distasteful historical reference)
Line 638:
{{main|One in ten rule}}
 
A widelyWidely used rule of thumb, the "[[one in ten rule]]", states that logistic regression models give stable values for the explanatory variables if based on a minimum of about 10 events per explanatory variable (EPV); where ''event'' denotes the cases belonging to the less frequent category in the dependent variable. Thus a study designed to use <math>k</math> explanatory variables for an event (e.g. [[myocardial infarction]]) expected to occur in a proportion <math>p</math> of participants in the study will require a total of <math>10k/p</math> participants. However, there is considerable debate about the reliability of this rule, which is based on simulation studies and lacks a secure theoretical underpinning.<ref>{{cite journal|pmid=27881078|pmc=5122171|year=2016|last1=Van Smeden|first1=M.|title=No rationale for 1 variable per 10 events criterion for binary logistic regression analysis|journal=BMC Medical Research Methodology|volume=16|issue=1|page=163|last2=De Groot|first2=J. A.|last3=Moons|first3=K. G.|last4=Collins|first4=G. S.|last5=Altman|first5=D. G.|last6=Eijkemans|first6=M. J.|last7=Reitsma|first7=J. B.|doi=10.1186/s12874-016-0267-3 |doi-access=free }}</ref> According to some authors<ref>{{cite journal|last=Peduzzi|first=P|author2=Concato, J |author3=Kemper, E |author4=Holford, TR |author5=Feinstein, AR |title=A simulation study of the number of events per variable in logistic regression analysis|journal=[[Journal of Clinical Epidemiology]]|date=December 1996|volume=49|issue=12|pages=1373–9|pmid=8970487|doi=10.1016/s0895-4356(96)00236-3|doi-access=free}}</ref> the rule is overly conservative in some circumstances, with the authors stating, "If we (somewhat subjectively) regard confidence interval coverage less than 93 percent, type I error greater than 7 percent, or relative bias greater than 15 percent as problematic, our results indicate that problems are fairly frequent with 2–4 EPV, uncommon with 5–9 EPV, and still observed with 10–16 EPV. The worst instances of each problem were not severe with 5–9 EPV and usually comparable to those with 10–16 EPV".<ref>{{cite journal|last1=Vittinghoff|first1=E.|last2=McCulloch|first2=C. E.|title=Relaxing the Rule of Ten Events per Variable in Logistic and Cox Regression|journal=American Journal of Epidemiology|date=12 January 2007|volume=165|issue=6|pages=710–718|doi=10.1093/aje/kwk052|pmid=17182981|doi-access=free}}</ref>
 
Others have found results that are not consistent with the above, using different criteria. A useful criterion is whether the fitted model will be expected to achieve the same predictive discrimination in a new sample as it appeared to achieve in the model development sample. For that criterion, 20 events per candidate variable may be required.<ref name=plo14mod/> Also, one can argue that 96 observations are needed only to estimate the model's intercept precisely enough that the margin of error in predicted probabilities is ±0.1 with a 0.95 confidence level.<ref name=rms/>