Nested case–control study: Difference between revisions

Content deleted Content added
No edit summary
Settings
Tags: references removed Visual edit Mobile edit Mobile web edit
 
(45 intermediate revisions by 36 users not shown)
Line 1:
A '''nested case–control (NCC) study''' is a variation of a case–control study in which cases and controls are drawn from the population in a fully enumerated cohort.
A '''nested case control (NCC) study''' is a variation of a [[case-control study]] in which only a subset of controls from the cohort are compared to the incident cases. In a case-cohort study, all incident cases in the cohort are compared to a random subset of participants who do not develop the disease of interest. In contrast, in a nested-case-control study, some number of controls are selected for each case from that case's matched risk set. By matching on factors such as age and selecting controls from relevant risk sets, the nested case control model is generally more efficient than a case-cohort design with the same number of selected controls.
 
Usually, the exposure of interest is only measur--[[Special:Contributions/203.241.147.40|203.241.147.40]] ([[User talk:203.241.147.40|talk]]) 08:27, 19 June 2014 (UTC)edmeasured among the cases and the selected controls. Thus the nested case controlcase–control study is lessmore efficient than the full cohort design. The nested case controlcase–control study can be analyzed using methods for missing covariates.<ref name=Cai/>
 
The NCC design is often used when the exposure of interest is difficult or expensive to obtain and when the outcome is rare. By utilizing data previously collected from a large cohort study, the time and cost of beginning a new case-controlcase–control study is avoided. By only measuring the covariate in as many participants as necessary, the cost and effort of exposure assessment is reduced. This benefit is pronounced when the covariate of interest is biological, since assessments such as [[gene expression profiling]] are expensive, and because the quantity of blood available for such analysis is often limited, making it a valuable resource that should not be used unnecessarily.
 
==Example==
As an example, of the 91,523 women in the [[Nurses' Health Study]] who did not have cancer at baseline and who were followed for 14 years, 2,341 women had developed breast cancer by 1993. Several studies have used standard cohort analyses to study precursors to breast cancer, e.g. use of hormonal contraceptives,<ref name="pmid9051324">{{cite journal|author1=Hankinson, Susan.SE 1997.|author2=Colditz ''GA |author3=Manson JE |author4=Willett WC |author5=Hunter DJ |author6=Stampfer MJ |display-authors=etal | title=A prospective study of oral contraceptive use and risk of breast cancer (Nurses'' Health Study, United States). | journal=Cancer Causes and Control | year= 1997 | volume= 8 | issue= 1 | pages= 65–72 | pmid=9051324 | doi= 10.1023/a:1018435205695|s2cid=24873830 }}</ref> which is a covariate easily measured on all of the women in the cohort. However, note that in comparison to the cases, there are so many controls that each particular control contributes relatively little information to the analysis.
 
As an example, of the 91,523 women in the [[Nurses' Health Study]] who did not have cancer at baseline and who were followed for 14 years, 2,341 women had developed breast cancer by 1993. Several studies have used standard cohort analyses to study precursors to breast cancer, e.g. use of hormonal contraceptives,<ref>Hankinson, Susan. 1997. ''A prospective study of oral contraceptive use and risk of breast cancer''. Cancer Causes and Control,
Volume 8, Number 1, 65-72.</ref> which is a covariate easily measured on all of the women in the cohort. However, note that in comparison to the cases, there are so many controls that each particular control contributes relatively little information to the analysis.
 
If, on the other hand, one is interested in the association between [[gene expression]] and breast cancer incidence, it would be very expensive and possibly wasteful of precious blood specimen to assay all 89,000 women without breast cancer. In this situation, one may choose to assay all of the cases, and also, for each case, select a certain number of women to assay from the risk set of participants who have not yet failed (i.e. those who have not developed breast cancer before the particular case in question has developed breast cancer). The risk set is often restricted to those participants who are matched to the case on variables such as age, which reduces the variability of effect estimates.
 
==Efficiency of the NCC model==
Commonly 1-41–4 controls are selected for each case. Since the covariate is not measured for all participants, the nested case controlcase–control model is both less expensive than a full cohort analysis and lessmore efficient than taking a simple random sample from the full-cohort analysiscohort. However, it has been shown that with 4 controls per case and/or stratified sampling of controls, relatively little efficiency may be lost, depending on the method of estimation used .<ref name=Cai>{{cite journal |last1=Cai, |first1=Tianxi|author1-link=Tianxi andCai |last2=Zheng, |first2=Yingye. 2011.|year=2012 ''|title=Evaluating prognostic accuracy of biomarkers in nested case–control studies''. |journal=Biostatistics, In|volume=13 Press|issue=1 |pages=89–100 |doi=10.1093/biostatistics/kxr021 |pmc=3276269 |pmid=21856652}}</ref><ref>{{cite journal |last1=Goldstein, |first1=Larry and |last2=Zhang, |first2=Haimeng. |year=2009. ''|title=Efficiency of the maximum partial likelihood estimator for nested case control sampling''. |journal=Bernoulli, |volume=15 |issue=2 |pages=569–597 |jstor=20680165 |doi=10.3150/08-bej162|arxiv=0809.0445 |s2cid=16589954 }}</ref>
 
==Analysis of nested case-controlcase–control studies==
Commonly 1-4 controls are selected for each case. Since the covariate is not measured for all participants, the nested case control model is both less expensive and less efficient than a full-cohort analysis. However, it has been shown that with 4 controls per case and/or stratified sampling of controls, relatively little efficiency may be lost, depending on the method of estimation used <ref name=Cai>Cai, Tianxi and Zheng, Yingye. 2011. ''Evaluating prognostic accuracy of biomarkers in nested case–control studies''. Biostatistics, In Press.</ref><ref>Goldstein, Larry and Zhang, Haimeng. 2009. ''Efficiency of the maximum partial likelihood estimator for nested case control sampling''. Bernoulli,
The analysis of a nested case controlcase–control model must take into account the way in which controls are sampled from the cohort. Failing to do so, such as by treating the cases and selected controls as the original cohort and performing a logistic regression, which is common, can result in biased estimates whose null distribution is different thanfrom what is assumed. Ways to account for the random sampling include [[conditional logistic regression ]],<ref>{{cite journal |last1=Borgan, |first1=O., |last2=Goldstein, |first2=L., and |last3=Langholz, |first3=B. |year=1995. ''|title=Methods for the Analysis of Sampled Cohort Data in the Cox Proportional Hazards Model''. |journal=[[Annals of Statistics,]] |volume=23 |number=5 |pages=1749–1778 |jstor=2242544 |doi=10.1214/aos/1176324322|url=https://www.duo.uio.no/bitstream/10852/47861/1/1992-7.pdf |doi-access=free }}</ref> and using [[inverse probability weighting]] to adjust for missing covariates among those who are not selected into the study.<ref name=Cai/>
Volume 15, Number 2, 569-597.</ref>
 
== Case-cohortCase–cohort study ==
==Analysis of nested case-control studies==
A case-controlcase–cohort study is a design in which cases and controls are drawn from within a prospective study. All cases who developed the outcome of interest during the follow-up are selected and compared with a subgrouprandom sample of the non-cohort. This randomly selected control sample could, by chance, include some cases. Exposure is defined prior to disease development based on data collected at baseline or on assays conducted in biological samples collected at baseline.
 
==References==
The analysis of a nested case control model must take into account the way in which controls are sampled from the cohort. Failing to do so, such as by treating the cases and selected controls as the original cohort and performing a logistic regression, which is common, can result in biased estimates whose null distribution is different than what is assumed. Ways to account for the random sampling include conditional logistic regression <ref>Borgan, O., Goldstein, L., and Langholz, B. 1995. ''Methods for the Analysis of Sampled Cohort Data in the Cox Proportional Hazards Model''. Annals of Statistics,
{{reflist}}
Volume 23, Number 5, 1749-1778.</ref> and using [[inverse probability weighting]] to adjust for missing covariates among those who are not selected into the study.<ref name=Cai/>
{{cite book |last=Porta |first=Miquel |date=2014 |title=A Dictionary of Epidemiology |___location=Oxford |publisher=Oxford University Press }}
 
==Further reading==
== Case-cohort study ==
*{{cite book |first1=Ruth H. |last1=Keogh |first2=D. R. |last2=Cox |author-link2=David Cox (statistician) |chapter=Nested case–control studies |title=Case–Control Studies |publisher=Cambridge University Press |year=2014 |pages=160–190 |isbn=978-1-107-01956-0 |chapter-url=https://books.google.com/books?id=GdXSAgAAQBAJ&pg=PA160 }}
A case-control study in which cases and controls are drawn from within a prospective study. All cases who developed the outcome of interest during the follow-up are selected and compared with a subgroup of the non-cases. Exposure is defined prior to disease development based on data collected at baseline or on assays conducted in biological samples collected at baseline.
 
==References==
{{reflist}}
{{Medical research studies}}
 
{{DEFAULTSORT:Nested case-control study}}
[[Category:Epidemiology]]
[[Category:Epidemiological study projects]]
[[Category:Design of experiments]]
[[Category:Cohort study methods]]