Content deleted Content added
Anachronist (talk | contribs) Undid revision 1132806771 by 51.155.207.129 (talk) - unexplained removal of validly sourced content |
Expanded the interpretation and biases section. Removed most of the contrasts with twin studies and intelligence since these are unrelated to GCTA estimates and make this entry not self-contained. Tag: references removed |
||
Line 42:
== Interpretation ==
GCTA provides an unbiased estimate of the total variance in phenotype explained by all variants included in the relatedness matrix (and any variation correlated with those SNPs). This estimate can also be interpreted as the maximum prediction accuracy (R^2) that could be achieved from a linear predictor using all SNPs in the relatedness matrix. The latter interpretation is particularly relevant to the development of Polygenic Risk Scores, as it defines their maximum accuracy. GCTA estimates are sometimes misinterpreted as estimates of total (or narrow-sense, i.e. additive) heritability, but this is not a guarentee of the method. GCTA estimates are likewise sometimes misinterpreted as "lower bounds" on the narrow-sense heritability but this is also incorrect: first because GCTA estimates can be biased (including biased upwards) if the model assumptions are violated, and second because, by definition (and when model assumptions are met), GCTA can provide an unbiased estimate of the narrow-sense heritability if all causal variants are included in the relatedness matrix. The interpretation of the GCTA estimate in relation to the narrow-sense heritability thus depends on the variants used to construct the relatedness matrix.
Most frequently, GCTA is run with a single relatedness matrix constructed from common SNPs and will not capture (or not fully capture) the contribution of the following factors:
# Any rare or low-frequency variants that are not directly genotyped/imputed.
# Any non-linear, dominance, or epistatic genetic effects. Note that GCTA can be extended to estimate the contribution of these effects through more complex relatedness matrices.
# The effects of Gene-Environment interactions. Note that GCTA can be extended to estimate the contribution of GxE interactions when the E is known, by including additional variance components.
# Structural variants, which are typically not genotyped or imputed.
# Measurement error: GCTA does not model any uncertainty or error on the measured trait.
GCTA makes several model assumptions and may produce biased estimates under the following conditions:
# The distribution of causal variants is systematically different from the distribution of variants included in the relatedness matrix (even if all causal variants are included in the relatedness matrix). For example, if causal variants are systematically at a higher/lower frequency or in higher/lower correlation than all genotyped variants. This can produce either an upwards or downwards bias depending on the relationship between the causal variants and variants used. Various extensions to GCTA have been proposed (for example, GREML-LDMS) to account for these distributional shifts.
# Population stratification is not fully accounted for by covariates. GCTA (specifically GREML) accounts for stratification through the inclusion of fixed effect covariates, typically principal components. If these covariates do not fully capture the stratification the GCTA estimate will be biased, generally upwards. Accounting for recent population structure is particularly challenging for studies of rare variants.
# Residual genetic or environmental relatedness present in the data. GCTA assumes a homogenous population with an independent and identically distributed environmental term. This assumption is violated if related individuals and/or individuals with substantially shared environments are included in the data. In this case, the GCTA estimate will additionally capture the contribution of any genetic variation correlated with the genetic relationship: either direct genetic effects or correlated environment.
# The presence of "indirect" genetic effects. When genetic variants present in the relatedness matrix are correlated with variants present in other individuals that influence the participant's environment, those effects will also be captured in the GCTA estimate. For example, if variants inherited by a participant from their mother influenced their phenotype through their maternal environment, then the effect of those variants will be included in the GCTA estimate even though it is "indirect" (i.e. mediated by parental genetics). This may be interpreted as an upward bias as such "indirect" effects are not strictly causal (altering them in the participant would not lead to a change in phenotype in expectation).
== Implementations ==
|