Genome-wide complex trait analysis: Difference between revisions

Content deleted Content added
Rescuing 2 sources and tagging 0 as dead.) #IABot (v2.0.9.5
m Disadvantages: | Altered template type. Add: pmc, pmid, pages, issue, volume, date, journal, title, doi, authors 1-8. Removed URL that duplicated identifier. Changed bare reference to CS1/2. | Use this tool. Report bugs. | #UCB_Gadget
Line 42:
# Computational inefficiency: The original GCTA implementation scales poorly with increasing data size (<math>\mathcal{O}(\text{SNPs} \cdot n^2)</math>), so even if enough data is available for precise GCTA estimates, the computational burden may be unfeasible. GCTA can be meta-analyzed as a standard precision-weighted fixed-effect meta-analysis,<ref>[http://gcta.freeforums.net/thread/213/analysis-greml-results-multiple-cohorts "Meta-analysis of GREML results from multiple cohorts"], Yang 2015</ref> so research groups sometimes estimate cohorts or subsets and then pool them meta-analytically (at the cost of additional complexity and some loss of precision). This has motivated the creation of faster implementations and variant algorithms which make different assumptions, such as using [[Method of moments (statistics)|moment matching]].<ref>[http://biorxiv.org/content/early/2016/08/18/070177 "Phenome-wide Heritability Analysis of the UK Biobank"], Ge et al 2016</ref>
# Need for raw data: GCTA requires genetic similarity of all subjects and thus their raw genetic information; due to privacy concerns, individual patient data is rarely shared. GCTA cannot be run on the summary statistics reported publicly by many GWAS projects, and if pooling multiple GCTA estimates, a [[meta-analysis]] must be performed. <br> In contrast, there are alternative techniques which operate on summaries reported by GWASes without requiring the raw data<ref>Pasaniuc & Price 2016, [https://www.dropbox.com/s/4mgmun29xbund7z/2016-pasaniuc.pdf "Dissecting the genetics of complex traits using summary association statistics"]</ref> e.g. "[[Linkage disequilibrium score regression|LD score regression]]"<ref>[https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4495769/ "LD Score Regression Distinguishes Confounding from Polygenicity in Genome-Wide Association Studies"], Bulik-Sullivan et al 2015</ref> contrasts [[linkage disequilibrium]] statistics (available from public datasets like [[1000 Genomes]]) with the public summary effect-sizes to infer heritability and estimate genetic correlations/overlaps of multiple traits. The [[Broad Institute]] runs [http://ldsc.broadinstitute.org/about/ LD Hub] {{Webarchive|url=https://web.archive.org/web/20160511100955/http://ldsc.broadinstitute.org/about/ |date=2016-05-11 }} which provides a public web interface to >=177 traits with LD score regression.<ref>[http://biorxiv.org/content/biorxiv/early/2016/05/03/051094.full.pdf "LD Hub: a centralized database and web interface to LD score regression that maximizes the potential of summary level GWAS data for SNP heritability and genetic correlation analysis"], Zheng et al 2016</ref> Another method using summary data is HESS.<ref>[http://biorxiv.org/content/early/2016/01/14/035907 "Contrasting the genetic architecture of 30 complex traits from summary association data"], Shi et al 2016</ref>
# Confidence intervals may be incorrect, or outside the 0-1 range of heritability, and highly imprecise due to asymptotics.<ref>[http:/{{cite journal | doi=10.1016/wwwj.sciencedirectajhg.com/science/article/pii/S00029297163014342016.04.016 | "title=Fast and Accurate Construction of Confidence Intervals for Heritability"], Schweiger| etjournal=The alAmerican Journal of Human Genetics | date=2 June 2016 | volume=98 | issue=6 | pages=1181–1192 | last1=Schweiger | first1=Regev | last2=Kaufman | first2=Shachar | last3=Laaksonen | first3=Reijo | last4=Kleber | first4=Marcus E. | last5=März | first5=Winfried | last6=Eskin | first6=Eleazar | last7=Rosset | first7=Saharon | last8=Halperin | first8=Eran | pmid=27259052 | pmc=4908190 }}</ref>
# Underestimation of SNP heritability: GCTA implicitly assumes all classes of SNPs, rarer or commoner, newer or older, more or less in linkage disequilibrium, have the same effects on average; in humans, rarer and newer variants tend to have larger and more negative effects<ref>[https://www.dropbox.com/s/idh2vm1dkar3qho/2017-gazal.pdf "Linkage disequilibrium–dependent architecture of human complex traits shows action of negative selection"], Gazal et al 2017</ref> as they represent [[mutation load]] being purged by [[Negative selection (natural selection)|negative selection]]. As with measurement error, this will bias GCTA estimates towards underestimating heritability.