Human genetic clustering: Difference between revisions

Content deleted Content added
Millager (talk | contribs)
pushing forward
Millager (talk | contribs)
finished draft of first subsection
Line 7:
 
== Genetic clustering algorithms and methods ==
Since at least 2001, a wide range of methods have been developed to assess the structure of human populations with the use of genetic data. Most commonly, genetic clusters can be derived by analysis of [[Single-nucleotide polymorphism|single nucleotide polymorphisms]] (SNPs), although other genetic data can be input and analyzed as well. Models for genetic clustering also vary by algorithms and programs used to process the data. Most methods for determining clusters can be categorized as '''model-based clustering methods''' or '''multidimensional summaries'''.<ref>{{Cite journal|last=Novembre|first=John|last2=Ramachandran|first2=Sohini|date=2011-09-22|title=Perspectives on Human Population Structure at the Cusp of the Sequencing Era|url=http://dx.doi.org/10.1146/annurev-genom-090810-183123|journal=Annual Review of Genomics and Human Genetics|volume=12|issue=1|pages=245–274|doi=10.1146/annurev-genom-090810-183123|issn=1527-8204}}</ref><ref name=":1">{{Cite journal|last=Lawson|first=Daniel John|last2=Falush|first2=Daniel|date=2012-09-22|title=Population Identification Using Genetic Data|url=http://dx.doi.org/10.1146/annurev-genom-082410-101510|journal=Annual Review of Genomics and Human Genetics|volume=13|issue=1|pages=337–361|doi=10.1146/annurev-genom-082410-101510|issn=1527-8204}}</ref> By processing a large number of SNPs (or other genetic marker data) in different ways, both approaches to genetic clustering operatetend to converge on similar patterns by identifying similarities among individual SNPs or [[haplotype]] tracts to reveal ancestral genetic similarities.<ref '''###addname=":1" something about these being different but showing similar results, and cite Lawson & Falush.'''/>
 
=== Model-based clustering ===
Line 13:
 
=== Multidimensional summary statistics ===
Where model-based clustering characterizes populations using proportions of discrete clusters, multidimensional summary statistics characterize populations on a continuous spectrum. The most common multidimensional statistical method used for genetic clustering is [[principal component analysis]] (PCA), which plots individuals by two or more axes (their "principal components") that represent aggregations of genetic markers that account for the highest variance. Clusters can then be identified by assessing the distribution of data, in discrete groups and with admixed position between groups.<ref name=":0" /><ref name=":1" />
Where model-based clustering aims to characterize proportions of cluster
 
=== Caveats and drawbacks ===
 
There are many caveats and drawbacks to genetic clustering methods of any type, given the degree of admixture and relative similarity within the human population. All genetic cluster findings are [[Sampling bias|biased]] by the sampling process used to gather data, and by the quality and quantity of that data. Many clustering studies use data derived from populations that are geographically distinct from one another, which may present a false illusion of clearly discrete clusters.<ref name=":0" /> STRUCTURE in particular can be misleading by requiring the data to be sorted into a predetermined number of clusters, which may or may not reflect the actual population's distribution.<ref name=":2">{{Cite journal|last=Kalinowski|first=S T|date=2010-08-04|title=The computer program STRUCTURE does not reliably identify the main genetic clusters within species: simulations and implications for human population structure|url=http://dx.doi.org/10.1038/hdy.2010.95|journal=Heredity|volume=106|issue=4|pages=625–632|doi=10.1038/hdy.2010.95|issn=0018-067X}}</ref> Sample size also plays an important moderating role on cluster findings, as different sample size inputs can influence cluster assignment, and more subtle relationships between genotypes may only emerge with larger sample sizes.<ref name=":0" /><ref name=":2" /><translate>
 
<translate>
== Applications to human genetic data == <!--T:11-->
</translate>Text of this section.