Content deleted Content added
finished last section, need to do applications to human genetics next |
Finished first draft. Note that the "notable applications to human genetic data" section has a table and image that were wholesale copied from the older version of this article. All other text should be my own. |
||
Line 7:
== Genetic clustering algorithms and methods ==
A wide range of methods have been developed to assess the structure of human populations with the use of genetic data. Early studies of within and between-group genetic variation used physical phenotypes and blood groups, with modern genetic studies using genetic markers such as [[Alu element|Alu sequences]], [[Microsatellite|short tandem repeat polymorphisms]], and [[Single-nucleotide polymorphism|single nucleotide polymorphisms]] (SNPs), among others.<ref>{{Cite journal|last=Bamshad|first=Michael|last2=Wooding|first2=Stephen|last3=Salisbury|first3=Benjamin A.|last4=Stephens|first4=J. Claiborne|date=2004-08|title=Deconstructing the relationship between genetics and race|url=http://dx.doi.org/10.1038/nrg1401|journal=Nature Reviews Genetics|volume=5|issue=8|pages=598–609|doi=10.1038/nrg1401|issn=1471-0056}}</ref> Models for genetic clustering also vary by algorithms and programs used to process the data. Most methods for determining clusters can be categorized as '''model-based clustering methods''' or '''multidimensional summaries'''.<ref name=":0" /><ref name=":1">{{Cite journal|last=Lawson|first=Daniel John|last2=Falush|first2=Daniel|date=2012-09-22|title=Population Identification Using Genetic Data|url=http://dx.doi.org/10.1146/annurev-genom-082410-101510|journal=Annual Review of Genomics and Human Genetics|volume=13|issue=1|pages=337–361|doi=10.1146/annurev-genom-082410-101510|issn=1527-8204}}</ref> By processing a large number of SNPs (or other genetic marker data) in different ways, both approaches to genetic clustering tend to converge on similar patterns by identifying similarities among SNPs and/or [[haplotype]] tracts to reveal ancestral genetic similarities.<ref name=":1" />
=== Model-based clustering ===
[[File:Rosenberg 1048people 993markers.jpg|thumb|Human population structure has been inferred from multilocus DNA sequence data (Rosenberg et al. 2002, 2005). Individuals from 52 populations were examined at 993 DNA markers. This data was used to partition individuals into K = 2, 3, 4, 5, or 6 gene clusters. In this figure, the average fractional membership of individuals from each population is represented by horizontal bars partitioned into K colored segments.]]
Common model-based clustering algorithms include STRUCTURE, ADMIXTURE, and HAPMIX. These algorithms operate by finding the best fit for genetic data among an arbitrary or mathematically derived number of clusters, such that differences within clusters are minimized and differences between clusters are maximized. This clustering method is also referred to as "[[Genetic admixture|admixture]] inference," as individual genomes (or individuals within populations) can be characterized by the proportions of [[Allele|alleles]] linked to each cluster.<ref name=":0" /> Of note, algorithms like STRUCTURE have required that populations are chosen for samples before running the cluster analysis.
Line 18 ⟶ 19:
There are caveats and limitations to genetic clustering methods of any type, given the degree of admixture and relative similarity within the human population. All genetic cluster findings are [[Sampling bias|biased]] by the sampling process used to gather data, and by the quality and quantity of that data. For example, many clustering studies use data derived from populations that are geographically distinct and far apart from one another, which may present an illusion of discrete clusters where, in reality, populations are much more blended with one another when intermediary groups are included.<ref name=":0" /> STRUCTURE in particular may be misleading by requiring the data to be sorted into a predetermined number of clusters which may or may not reflect the actual population's distribution.<ref name=":2">{{Cite journal|last=Kalinowski|first=S T|date=2010-08-04|title=The computer program STRUCTURE does not reliably identify the main genetic clusters within species: simulations and implications for human population structure|url=http://dx.doi.org/10.1038/hdy.2010.95|journal=Heredity|volume=106|issue=4|pages=625–632|doi=10.1038/hdy.2010.95|issn=0018-067X}}</ref> Sample size also plays an important moderating role on cluster findings, as different sample size inputs can influence cluster assignment, and more subtle relationships between genotypes may only emerge with larger sample sizes.<ref name=":0" /><ref name=":2" />
==
A number of landmark genetic cluster studies have been conducted since 2002, including the following:
{| class="wikitable"
|Authors
|Year
|Title
|Sample size / number of populations sampled
|Sample
|Markers
|-
|Rosenberg et al.
|2002
|Genetic Structure of Human Populations<ref name=":8">{{Cite journal|last1=Rosenberg|first1=Noah A.|last2=Pritchard|first2=Jonathan K.|last3=Weber|first3=James L.|last4=Cann|first4=Howard M.|last5=Kidd|first5=Kenneth K.|last6=Zhivotovsky|first6=Lev A.|last7=Feldman|first7=Marcus W.|date=2002-12-20|title=Genetic Structure of Human Populations|journal=Science|volume=298|issue=5602|pages=2381–2385|bibcode=2002Sci...298.2381R|doi=10.1126/science.1078311|issn=0036-8075|pmid=12493913|s2cid=8127224}}</ref>
|1056 / 52
|[[Human Genome Diversity Project]] (HGDP-CEPH)
|377 STRs
|-
| rowspan="2" |Serre & Pääbo
| rowspan="2" |2004
| rowspan="2" |Worldwide Human Relationships Inferred from Genome-Wide Patterns of Variation<ref>{{Cite journal|last1=Serre|first1=David|last2=Pääbo|first2=Svante|date=September 2004|title=Evidence for gradients of human genetic diversity within and among continents|journal=Genome Research|volume=14|issue=9|pages=1679–1685|doi=10.1101/gr.2529604|issn=1088-9051|pmc=515312|pmid=15342553}}</ref>
|89 / 15
|a: HGDP
| rowspan="2" |20 STRs
|-
|90 / geographically distributed individuals
|b: Jorde 1997
|-
|Rosenberg et al.
|2005
|Clines, Clusters, and the Effect of Study Design on the Inference of Human Population Structure<ref name="rosenberg2005">{{cite journal|last1=Rosenberg|first1=NA|last2=Mahajan|first2=S|last3=Ramachandran|first3=S|last4=Zhao|first4=C|last5=Pritchard|first5=JK|display-authors=etal|year=2005|title=Clines, Clusters, and the Effect of Study Design on the Inference of Human Population Structure|url=|journal=PLOS Genet|volume=1|issue=6|page=e70|doi=10.1371/journal.pgen.0010070|pmc=1310579|pmid=16355252|authorlink5=Jonathan K. Pritchard}}</ref>
|1056 / 52
|[[Human Genome Diversity Project]] (HGDP-CEPH)
|783 STRs + 210 indels
|-
|Li et al.
|2008
|Worldwide Human Relationships Inferred from Genome-Wide Patterns of Variation<ref>{{Cite journal|last1=Li|first1=Jun Z.|last2=Absher|first2=Devin M.|last3=Tang|first3=Hua|last4=Southwick|first4=Audrey M.|last5=Casto|first5=Amanda M.|last6=Ramachandran|first6=Sohini|last7=Cann|first7=Howard M.|last8=Barsh|first8=Gregory S.|last9=Feldman|first9=Marcus|last10=Cavalli-Sforza|first10=Luigi L.|last11=Myers|first11=Richard M.|date=2008-02-22|title=Worldwide Human Relationships Inferred from Genome-Wide Patterns of Variation|journal=Science|volume=319|issue=5866|pages=1100–1104|bibcode=2008Sci...319.1100L|doi=10.1126/science.1153717|issn=0036-8075|pmid=18292342|s2cid=53541133}}</ref>
|938 / 51
|[[Human Genome Diversity Project]] (HGDP-CEPH)
|650,000 SNPs
|-
|Tishkoff et al.
|2009
|The Genetic Structure and History of Africans and African Americans<ref name=":62">{{Cite journal|last1=Tishkoff|first1=Sarah A|last2=Reed|first2=Floyd A|last3=Friedlaender|first3=Françoise R|last4=Ehret|first4=Christopher|last5=Ranciaro|first5=Alessia|last6=Froment|first6=Alain|last7=Hirbo|first7=Jibril B|last8=Awomoyi|first8=Agnes A|last9=Bodo|first9=Jean-Marie|last10=Doumbo|first10=Ogobara|last11=Ibrahim|first11=Muntaser|date=2009-05-22|title=The Genetic Structure and History of Africans and African Americans|journal=Science|volume=324|issue=5930|pages=1035–1044|bibcode=2009Sci...324.1035T|doi=10.1126/science.1172257|issn=0036-8075|pmc=2947357|pmid=19407144|last20=Pretorius|first25=Scott M|last25=Williams|first24=James L|last24=Weber|first23=Charles|last23=Wambebe|first22=Mahamadou A|last22=Thera|first21=Michael W|last21=Smith|first20=Gideon S|last19=Powell|first19=Kweli|last12=Juma|first18=Sabah A|last18=Omar|first17=Thomas B|first16=Holly|last16=Mortensen|first15=Jason H|last15=Moore|first14=Godfrey|last14=Lema|first13=Maritha J|last13=Kotze|first12=Abdalla T|last17=Nyambo}}</ref>
|~3400 / 185
|HGDP-CEPH ''plus'' 133 additional African populations and Indian individuals
|1327 STRs + indels
|-
|Xing et al.
|2010
|Toward a more uniform sampling of human genetic diversity: A survey of worldwide populations by high-density genotyping<ref name=":7">{{Cite journal|last1=Xing|first1=Jinchuan|last2=Watkins|first2=W. Scott|last3=Shlien|first3=Adam|last4=Walker|first4=Erin|last5=Huff|first5=Chad D.|last6=Witherspoon|first6=David J.|last7=Zhang|first7=Yuhua|last8=Simonson|first8=Tatum S.|last9=Weiss|first9=Robert B.|last10=Schiffman|first10=Joshua D.|last11=Malkin|first11=David|date=October 2010|title=Toward a more uniform sampling of human genetic diversity: A survey of worldwide populations by high-density genotyping|journal=Genomics|volume=96|issue=4|pages=199–210|doi=10.1016/j.ygeno.2010.07.004|issn=0888-7543|pmc=2945611|pmid=20643205|last12=Woodward|first12=Scott R.|last13=Jorde|first13=Lynn B.}}</ref>
|850 / 40
|HapMap ''plus'' 296 individuals
|250,000 SNPs
|}
== Genetic clustering and race ==
A plurality of human genetic clustering studies have produced clusters of individuals with similar geographic origins or ancestry, and these findings have been interpreted by some to suggest biological support for the concept of race. Clustering results often, for example, have shown a clear cluster distinction between individuals with African and non-African ancestry, and other levels of clustering have come close to placing individuals all within their corresponding continental populations (i.e., Europeans clustered together, East Asians clustered together, etc.).<ref name=":4" /> Rosenberg et al. (2002) suggested
Many other scholars have challenged the idea that race can be inferred by genetic clusters, drawing distinctions between arbitrarily assigned genetic clusters, ancestry, and race. One recurring caution against thinking of human populations in terms of clusters is the notion that genotypic variation and traits are distributed evenly between populations, along gradual [[Cline (biology)|clines]] rather than along discrete population boundaries; so although genetic similarities are usually organized geographically, their underlying populations have never been completely separated from one another. And due to migration, gene flow, and baseline homogeneity, features between groups are extensively overlapping and intermixed.<ref name=":3" /><ref name=":4" /> Moreover, genetic clusters do not typically match socially defined racial groups; many commonly understood races may not be sorted into the same genetic cluster, and many genetic clusters are made up of individuals who would have distinct racial identities.<ref name=":5" /> In general, clusters may most simply be understood as products of the methods used to sample and analyze genetic data; not without meaning for understanding ancestry and genetic characteristics, but inadequate to fully explaining the concept of race, which is more often described in terms of social and cultural forces.
|