Human genetic clustering: Difference between revisions

Content deleted Content added
Millager (talk | contribs)
pausing for now, hopefully one more push and this will be done
OAbot (talk | contribs)
m Open access bot: pmc updated in citation with #oabot.
 
(44 intermediate revisions by 26 users not shown)
Line 1:
{{Distinguish|Gene cluster|Metabolic gene cluster|Cluster genealogy}}
= Human genetic clustering =
'''Human genetic clustering''' refers to apatterns of relative genetic similarity among human individuals and populations, as well as the wide range of scientific and statistical methods often used to characterize patterns and subgroupsstudy withinthis studiesaspect of [[human genetic variation]].
 
Clustering studies are thought to be valuable for characterizing the general structure of genetic variation among human populations, to contribute to the study of ancestral origins, evolutionary history, and precision medicine. Since the mapping of the human genome, and with the availability of increasingly powerful analytic tools, [[Cluster analysis|cluster analyses]] have revealed a range of ancestral and migratory trends among human populations and individuals.<ref name=":002">{{Cite journal|lastlast1=Novembre|firstfirst1=John|last2=Ramachandran|first2=Sohini|date=2011-09-22|title=Perspectives on Human Population Structure at the Cusp of the Sequencing Era|url=http://dx.doi.org/10.1146/annurev-genom-090810-183123|journal=Annual Review of Genomics and Human Genetics|volume=12|issue=1|pages=245–274|doi=10.1146/annurev-genom-090810-183123|pmid=21801023|issn=1527-8204|url-access=subscription}}</ref> HumansHuman genetic clusters tend to clusterbe togetherorganized by geographic ancestry, with divisions between clusters aligning largely with geographic barriers such as oceans or mountain ranges.<ref Butname=":32">{{Cite journal|last1=Maglo|first1=Koffi N.|last2=Mersha|first2=Tesfaye B.|last3=Martin|first3=Lisa J.|date=2016-02-17|title=Population Genomics and the Statistical Values of Race: An Interdisciplinary Perspective on the Biological Classification of Human Populations and Implications for Clinical Genetic Epidemiological Research|journal=Frontiers in Genetics|volume=7|page=22|doi=10.3389/fgene.2016.00022|pmid=26925096|pmc=4756148|issn=1664-8021|doi-access=free}}</ref><ref name=":92">{{Cite book|date=2012-10-29|editor-last=Goodman|editor-first=Alan H.|editor2-last=Moses|editor2-first=Yolanda T.|editor3-last=Jones|editor3-first=Joseph L.|title=Race|url=http://dx.doi.org/10.1002/9781118233023|doi=10.1002/9781118233023|isbn=9781118233023}}</ref> Clustering studies have been applied to global populations,<ref name=":102">{{Cite journal|last=Rosenberg|first=N. A.|date=2002-12-20|title=Genetic Structure of Human Populations|url=http://dx.doi.org/10.1126/science.1078311|journal=Science|volume=298|issue=5602|pages=2381–2385|doi=10.1126/science.1078311|pmid=12493913|bibcode=2002Sci...298.2381R|s2cid=8127224|issn=0036-8075|url-access=subscription}}</ref> as well as to population subsets like post-colonial North America.<ref name=":112">{{Cite journal|last1=Han|first1=Eunjung|last2=Carbonetto|first2=Peter|last3=Curtis|first3=Ross E.|last4=Wang|first4=Yong|last5=Granka|first5=Julie M.|last6=Byrnes|first6=Jake|last7=Noto|first7=Keith|last8=Kermany|first8=Amir R.|last9=Myres|first9=Natalie M.|last10=Barber|first10=Mathew J.|last11=Rand|first11=Kristin A.|date=2017-02-07|title=Clustering of 770,000 genomes reveals post-colonial population structure of North America|journal=Nature Communications|language=en|volume=8|issue=1|pages=14238|doi=10.1038/ncomms14238|pmid=28169989|pmc=5309710|bibcode=2017NatCo...814238H|issn=2041-1723|doi-access=free}}</ref><ref name=":122">{{Cite journal|last1=Jordan|first1=I. King|last2=Rishishwar|first2=Lavanya|last3=Conley|first3=Andrew B.|date=September 2019|title=Native American admixture recapitulates population-specific migration and settlement of the continental United States|journal=PLOS Genetics|volume=15|issue=9|pages=e1008225|doi=10.1371/journal.pgen.1008225|issn=1553-7404|pmc=6756731|pmid=31545791 |doi-access=free }}</ref> Notably, the practice of defining clusters among modern human populations is largely arbitrary and variable due to the continuous nature of human genotypes; although individual genetic markers can be used to produce smaller groups, there are no models that produce completely distinct subgroups when larger numbers of genetic markers are used.<ref name=":532" /><ref name=":52">{{Cite journal|lastlast1=Bamshad|firstfirst1=Michael J.|last2=Olson|first2=Steve E.|date=December 2003-12|title=Does Race Exist?|url=http://dx.doi.org/10.1038/scientificamerican1203-78|journal=Scientific American|volume=289|issue=6|pages=78–85|doi=10.1038/scientificamerican1203-78|pmid=14631734|bibcode=2003SciAm.289f..78B|issn=0036-8733|url-access=subscription}}</ref><ref name=":322">{{Cite journal|last=MagloKalinowski|first=KoffiS N.|last2=Mersha|first2=Tesfaye B.|last3=Martin|first3=Lisa J.T|date=20162010-0208-1704|title=PopulationThe Genomicscomputer andprogram theSTRUCTURE Statisticaldoes Valuesnot ofreliably Race: An Interdisciplinary Perspective onidentify the Biologicalmain Classificationgenetic ofclusters Humanwithin Populationsspecies: simulations and Implicationsimplications for Clinicalhuman Geneticpopulation Epidemiological Research|url=http://dx.doi.org/10.3389/fgene.2016.00022structure|journal=Frontiers in GeneticsHeredity|volume=7106|issue=4|pages=625–632|doi=10.33891038/fgenehdy.20162010.0002295|pmid=20683484|pmc=3183908|issn=16640018-8021067X|doi-access=free}}</ref>
 
StudiesMany studies of human genetic clustering have been implicated in discussions of [[Race (human categorization)|race]], [[Ethnic group|ethnicity]], and [[scientific racism]], as some have controversially suggested that genetically derived clusters may be understood as proof of genetically determined races.<ref name=":442">{{Cite journal|lastlast1=Jorde|firstfirst1=Lynn B|last2=Wooding|first2=Stephen P|date=2004-10-26|title=Genetic variation, classification and 'race'|url=http://dx.doi.org/10.1038/ng1435|journal=Nature Genetics|volume=36|issue=S11|pages=S28–S33|doi=10.1038/ng1435|pmid=15508000|issn=1061-4036|doi-access=free}}</ref><ref>{{Cite book|last=Verfasser.Marks|first=Marks, Jonathan (Jonathan M.), 1955-|url=httphttps://worldcat.org/oclc/1037867598|title=Is science racist?|date=27 February 2017|publisher=John Wiley & Sons |isbn=978-0-7456-8925-8|oclc=1037867598}}</ref> Although cluster analyses invariably organize humans (or groups of humans) into subgroups, debatesince the work of evolutionary biologists such as [[Richard Lewontin]], [[Luigi Luca Cavalli-Sforza|Luigi Cavalli-Sforza]], and [[Marcus Feldman]] in the 1970s there is ongoingvirtually onno howdebate towithin interprethuman genetics that any of these genetic clusters withcan respectbe attributed to raceraces, andnor itsdoes socialknowing andany phenotypicindividual's featuresskin tone or continent of origin constitute a meaningful prediction of specific alleles.<ref name=":0">{{Cite web |date=2021-11-19 |title=What Can Genetic Testing Tell You About ‘Race’? |url=https://magazine.scienceforthepeople.org/lewontin-special-issue/genetics-of-race-gswg/ |access-date=2025-02-26 |website=Science for the People Magazine |language=en-US}}</ref> And, because there is such a small fraction of genetic variation between human genotypes overall, genetic clustering approaches are highly dependent on the sampled data, genetic markers, and statistical methods applied to their construction. It has also been repeatedly demonstrated by various methodologies that the five races ([[caucasoid]], [[mongoloid]], [[negroid]], [[Red race|American]] or "red", and [[Malay race|Malay]]) historically purported by scientific racism do not comport with population substructures derivable from any modern genomic datasets.<ref>{{Cite journal |last=Auton |first=Adam |last2=Abecasis |first2=Gonçalo R. |last3=Altshuler |first3=David M. |last4=Durbin |first4=Richard M. |last5=Abecasis |first5=Gonçalo R. |last6=Bentley |first6=David R. |last7=Chakravarti |first7=Aravinda |last8=Clark |first8=Andrew G. |last9=Donnelly |first9=Peter |last10=Eichler |first10=Evan E. |last11=Flicek |first11=Paul |last12=Gabriel |first12=Stacey B. |last13=Gibbs |first13=Richard A. |last14=Green |first14=Eric D. |last15=Hurles |first15=Matthew E. |date=October 2015 |title=A global reference for human genetic variation |url=https://www.nature.com/articles/nature15393 |journal=Nature |language=en |volume=526 |issue=7571 |pages=68–74 |doi=10.1038/nature15393 |issn=1476-4687|hdl=11693/38161 |hdl-access=free |pmc=4750478 }}</ref> Rather, the evidence for [[Cline (biology)|clinal]] patterns of human genetic variation overwhelms that pointing towards distinct groups defined by [[Human skin color|skin pigmentation]] or [[Phrenology|skull shape]],<ref name=":0" /> and arbitrarily invoking five population clusters in an attempt to test the genomic validity of scientific racism instead yields three "races" within Africa, one encompassing most of Europe and mainland Asia, and one encompassing Australia, the Americas, and the Pacific Islands.<ref>{{Cite journal |last=Lala |first=Kevin N. |last2=Feldman |first2=Marcus W. |date=2024-11-26 |title=Genes, culture, and scientific racism |url=https://www.pnas.org/doi/10.1073/pnas.2322874121 |journal=Proceedings of the National Academy of Sciences |volume=121 |issue=48 |pages=e2322874121 |doi=10.1073/pnas.2322874121 |pmc=11621800 |pmid=39556747}}</ref>
 
== Genetic clustering algorithms and methods ==
A wide range of methods have been developed to assess the structure of human populations with the use of genetic data. Early studies of within and between-group genetic variation used physical phenotypes and blood groups, with modern genetic studies using genetic markers such as restriction[[Alu siteelement|Alu polymorphismssequences]], [[Microsatellite|short tandem repeat polymorphisms]], and [[Single-nucleotide polymorphism|single nucleotide polymorphisms]] (SNPs), among others.<ref>{{Cite journal|lastlast1=Bamshad|firstfirst1=Michael|last2=Wooding|first2=Stephen|last3=Salisbury|first3=Benjamin A.|last4=Stephens|first4=J. Claiborne|date=August 2004-08|title=Deconstructing the relationship between genetics and race|url=http://dx.doi.org/10.1038/nrg1401|journal=Nature Reviews Genetics|volume=5|issue=8|pages=598–609|doi=10.1038/nrg1401|pmid=15266342|s2cid=12378279|issn=1471-0056|url-access=subscription}}</ref> Models for genetic clustering also vary by algorithms and programs used to process the data. Most sophisticated methods for determining clusters can be categorized as '''model-based clustering methods''' or(such '''multidimensionalas summaries'''.the algorithm STRUCTURE<ref name=":132">{{Cite journal|lastlast1=NovembrePritchard|firstfirst1=JohnJonathan K|last2=RamachandranStephens|first2=SohiniMatthew|last3=Donnelly|first3=Peter|date=20112000-0906-2201|title=PerspectivesInference on Humanof Population Structure atUsing theMultilocus CuspGenotype of the Sequencing Era|url=http://dx.doi.org/10.1146/annurev-genom-090810-183123Data|journal=Annual Review of Genomics and Human Genetics|volume=12155|issue=12|pages=245–274945–959|doi=10.11461093/annurev-genom-090810-183123genetics/155.2.945|pmid=10835412|pmc=1461096|issn=15271943-82042631|doi-access=free}}</ref>) or '''multidimensional summaries''' (typically through principal component analysis).<ref name=":102" /><ref name=":14">{{Cite journal|lastlast1=Lawson|firstfirst1=Daniel John|last2=Falush|first2=Daniel|date=2012-09-22|title=Population Identification Using Genetic Data|url=http://dx.doi.org/10.1146/annurev-genom-082410-101510|journal=Annual Review of Genomics and Human Genetics|volume=13|issue=1|pages=337–361|doi=10.1146/annurev-genom-082410-101510|pmid=22703172|issn=1527-8204|doi-access=free}}</ref> By processing a large number of SNPs (or other genetic marker data) in different ways, both approaches to genetic clustering tend to converge on similar patterns by identifying similarities among SNPs and/or [[haplotype]] tracts to reveal ancestral genetic similarities.<ref name=":114" />
 
=== [[Model-based clustering]] ===
Common [[model-based clustering]] algorithms include STRUCTURE, ADMIXTURE, and HAPMIX. These algorithms operate by finding the best fit for genetic data among an arbitrary or mathematically derived number of clusters, such that differences within clusters are minimized and differences between clusters are maximized. This clustering method is also referred to as "[[Genetic admixture|admixture]] inference," as individual genomes (or individuals within populations) can be characterized by the proportions of [[Allele|allelesallele]]s linked to each cluster.<ref name=":002" /> OfIn noteother words, algorithms like STRUCTURE havegenerate requiredresults that assume the existence of discrete ancestral populations, areoperationalized chosenthrough forunique samplesgenetic beforemarkers, runningwhich have combined over time to form the clusteradmixed analysis.populations of the modern day.
 
=== Multidimensional summary statistics ===
[[File:Whole-genome based PCA and clustering of worlds ethnic groups.png|thumb|upright=1.4|A 2023 [[Whole genome sequencing|whole-genome study]] of modern-day ethnic groups in the world identified 14 genomic clusters, which do not exactly align with current categorizations of race or ethnicity. The study also found that "99.8% of whole genome is identical between two individuals". Left image shows [[Cluster analysis|clustering]] following [[principal component analysis]] with [[Three-dimensional space|three dimensions]]. Top right image shows geographical locations where samples were collected.<ref name=Kim_Choi_Kim_2023>{{cite journal |doi=10.1038/s41598-023-32325-w |title=On whole-genome demography of world's ethnic groups and individual genomic identity |date=2023 |last1=Kim |first1=Byung-Ju |last2=Choi |first2=Jaejin |last3=Kim |first3=Sung-Hou |journal=Scientific Reports |volume=13 |issue=1 |page=6316 |pmid=37072456 |pmc=10113208 |bibcode=2023NatSR..13.6316K }}</ref>]]
Where model-based clustering characterizes populations using proportions of discrete clusters, multidimensional summary statistics characterize populations on a continuous spectrum. The most common multidimensional statistical method used for genetic clustering is [[principal component analysis]] (PCA), which plots individuals by two or more axes (their "principal components") that represent aggregations of genetic markers that account for the highest variance. Clusters can then be identified by assessing the distribution of data; with larger samples of human genotypes, data tends to cluster in discrete groups as well as admixed position between groups.<ref name=":0" /><ref name=":1" />
 
Where model-based clustering characterizes populations using proportions of discretepresupposed ancestral clusters, multidimensional summary statistics characterize populations on a continuous spectrum. The most common multidimensional statistical method used for genetic clustering is [[principal component analysis]] (PCA), which plots individuals by two or more axes (their "principal components") that represent aggregations of genetic markers that account for the highest variance. Clusters can then be identified by visually assessing the distribution of data; with larger samples of human genotypes, data tends to cluster in discretedistinct groups as well as admixed positionpositions between groups.<ref name=":002" /><ref name=":114" />
 
=== Caveats and limitations ===
There are caveats and limitations to genetic clustering methods of any type, given the degree of admixture and relative similarity within the human population. All genetic cluster findings are [[Sampling bias|biased]] by the sampling process used to gather data, and by the quality and quantity of that data. For example, many clustering studies use data derived from populations that are geographically distinct and far apart from one another, which may present an illusion of discrete clusters where, in reality, populations are much more blended with one another when intermediary groups are included.<ref name=":002" /> STRUCTURESample insize particularalso plays an important moderating role on cluster findings, as different sample size inputs can influence cluster assignment, and more subtle relationships between genotypes may beonly misleadingemerge bywith requiringlarger sample sizes.<ref name=":02" /><ref name=":22" /> In particular, the use of STRUCTURE has been widely criticized as being potentially misleading through requiring data to be sorted into a predetermined number of clusters which may or may not reflect the actual population's distribution.<ref name=":222" /><ref>{{Cite journal|lastlast1=KalinowskiLawson|firstfirst1=SDaniel TJ.|last2=van Dorp|first2=Lucy|last3=Falush|first3=Daniel|date=20102018-08-0414|title=TheA computertutorial program STRUCTUREon doeshow not reliablyto identifyover-interpret the main genetic clusters within species: simulationsSTRUCTURE and implicationsADMIXTURE forbar human population structure|url=http://dx.doi.org/10.1038/hdy.2010.95plots|journal=HeredityNature Communications|volume=1069|issue=41|pagespage=625–6323258|doi=10.1038/hdy.2010.95s41467-018-05257-7|issn=00182041-067X1723|pmc=6092366|pmid=30108219|bibcode=2018NatCo...9.3258L}}</ref> SampleThe sizecreators alsoof playsSTRUCTURE anoriginally importantdescribed moderatingthe rolealgorithm onas clusteran findings,"[[Exploratory asdata differentanalysis|exploratory]]" samplemethod sizeto inputsbe caninterpreted influencewith cluster assignment,caution and morenot subtleas relationshipsa between genotypes may only emergetest with largerstatistically samplesignificant sizespower.<ref name=":0132" /><ref>{{Cite namejournal|last=":2"Novembre|first=John|date=2016-10-01|title=Pritchard, Stephens, and Donnelly on Population Structure|url= |journal=Genetics|volume=204|issue=2|pages=391–393|doi=10.1534/genetics.116.195164|issn=1943-2631|pmc=5068833|pmid=27729489}}</ref>
 
== ApplicationsNotable applications to human genetic data ==
'''###Modern change this --> Applicationapplications of genetic clustering methods to aglobal-scale largegenetic human datasetdata waswere first marked by studies associated with the [[Human Genome Diversity Project]] (HGDP) data.<ref name=":002" /> These early HGDP studies, such as those by Rosenberg andet colleaguesal. (2002),<ref>{{Cite journal|lastname=Rosenberg|first=N.":102" A.|date=2002-12-20|title=Genetic Structure of Human Populations|url=http://dx.doi.org/10.1126/science.1078311|journal=Science|volume=298|issue=5602|pages=2381–2385|doi=10.1126/science.1078311|issn=0036-8075}}</ref><ref>{{Cite journal|lastlast1=Rosenberg|firstfirst1=Noah A|last2=Mahajan|first2=Saurabh|last3=Ramachandran|first3=Sohini|author-link3=Sohini Ramachandran|last4=Zhao|first4=Chengfeng|last5=Pritchard|first5=Jonathan K|last6=Feldman|first6=Marcus W|date=2005-12-09|title=Clines, Clusters, and the Effect of Study Design on the Inference of Human Population Structure|url=http://dx.doi.org/10.1371/journal.pgen.0010070|journal=PLoSPLOS Genetics|volume=1|issue=6|pages=e70|doi=10.1371/journal.pgen.0010070|pmid=16355252|pmc=1310579|issn=1553-7404|doi-access=free}}</ref> contributed to theories of the serial founder effect and early human migration out of Africa, and clustering methods have been notably applied to describe admixed continental populations.'''<ref name=":112" /><ref name=":122" /><ref>{{Cite journal|last1=Leslie|first1=Stephen|last2=Winney|first2=Bruce|last3=Hellenthal|first3=Garrett|last4=Davison|first4=Dan|last5=Boumertit|first5=Abdelhamid|last6=Day|first6=Tammy|last7=Hutnik|first7=Katarzyna|last8=Royrvik|first8=Ellen C.|last9=Cunliffe|first9=Barry|last10=Lawson|first10=Daniel J.|last11=Falush|first11=Daniel|date=March 2015|title=The fine-scale genetic structure of the British population|url= |journal=Nature|language=en|volume=519|issue=7543|pages=309–314|doi=10.1038/nature14230|pmid=25788095|issn=1476-4687|pmc=4632200|bibcode=2015Natur.519..309.}}</ref> Genetic clustering and HGDP studies have also contributed to methods for, and criticisms of, the [[Genealogical DNA test|genetic ancestry consumer testing]] industry.<ref>{{Cite journal|last1=Royal|first1=Charmaine D.|last2=Novembre|first2=John|last3=Fullerton|first3=Stephanie M.|last4=Goldstein|first4=David B.|last5=Long|first5=Jeffrey C.|last6=Bamshad|first6=Michael J.|last7=Clark|first7=Andrew G.|date=2010-05-14|title=Inferring Genetic Ancestry: Opportunities, Challenges, and Implications|journal=American Journal of Human Genetics|volume=86|issue=5|pages=661–673|doi=10.1016/j.ajhg.2010.03.011|issn=0002-9297|pmc=2869013|pmid=20466090}}</ref>
 
A number of landmark genetic cluster studies have been conducted on global human populations since 2002, including the following:
'''###Talk about ROsenberg et al here --> briefly, can also borrow from original article and genomic age book'''
{| class="wikitable"
!Authors
!Year
!Title
!Sample size / number of populations sampled
!Sample
!Markers
|-
|Rosenberg et al.
|2002
|Genetic Structure of Human Populations<ref name=":82">{{Cite journal|last1=Rosenberg|first1=Noah A.|last2=Pritchard|first2=Jonathan K.|last3=Weber|first3=James L.|last4=Cann|first4=Howard M.|last5=Kidd|first5=Kenneth K.|last6=Zhivotovsky|first6=Lev A.|last7=Feldman|first7=Marcus W.|date=2002-12-20|title=Genetic Structure of Human Populations|journal=Science|volume=298|issue=5602|pages=2381–2385|bibcode=2002Sci...298.2381R|doi=10.1126/science.1078311|issn=0036-8075|pmid=12493913|s2cid=8127224}}</ref>
|1056 / 52
|[[Human Genome Diversity Project]] (HGDP-CEPH)
|377 STRs
|-
| rowspan="2" |Serre & Pääbo
| rowspan="2" |2004
| rowspan="2" |Worldwide Human Relationships Inferred from Genome-Wide Patterns of Variation<ref>{{Cite journal|last1=Serre|first1=David|last2=Pääbo|first2=Svante|date=September 2004|title=Evidence for gradients of human genetic diversity within and among continents|journal=Genome Research|volume=14|issue=9|pages=1679–1685|doi=10.1101/gr.2529604|issn=1088-9051|pmc=515312|pmid=15342553}}</ref>
|89 / 15
|a: HGDP
| rowspan="2" |20 STRs
|-
|90 / geographically distributed individuals
|b: Jorde 1997&nbsp;
|-
|Rosenberg et al.
|2005
|Clines, Clusters, and the Effect of Study Design on the Inference of Human Population Structure<ref name="rosenberg20052">{{cite journal|last1=Rosenberg|first1=NA|last2=Mahajan|first2=S|last3=Ramachandran|first3=S|last4=Zhao|first4=C|last5=Pritchard|first5=JK|display-authors=etal|year=2005|title=Clines, Clusters, and the Effect of Study Design on the Inference of Human Population Structure|url=|journal=PLOS Genet|volume=1|issue=6|page=e70|doi=10.1371/journal.pgen.0010070|pmc=1310579|pmid=16355252|authorlink5=Jonathan K. Pritchard |doi-access=free }}</ref>
|1056 / 52
|[[Human Genome Diversity Project]] (HGDP-CEPH)
|783 STRs + 210 indels
|-
|Li et&nbsp; al.
|2008
|Worldwide Human Relationships Inferred from Genome-Wide Patterns of Variation<ref>{{Cite journal|last1=Li|first1=Jun Z.|last2=Absher|first2=Devin M.|last3=Tang|first3=Hua|last4=Southwick|first4=Audrey M.|last5=Casto|first5=Amanda M.|last6=Ramachandran|first6=Sohini|last7=Cann|first7=Howard M.|last8=Barsh|first8=Gregory S.|last9=Feldman|first9=Marcus|last10=Cavalli-Sforza|first10=Luigi L.|last11=Myers|first11=Richard M.|date=2008-02-22|title=Worldwide Human Relationships Inferred from Genome-Wide Patterns of Variation|journal=Science|volume=319|issue=5866|pages=1100–1104|bibcode=2008Sci...319.1100L|doi=10.1126/science.1153717|issn=0036-8075|pmid=18292342|s2cid=53541133}}</ref>
|938 / 51
|[[Human Genome Diversity Project]] (HGDP-CEPH)
|650,000 SNPs
|-
|Tishkoff et al.
|2009
|The Genetic Structure and History of Africans and African Americans<ref name=":622">{{Cite journal|last1=Tishkoff|first1=Sarah A|last2=Reed|first2=Floyd A|last3=Friedlaender|first3=Françoise R|last4=Ehret|first4=Christopher|last5=Ranciaro|first5=Alessia|last6=Froment|first6=Alain|last7=Hirbo|first7=Jibril B|last8=Awomoyi|first8=Agnes A|last9=Bodo|first9=Jean-Marie|last10=Doumbo|first10=Ogobara|last11=Ibrahim|first11=Muntaser|date=2009-05-22|title=The Genetic Structure and History of Africans and African Americans|journal=Science|volume=324|issue=5930|pages=1035–1044|bibcode=2009Sci...324.1035T|doi=10.1126/science.1172257|issn=0036-8075|pmc=2947357|pmid=19407144|first12=Abdalla T|last13=Kotze|first13=Maritha J|last14=Lema|first14=Godfrey|last15=Moore|first15=Jason H|last16=Mortensen|first16=Holly|first17=Thomas B|last18=Omar|first18=Sabah A|last12=Juma|first19=Kweli|last19=Powell|first20=Gideon S|last21=Smith|first21=Michael W|last22=Thera|first22=Mahamadou A|last23=Wambebe|first23=Charles|last24=Weber|first24=James L|last25=Williams|first25=Scott M|last20=Pretorius|last17=Nyambo}}</ref>
|~3400 / 185
|HGDP-CEPH ''plus'' 133 additional African populations and Indian individuals
|1327 STRs + indels
|-
|Xing et al.
|2010
|Toward a more uniform sampling of human genetic diversity: A survey of worldwide populations by high-density genotyping<ref name=":72">{{Cite journal|last1=Xing|first1=Jinchuan|last2=Watkins|first2=W. Scott|last3=Shlien|first3=Adam|last4=Walker|first4=Erin|last5=Huff|first5=Chad D.|last6=Witherspoon|first6=David J.|last7=Zhang|first7=Yuhua|last8=Simonson|first8=Tatum S.|last9=Weiss|first9=Robert B.|last10=Schiffman|first10=Joshua D.|last11=Malkin|first11=David|date=October 2010|title=Toward a more uniform sampling of human genetic diversity: A survey of worldwide populations by high-density genotyping|journal=Genomics|volume=96|issue=4|pages=199–210|doi=10.1016/j.ygeno.2010.07.004|issn=0888-7543|pmc=2945611|pmid=20643205|last12=Woodward|first12=Scott R.|last13=Jorde|first13=Lynn B.}}</ref>
|850 / 40
|HapMap ''plus'' 296 individuals
|250,000 SNPs
|}
 
== Genetic clustering and race ==
'''###Could include table from original article (under genetic cluster studies) here'''
Clusters of individuals are often [[population structure (genetics)|geographically structured]]. For example, when clustering a population of East Asians and Europeans, each group will likely form its own respective cluster based on similar [[allele frequency|allele frequencies]].<ref>{{cite journal |last1=Spencer |first1=Quayshawn |title=A Radical Solution to the Race Problem |journal=Philosophy of Science |date=2014 |volume=81 |issue=5 |page=1029-30 |doi=10.1086/677694 |doi-access=free }}</ref> In this way, clusters can have a correlation with traditional concepts of race and self-identified ancestry; in some cases, such as medical questionnaires, the latter variables can be used as a proxy for genetic ancestry where genetic data is unavailable.<ref name=":42" /><ref name=":102" /> However, genetic variation is distributed in a complex, continuous, and overlapping manner, so this correlation is imperfect and the use of [[Race and health|racial categories in medicine]] can introduce additional hazards.<ref name=":42" />
 
Many otherSome scholars{{who|date=August 2021}} have challenged the idea that race can be inferred by genetic clusters, drawing distinctions between arbitrarily assigned genetic clusters, ancestry, and race. One recurring caution against thinking of human populations in terms of clusters is the notion that genotypic variation and traits are distributed evenly between populations, along gradual [[Cline (biology)|clines]] rather than along discrete population boundaries; so although genetic similarities are usually organized geographically, their underlying populations have never been completely separated from one another. And dueDue to migration, gene flow, and baseline homogeneity, features between groups are extensively overlapping and intermixed.<ref name=":332" /><ref name=":442" /> Moreover, genetic clusters do not typically match socially defined racial groups; many commonly understood races may not be sorted into the same genetic cluster, and many genetic clusters are made up of individuals who would have distinct racial identities.<ref name=":552" /> In general, clusters may most simply be understood as products of the methods used to sample and analyze genetic data; not without meaning for understanding ancestry and genetic characteristics, but inadequate to fully explaining the concept of race, which is more often described in terms of social and cultural forces.
'''###skim other articles to add other lil' details, then call it a paper!'''
 
In the related context of [[personalized medicine]], race is currently listed as a [[risk factor]] for a wide range of medical conditions with genetic and non-genetic causes. Questions have emerged regarding whether or not genetic clusters support the idea of race as a valid construct to apply to medical research and treatment of disease, because there are many diseases that correspond with specific genetic markers and/or with specific populations, as seen with [[Tay–Sachs disease|Tay-Sachs disease]] or [[sickle cell disease]].<ref name=":92" /><ref name=":63">{{Cite book|last1=Koenig|first1=Barbara A. Lee|first2=Sandra|last2=Soo-Jin|last3=Richardson|first3=Sarah S.|author-link3=Sarah S. Richardson|url=https://worldcat.org/oclc/468194495|title=Revisiting race in a genomic age|date=2008|publisher=Rutgers University Press|isbn=978-0-8135-4323-9|oclc=468194495}}</ref> Researchers are careful to emphasize that ancestry—revealed in part through cluster analyses—plays an important role in understanding risk of disease. But racial or ethnic identity does not perfectly align with genetic ancestry, and so race and ethnicity do not reveal enough information to make a medical diagnosis.<ref name=":63" /> Race as a variable in medicine is more likely to reflect social factors, where ancestry information is more likely to be meaningful when considering genetic ancestry.<ref name=":32" /><ref name=":63" />
== Genetic clustering and race ==
A plurality of human genetic clustering studies have produced clusters of individuals with similar geographic origins or ancestry, and these findings have been interpreted by some to suggest biological support for the concept of race. Clustering results often, for example, have shown a clear cluster distinction between individuals with African and non-African ancestry, and other levels of clustering have come close to placing individuals all within their corresponding continental populations (i.e., Europeans clustered together, East Asians clustered together, etc.).<ref>{{Cite journal|last=Jorde|first=Lynn B|last2=Wooding|first2=Stephen P|date=2004-10-26|title=Genetic variation, classification and 'race'|url=http://dx.doi.org/10.1038/ng1435|journal=Nature Genetics|volume=36|issue=S11|pages=S28–S33|doi=10.1038/ng1435|issn=1061-4036}}</ref> Rosenberg et al. (2002) suggested divisions of human populations into five clusters that can be seen to resemble major geographic divisions, and concluded that self-identified ancestry (taken by many to mean race) may be an adequate proxy for ancestry. And the association between genetic clusters and race may be further confounded by false assumptions about racialized traits, such as skin color or temperament, having clear genetic roots.<ref>{{Cite book|last=1980-|first=Koenig, Barbara A. Lee, Sandra Soo-Jin, 1966- Richardson, Sarah S.,|url=http://worldcat.org/oclc/468194495|title=Revisiting race in a genomic age|date=2008|publisher=Rutgers University Press|isbn=978-0-8135-4323-9|oclc=468194495}}</ref> In these ways, aspects of genetic clusters may be seen to resemble the traditional notion of race, at least as understood in the United States.
 
== References ==
Many other scholars have challenged the idea that race can be inferred by genetic clusters, drawing distinctions between arbitrarily assigned genetic clusters, ancestry, and race. One recurring caution against thinking of human populations in terms of clusters is the notion that genotypic variation and traits are distributed evenly between populations, along gradual [[Cline (biology)|clines]] rather than along discrete population boundaries; so although genetic similarities are usually organized geographically, their underlying populations have never been completely separated from one another. And due to migration, gene flow, and baseline homogeneity, features between groups are extensively overlapping and intermixed.<ref name=":3" /><ref name=":4" /> Moreover, genetic clusters do not typically match socially defined racial groups; many commonly understood races may not be sorted into the same genetic cluster, and many genetic clusters are made up of individuals who would have distinct racial identities.<ref name=":5" /> In general, clusters may most simply be understood as products of the methods used to sample and analyze genetic data; not without meaning for understanding ancestry and genetic characteristics, but inadequate to fully explaining the concept of race, which is more often described in terms of social and cultural forces.
<references />
 
{{Human genetics}}
A related issue is that human genetic cluster research has highlighted many questions about the validity of racial identity as carrying genetic or biological meaning in medicine and health care.
{{Population genetics}}
 
= {{DEFAULTSORT:Human genetic clustering =}}
'''###there is more about this in Maglo p. 7 (bottom of page) and Jorde p. 4.'''
[[Category:Human population genetics]]
[[Category:Biological anthropology]]