Content deleted Content added
added Category:Human genetics; removed {{uncategorized}} using HotCat |
m Open access bot: pmc updated in citation with #oabot. |
||
(34 intermediate revisions by 22 users not shown) | |||
Line 1:
{{Distinguish|Gene cluster|Metabolic gene cluster|Cluster genealogy}}
{{Histmerge|User:Millager/Human genetic clustering|Copy-paste move}}▼
'''Human genetic clustering''' refers to patterns of relative genetic similarity among human individuals and populations, as well as the wide range of scientific and statistical methods used to study this aspect of [[human genetic variation]].
Clustering studies are thought to be valuable for characterizing the general structure of genetic variation among human populations, to contribute to the study of ancestral origins, evolutionary history, and precision medicine. Since the mapping of the human genome, and with the availability of increasingly powerful analytic tools, [[Cluster analysis|cluster analyses]] have revealed a range of ancestral and migratory trends among human populations and individuals.<ref name=":02">{{Cite journal|
Many studies of human genetic clustering have been implicated in discussions of [[Race (human categorization)|race]], [[Ethnic group|ethnicity]], and [[scientific racism]], as some have controversially suggested that genetically derived clusters may be understood as proof of genetically determined races.<ref name=":42">{{Cite journal|
== Genetic clustering algorithms and methods ==
A wide range of methods have been developed to assess the structure of human populations with the use of genetic data. Early studies of within and between-group genetic variation used physical phenotypes and blood groups, with modern genetic studies using genetic markers such as [[Alu element|Alu sequences]], [[Microsatellite|short tandem repeat polymorphisms]], and [[Single-nucleotide polymorphism|single nucleotide polymorphisms]] (SNPs), among others.<ref>{{Cite journal|
=== [[Model-based clustering]] ===
Common [[model-based clustering]] algorithms include STRUCTURE, ADMIXTURE, and HAPMIX. These algorithms operate by finding the best fit for genetic data among an arbitrary or mathematically derived number of clusters, such that differences within clusters are minimized and differences between clusters are maximized. This clustering method is also referred to as "[[Genetic admixture|admixture]] inference," as individual genomes (or individuals within populations) can be characterized by the proportions of [[allele]]s linked to each cluster.<ref name=":02" /> In other words, algorithms like STRUCTURE generate results that assume the existence of discrete ancestral populations, operationalized through unique genetic markers, which have combined over time to form the admixed populations of the modern day.▼
▲Common model-based clustering algorithms include STRUCTURE, ADMIXTURE, and HAPMIX. These algorithms operate by finding the best fit for genetic data among an arbitrary or mathematically derived number of clusters, such that differences within clusters are minimized and differences between clusters are maximized. This clustering method is also referred to as "[[Genetic admixture|admixture]] inference," as individual genomes (or individuals within populations) can be characterized by the proportions of [[allele]]s linked to each cluster.<ref name=":02" /> In other words, algorithms like STRUCTURE generate results that assume the existence of discrete ancestral populations, operationalized through unique genetic markers, which have combined over time to form the admixed populations of the modern day.
=== Multidimensional summary statistics ===
[[File:Whole-genome based PCA and clustering of worlds ethnic groups.png|thumb|upright=1.4|A 2023 [[Whole genome sequencing|whole-genome study]] of modern-day ethnic groups in the world identified 14 genomic clusters, which do not exactly align with current categorizations of race or ethnicity. The study also found that "99.8% of whole genome is identical between two individuals". Left image shows [[Cluster analysis|clustering]] following [[principal component analysis]] with [[Three-dimensional space|three dimensions]]. Top right image shows geographical locations where samples were collected.<ref name=Kim_Choi_Kim_2023>{{cite journal |doi=10.1038/s41598-023-32325-w |title=On whole-genome demography of world's ethnic groups and individual genomic identity |date=2023 |last1=Kim |first1=Byung-Ju |last2=Choi |first2=Jaejin |last3=Kim |first3=Sung-Hou |journal=Scientific Reports |volume=13 |issue=1 |page=6316 |pmid=37072456 |pmc=10113208 |bibcode=2023NatSR..13.6316K }}</ref>]]
Where model-based clustering characterizes populations using proportions of presupposed ancestral clusters, multidimensional summary statistics characterize populations on a continuous spectrum. The most common multidimensional statistical method used for genetic clustering is [[principal component analysis]] (PCA), which plots individuals by two or more axes (their "principal components") that represent aggregations of genetic markers that account for the highest variance. Clusters can then be identified by visually assessing the distribution of data; with larger samples of human genotypes, data tends to cluster in distinct groups as well as admixed positions between groups.<ref name=":02" /><ref name=":14" />
=== Caveats and limitations ===
There are caveats and limitations to genetic clustering methods of any type, given the degree of admixture and relative similarity within the human population. All genetic cluster findings are [[Sampling bias|biased]] by the sampling process used to gather data, and by the quality and quantity of that data. For example, many clustering studies use data derived from populations that are geographically distinct and far apart from one another, which may present an illusion of discrete clusters where, in reality, populations are much more blended with one another when intermediary groups are included.<ref name=":02" /> Sample size also plays an important moderating role on cluster findings, as different sample size inputs can influence cluster assignment, and more subtle relationships between genotypes may only emerge with larger sample sizes.<ref name=":02" /><ref name=":22" /> In particular, the use of STRUCTURE has been widely criticized as being potentially misleading through requiring data to be sorted into a predetermined number of clusters which may or may not reflect the actual population's distribution.<ref name=":22" /><ref>{{Cite journal|
== Notable applications to human genetic data ==
Modern applications of genetic clustering methods to global-scale genetic data were first marked by studies associated with the [[Human Genome Diversity Project]] (HGDP) data.<ref name=":02" /> These early HGDP studies, such as those by Rosenberg et al. (2002),<ref name=":102" /><ref>{{Cite journal|
A number of landmark genetic cluster studies have been conducted on global human populations since 2002, including the following:
{| class="wikitable"
|-
|Rosenberg et al.
Line 52 ⟶ 51:
|Rosenberg et al.
|2005
|Clines, Clusters, and the Effect of Study Design on the Inference of Human Population Structure<ref name="rosenberg20052">{{cite journal|last1=Rosenberg|first1=NA|last2=Mahajan|first2=S|last3=Ramachandran|first3=S|last4=Zhao|first4=C|last5=Pritchard|first5=JK|display-authors=etal|year=2005|title=Clines, Clusters, and the Effect of Study Design on the Inference of Human Population Structure|url=|journal=PLOS Genet|volume=1|issue=6|page=e70|doi=10.1371/journal.pgen.0010070|pmc=1310579|pmid=16355252|authorlink5=Jonathan K. Pritchard |doi-access=free }}</ref>
|1056 / 52
|[[Human Genome Diversity Project]] (HGDP-CEPH)
Line 80 ⟶ 79:
== Genetic clustering and race ==
Clusters of individuals are often [[population structure (genetics)|geographically structured]]. For example, when clustering a population of East Asians and Europeans, each group will likely form its own respective cluster based on similar [[allele frequency|allele frequencies]].<ref>{{cite journal |last1=Spencer |first1=Quayshawn |title=A Radical Solution to the Race Problem |journal=Philosophy of Science |date=2014 |volume=81 |issue=5 |page=1029-30 |doi=10.1086/677694 |doi-access=free }}</ref> In this way, clusters can have a correlation with traditional concepts of race and self-identified ancestry; in some cases, such as medical questionnaires, the latter variables can be used as a proxy for genetic ancestry where genetic data is unavailable.<ref name=":42" /><ref name=":102" /> However, genetic variation is distributed in a complex, continuous, and overlapping manner, so this correlation is imperfect and the use of [[Race and health|racial categories in medicine]] can introduce additional hazards.<ref name=":42" />
In the related context of [[personalized medicine]], race is currently listed as a [[risk factor]] for a wide range of medical conditions with genetic and non-genetic causes. Questions have emerged regarding whether or not genetic clusters support the idea of race as a valid construct to apply to medical research and treatment of disease, because there are many diseases that correspond with specific genetic markers and/or with specific populations, as seen with [[Tay–Sachs disease|Tay-Sachs disease]] or [[sickle cell disease]].<ref name=":92" /><ref name=":63">{{Cite book|last1=Koenig|first1=Barbara A. Lee|first2=Sandra|last2=Soo-Jin|last3=Richardson|first3=Sarah S.|author-link3=Sarah S. Richardson|url=https://worldcat.org/oclc/468194495|title=Revisiting race in a genomic age|date=2008|publisher=Rutgers University Press|isbn=978-0-8135-4323-9|oclc=468194495}}</ref> Researchers are careful to emphasize that ancestry—revealed in part through cluster analyses—plays an important role in understanding risk of disease. But racial or ethnic identity does not perfectly align with genetic ancestry, and so race and ethnicity do not reveal enough information to make a medical diagnosis.<ref name=":63" /> Race as a variable in medicine is more likely to reflect social factors, where ancestry information is more likely to be meaningful when considering genetic ancestry.<ref name=":32" /><ref name=":63" />
== References ==
<references />
{{Human genetics}}
{{Population genetics}}
[[Category:Human population genetics]]
[[Category:Biological anthropology]]
|