Revision as of 15:30, 23 May 2025 edit OAbot (talk \| contribs) Bots 643,717 edits m Open access bot: url-access updated in citation with #oabot. ← Previous edit		Revision as of 16:32, 30 May 2025 edit undo Bogazicili (talk \| contribs) Extended confirmed users 7,921 edits →Genetic clustering algorithms and methods: Removing image from a dated study (2005). Adding results of a 2023 study instead Next edit →
Line 10: === [[Model-based clustering]] === [[File:Rosenberg_1048people_993markers.jpg\|thumb\|Human population structure has been inferred from multilocus DNA sequence data (Rosenberg et al. 2002, 2005). Individuals from 52 populations were examined at 993 DNA markers. This data was used to partition individuals into K = 2, 3, 4, 5, or 6 gene clusters. In this figure, the average fractional membership of individuals from each population is represented by horizontal bars partitioned into K colored segments.]] Common [[model-based clustering]] algorithms include STRUCTURE, ADMIXTURE, and HAPMIX. These algorithms operate by finding the best fit for genetic data among an arbitrary or mathematically derived number of clusters, such that differences within clusters are minimized and differences between clusters are maximized. This clustering method is also referred to as "[[Genetic admixture\|admixture]] inference," as individual genomes (or individuals within populations) can be characterized by the proportions of [[allele]]s linked to each cluster.<ref name=":02" /> In other words, algorithms like STRUCTURE generate results that assume the existence of discrete ancestral populations, operationalized through unique genetic markers, which have combined over time to form the admixed populations of the modern day. === Multidimensional summary statistics === [[File:Whole-genome based PCA and clustering of worlds ethnic groups.png\|thumb\|upright=1.4\|A 2023 [[Whole genome sequencing\|whole-genome study]] of modern-day ethnic groups in the world identified 14 genomic clusters, which do not exactly align with current categorizations of race or ethnicity. The study also found that "99.8% of whole genome is identical between two individuals". Left image shows [[Cluster analysis\|clustering]] following [[principal component analysis]] with [[Three-dimensional space\|three dimensions]]. Top right image shows geographical locations where samples were collected.<ref name=Kim_Choi_Kim_2023>{{cite journal \|doi=10.1038/s41598-023-32325-w \|title=On whole-genome demography of world's ethnic groups and individual genomic identity \|date=2023 \|last1=Kim \|first1=Byung-Ju \|last2=Choi \|first2=Jaejin \|last3=Kim \|first3=Sung-Hou \|journal=Scientific Reports \|volume=13 \|issue=1 \|page=6316 \|pmid=37072456 \|pmc=10113208 \|bibcode=2023NatSR..13.6316K }}</ref>]] Where model-based clustering characterizes populations using proportions of presupposed ancestral clusters, multidimensional summary statistics characterize populations on a continuous spectrum. The most common multidimensional statistical method used for genetic clustering is [[principal component analysis]] (PCA), which plots individuals by two or more axes (their "principal components") that represent aggregations of genetic markers that account for the highest variance. Clusters can then be identified by visually assessing the distribution of data; with larger samples of human genotypes, data tends to cluster in distinct groups as well as admixed positions between groups.<ref name=":02" /><ref name=":14" />

Human genetic clustering: Difference between revisions