Human genetic clustering: Difference between revisions

Content deleted Content added
Citation bot (talk | contribs)
Alter: template type, journal, first. Add: issue, date, s2cid, bibcode, isbn, pmc, page, pmid, authors 1-1. Removed proxy/dead URL that duplicated identifier. Removed parameters. Some additions/deletions were parameter name changes. | Use this bot. Report bugs. | Suggested by Grimes2 | Category:CS1 maint: extra punctuation | #UCB_Category 251/264
m remove URL redundant with identifier in autolinked citation
Line 16:
 
=== Caveats and limitations ===
There are caveats and limitations to genetic clustering methods of any type, given the degree of admixture and relative similarity within the human population. All genetic cluster findings are [[Sampling bias|biased]] by the sampling process used to gather data, and by the quality and quantity of that data. For example, many clustering studies use data derived from populations that are geographically distinct and far apart from one another, which may present an illusion of discrete clusters where, in reality, populations are much more blended with one another when intermediary groups are included.<ref name=":02" /> Sample size also plays an important moderating role on cluster findings, as different sample size inputs can influence cluster assignment, and more subtle relationships between genotypes may only emerge with larger sample sizes.<ref name=":02" /><ref name=":22" /> In particular, the use of STRUCTURE has been widely criticized as being potentially misleading through requiring data to be sorted into a predetermined number of clusters which may or may not reflect the actual population's distribution.<ref name=":22" /><ref>{{Cite journal|last1=Lawson|first1=Daniel J.|last2=van Dorp|first2=Lucy|last3=Falush|first3=Daniel|date=2018-08-14|title=A tutorial on how not to over-interpret STRUCTURE and ADMIXTURE bar plots|journal=Nature Communications|volume=9|issue=1|page=3258|doi=10.1038/s41467-018-05257-7|issn=2041-1723|pmc=6092366|pmid=30108219|bibcode=2018NatCo...9.3258L}}</ref> The creators of STRUCTURE originally described the algorithm as an "[[Exploratory data analysis|exploratory]]" method to be interpreted with caution and not as a test with statistically significant power.<ref name=":132" /><ref>{{Cite journal|last=Novembre|first=John|date=2016-10-01|title=Pritchard, Stephens, and Donnelly on Population Structure|url=https://doi.org/10.1534/genetics.116.195164 |journal=Genetics|volume=204|issue=2|pages=391–393|doi=10.1534/genetics.116.195164|issn=1943-2631|pmc=5068833|pmid=27729489}}</ref>
 
== Notable applications to human genetic data ==
Modern applications of genetic clustering methods to global-scale genetic data were first marked by studies associated with the [[Human Genome Diversity Project]] (HGDP) data.<ref name=":02" /> These early HGDP studies, such as those by Rosenberg et al. (2002),<ref name=":102" /><ref>{{Cite journal|last1=Rosenberg|first1=Noah A|last2=Mahajan|first2=Saurabh|last3=Ramachandran|first3=Sohini|last4=Zhao|first4=Chengfeng|last5=Pritchard|first5=Jonathan K|last6=Feldman|first6=Marcus W|date=2005-12-09|title=Clines, Clusters, and the Effect of Study Design on the Inference of Human Population Structure|journal=PLOS Genetics|volume=1|issue=6|pages=e70|doi=10.1371/journal.pgen.0010070|pmid=16355252|pmc=1310579|issn=1553-7404|doi-access=free}}</ref> contributed to theories of the serial founder effect and early human migration out of Africa, and clustering methods have been notably applied to describe admixed continental populations.<ref name=":112" /><ref name=":122" /><ref>{{Cite journal|last1=Leslie|first1=Stephen|last2=Winney|first2=Bruce|last3=Hellenthal|first3=Garrett|last4=Davison|first4=Dan|last5=Boumertit|first5=Abdelhamid|last6=Day|first6=Tammy|last7=Hutnik|first7=Katarzyna|last8=Royrvik|first8=Ellen C.|last9=Cunliffe|first9=Barry|last10=Lawson|first10=Daniel J.|last11=Falush|first11=Daniel|date=March 2015|title=The fine-scale genetic structure of the British population|url=https://www.nature.com/articles/nature14230 |journal=Nature|language=en|volume=519|issue=7543|pages=309–314|doi=10.1038/nature14230|pmid=25788095|issn=1476-4687|pmc=4632200|bibcode=2015Natur.519..309.}}</ref> Genetic clustering and HGDP studies have also contributed to methods for, and criticisms of, the [[Genealogical DNA test|genetic ancestry consumer testing]] industry.<ref>{{Cite journal|last1=Royal|first1=Charmaine D.|last2=Novembre|first2=John|last3=Fullerton|first3=Stephanie M.|last4=Goldstein|first4=David B.|last5=Long|first5=Jeffrey C.|last6=Bamshad|first6=Michael J.|last7=Clark|first7=Andrew G.|date=2010-05-14|title=Inferring Genetic Ancestry: Opportunities, Challenges, and Implications|journal=American Journal of Human Genetics|volume=86|issue=5|pages=661–673|doi=10.1016/j.ajhg.2010.03.011|issn=0002-9297|pmc=2869013|pmid=20466090}}</ref>
 
A number of landmark genetic cluster studies have been conducted on global human populations since 2002, including the following: