Non-coding DNA: Difference between revisions

Content deleted Content added
Junk DNA: clarify at start of section the relation between "Junk DNA" and the article title (Non-coding DNA)
 
(48 intermediate revisions by 22 users not shown)
Line 1:
{{Short description|DNA that does not codingcode for proteinproteins}}
 
'''Non-coding DNA''' ('''ncDNA''') sequences are components of an organism's [[DNA]] that do not [[genetic code|encode]] [[protein]] sequences. Some non-coding DNA is [[Transcription (genetics)|transcribed]] into functional [[non-coding RNA]] molecules (e.g. [[transfer RNA]], [[microRNA]], [[Piwi-interacting RNA|piRNA]], [[ribosomal RNA]], and [[RNA interference|regulatory RNAs]]). Other functional regions of the non-coding DNA fraction include [[regulatory sequence]]s that control [[gene expression]]; [[scaffold attachment region]]s; [[origin of replication|origins of DNA replication]]; [[centromere]]s; and [[telomere]]s. Some non-coding regions appear to be mostly nonfunctional, such as [[introns]], [[pseudogenes]], [[intergenic DNA]], and fragments of [[transposons]] and [[viruses]]. Regions that are completely nonfunctional are called [[junk DNA]].
 
== Fraction of non-coding genomic DNA ==
In [[bacteria]], the [[Coding region|coding regions]] typically take up 88% of the genome.<ref name=":0" /> The remaining 12% consistsdoes largelynot encode proteins, but much of it still has biological function through [[Gene|genes]] where the RNA transcript is functional (non-coding genes) and regulatory sequences, which means that almost all of the bacterial genome has a function.<ref name=":0">{{ cite journal | vauthors = Kirchberger PC, Schmidt ML, and Ochman H | date = 2020 | title = The ingenuity of bacterial genomes | journal = Annual Review of Microbiology | volume = 74 | pages = 815–834 | doi = 10.1146/annurev-micro-020518-115822| pmid = 32692614 | s2cid = 220699395 }}</ref> The amount of coding DNA in eukaryrotes[[Eukaryote|eukaryotes]] is usually a much smaller fraction of the genome because eukaryotic genomes contain large amounts of repetitive DNA not found in prokaryotes. The [[human genome]] contains somewhere between 1–2% coding DNA.<ref name = Piovesan/><ref>{{ cite journal | vauthors = Omenn GS | date = 2021 | title = Reflections on the HUPO Human Proteome Project, the Flagship Project of the Human Proteome Organization, at 10 Years | journal = Molecular & Cellular Proteomics | volume = 20 | pages = 100062 | doi = 10.1016/j.mcpro.2021.100062| pmid = 33640492 | pmc = 8058560 }}</ref> The exact number is not known because there are disputes over the number of functional coding [[Exon|exons]] and over the total size of the human genome. This means that 98–99% of the human genome consists of non-coding DNA and this includes many functional elements such as non-coding genes and regulatory sequences.
 
[[Genome size]] in eukaryotes can vary over a wide range, even between closely related sequencesspecies. This puzzling observation was originally known as the [[C-value | C-value Paradoxparadox]] where "C" refers to the haploid genome size.<ref>{{cite journal | vauthors = Thomas CA | title = The genetic organization of chromosomes | journal = Annual Review of Genetics | volume = 5 | pages = 237–256 | date = 1971 | pmid = 16097657 | doi = 10.1146/annurev.ge.05.120171.001321 }}</ref> The paradox was resolved with the discovery that most of the differences were due to the expansion and contraction of repetitive DNA and not the number of genes. Some researchers speculated that this repetitive DNA was mostly [[junk DNA]]. The reasons for the changes in genome size are still being worked out and this problem is called the C-value Enigma.<ref>{{ cite journal | vauthors = Elliott TA, Gregory TR | date = 2015 | title = What's in a genome? The C-value enigma and the evolution of eukaryotic genome content | journal = Phil. Trans. R. Soc. B | volume = 370 | issue = 1678 | pages = 20140331 | doi = 10.1098/rstb.2014.0331| pmid = 26323762 | pmc = 4571570 | s2cid = 12095046 }}</ref>
 
This led to the observation that the number of genes does not seem to correlate with perceived notions of complexity because the number of genes seems to be relatively constant, an issue termed the [[G-value paradox|G-value Paradox]].<ref>{{ cite journal | vauthors = Hahn MW, Wray GA | date = 2002 | title = The g-value paradox | journal = Evolution and Development | volume = 4 | issue = 2 | pages = 73–75 | doi = 10.1046/j.1525-142X.2002.01069.x| pmid = 12004964 | s2cid = 2810069 }}</ref> For example, the genome of the unicellular ''[[Polychaos dubium]]'' (formerly known as ''Amoeba dubia'') has been reported to contain more than 200 times the amount of DNA in humans (i.e. more than 600 billion [[genome size|pairs of bases]] vs a bit more than 3 billion in humans).<ref name=Gregory>{{cite journal | vauthors = Gregory TR, Hebert PD | title = The modulation of DNA content: proximate causes and ultimate consequences | journal = Genome Research | volume = 9 | issue = 4 | pages = 317–324 | date = April 1999 | pmid = 10207154 | doi = 10.1101/gr.9.4.317 | s2cid = 16791399 | doi-access = free }}</ref> The [[pufferfish]] ''[[Takifugu]] rubripes'' genome is only about one eighth the size of the human genome, yet seems to have a comparable number of genes. Genes take up about 30% of the pufferfish genome and the coding DNA is about 10%. (Non-coding DNA = 90%.) The reduced size of the pufferfish genome is due to a reduction in the length of introns and less repetitive DNA.<ref>{{ cite journal | vauthors = Aparicio S, Chapman J, Stupka E, Putnam N, Chia JM, Dehal P, Christoffels A, Rash S, Hoon S, Smit A | date = 2002 | title = Whole-genome shotgun assembly and analysis of the genome of Fugu rubripes | journal = Science | volume = 297 | issue = 5585 | pages = 1301–1310 | doi = 10.1126/science.1072104| pmid = 12142439 | bibcode = 2002Sci...297.1301A | s2cid = 10310355 }}</ref><ref name="Ohno">{{cite journal |last1 vauthors = Ohno |first1=S | title = So much '"junk'" DNA in our genome | journal = Brookhaven Symposia in Biology |date=1972 |volume = 23 | pages =366–70 366–370 | date = 1972 | pmid = 5065367 | oclc = 101819442 }}</ref>
 
''[[Utricularia gibba]]'', a [[bladderwort]] plant, has a very small [[nuclear genome]] (100.7 Mb) compared to most plants.<ref name = Ibarra-Laclette>{{ cite journal | vauthors = Ibarra-Laclette E, Lyons E, Hernández-Guzmán G, Pérez-Torres CA, Carretero-Paulet L, Chang TH, Lan T, Welch AJ, Juárez MJ, Simpson J, etal | date = 2013 | title = Architecture and evolution of a minute plant genome | journal = Nature | volume = 498 | issue = 7452 | pages = 94–98 | doi = 10.1038/nature12132| pmid = 23665961 | pmc = 4972453 | bibcode = 2013Natur.498...94I | s2cid = 18219754 }}</ref><ref name = Lan>{{ cite journal | vauthors = Lan T, Renner T, Ibarra-Laclette E, Farr KM, Chang TH, Cervantes-Pérez SA, Zheng C, Sankoff D, Tang H, and Purbojati RW | date = 2017 | title = Long-read sequencing uncovers the adaptive topography of a carnivorous plant genome | journal = Proceedings of the National Academy of Sciences | volume = 114 | issue = 22 | pages = E4435–E4441 | doi = 10.1073/pnas.1702072114| pmid = 28507139 | pmc = 5465930 | bibcode = 2017PNAS..114E4435L | doi-access = free }}</ref> It likely evolved from an ancestral genome that was 1,500 Mb in size.<ref name = Lan/> The bladderwort genome has roughly the same number of genes as other plants but the total amount of coding DNA comes to about 30% of the genome.<ref name = Ibarra-Laclette/><ref name="Lan"/>
 
The remainder of the genome (70% non-coding DNA) consists of [[Promoter (genetics)|promoters]] and regulatory sequences that are shorter than those in other plant species.<ref name = Ibarra-Laclette/> The genes contain introns but there are fewer of them and they are smaller than the introns in other plant genomes.<ref name = Ibarra-Laclette/> There are noncoding genes, including many copies of ribosomal RNA genes.<ref name = Lan/> The genome also contains telomere sequences and centromeres as expected.<ref name = Lan/> Much of the repetitive DNA seen in other eukaryotes has been deleted from the bladderwort genome since that lineage split from those of other plants. About 59% of the bladderwort genome consists of transposon-related sequences but since the genome is so much smaller than other genomes, this represents a considerable reduction in the amount of this DNA.<ref name = Lan/> The authors of the original 2013 article note that claims of additional functional elements in the non-coding DNA of animals do not seem to apply to plant genomes.<ref name = Ibarra-Laclette/>
 
According to a New York Times piecearticle, during the evolution of this species, "... genetic junk that didn’tdidn't serve a purpose was expunged, and the necessary stuff was kept."<ref>{{cite news | lastvauthors = Klein | first = JoannaJ | title = Genetic Tidying Up Made Humped Bladderworts Into Carnivorous Plants | url = https://www.nytimes.com/2017/05/19/science/humped-bladderwort-carnivorous-plant-genome.html | work = New York Times | date = 19 May 2017 | access-date = May 30, 2022}}</ref> According to Victor Albert of the University of Buffalo, the plant is able to expunge its so-called junk DNA and "have a perfectly good multicellular plant with lots of different cells, organs, tissue types and flowers, and you can do it without the junk. Junk is not needed."<ref>{{ cite press release | vauthors = Hsu C, and Stolte D | date = May 13, 2013 | title = Carnivorous Plant Throws Out 'Junk' DNA | url = https://news.arizona.edu/story/carnivorous-plant-throws-out-junk-dna | ___location = Tucson, AZ, USA | publisher = University of Arizona | access-date = May 29, 2022}}</ref>
 
==Types of non-coding DNA sequences==
{{further|Conserved non-coding sequence}}
 
===Noncoding genes===
{{furtherSee also|Non-coding RNA}}
 
There are [[Gene|two types of genes]]: protein coding genes and [[Non-coding RNA|noncoding genes]].<ref>{{cite book | vauthors = Kampourakis K | date = 2017 | title = Making sense of genes | publisher = Cambridge University Press | place = Cambridge UK | isbn = 978-1-107-12813-2}}{{page needed|date=June 2022}}</ref> Noncoding genes are an important part of non-coding DNA and they include genes for [[transfer RNA]] and [[ribosomal RNA]]. These genes were discovered in the 1960s. [[Prokaryote|Prokaryotic]] genomes contain genes for a number of other noncoding RNAs but noncoding RNA genes are much more common in eukaryotes.
 
Typical classes of noncoding genes in eukaryotes include genes for [[small nuclear RNA]]s (snRNAs), [[small nucleolar RNA]]s (sno RNAs), [[microRNA]]s (miRNAs), [[Small interfering RNA|short interfering RNAs]] (siRNAs), [[Piwi-interacting RNA|PIWI-interacting RNAs]] (piRNAs), and [[Long non-coding RNA|long noncoding RNAs]] (lncRNAs). In addition, there are a number of unique RNA genes that produce [[Catalytic RNA|catalytic RNAs]].<ref>{{cite journal | vauthors=Cech TR, Steitz JA | title=The Noncoding RNA Revolution - Trashing Old Rules to Forge New Ones | journal=Cell|volume=157|pages=77–94|date=2014| issue=1 | doi=10.1016/j.cell.2014.03.008 | pmid=24679528 | s2cid=14852160 | doi-access=free }}</ref>
 
Noncoding genes account for only a few percent of prokaryotic genomes<ref>{{cite journal |last1 vauthors = Rogozin IB, Makarova KS, Natale DA, Spiridonov AN, Tatusov RL, Wolf YI, Yin J, Koonin EV |first1 display-authors =I. B.6 | title = Congruent evolution of different classes of non-coding DNA in prokaryotic genomes | journal = Nucleic Acids Research |date=1 October 2002 |volume = 30 | issue = 19 | pages = 4264–4271 |doi date =10.1093/nar/gkf549 October 2002 | pmid = 12364605 | pmc = 140549 | doi = 10.1093/nar/gkf549 }}</ref> but they can represent a vastly higher fraction in eukaryotic genomes.<ref>{{cite book |doi=10.1016/B978-0-12-800049-6.00171-2 |chapter=Adaptive Molecular Evolution: Detection Methods |title=Encyclopedia of Evolutionary Biology |year=2016 |last1 vauthors = Bielawski |first1=J.P.JP, |last2=Jones |first2=C. |pages=16–25 |isbn=978-0-12-800426-5 }}</ref> In humans, the noncoding genes take up at least 6% of the genome, largely because there are hundreds of copies of ribosomal RNA genes.{{citation needed|date=May 2022}} Protein-coding genes occupy about 38% of the genome; a fraction that is much higher than the coding region because genes contain large introns.{{citation needed|date=May 2022}}
 
The total number of noncoding genes in the human genome is controversial. Some scientists think that there are only about 5,000 noncoding genes while others believe that there may be more than 100,000 (see the article on [[Non-coding RNA]]). The difference is largely due to debate over the number of lncRNA genes.<ref>{{ cite journal | vauthors = Ponting CP, and Haerty W | date = 2022 | title = Genome-Wide Analysis of Human Long Noncoding RNAs: A Provocative Review | journal = Annual Review of Genomics and Human Genetics | volume = 23 | pages = 153–172 | doi = 10.1146/annurev-genom-112921-123710| pmid = 35395170 | s2cid = 248049706 | doi-access = free | hdl = 20.500.11820/ede40d70-b99c-42b0-a378-3b9b7b256a1b | hdl-access = free }}</ref>
 
===Promoters and regulatory elements===
{{furtherMain|Promoter (genetics)}}
 
[[promoter (biology)|Promoter]]sPromoters are DNA segments near the 5' end of the gene where transcription begins. They are the sites where [[RNA polymerase]] binds to initiate RNA synthesis. Every gene has a noncoding promoter.
 
[[Cis-regulatory element|Regulatory elements]] are sites that control the [[Transcription (genetics)|transcription]] of a nearby gene. They are almost always sequences where [[transcription factor]]s bind to DNA and these transcription factors can either activate transcription (activators) or repress transcription (repressors). Regulatory elements were discovered in the 1960s and their general characteristics were worked out in the 1970s by studying specific transcription factors in bacteria and [[bacteriophage]].{{citation needed|date=June 2022}}
 
Promoters and regulatory sequences represent an abundant class of noncoding DNA but they mostly consist of a collection of relatively short sequences so they don'tdo not take up a very large fraction of the genome. The exact amount of regulatory DNA in mammalian genome is unclear because it is difficult to distinguish between spurious transcription factor binding sites and those that are functional. The binding characteristics of typical [[DNA-binding protein]]s were characterized in the 1970s and the biochemical properties of transcription factors predict that in cells with large genomes, the majority of binding sites will not be fortuitous and not biologiacallybiologically functional.{{citation needed|date=June 2022}}
 
Many regulatory sequences occur near promoters, usually upstream of the transcription start site of the gene. Some occur within a gene and a few are located downstream of the transcription termination site. In eukaryotes, there are some regulatory sequences that are located at a considerable distance from the promoter region. These distant regulatory sequences are often called [[Enhancer (genetics)|enhancers]] but there is no rigorous definition of enhancer that distinguishes it from other transcription factor binding sites.<ref>{{cite journal | vauthors = Compe E, Egly JM | title = The Long Road to Understanding RNAPII Transcription Initiation and Related Syndromes | journal = Annual Review of Biochemistry | volume = 90 | pages = 193–219 | date = 2021 | doi = 10.1146/annurev-biochem-090220-112253| pmid = 34153211 | s2cid = 235595550 }}</ref><ref>{{cite journal | vauthors = Visel A, Rubin EM, Pennacchio LA | title = Genomic views of distant-acting enhancers | journal = Nature | volume = 461 | issue = 7261 | pages = 199–205 | date = September 2009 | pmid = 19741700 | pmc = 2923221 | doi = 10.1038/nature08451 | author-link3 = Len A. Pennacchio | bibcode = 2009Natur.461..199V }}</ref>
 
===Introns===
{{furtherMain|Intron}}
 
[[File:Pre-mRNA.svg|right|thumbnail|upright=1.35|Illustration of an unspliced pre-mRNA precursor, with five [[intron]]s and six [[exon]]s (top). After the introns have been removed via splicing, the mature mRNA sequence is ready for translation (bottom).]]
Line 47:
Introns are the parts of a gene that are transcribed into the [[precursor RNA]] sequence, but ultimately removed by [[RNA splicing]] during the processing to mature RNA. Introns are found in both types of genes: protein-coding genes and noncoding genes. They are present in prokaryotes but they are much more common in eukaryotic genomes.{{citation needed|date=June 2022}}
 
Group I and group II introns take up only a small percentage of the genome when they are present. Spliceosomal introns (see Figure) are only found in eukaryotes and they can represent a substantial proportion of the genome. In humans, for example, introns in protein-coding genes cover 37% of the genome. Combining that with about 1% coding sequences means that protein-coding genes occupy about 3938% of the human genome. The calculations for noncoding genes are more complicated because there's is considerable dispute over the total number of noncoding genes but taking only the well-defined examples means that noncoding genes occupy at least 6% of the genome.<ref>{{ cite journal | vauthors = Harrow J, Frankish A, Gonzalez JM, Tapanari E, Diekhans M, Kokocinski F, Aken BL, Barrell D, Zadissa A, Searle S | date = 2012 | title = GENCODE: the reference human genome annotation for The ENCODE Project | journal = Genome Research | volume = 22 | issue = 9 | pages = 1760–1774 | doi = 10.1101/gr.135350.111| pmid = 22955987 | pmc = 3431492 }}</ref><ref name = Piovesan>{{ cite journal | vauthors = Piovesan A, Antonaros F, Vitale L, Strippoli P, Pelleri MC, Caracausi M | date = 2019 | title = Human protein-coding genes and gene feature statistics in 2019 | journal = BMC Research Notes | volume = 12 | issue = 1 | pages = 315 | doi = 10.1186/s13104-019-4343-8| pmid = 31164174 | pmc = 6549324 | doi-access = free }}</ref>
 
===Untranslated regions===
{{furtherMain|Untranslated region}}
 
The standard biochemistry and molecular biology textbooks describe non-coding [[Nucleotide|nucleotides]] in mRNA located between the 5' end of the gene and the translation initiation codon. These regions are called 5'-untranslated regions or 5'-UTRs. Similar regions called 3'-untranslated regions (3'-UTRs) are found at the end of the gene. The 5'-UTRs and 3'UTRs are very short in bacteria but they can be several hundred nucleotides in length in eukaryotes. They contain short elements that control the initiation of translation (5'-UTRs) and transcription termination (3'-UTRs) as well as regulatory elements that may control mRNA stability, processing, and targeting to different regions of the cell.<ref>{{cite book | vauthors = Alberts B, Bray D, Lewis J, Raff M, Roberts K, Watson JD | date = 1994 | title = Molecular Biology of the Cell, 3rd edition | publisher = Garland Publishing Inc. | place = London, UK}}{{page needed|date=June 2022}}</ref><ref>{{ cite book | vauthors = Lewin B | date = 2004 | title = Genes VIII | publisher = Pearson/Prentice Hall | place = Upper Saddle River, NJ, USA}}{{page needed|date=June 2022}}</ref><ref>{{ cite book | vauthors = Moran L, Horton HR, Scrimgeour KG, Perry MD | date = 2012 | title = Principles of Biochemistry Fifth Edition | publisher = Pearson | place = Upper Saddle River, NJ, USA}}{{page needed|date=June 2022}}</ref>
 
===Origins of replication===
{{furtherMain|Origin of replication}}
 
DNA synthesis begins at specific sites called [[Origin of replication|origins of replication]]. These are regions of the genome where the DNA replication machinery is assembled and the DNA is unwound to begin DNA synthesis. In most cases, replication proceeds in both directions from the replication origin.
 
The main features of replication origins are sequences where specific initiation proteins are bound. A typical replication origin covers about 100-200 base pairs of DNA. Prokaryotes have one origin of replication per chromosome or plasmid but there are usually multiple origins in eukaryotic chromosomes. The human genome contains about 100,000 origins of replication representing about 0.3% of the genome.<ref>{{cite journal |vauthors=Leonard AC, Méchali M |title=DNA replication origins |journal=Cold Spring Harbor Perspectives in Biology |volume=5 |pages=a010116 |date=2013 |issue=10 |doi=10.1101/cshperspect.a010116|pmid=23838439 |pmc=3783049 }}</ref><ref>{{cite journal |vauthors=Urban JM, Foulk MS, Casella C, Gerbi SA |date=2015 |title=The hunt for origins of DNA replication in multicellular eukaryotes |journal=F1000Prime Reports |volume=7 |page=30 |doi=10.12703/P7-30|pmid=25926981 |pmc=4371235 |doi-access=free }}</ref><ref>{{cite journal |vauthors=Prioleau M, MacAlpine DM |date=2016 |title=DNA replication origins—where do we begin? |journal=Genes & Development |volume=30 |issue=15 |pages=1683–1697 |doi=10.1101/gad.285114.116|pmid=27542827 |pmc=5002974 }}</ref>
 
===Centromeres===
{{furtherMain|Centromere}}
[[File:Human karyotype with bands and sub-bands.png|thumb|Schematic [[karyotype|karyogram]] of a human, showing an overview of the [[human genome]] on [[G banding]], wherein non-coding DNA is present at the centromeres (shown as narrow segment of each chromosome), and also occurs to a greater extent in darker ([[GC-content|GC poor]]) regions.<ref name=Romiguier2017>{{cite journal | authorvauthors = Romiguier J, Roux C | title = Analytical Biases Associated with GC-Content in Molecular Evolution. | journal =Front GenetFrontiers |in year= 2017Genetics | volume = 8 | issue = | pages = 16 | year = 2017 | pmid = 28261263 | pmc = 5309256 | doi = 10.3389/fgene.2017.00016 | pmc=5309256 |doi-access url=https://www.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=pubmed&tool=sumsearch.org/cite&retmode=ref&cmd=prlinks&id=28261263 free }} </ref><br>{{further|Karyotype}}]]
{{further|Centromere}}
 
Centromeres are the sites where spindle fibers attach to newly replicated chromosomes in order to segregate them into daughter cells when the cell divides. Each eukaryotic chromosome has a single functional centromere that's is seen as a constricted region in a condensed metaphase chromosome. Centromeric DNA consists of a number of repetitive DNA sequences that often take up a significant fraction of the genome because each centromere can be millions of base pairs in length. In humans, for example, the sequences of all 24 centromeres have been determined<ref>{{ cite journal | vauthors = Altemose N, Logsdon GA, Bzikadze AV, Sidhwani P, Langley SA, Caldas GV, et al. | title = Complete genomic and epigenetic maps of human centromeres | journal = Science | volume = 376 | pages = 56 | date = 2021 | issue = 6588 | doi = 10.1126/science.abl4178| pmid = 35357911 | pmc = 9233505 | s2cid = 247853627 }}</ref> and they account for about 6% of the genome. However, it's is unlikely that all of this noncoding DNA is essential since there is considerable variation in the total amount of centromeric DNA in different individuals.<ref>{{cite journal | vauthors = Miga KH | title = Centromeric satellite DNAs: hidden sequence variation in the human population | journal = Genes | volume = 10 | pages = 353 | date = 2019 | issue = 5 | doi = 10.3390/genes10050352| pmid = 31072070 | pmc = 6562703 | doi-access = free }}</ref> Centromeres are another example of functional noncoding DNA sequences that have been known for almost half a century and it's is likely that they are more abundant than coding DNA.
 
===Telomeres===
{{furtherMain|Telomere}}
 
Telomeres are regions of repetitive DNA at the end of a [[chromosome]], which provide protection from chromosomal deterioration during [[DNA replication]]. Recent studies have shown that telomeres function to aid in its own stability. [[Telomeric Repeat-Containing RNA (TERRA)|Telomeric repeat-containing RNA (TERRA)]] are transcripts derived from telomeres. TERRA has been shown to maintain telomerase activity and lengthen the ends of chromosomes.<ref>{{cite journal | vauthors = Cusanelli E, Chartrand P | title = Telomeric noncoding RNA: telomeric repeat-containing RNA in telomere biology | journal = Wiley Interdisciplinary Reviews. RNA | volume = 5 | issue = 3 | pages = 407–419 | date = May 2014 | pmid = 24523222 | doi = 10.1002/wrna.1220 | s2cid = 36918311 }}</ref>
 
===Scaffold attachment regions===
{{furtherMain|Scaffold/matrix attachment region}}
 
Both prokaryotic and eukarotic genomes are organized into large loops of protein-bound DNA. In eukaryotes, the bases of the loops are called [[Scaffold/matrix attachment region|scaffold attachment regions]] (SARs) and they consist of stretches of DNA that bind an RNA/protein complex to stabilize the loop. There are about 100,000 loops in the human genome and each oneSAR consists of about 100 bp of DNA., Theso the total amount of DNA devoted to SARs accounts for about 0.3% of the human genome.<ref>{{cite journal | vauthors = Mistreli T | date = 2020 | title = The self-organizing genome: Principles of genome architecture and function | journal = Cell | volume = 183 | issue = 1 | pages = 28–45 | doi = 10.1016/j.cell.2020.09.014 | pmid = 32976797 | pmc = 7541718 }}</ref>
 
===Pseudogenes===
{{furtherMain|Pseudogene}}
 
Pseudogenes are mostly former genes that have become non-functional due to mutation, but the term also refers to inactive DNA sequences that are derived from RNAs produced by functional genes ([[Pseudogene|processed pseudogenes]]). Pseudogenes are only a small fraction of noncoding DNA in prokaryotic genomes because they are eliminated by negative selection. In some eukaryotes, however, pseudogenes can accumulate because selection isn'tis not powerful enough to eliminate them (see [[Nearly neutral theory of molecular evolution]]).
 
The human genome contains about 15,000 pseudogenes derived from protein-coding genes and an unknown number derived from noncoding genes.<ref>{{ cite web | url = https://useast.ensembl.org/Homo_sapiens/Info/Annotation | title = Ensemble Human reference genome GRCh38.p13}}</ref> They may cover a substantial fraction of the genome (~5%) since many of them contain former intron sequences, .
 
Pseudogenes are junk DNA by definition and they evolve at the neutral rate as expected for junk DNA.<ref>{{ cite journal | vauthors = Xu J, Zhang J | date = 2015 | title = Are human translated pseudogenes functional? | journal = Molecular Biology and Evolution | volume = 33 | issue = 3 | pages = 755–760 | doi = 10.1093/molbev/msv268 | pmid = 26589994 | pmc = 5009996 }}</ref> Some former pseudogenes have secondarily acquired a function and this leads some scientists to speculate that most pseudogenes are not junk because they have a yet-to-be-discovered function.<ref>{{ cite journal | vauthors = Wen YZ, Zheng LL, Qu LH, Ayala FJ, Lun ZR | date = 2012 | title = Pseudogenes are not pseudo any more. | journal = RNA Biology | volume = 9 | issue = 1 | pages = 27–32 | doi = 10.4161/rna.9.1.18277 | pmid = 22258143 | s2cid = 13161678 | doi-access = free }}</ref>
 
===Repeat sequences, transposons and viral elements===
{{furtherMain|Repeated sequence (DNA)}}
 
[[File:Bacterial mobile elements.svg|thumb|upright=1.35|[[Mobile genetic elements]] in the cell (left) and how they can be acquired (right)]]
 
[[Transposon]]s and [[retrotransposon]]s are [[mobile genetic elements]]. Retrotransposon [[Repeated sequence (DNA)|repeated sequences]], which include [[Retrotransposon#LINEs|long interspersed nuclear elements]] (LINEs) and [[Retrotransposon#SINEs|short interspersed nuclear elements]] (SINEs), account for a large proportion of the genomic sequences in many species. [[Alu sequence]]s, classified as a short interspersed nuclear element, are the most abundant mobile elements in the human genome. Some examples have been found of SINEs exerting transcriptional control of some protein-encoding genes.<ref>{{cite journal |vauthors=Ponicsan SL, Kugel JF, Goodrich JA |title=Genomic gems: SINE RNAs regulate mRNA production |journal=Current Opinion in Genetics & Development |volume=20 |issue=2 |pages=149–155 |date=April 2010 |pmid=20176473 |pmc=2859989 |doi=10.1016/j.gde.2010.01.004}}</ref><ref>{{cite journal |vauthors=Häsler J, Samuelsson T, Strub K |title=Useful 'junk': Alu RNAs in the human transcriptome |journal=Cellular and Molecular Life Sciences |volume=64 |issue=14 |pages=1793–1800 |date=July 2007 |pmid=17514354 |s2cid=5938630 |doi=10.1007/s00018-007-7084-0 |type=Submitted manuscript |url=https://archive-ouverte.unige.ch/unige:17489|pmc=11136058 }}</ref><ref>{{cite journal |vauthors=Walters RD, Kugel JF, Goodrich JA |title=InvAluable junk: the cellular impact and function of Alu and B2 RNAs |journal=IUBMB Life |volume=61 |issue=8 |pages=831–837 |date=August 2009 |pmid=19621349 |pmc=4049031 |doi=10.1002/iub.227}}</ref>
 
[[Endogenous retrovirus]] sequences are the product of [[reverse transcription]] of [[retrovirus]] genomes into the genomes of [[germ cell]]s. Mutation within these retro-transcribed sequences can inactivate the viral genome.<ref>{{cite journal | vauthors = Nelson PN, Hooley P, Roden D, Davari Ejtehadi H, Rylance P, Warren P, Martin J, Murray PG | display-authors = 6 | title = Human endogenous retroviruses: transposable elements with potential? | journal = Clinical and Experimental Immunology | volume = 138 | issue = 1 | pages = 1–9 | date = October 2004 | pmid = 15373898 | pmc = 1809191 | doi = 10.1111/j.1365-2249.2004.02592.x }}</ref>
 
Over 8% of the human genome is made up of (mostly decayed) endogenous retrovirus sequences, as part of the over 42% fraction that is recognizably derived of retrotransposons, while another 3% can be identified to be the remains of [[Transposon#DNA transposons|DNA transposon]]s. Much of the remaining half of the genome that is currently without an explained origin is expected to have found its origin in transposable elements that were active so long ago (> 200 million years) that random mutations have rendered them unrecognizable.<ref name=humangenome>{{cite journal | vauthors = Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W, Funke R, Gage D, Harris K, Heaford A, Howland J, Kann L, Lehoczky J, LeVine R, McEwan P, McKernan K, Meldrim J, Mesirov JP, Miranda C, Morris W, Naylor J, Raymond C, Rosetti M, Santos R, Sheridan A, Sougnez C, Stange-Thomann Y, Stojanovic N, Subramanian A, Wyman D, Rogers J, Sulston J, Ainscough R, Beck S, Bentley D, Burton J, Clee C, Carter N, Coulson A, Deadman R, Deloukas P, Dunham A, Dunham I, Durbin R, French L, Grafham D, Gregory S, Hubbard T, Humphray S, Hunt A, Jones M, Lloyd C, McMurray A, Matthews L, Mercer S, Milne S, Mullikin JC, Mungall A, Plumb R, Ross M, Shownkeen R, Sims S, Waterston RH, Wilson RK, Hillier LW, McPherson JD, Marra MA, Mardis ER, Fulton LA, Chinwalla AT, Pepin KH, Gish WR, Chissoe SL, Wendl MC, Delehaunty KD, Miner TL, Delehaunty A, Kramer JB, Cook LL, Fulton RS, Johnson DL, Minx PJ, Clifton SW, Hawkins T, Branscomb E, Predki P, Richardson P, Wenning S, Slezak T, Doggett N, Cheng JF, Olsen A, Lucas S, Elkin C, Uberbacher E, Frazier M, Gibbs RA, Muzny DM, Scherer SE, Bouck JB, Sodergren EJ, Worley KC, Rives CM, Gorrell JH, Metzker ML, Naylor SL, Kucherlapati RS, Nelson DL, Weinstock GM, Sakaki Y, Fujiyama A, Hattori M, Yada T, Toyoda A, Itoh T, Kawagoe C, Watanabe H, Totoki Y, Taylor T, Weissenbach J, Heilig R, Saurin W, Artiguenave F, Brottier P, Bruls T, Pelletier E, Robert C, Wincker P, Smith DR, Doucette-Stamm L, Rubenfield M, Weinstock K, Lee HM, Dubois J, Rosenthal A, Platzer M, Nyakatura G, Taudien S, Rump A, Yang H, Yu J, Wang J, Huang G, Gu J, Hood L, Rowen L, Madan A, Qin S, Davis RW, Federspiel NA, Abola AP, Proctor MJ, Myers RM, Schmutz J, Dickson M, Grimwood J, Cox DR, Olson MV, Kaul R, Raymond C, Shimizu N, Kawasaki K, Minoshima S, Evans GA, Athanasiou M, Schultz R, Roe BA, Chen F, Pan H, Ramser J, Lehrach H, Reinhardt R, McCombie WR, de la Bastide M, Dedhia N, Blöcker H, Hornischer K, Nordsiek G, Agarwala R, Aravind L, Bailey JA, Bateman A, Batzoglou S, Birney E, Bork P, Brown DG, Burge CB, Cerutti L, Chen HC, Church D, Clamp M, Copley RR, Doerks T, Eddy SR, Eichler EE, Furey TS, Galagan J, Gilbert JG, Harmon C, Hayashizaki Y, Haussler D, Hermjakob H, Hokamp K, Jang W, Johnson LS, Jones TA, Kasif S, Kaspryzk A, Kennedy S, Kent WJ, Kitts P, Koonin EV, Korf I, Kulp D, Lancet D, Lowe TM, McLysaght A, Mikkelsen T, Moran JV, Mulder N, Pollara VJ, Ponting CP, Schuler G, Schultz J, Slater G, Smit AF, Stupka E, Szustakowki J, Thierry-Mieg D, Thierry-Mieg J, Wagner L, Wallis J, Wheeler R, Williams A, Wolf YI, Wolfe KH, Yang SP, Yeh RF, Collins F, Guyer MS, Peterson J, Felsenfeld A, Wetterstrand KA, Patrinos A, Morgan MJ, de Jong P, Catanese JJ, Osoegawa K, Shizuya H, Choi S, Chen YJ, Szustakowki J | display-authors = 6 | title = Initial sequencing and analysis of the human genome | journal = Nature | volume = 409 | issue = 6822 | pages = 860–921 | date = February 2001 | pmid = 11237011 | doi = 10.1038/35057062 | doi-access = free | bibcode = 2001Natur.409..860L | hdl = 2027.42/62798 | hdl-access = free }}</ref> Genome size variation in at least two kinds of plants is mostly the result of retrotransposon sequences.<ref>{{cite journal | vauthors = Piegu B, Guyot R, Picault N, Roulin A, Sanyal A, Saniyal A, Kim H, Collura K, Brar DS, Jackson S, Wing RA, Panaud O | display-authors = 6 | title = Doubling genome size without polyploidization: dynamics of retrotransposition-driven genomic expansions in Oryza australiensis, a wild relative of rice | journal = Genome Research | volume = 16 | issue = 10 | pages = 1262–1269 | date = October 2006 | pmid = 16963705 | pmc = 1581435 | doi = 10.1101/gr.5290206 }}</ref><ref>{{cite journal | vauthors = Hawkins JS, Kim H, Nason JD, Wing RA, Wendel JF | title = Differential lineage-specific amplification of transposable elements is responsible for genome size variation in Gossypium | journal = Genome Research | volume = 16 | issue = 10 | pages = 1252–1261 | date = October 2006 | pmid = 16954538 | pmc = 1581434 | doi = 10.1101/gr.5282906 }}</ref>
 
===Highly repetitive DNA===
Line 105:
Variations in the number of STR repeats can cause genetic diseases when they lie within a gene but most of these regions appear to be non-functional junk DNA where the number of repeats can vary considerably from individual to individual. This is why these length differences are used extensively in [[DNA profiling|DNA fingerprinting]].
 
===Junk DNA===
{{Main|Junk DNA}}
Although many non-coding regions have biological function,<ref name="Costa non-coding3">{{cite book |title=Non-coding RNAs and Epigenetic Regulation of Gene Expression: Drivers of Natural Selection |vauthors=Costa F |date=2012 |publisher=[[Caister Academic Press]] |isbn=978-1-904455-94-3 |veditors=Morris KV |chapter=7 Non-coding RNAs, Epigenomics, and Complexity in Human Cells}}{{page needed|date=June 2022}}</ref><ref name="Nessa3">{{cite book |title=Junk DNA: A Journey Through the Dark Matter of the Genome |vauthors=Carey M |date=2015 |publisher=Columbia University Press |isbn=978-0-231-17084-0 |author-link=Nessa Carey}}{{page needed|date=June 2022}}</ref> some portion of non-coding DNA has also been described as "Junk DNA". Though exact definitions differ, this refers broadly to "any DNA sequence that does not play a functional role in development, physiology, or some other organism-level capacity."<ref name="PalazzoGregory20142">{{cite journal |vauthors=Palazzo AF, Gregory TR |date=May 2014 |title=The case for junk DNA |journal=PLOS Genetics |volume=10 |issue=5 |pages=e1004351 |doi=10.1371/journal.pgen.1004351 |pmc=4014423 |pmid=24809441}}</ref> The term "junk DNA" was used in the 1960s.<ref name="PalazzoGregory20142" /><ref name="EhretdeHaller19632">{{cite journal |vauthors=Ehret CF, De Haller G |date=October 1963 |title=Origin, development, and maturation of organelles and organelle systems of the cell surface in Paramecium |journal=Journal of Ultrastructure Research |volume=23 |pages=SUPPL6:1–SUPPL642 |doi=10.1016/S0022-5320(63)80088-X |pmid=14073743}}</ref><ref name="Gregory Evolution Genome2">{{cite book |url=https://books.google.com/books?id=8HtPZP9VSiMC&dq=not+only+is+%22junk+dna%22+an+inappropriate+moniker&pg=PA30 |title=The Evolution of the Genome |date=2005 |publisher=Elsevier |isbn=978-0-12-301463-4 |veditors=TR |pages=29–31}}</ref> but it only became widely known in 1972 in a paper by [[Susumu Ohno]].<ref name="Ohno2">{{cite journal |last1=Ohno |first1=S |date=1972 |title=So much 'junk' DNA in our genome |journal=Brookhaven Symposia in Biology |volume=23 |pages=366–70 |oclc=101819442 |pmid=5065367}}</ref> Ohno noted that the [[mutational load]] from deleterious mutations placed an upper limit on the number of functional [[Locus (genetics)|loci]] that could be expected given a typical mutation rate. He hypothesized that mammalian genomes could not have more than 30,000 loci under selection before the "cost" from the mutational load would cause an inescapable decline in fitness, and eventually extinction.<ref name="Ohno2" /> Similar calculations focusing on nucleotides rather than gene loci come to the similar conclusion that the functional portion of the human genome (given mutation rates, genome size and population size) can only be maintained up to approximately 15%.<ref>{{cite journal |last1=Graur |first1=D |year=2017 |title=An Upper Limit on the Functional Fraction of the Human Genome |journal=Genome Biol. Evol. |volume=9 |pages=1880–1885 |doi=10.1093/gbe/evx121 |pmc=5570035 |pmid=28854598 |number=7}}</ref> The presence of junk DNA also explained the observation that even closely related species can have widely (orders-of-magnitude) different genome sizes ([[C-value|C-value paradox]]).<ref name="eddy2">{{cite journal |author-link=Sean Eddy |vauthors=Eddy SR |date=November 2012 |title=The C-value paradox, junk DNA and ENCODE |journal=Current Biology |volume=22 |issue=21 |pages=R898–R899 |doi=10.1016/j.cub.2012.10.002 |pmid=23137679 |doi-access=free |s2cid=28289437}}</ref>
Junk DNA is DNA that has no biologically relevant function such as pseudogenes and fragments of once active transposons. Bacteria and viral genomes have very little junk DNA<ref>{{cite journal | vauthors = Gil R, and Latorre A | date = 2012 | title = Factors behind junk DNA in bacteria | journal = Genes | volume = 3 | issue = 4 | pages = 634–650 | doi = 10.3390/genes3040634 | pmid = 24705080 | pmc = 3899985 | doi-access = free }}</ref><ref>{{Cite journal |last1=Brandes |first1=Nadav |last2=Linial |first2=Michal |date=2016 |title=Gene overlapping and size constraints in the viral world |journal=Biology Direct |language=en |volume=11 |issue=1 |pages=26 |doi=10.1186/s13062-016-0128-3 |pmid=27209091 |pmc=4875738 |issn=1745-6150 |doi-access=free }}</ref> but some eukaryotic genomes may have a substantial amount of junk DNA.<ref name="PalazzoGregory2014">{{cite journal | vauthors = Palazzo AF, Gregory TR | title = The case for junk DNA | journal = PLOS Genetics | volume = 10 | issue = 5 | pages = e1004351 | date = May 2014 | pmid = 24809441 | pmc = 4014423 | doi = 10.1371/journal.pgen.1004351 | doi-access = free }}</ref> The exact amount of nonfunctional DNA in humans and other species with large genomes has not been determined and there is considerable controversy in the scientific literature.<ref>{{cite journal | last = Morange | first = Michel | date = 2014 | title = Genome as a Multipurpose Structure Built by Evolution | journal = Perspectives in Biology and Medicine | volume = 57 | issue = 1 | pages = 162–171 | doi = 10.1353/pbm.2014.0008 | pmid = 25345709 | s2cid = 27613442 | url = https://hal.archives-ouvertes.fr/hal-01480552/file/ARTICLE%20ENCODE%20MM%2070114%20corrige%C2%A6%C3%BC.pdf }}</ref><ref>{{cite journal | vauthors = Haerty W, and Ponting CP | title = No Gene in the Genome Makes Sense Except in the Light of Evolution. | year = 2014 | journal = Annual Review of Genomics and Human Genetics | volume =25 | pages = 71–92 | doi = 10.1146/annurev-genom-090413-025621| pmid = 24773316 | doi-access = free }}</ref>
 
Some authors assert that the term "junk DNA" occurs mainly in [[popular science]] and is no longer used in serious research articles.<ref name="SA2">{{cite journal |vauthors=Khajavinia A, Makalowski W |date=May 2007 |title=What is "junk" DNA, and what is it worth? |journal=Scientific American |volume=296 |issue=5 |pages=104 |bibcode= |doi=10.1038/scientificamerican0507-104 |pmid=17503549 |quote=The term "junk DNA" repelled mainstream researchers from studying noncoding genetic material for many years}}</ref> However, examination of ''Web of Science'' shows immediately that this is at best an oversimplification. For example, given the average deleterious mutation rate and population size, the functional portion of the human genome an only be maintained up to approximately 15%.<ref>{{cite journal
| last1= Graur | first1 = D
| title = An Upper Limit on the Functional Fraction of the Human Genome
| doi= 10.1093/gbe/evx121
| year = 2017
| journal = Genome Biol. Evol.
| volume = 9| number =7
| pages = 1880–1885
| pmid = 28854598
| pmc = 5570035
}}</ref> Likewise, in an recent review Palazzo and Kejiou<ref>{{cite journal
| title = Non-Darwinian Molecular Biology
| last1 = Palazzo|first1 = A F
|last2 = Kejiou| first2 = N S
| journal = Front. Genet.
| volume = 13
| pages = 831068
| year= 2022
| doi= 10.3389/fgene.2022.831068| pmid = 35251134| pmc = 8888898| doi-access = free}}</ref> noted the impossibility of maintaining a population with 100% functionality, and point out that "many researchers continue to state, erroneously, that all non-coding DNA was once thought to be junk."
 
Since the late 1970s it has become apparent that most of the DNA in large genomes finds its origin in the [[selfish DNA|selfish]] amplification of [[transposable element]]s, of which [[Ford Doolittle|W. Ford Doolittle]] and Carmen Sapienza in 1980 wrote in the journal ''[[Nature (journal)|Nature]]'': "When a given DNA, or class of DNAs, of unproven phenotypic function can be shown to have evolved a strategy (such as transposition) which ensures its genomic survival, then no other explanation for its existence is necessary."<ref name="Doolittle1980">{{cite journal | vauthors = Doolittle WF, Sapienza C | title = Selfish genes, the phenotype paradigm and genome evolution | journal = Nature | volume = 284 | issue = 5757 | pages = 601–603 | date = April 1980 | pmid = 6245369 | doi = 10.1038/284601a0 | s2cid = 4311366 | bibcode = 1980Natur.284..601D }}</ref> The amount of junk DNA can be expected to depend on the rate of amplification of these elements and the rate at which non-functional DNA is lost.{{citation needed|date=June 2022}} Another source is [[Paleopolyploidy|genome duplication]] followed by a loss of function due to redundancy.{{citation needed|date=June 2022}} In the same issue of ''Nature'', [[Leslie Orgel]] and [[Francis Crick]] wrote that junk DNA has "little specificity and conveys little or no selective advantage to the organism".<ref>{{cite journal |vauthors=Orgel LE, Crick FH |title=Selfish DNA: the ultimate parasite |journal=Nature |volume=284 |issue=5757 |pages=604–607 |date=April 1980 |pmid=7366731 |doi=10.1038/284604a0 |s2cid=4233826 |bibcode=1980Natur.284..604O}}</ref>
 
The term "junk DNA" may provoke a strong reaction and some have recommended using more neutral terminology such as "nonfunctional DNA."<ref name=eddy/>
 
===ENCODE Project===
 
The Encyclopedia of DNA Elements ([[ENCODE]]) project uncovered, by direct biochemical approaches, that at least 80% of human genomic DNA has biochemical activity such as "transcription, transcription factor association, chromatin structure, and histone modification".<ref name=Nature489p57>{{cite journal | vauthors = Dunham I, Kundaje A, Aldred SF, Collins PJ, Davis CA, Doyle F, etal | collaboration = The ENCODE Project Consortium | title = An integrated encyclopedia of DNA elements in the human genome | journal = Nature | volume = 489 | issue = 7414 | pages = 57–74 | date = September 2012 | pmid = 22955616 | pmc = 3439153 | doi = 10.1038/nature11247 | bibcode = 2012Natur.489...57T }}.</ref> Though this was not necessarily unexpected due to previous decades of research discovering many functional non-coding regions,<ref name="Costa non-coding">{{cite book| vauthors = Costa F | veditors = Morris KV |title=Non-coding RNAs and Epigenetic Regulation of Gene Expression: Drivers of Natural Selection|date=2012|publisher=[[Caister Academic Press]]|isbn=978-1-904455-94-3|chapter=7 Non-coding RNAs, Epigenomics, and Complexity in Human Cells}}{{page needed|date=June 2022}}</ref><ref name=Nessa /> some scientists criticized the conclusion for conflating biochemical activity with [[biological function]].<ref name="observer">{{cite news |url=https://www.theguardian.com/science/2013/feb/24/scientists-attacked-over-junk-dna-claim |title=Scientists attacked over claim that 'junk DNA' is vital to life | vauthors = McKie R |work=The Observer|date=24 February 2013 }}</ref><ref name=eddy>{{cite journal | vauthors = Eddy SR | title = The C-value paradox, junk DNA and ENCODE | journal = Current Biology | volume = 22 | issue = 21 | pages = R898–R899 | date = November 2012 | pmid = 23137679 | doi = 10.1016/j.cub.2012.10.002 | s2cid = 28289437 | author-link = Sean Eddy | doi-access = free }}</ref><ref name=doolittle2013>{{cite journal | vauthors = Doolittle WF | title = Is junk DNA bunk? A critique of ENCODE | journal = Proceedings of the National Academy of Sciences of the United States of America | volume = 110 | issue = 14 | pages = 5294–5300 | date = April 2013 | pmid = 23479647 | pmc = 3619371 | doi = 10.1073/pnas.1221376110 | author-link = W. Ford Doolittle | bibcode = 2013PNAS..110.5294D | doi-access = free }}</ref><ref name="PalazzoGregory2014">{{cite journal | vauthors = Palazzo AF, Gregory TR | title = The case for junk DNA | journal = PLOS Genetics | volume = 10 | issue = 5 | pages = e1004351 | date = May 2014 | pmid = 24809441 | pmc = 4014423 | doi = 10.1371/journal.pgen.1004351 }}</ref><ref name="graur">{{cite journal | vauthors = Graur D, Zheng Y, Price N, Azevedo RB, Zufall RA, Elhaik E | title = On the immortality of television sets: "function" in the human genome according to the evolution-free gospel of ENCODE | journal = Genome Biology and Evolution | volume = 5 | issue = 3 | pages = 578–590 | year = 2013 | pmid = 23431001 | pmc = 3622293 | doi = 10.1093/gbe/evt028 }}</ref> Some have argued that neither accessibility of segments of the genome to transcription factors nor their transcription guarantees that those segments have biochemical function and that their transcription is [[natural selection|selectively advantageous]]. After all, non-functional sections of the genome can be transcribed, given that transcription factors typically bind to short sequences that are found (randomly) all over the whole genome.<ref>{{cite journal | vauthors = Lambert SA, Jolma A, Campitelli LF, Das PK, Yin Y, Albu M, Chen X, Taipale J, Hughes TR, Weirauch MT | display-authors = 6 | title = The Human Transcription Factors | journal = Cell | volume = 172 | issue = 4 | pages = 650–665 | date = February 2018 | pmid = 29425488 | doi = 10.1016/j.cell.2018.01.029 | s2cid = 3599827 | doi-access = free }}</ref>
However, others have argued against relying solely on estimates from comparative genomics due to its limited scope since non-coding DNA has been found to be involved in [[epigenetic]] activity and complex [[gene regulatory network|networks of genetic interactions]] and is explored in [[evolutionary developmental biology]].<ref name=Nessa>{{cite book| vauthors = Carey M | author-link =Nessa Carey|title=Junk DNA: A Journey Through the Dark Matter of the Genome|date=2015|publisher=Columbia University Press|isbn=978-0-231-17084-0}}{{page needed|date=June 2022}}</ref><ref name=kellis /><ref name="extent functionality">{{cite journal | vauthors = Liu G, Mattick JS, Taft RJ | title = A meta-analysis of the genomic and transcriptomic composition of complex life | journal = Cell Cycle | volume = 12 | issue = 13 | pages = 2061–2072 | date = July 2013 | pmc = 4685169 | doi = 10.1186/1877-6566-7-2 | pmid = 23759593 }}</ref><ref name="Morris Epigenetics">{{cite book | veditors = Morris K |title=Non-Coding RNAs and Epigenetic Regulation of Gene Expression: Drivers of Natural Selection |date=2012 |publisher=Caister Academic Press |___location=Norfolk, UK |isbn=978-1-904455-94-3}}{{page needed|date=June 2022}}</ref> Prior to ENCODE, the much lower estimates of functionality were based on genomic conservation estimates across mammalian lineages.<ref name="eddy" /><ref name="doolittle2013" /><ref name="PalazzoGregory2014" /><ref name="graur" /> Estimates for the biologically functional fraction of the human genome based on [[comparative genomics]] range between 8 and 15%.<ref>{{cite journal | vauthors = Ponting CP, Hardison RC | title = What fraction of the human genome is functional? | journal = Genome Research | volume = 21 | issue = 11 | pages = 1769–1776 | date = November 2011 | pmid = 21875934 | pmc = 3205562 | doi = 10.1101/gr.116814.110 }}</ref><ref name=kellis>{{cite journal | vauthors = Kellis M, Wold B, Snyder MP, Bernstein BE, Kundaje A, Marinov GK, Ward LD, Birney E, Crawford GE, Dekker J, Dunham I, Elnitski LL, Farnham PJ, Feingold EA, Gerstein M, Giddings MC, Gilbert DM, Gingeras TR, Green ED, Guigo R, Hubbard T, Kent J, Lieb JD, Myers RM, Pazin MJ, Ren B, Stamatoyannopoulos JA, Weng Z, White KP, Hardison RC | display-authors = 6 | title = Defining functional DNA elements in the human genome | journal = Proceedings of the National Academy of Sciences of the United States of America | volume = 111 | issue = 17 | pages = 6131–6138 | date = April 2014 | pmid = 24753594 | pmc = 4035993 | doi = 10.1073/pnas.1318948111 | doi-access = free | bibcode = 2014PNAS..111.6131K }}</ref><ref name="Rands">{{cite journal | vauthors = Rands CM, Meader S, Ponting CP, Lunter G | title = 8.2% of the Human genome is constrained: variation in rates of turnover across functional element classes in the human lineage | journal = PLOS Genetics | volume = 10 | issue = 7 | pages = e1004525 | date = July 2014 | pmid = 25057982 | pmc = 4109858 | doi = 10.1371/journal.pgen.1004525 }}</ref> One consistent indication of biological functionality of a genomic region is if the sequence of that genomic region was maintained by purifying selection (or if mutating away the sequence is deleterious to the organism). Under this definition, 90% of the genome is 'junk'. However, some stress that 'junk' is not 'garbage'<ref>{{cite journal |last1=Brenner |first1=Sydney |title=Refuge of spandrels |journal=Current Biology |date=September 1998 |volume=8 |issue=19 |pages=R669 |doi=10.1016/s0960-9822(98)70427-0 |pmid=9776723 |s2cid=2918533 |doi-access=free }}</ref> and the large body of nonfunctional transcripts produced by 'junk DNA' can evolve functional elements ''de novo''.<ref>{{cite journal | vauthors = Palazzo AF, Koonin EV | title = Functional Long Non-coding RNAs Evolve from Junk Transcripts | journal = Cell | volume = 183 | issue = 5 | pages = 1151–1161 | date = November 2020 | pmid = 33068526 | doi = 10.1016/j.cell.2020.09.047 | s2cid = 222815635 | doi-access = free }}</ref><ref>{{cite journal | vauthors = Graur D, Zheng Y, Azevedo RB | title = An evolutionary classification of genomic function | journal = Genome Biology and Evolution | volume = 7 | issue = 3 | pages = 642–645 | date = January 2015 | pmid = 25635041 | pmc = 5322545 | doi = 10.1093/gbe/evv021 }}</ref> However, widespread transcription and splicing in the human genome has been discussed as another indicator of genetic function in addition to genomic conservation which may miss poorly conserved functional sequences.<ref name="kellis" /> And much of the apparent junk DNA is involved in [[epigenetic]] regulation and appears to be necessary for the development of complex organisms.<ref name="Nessa" /><ref name="extent functionality" /><ref name="Morris Epigenetics" />
 
The nonfunctional DNA in bacterial genomes is mostly located in the intergenic fraction of non-coding DNA but in eukaryotic genomes it may also be found within [[introns]]. There are many examples of functional DNA elements in non-coding DNA, and it is erroneous to equate non-coding DNA with junk DNA.
Contributing to the debating is that there is no consensus on what constitutes a "functional" element in the genome since geneticists, evolutionary biologists, and molecular biologists employ different approaches and definitions of "function",<ref name="kellis" /> often with a lack of clarity of what they mean in the literature.<ref>{{cite journal |last1=Linquist |first1=Stefan |last2=Doolittle |first2=W. Ford |last3=Palazzo |first3=Alexander F. |title=Getting clear about the F-word in genomics |journal=PLOS Genetics |date=1 April 2020 |volume=16 |issue=4 |pages=e1008702 |doi=10.1371/journal.pgen.1008702|pmid=32236092 |pmc=7153884 }}</ref> Due to the ambiguity in the terminology, there are different schools of thought over this matter.<ref>{{cite journal |last1=Doolittle |first1=W. Ford |title=We simply cannot go on being so vague about 'function' |journal=Genome Biology |date=December 2018 |volume=19 |issue=1 |pages=223 |doi=10.1186/s13059-018-1600-4|pmid=30563541 |pmc=6299606 |doi-access=free }}</ref> Furthermore, methods used have limitations, for example, Genetic approaches may miss functional elements that do not manifest physically on the organism, evolutionary approaches have difficulties using accurate multispecies sequence alignments since genomes of even closely related species vary considerably, and with biochemical approaches, though having high reproducibility, the biochemical signatures do not always automatically signify a function.<ref name="kellis" /> Kellis et al. noted that 70% of the transcription coverage was less than 1 transcript per cell (and may thus be based on spurious background transcription). On the other hand, they argued that 12–15% fraction of human DNA may be under functional constraint, and may still be an underestimate when lineage-specific constraints are included. Ultimately genetic, evolutionary, and biochemical approaches can all be used in a complementary way to identify regions that may be functional in human biology and disease.<ref name="kellis" /> Some critics have argued that functionality can only be assessed in reference to an appropriate [[null hypothesis]]. In this case, the null hypothesis would be that these parts of the genome are non-functional and have properties, be it on the basis of conservation or biochemical activity, that would be expected of such regions based on our general understanding of [[molecular evolution]] and [[biochemistry]]. According to these critics, until a region in question has been shown to have additional features, beyond what is expected of the null hypothesis, it should provisionally be labelled as non-functional.<ref name="PalazzoLee2015">{{cite journal | vauthors = Palazzo AF, Lee ES | title = Non-coding RNA: what is functional and what is junk? | journal = Frontiers in Genetics | volume = 6 | pages = 2 | year = 2015 | pmid = 25674102 | pmc = 4306305 | doi = 10.3389/fgene.2015.00002 | doi-access = free }}</ref>
 
==Genome-wide association studies (GWAS) and non-coding DNA==
 
[[Genome-wide association studies]] (GWAS) identify linkages between alleles and observable traits such as phenotypes and diseases. Most of the associations are between [[single-nucleotide polymorphisms]] (SNPs) and the trait being examined and most of these SNPs are located in non-functional DNA. The association establishes a linkage that helps map the DNA region responsible for the trait but it doesn'tdoes not necessarily identify the mutations causing the disease or phenotypic difference.<ref>{{ cite journal | vauthors = Korte A, Farlwo A | date = 2013 | title = The advantages and limitations of trait analysis with GWAS: a review | journal = Plant Methods | volume = 9 | pages = 29 | doi = 10.1186/1746-4811-9-29| pmid = 23876160 | pmc = 3750305 | s2cid = 206976469 | doi-access = free }}</ref><ref name = Manolio>{{cite journal | vauthors = Manolio TA | title = Genomewide association studies and assessment of the risk of disease | journal = The New England Journal of Medicine | volume = 363 | issue = 2 | pages = 166–76 | date = July 2010 | pmid = 20647212 | doi = 10.1056/NEJMra0905980 | doi-access = free }}</ref><ref>{{cite journal | vauthors = Visscher PV, Wray NR, Zhang Q, Sklar P, McCarthy MI, Brown MA, Yang J | date = 2017 | title = 10 Years of GWAS Discovery: Biology, Function, and Translation | journal = American Journal of Human Genetics | volume = 101 | issue = 1 | pages = 5–22 | doi = 10.1016/j.ajhg.2017.06.005| pmid = 28686856 | pmc = 5501872 }}</ref><ref>{{ cite journal | vauthors = Gallagher MD, Chen-Plotkin, AS | date = 2018 | title = The Post-GWAS Era: From Association to Function | journal = American Journal of Human Genetics | volume = 102 | issue = 5 | pages = 717–730 | doi = 10.1016/j.ajhg.2018.04.002| pmid = 29727686 | pmc = 5986732 }}</ref><ref>{{ cite journal | vauthors = Marigorta UM, Rodríguez JA, Gibson G, Navarro A | date = 2018 | title = Replicability and Prediction: Lessons and Challenges from GWAS | journal = Trends in Genetics | volume = 34 | issue = 7 | pages = 504–517 | doi = 10.1016/j.tig.2018.03.005| pmid = 29716745 | pmc = 6003860 }}</ref>
SNPs that are tightly linked to traits are the ones most likely to identify a causal mutation. (The association is referred to as tight [[linkage disequilibrium]].) About 12% of these polymorphisms are found in coding regions; about 40% are located in introns; and most of the rest are found in intergenic regions, including regulatory sequences.<ref name=Manolio/>
 
== See also ==
*[[Conserved non-coding sequence]]
*[[Eukaryotic chromosome fine structure]]
*[[Gene-centered view of evolution]]
*[[Gene regulatory network]]
*[[Intergenic region]]
*[[Intragenomic conflict]]
*[[Phylogenetic footprinting]]
*[[Transcriptome]]
*[[Non-coding RNA]]
*[[Gene desert]]
*The [[Onion Test]]
 
== References ==
Line 165 ⟶ 126:
{{Refbegin|32em}}
* {{cite book | vauthors = Bennett MD, Leitch IJ | year = 2005 | chapter = Genome size evolution in plants |chapter-url=https://books.google.com/books?id=8HtPZP9VSiMC&pg=PA89 | title = The Evolution of the Genome | veditors = Gregory RT | publisher = Elsevier | ___location = San Diego | pages = 89–162 |isbn=978-0-08-047052-8}}
* {{cite book |doi=10.1016/B978-012301463-4/50003-6 |chapter=Genome Size Evolution in Animals |title=The Evolution of the Genome |year=2005 |last1=Gregory |first1vauthors =T. RyanGregory TR |pages=3–87 |isbn=978-0-12-301463-4 }}
* {{cite journal | vauthors = Shabalina SA, Spiridonov NA | title = The mammalian transcriptome and the function of non-coding DNA sequences | journal = Genome Biology | volume = 5 | issue = 4 | pages = 105 | year = 2004 | pmid = 15059247 | pmc = 395773 | doi = 10.1186/gb-2004-5-4-105 | doi-access = free }}
* {{cite journal | vauthors = Castillo-Davis CI | title = The evolution of noncoding DNA: how much junk, how much func? | journal = Trends in Genetics | volume = 21 | issue = 10 | pages = 533–536 | date = October 2005 | pmid = 16098630 | doi = 10.1016/j.tig.2005.08.001 }}
{{Refend}}