Non-coding DNA: Difference between revisions

Content deleted Content added
lede summarizes body
 
(7 intermediate revisions by 6 users not shown)
Line 6:
In [[bacteria]], the [[Coding region|coding regions]] typically take up 88% of the genome.<ref name=":0" /> The remaining 12% does not encode proteins, but much of it still has biological function through [[Gene|genes]] where the RNA transcript is functional (non-coding genes) and regulatory sequences, which means that almost all of the bacterial genome has a function.<ref name=":0">{{cite journal | vauthors = Kirchberger PC, Schmidt ML, and Ochman H | date = 2020 | title = The ingenuity of bacterial genomes | journal = Annual Review of Microbiology | volume = 74 | pages = 815–834 | doi = 10.1146/annurev-micro-020518-115822| pmid = 32692614 | s2cid = 220699395 }}</ref> The amount of coding DNA in [[Eukaryote|eukaryotes]] is usually a much smaller fraction of the genome because eukaryotic genomes contain large amounts of repetitive DNA not found in prokaryotes. The [[human genome]] contains somewhere between 1–2% coding DNA.<ref name = Piovesan/><ref>{{ cite journal | vauthors = Omenn GS | date = 2021 | title = Reflections on the HUPO Human Proteome Project, the Flagship Project of the Human Proteome Organization, at 10 Years | journal = Molecular & Cellular Proteomics | volume = 20 | pages = 100062 | doi = 10.1016/j.mcpro.2021.100062| pmid = 33640492 | pmc = 8058560 }}</ref> The exact number is not known because there are disputes over the number of functional coding [[Exon|exons]] and over the total size of the human genome. This means that 98–99% of the human genome consists of non-coding DNA and this includes many functional elements such as non-coding genes and regulatory sequences.
 
[[Genome size]] in eukaryotes can vary over a wide range, even between closely related species. This puzzling observation was originally known as the [[C-value |C-value Paradoxparadox]] where "C" refers to the haploid genome size.<ref>{{cite journal | vauthors = Thomas CA | title = The genetic organization of chromosomes | journal = Annual Review of Genetics | volume = 5 | pages = 237–256 | date = 1971 | pmid = 16097657 | doi = 10.1146/annurev.ge.05.120171.001321 }}</ref> The paradox was resolved with the discovery that most of the differences were due to the expansion and contraction of repetitive DNA and not the number of genes. Some researchers speculated that this repetitive DNA was mostly [[junk DNA]]. The reasons for the changes in genome size are still being worked out and this problem is called the C-value Enigma.<ref>{{ cite journal | vauthors = Elliott TA, Gregory TR | date = 2015 | title = What's in a genome? The C-value enigma and the evolution of eukaryotic genome content | journal = Phil. Trans. R. Soc. B | volume = 370 | issue = 1678 | pages = 20140331 | doi = 10.1098/rstb.2014.0331| pmid = 26323762 | pmc = 4571570 | s2cid = 12095046 }}</ref>
 
This led to the observation that the number of genes does not seem to correlate with perceived notions of complexity because the number of genes seems to be relatively constant, an issue termed the [[G-value paradox|G-value Paradox]].<ref>{{ cite journal | vauthors = Hahn MW, Wray GA | date = 2002 | title = The g-value paradox | journal = Evolution and Development | volume = 4 | issue = 2 | pages = 73–75 | doi = 10.1046/j.1525-142X.2002.01069.x| pmid = 12004964 | s2cid = 2810069 }}</ref> For example, the genome of the unicellular ''[[Polychaos dubium]]'' (formerly known as ''Amoeba dubia'') has been reported to contain more than 200 times the amount of DNA in humans (i.e. more than 600 billion [[genome size|pairs of bases]] vs a bit more than 3 billion in humans).<ref name=Gregory>{{cite journal | vauthors = Gregory TR, Hebert PD | title = The modulation of DNA content: proximate causes and ultimate consequences | journal = Genome Research | volume = 9 | issue = 4 | pages = 317–324 | date = April 1999 | pmid = 10207154 | doi = 10.1101/gr.9.4.317 | s2cid = 16791399 | doi-access = free }}</ref> The [[pufferfish]] ''[[Takifugu]] rubripes'' genome is only about one eighth the size of the human genome, yet seems to have a comparable number of genes. Genes take up about 30% of the pufferfish genome and the coding DNA is about 10%. (Non-coding DNA = 90%.) The reduced size of the pufferfish genome is due to a reduction in the length of introns and less repetitive DNA.<ref>{{ cite journal | vauthors = Aparicio S, Chapman J, Stupka E, Putnam N, Chia JM, Dehal P, Christoffels A, Rash S, Hoon S, Smit A | date = 2002 | title = Whole-genome shotgun assembly and analysis of the genome of Fugu rubripes | journal = Science | volume = 297 | issue = 5585 | pages = 1301–1310 | doi = 10.1126/science.1072104| pmid = 12142439 | bibcode = 2002Sci...297.1301A | s2cid = 10310355 }}</ref><ref name="Ohno">{{cite journal | vauthors = Ohno S | title = So much "junk" DNA in our genome | journal = Brookhaven Symposia in Biology | volume = 23 | pages = 366–370 | date = 1972 | pmid = 5065367 | oclc = 101819442 }}</ref>
Line 17:
 
==Types of non-coding DNA sequences==
{{further|Conserved non-coding sequence}}
 
===Noncoding genes===
Line 76 ⟶ 75:
{{Main|Scaffold/matrix attachment region}}
 
Both prokaryotic and eukarotic genomes are organized into large loops of protein-bound DNA. In eukaryotes, the bases of the loops are called [[Scaffold/matrix attachment region|scaffold attachment regions]] (SARs) and they consist of stretches of DNA that bind an RNA/protein complex to stabilize the loop. There are about 100,000 loops in the human genome and each oneSAR consists of about 100 bp of DNA., Theso the total amount of DNA devoted to SARs accounts for about 0.3% of the human genome.<ref>{{cite journal | vauthors = Mistreli T | date = 2020 | title = The self-organizing genome: Principles of genome architecture and function | journal = Cell | volume = 183 | issue = 1 | pages = 28–45 | doi = 10.1016/j.cell.2020.09.014 | pmid = 32976797 | pmc = 7541718 }}</ref>
 
===Pseudogenes===
{{Main|Pseudogene}}
 
Pseudogenes are mostly former genes that have become non-functional due to mutation, but the term also refers to inactive DNA sequences that are derived from RNAs produced by functional genes ([[Pseudogene|processed pseudogenes]]). Pseudogenes are only a small fraction of noncoding DNA in prokaryotic genomes because they are eliminated by negative selection. In some eukaryotes, however, pseudogenes can accumulate because selection is not powerful enough to eliminate them (see [[Nearly neutral theory of molecular evolution]]).
 
The human genome contains about 15,000 pseudogenes derived from protein-coding genes and an unknown number derived from noncoding genes.<ref>{{ cite web | url = https://useast.ensembl.org/Homo_sapiens/Info/Annotation | title = Ensemble Human reference genome GRCh38.p13}}</ref> They may cover a substantial fraction of the genome (~5%) since many of them contain former intron sequences.
Line 92 ⟶ 91:
[[File:Bacterial mobile elements.svg|thumb|upright=1.35|[[Mobile genetic elements]] in the cell (left) and how they can be acquired (right)]]
 
[[Transposon]]s and [[retrotransposon]]s are [[mobile genetic elements]]. Retrotransposon [[Repeated sequence (DNA)|repeated sequences]], which include [[Retrotransposon#LINEs|long interspersed nuclear elements]] (LINEs) and [[Retrotransposon#SINEs|short interspersed nuclear elements]] (SINEs), account for a large proportion of the genomic sequences in many species. [[Alu sequence]]s, classified as a short interspersed nuclear element, are the most abundant mobile elements in the human genome. Some examples have been found of SINEs exerting transcriptional control of some protein-encoding genes.<ref>{{cite journal |vauthors=Ponicsan SL, Kugel JF, Goodrich JA |title=Genomic gems: SINE RNAs regulate mRNA production |journal=Current Opinion in Genetics & Development |volume=20 |issue=2 |pages=149–155 |date=April 2010 |pmid=20176473 |pmc=2859989 |doi=10.1016/j.gde.2010.01.004}}</ref><ref>{{cite journal |vauthors=Häsler J, Samuelsson T, Strub K |title=Useful 'junk': Alu RNAs in the human transcriptome |journal=Cellular and Molecular Life Sciences |volume=64 |issue=14 |pages=1793–1800 |date=July 2007 |pmid=17514354 |s2cid=5938630 |doi=10.1007/s00018-007-7084-0 |type=Submitted manuscript |url=https://archive-ouverte.unige.ch/unige:17489|pmc=11136058 }}</ref><ref>{{cite journal |vauthors=Walters RD, Kugel JF, Goodrich JA |title=InvAluable junk: the cellular impact and function of Alu and B2 RNAs |journal=IUBMB Life |volume=61 |issue=8 |pages=831–837 |date=August 2009 |pmid=19621349 |pmc=4049031 |doi=10.1002/iub.227}}</ref>
 
[[Endogenous retrovirus]] sequences are the product of [[reverse transcription]] of [[retrovirus]] genomes into the genomes of [[germ cell]]s. Mutation within these retro-transcribed sequences can inactivate the viral genome.<ref>{{cite journal | vauthors = Nelson PN, Hooley P, Roden D, Davari Ejtehadi H, Rylance P, Warren P, Martin J, Murray PG | display-authors = 6 | title = Human endogenous retroviruses: transposable elements with potential? | journal = Clinical and Experimental Immunology | volume = 138 | issue = 1 | pages = 1–9 | date = October 2004 | pmid = 15373898 | pmc = 1809191 | doi = 10.1111/j.1365-2249.2004.02592.x }}</ref>
Line 110 ⟶ 109:
Junk DNA is DNA that has no biologically relevant function such as pseudogenes and fragments of once active transposons. Bacteria and viral genomes have very little junk DNA<ref>{{cite journal | vauthors = Gil R, and Latorre A | date = 2012 | title = Factors behind junk DNA in bacteria | journal = Genes | volume = 3 | issue = 4 | pages = 634–650 | doi = 10.3390/genes3040634 | pmid = 24705080 | pmc = 3899985 | doi-access = free }}</ref><ref>{{Cite journal |last1=Brandes |first1=Nadav |last2=Linial |first2=Michal |date=2016 |title=Gene overlapping and size constraints in the viral world |journal=Biology Direct |language=en |volume=11 |issue=1 |pages=26 |doi=10.1186/s13062-016-0128-3 |pmid=27209091 |pmc=4875738 |issn=1745-6150 |doi-access=free }}</ref> but some eukaryotic genomes may have a substantial amount of junk DNA.<ref name="PalazzoGregory2014">{{cite journal | vauthors = Palazzo AF, Gregory TR | title = The case for junk DNA | journal = PLOS Genetics | volume = 10 | issue = 5 | pages = e1004351 | date = May 2014 | pmid = 24809441 | pmc = 4014423 | doi = 10.1371/journal.pgen.1004351 | doi-access = free }}</ref> The exact amount of nonfunctional DNA in humans and other species with large genomes has not been determined and there is considerable controversy in the scientific literature.<ref>{{cite journal | last = Morange | first = Michel | date = 2014 | title = Genome as a Multipurpose Structure Built by Evolution | journal = Perspectives in Biology and Medicine | volume = 57 | issue = 1 | pages = 162–171 | doi = 10.1353/pbm.2014.0008 | pmid = 25345709 | s2cid = 27613442 | url = https://hal.archives-ouvertes.fr/hal-01480552/file/ARTICLE%20ENCODE%20MM%2070114%20corrige%C2%A6%C3%BC.pdf }}</ref><ref>{{cite journal | vauthors = Haerty W, and Ponting CP | title = No Gene in the Genome Makes Sense Except in the Light of Evolution. | year = 2014 | journal = Annual Review of Genomics and Human Genetics | volume =25 | pages = 71–92 | doi = 10.1146/annurev-genom-090413-025621| pmid = 24773316 | doi-access = free }}</ref>
 
The nonfunctional DNA in bacterial genomes is mostly located in the intergenic fraction of non-coding DNA but in eukaryotic genomes it may also be found within [[introns]]. It is important to note that thereThere are many examples of functional DNA elements in non-coding DNA, and that it is erroneous to equate non-coding DNA with junk DNA.
 
==Genome-wide association studies (GWAS) and non-coding DNA==
Line 119 ⟶ 118:
 
== See also ==
*[[Conserved non-coding sequence]]
*[[Eukaryotic chromosome fine structure]]
*[[Gene-centered view of evolution]]
*[[Gene regulatory network]]
*[[Intergenic region]]
*[[Intragenomic conflict]]
*[[Phylogenetic footprinting]]
*[[Transcriptome]]
*[[Non-coding RNA]]
*[[Gene desert]]
*The [[Onion Test]]
 
== References ==