Non-coding DNA: Difference between revisions

Content deleted Content added
short desc edit
Tags: Mobile edit Mobile web edit Advanced mobile edit
Melousina (talk | contribs)
m Added links
Line 1:
{{Short description|DNA that does not code for proteins}}
 
'''Non-coding DNA''' ('''ncDNA''') sequences are components of an organism's [[DNA]] that do not [[genetic code|encode]] [[protein]] sequences. Some non-coding DNA is [[Transcription (genetics)|transcribed]] into functional [[non-coding RNA]] molecules (e.g. [[transfer RNA]], [[microRNA]], [[Piwi-interacting RNA|piRNA]], [[ribosomal RNA]], and [[RNA interference|regulatory RNAs]]). Other functional regions of the non-coding DNA fraction include [[regulatory sequence]]s that control [[gene expression]]; [[scaffold attachment region]]s; [[origin of replication|origins of DNA replication]]; [[centromere]]s; and [[telomere]]s. Some non-coding regions appear to be mostly nonfunctional such as [[introns]], [[pseudogenes]], [[intergenic DNA]], and fragments of [[transposons]] and [[viruses]].
 
== Fraction of non-coding genomic DNA ==
In [[bacteria]], the [[Coding region|coding regions]] typically take up 88% of the genome.<ref name=":0" /> The remaining 12% does not encode proteins, but much of it still has biological function through [[Gene|genes]] where the RNA transcript is functional (non-coding genes) and regulatory sequences, which means that almost all of the bacterial genome has a function.<ref name=":0">{{cite journal | vauthors = Kirchberger PC, Schmidt ML, and Ochman H | date = 2020 | title = The ingenuity of bacterial genomes | journal = Annual Review of Microbiology | volume = 74 | pages = 815–834 | doi = 10.1146/annurev-micro-020518-115822| pmid = 32692614 | s2cid = 220699395 }}</ref> The amount of coding DNA in [[Eukaryote|eukaryrotes]] is usually a much smaller fraction of the genome because eukaryotic genomes contain large amounts of repetitive DNA not found in prokaryotes. The [[human genome]] contains somewhere between 1–2% coding DNA.<ref name = Piovesan/><ref>{{ cite journal | vauthors = Omenn GS | date = 2021 | title = Reflections on the HUPO Human Proteome Project, the Flagship Project of the Human Proteome Organization, at 10 Years | journal = Molecular & Cellular Proteomics | volume = 20 | pages = 100062 | doi = 10.1016/j.mcpro.2021.100062| pmid = 33640492 | pmc = 8058560 }}</ref> The exact number is not known because there are disputes over the number of functional coding [[Exon|exons]] and over the total size of the human genome. This means that 98–99% of the human genome consists of non-coding DNA and this includes many functional elements such as non-coding genes and regulatory sequences.
 
[[Genome size]] in eukaryotes can vary over a wide range, even between closely related sequences. This puzzling observation was originally known as the [[C-value | C-value Paradox]] where "C" refers to the haploid genome size.<ref>{{cite journal | vauthors = Thomas CA | title = The genetic organization of chromosomes | journal = Annual Review of Genetics | volume = 5 | pages = 237–256 | date = 1971 | pmid = 16097657 | doi = 10.1146/annurev.ge.05.120171.001321 }}</ref> The paradox was resolved with the discovery that most of the differences were due to the expansion and contraction of repetitive DNA and not the number of genes. Some researchers speculated that this repetitive DNA was mostly [[junk DNA]]. The reasons for the changes in genome size are still being worked out and this problem is called the C-value Enigma.<ref>{{ cite journal | vauthors = Elliott TA, Gregory TR | date = 2015 | title = What's in a genome? The C-value enigma and the evolution of eukaryotic genome content | journal = Phil. Trans. R. Soc. B | volume = 370 | issue = 1678 | pages = 20140331 | doi = 10.1098/rstb.2014.0331| pmid = 26323762 | pmc = 4571570 | s2cid = 12095046 }}</ref>
 
This led to the observation that the number of genes does not seem to correlate with perceived notions of complexity because the number of genes seems to be relatively constant, an issue termed the [[G-value paradox|G-value Paradox]].<ref>{{ cite journal | vauthors = Hahn MW, Wray GA | date = 2002 | title = The g-value paradox | journal = Evolution and Development | volume = 4 | issue = 2 | pages = 73–75 | doi = 10.1046/j.1525-142X.2002.01069.x| pmid = 12004964 | s2cid = 2810069 }}</ref> For example, the genome of the unicellular ''[[Polychaos dubium]]'' (formerly known as ''Amoeba dubia'') has been reported to contain more than 200 times the amount of DNA in humans (i.e. more than 600 billion [[genome size|pairs of bases]] vs a bit more than 3 billion in humans).<ref name=Gregory>{{cite journal | vauthors = Gregory TR, Hebert PD | title = The modulation of DNA content: proximate causes and ultimate consequences | journal = Genome Research | volume = 9 | issue = 4 | pages = 317–324 | date = April 1999 | pmid = 10207154 | doi = 10.1101/gr.9.4.317 | s2cid = 16791399 | doi-access = free }}</ref> The [[pufferfish]] ''[[Takifugu]] rubripes'' genome is only about one eighth the size of the human genome, yet seems to have a comparable number of genes. Genes take up about 30% of the pufferfish genome and the coding DNA is about 10%. (Non-coding DNA = 90%.) The reduced size of the pufferfish genome is due to a reduction in the length of introns and less repetitive DNA.<ref>{{ cite journal | vauthors = Aparicio S, Chapman J, Stupka E, Putnam N, Chia JM, Dehal P, Christoffels A, Rash S, Hoon S, Smit A | date = 2002 | title = Whole-genome shotgun assembly and analysis of the genome of Fugu rubripes | journal = Science | volume = 297 | issue = 5585 | pages = 1301–1310 | doi = 10.1126/science.1072104| pmid = 12142439 | bibcode = 2002Sci...297.1301A | s2cid = 10310355 }}</ref><ref name="Ohno">{{cite journal | vauthors = Ohno S | title = So much "junk" DNA in our genome | journal = Brookhaven Symposia in Biology | volume = 23 | pages = 366–370 | date = 1972 | pmid = 5065367 | oclc = 101819442 }}</ref>
 
''[[Utricularia gibba]]'', a [[bladderwort]] plant, has a very small [[nuclear genome]] (100.7 Mb) compared to most plants.<ref name = Ibarra-Laclette>{{ cite journal | vauthors = Ibarra-Laclette E, Lyons E, Hernández-Guzmán G, Pérez-Torres CA, Carretero-Paulet L, Chang TH, Lan T, Welch AJ, Juárez MJ, Simpson J, etal | date = 2013 | title = Architecture and evolution of a minute plant genome | journal = Nature | volume = 498 | issue = 7452 | pages = 94–98 | doi = 10.1038/nature12132| pmid = 23665961 | pmc = 4972453 | bibcode = 2013Natur.498...94I | s2cid = 18219754 }}</ref><ref name = Lan>{{ cite journal | vauthors = Lan T, Renner T, Ibarra-Laclette E, Farr KM, Chang TH, Cervantes-Pérez SA, Zheng C, Sankoff D, Tang H, and Purbojati RW | date = 2017 | title = Long-read sequencing uncovers the adaptive topography of a carnivorous plant genome | journal = Proceedings of the National Academy of Sciences | volume = 114 | issue = 22 | pages = E4435–E4441 | doi = 10.1073/pnas.1702072114| pmid = 28507139 | pmc = 5465930 | bibcode = 2017PNAS..114E4435L | doi-access = free }}</ref> It likely evolved from an ancestral genome that was 1,500 Mb in size.<ref name = Lan/> The bladderwort genome has roughly the same number of genes as other plants but the total amount of coding DNA comes to about 30% of the genome.<ref name = Ibarra-Laclette/><ref name="Lan"/>
 
The remainder of the genome (70% non-coding DNA) consists of [[Promoter (genetics)|promoters]] and regulatory sequences that are shorter than those in other plant species.<ref name = Ibarra-Laclette/> The genes contain introns but there are fewer of them and they are smaller than the introns in other plant genomes.<ref name = Ibarra-Laclette/> There are noncoding genes, including many copies of ribosomal RNA genes.<ref name = Lan/> The genome also contains telomere sequences and centromeres as expected.<ref name = Lan/> Much of the repetitive DNA seen in other eukaryotes has been deleted from the bladderwort genome since that lineage split from those of other plants. About 59% of the bladderwort genome consists of transposon-related sequences but since the genome is so much smaller than other genomes, this represents a considerable reduction in the amount of this DNA.<ref name = Lan/> The authors of the original 2013 article note that claims of additional functional elements in the non-coding DNA of animals do not seem to apply to plant genomes.<ref name = Ibarra-Laclette/>
 
According to a New York Times piece, during the evolution of this species, "... genetic junk that didn't serve a purpose was expunged, and the necessary stuff was kept."<ref>{{cite news | vauthors = Klein J | title = Genetic Tidying Up Made Humped Bladderworts Into Carnivorous Plants | url = https://www.nytimes.com/2017/05/19/science/humped-bladderwort-carnivorous-plant-genome.html | work = New York Times | date = 19 May 2017 | access-date = May 30, 2022}}</ref> According to Victor Albert of the University of Buffalo, the plant is able to expunge its so-called junk DNA and "have a perfectly good multicellular plant with lots of different cells, organs, tissue types and flowers, and you can do it without the junk. Junk is not needed."<ref>{{ cite press release | vauthors = Hsu C, and Stolte D | date = May 13, 2013 | title = Carnivorous Plant Throws Out 'Junk' DNA | url = https://news.arizona.edu/story/carnivorous-plant-throws-out-junk-dna | ___location = Tucson, AZ, USA | publisher = University of Arizona | access-date = May 29, 2022}}</ref>
Line 22:
{{See also|Non-coding RNA}}
 
There are [[Gene|two types of genes]]: protein coding genes and [[Non-coding RNA|noncoding genes]].<ref>{{cite book | vauthors = Kampourakis K | date = 2017 | title = Making sense of genes | publisher = Cambridge University Press | place = Cambridge UK | isbn = 978-1-107-12813-2}}{{page needed|date=June 2022}}</ref> Noncoding genes are an important part of non-coding DNA and they include genes for [[transfer RNA]] and [[ribosomal RNA]]. These genes were discovered in the 1960s. [[Prokaryote|Prokaryotic]] genomes contain genes for a number of other noncoding RNAs but noncoding RNA genes are much more common in eukaryotes.
 
Typical classes of noncoding genes in eukaryotes include genes for [[small nuclear RNA]]s (snRNAs), [[small nucleolar RNA]]s (sno RNAs), [[microRNA]]s (miRNAs), [[Small interfering RNA|short interfering RNAs]] (siRNAs), [[Piwi-interacting RNA|PIWI-interacting RNAs]] (piRNAs), and [[Long non-coding RNA|long noncoding RNAs]] (lncRNAs). In addition, there are a number of unique RNA genes that produce [[Catalytic RNA|catalytic RNAs]].<ref>{{cite journal | vauthors=Cech TR, Steitz JA | title=The Noncoding RNA Revolution - Trashing Old Rules to Forge New Ones | journal=Cell|volume=157|pages=77–94|date=2014| issue=1 | doi=10.1016/j.cell.2014.03.008 | pmid=24679528 | s2cid=14852160 | doi-access=free }}</ref>
 
Noncoding genes account for only a few percent of prokaryotic genomes<ref>{{cite journal | vauthors = Rogozin IB, Makarova KS, Natale DA, Spiridonov AN, Tatusov RL, Wolf YI, Yin J, Koonin EV | display-authors = 6 | title = Congruent evolution of different classes of non-coding DNA in prokaryotic genomes | journal = Nucleic Acids Research | volume = 30 | issue = 19 | pages = 4264–4271 | date = October 2002 | pmid = 12364605 | pmc = 140549 | doi = 10.1093/nar/gkf549 }}</ref> but they can represent a vastly higher fraction in eukaryotic genomes.<ref>{{cite book |doi=10.1016/B978-0-12-800049-6.00171-2 |chapter=Adaptive Molecular Evolution: Detection Methods |title=Encyclopedia of Evolutionary Biology |year=2016 | vauthors = Bielawski JP, Jones C |pages=16–25 |isbn=978-0-12-800426-5 }}</ref> In humans, the noncoding genes take up at least 6% of the genome, largely because there are hundreds of copies of ribosomal RNA genes.{{citation needed|date=May 2022}} Protein-coding genes occupy about 38% of the genome; a fraction that is much higher than the coding region because genes contain large introns.{{citation needed|date=May 2022}}
Line 33:
{{Main|Promoter (genetics)}}
 
[[promoter (biology)|Promoter]]sPromoters are DNA segments near the 5' end of the gene where transcription begins. They are the sites where [[RNA polymerase]] binds to initiate RNA synthesis. Every gene has a noncoding promoter.
 
[[Cis-regulatory element|Regulatory elements]] are sites that control the [[Transcription (genetics)|transcription]] of a nearby gene. They are almost always sequences where [[transcription factor]]s bind to DNA and these transcription factors can either activate transcription (activators) or repress transcription (repressors). Regulatory elements were discovered in the 1960s and their general characteristics were worked out in the 1970s by studying specific transcription factors in bacteria and [[bacteriophage]].{{citation needed|date=June 2022}}
 
Promoters and regulatory sequences represent an abundant class of noncoding DNA but they mostly consist of a collection of relatively short sequences so they don't take up a very large fraction of the genome. The exact amount of regulatory DNA in mammalian genome is unclear because it is difficult to distinguish between spurious transcription factor binding sites and those that are functional. The binding characteristics of typical [[DNA-binding protein]]s were characterized in the 1970s and the biochemical properties of transcription factors predict that in cells with large genomes the majority of binding sites will be fortuitous and not biologiacally functional.{{citation needed|date=June 2022}}
Line 53:
{{Main|Untranslated region}}
 
The standard biochemistry and molecular biology textbooks describe non-coding [[Nucleotide|nucleotides]] in mRNA located between the 5' end of the gene and the translation initiation codon. These regions are called 5'-untranslated regions or 5'-UTRs. Similar regions called 3'-untranslated regions (3'-UTRs) are found at the end of the gene. The 5'-UTRs and 3'UTRs are very short in bacteria but they can be several hundred nucleotides in length in eukaryotes. They contain short elements that control the initiation of translation (5'-UTRs) and transcription termination (3'-UTRs) as well as regulatory elements that may control mRNA stability, processing, and targeting to different regions of the cell.<ref>{{cite book | vauthors = Alberts B, Bray D, Lewis J, Raff M, Roberts K, Watson JD | date = 1994 | title = Molecular Biology of the Cell, 3rd edition | publisher = Garland Publishing Inc. | place = London, UK}}{{page needed|date=June 2022}}</ref><ref>{{ cite book | vauthors = Lewin B | date = 2004 | title = Genes VIII | publisher = Pearson/Prentice Hall | place = Upper Saddle River, NJ, USA}}{{page needed|date=June 2022}}</ref><ref>{{ cite book | vauthors = Moran L, Horton HR, Scrimgeour KG, Perry MD | date = 2012 | title = Principles of Biochemistry Fifth Edition | publisher = Pearson | place = Upper Saddle River, NJ, USA}}{{page needed|date=June 2022}}</ref>
 
===Origins of replication===
Line 71:
{{Main|Telomere}}
 
Telomeres are regions of repetitive DNA at the end of a [[chromosome]], which provide protection from chromosomal deterioration during [[DNA replication]]. Recent studies have shown that telomeres function to aid in its own stability. [[Telomeric Repeat-Containing RNA (TERRA)|Telomeric repeat-containing RNA (TERRA)]] are transcripts derived from telomeres. TERRA has been shown to maintain telomerase activity and lengthen the ends of chromosomes.<ref>{{cite journal | vauthors = Cusanelli E, Chartrand P | title = Telomeric noncoding RNA: telomeric repeat-containing RNA in telomere biology | journal = Wiley Interdisciplinary Reviews. RNA | volume = 5 | issue = 3 | pages = 407–419 | date = May 2014 | pmid = 24523222 | doi = 10.1002/wrna.1220 | s2cid = 36918311 }}</ref>
 
===Scaffold attachment regions===