Content deleted Content added
m Open access bot: hdl updated in citation with #oabot. |
|||
(19 intermediate revisions by 12 users not shown) | |||
Line 1:
{{Short description|DNA that does not code for proteins}}
'''Non-coding DNA''' ('''ncDNA''') sequences are components of an organism's [[DNA]] that do not [[genetic code|encode]] [[protein]] sequences. Some non-coding DNA is [[Transcription (genetics)|transcribed]] into functional [[non-coding RNA]] molecules (e.g. [[transfer RNA]], [[microRNA]], [[Piwi-interacting RNA|piRNA]], [[ribosomal RNA]], and [[RNA interference|regulatory RNAs]]). Other functional regions of the non-coding DNA fraction include [[regulatory sequence]]s that control [[gene expression]]; [[scaffold attachment region]]s; [[origin of replication|origins of DNA replication]]; [[centromere]]s; and [[telomere]]s. Some non-coding regions appear to be mostly nonfunctional, such as [[introns]], [[pseudogenes]], [[intergenic DNA]], and fragments of [[transposons]] and [[viruses]]. Regions that are completely nonfunctional are called [[junk DNA]].
== Fraction of non-coding genomic DNA ==
In [[bacteria]], the [[Coding region|coding regions]] typically take up 88% of the genome.<ref name=":0" /> The remaining 12% does not encode proteins, but much of it still has biological function through [[Gene|genes]] where the RNA transcript is functional (non-coding genes) and regulatory sequences, which means that almost all of the bacterial genome has a function.<ref name=":0">{{cite journal | vauthors = Kirchberger PC, Schmidt ML, and Ochman H | date = 2020 | title = The ingenuity of bacterial genomes | journal = Annual Review of Microbiology | volume = 74 | pages = 815–834 | doi = 10.1146/annurev-micro-020518-115822| pmid = 32692614 | s2cid = 220699395 }}</ref> The amount of coding DNA in [[Eukaryote|
[[Genome size]] in eukaryotes can vary over a wide range, even between closely related species. This puzzling observation was originally known as the [[C-value
This led to the observation that the number of genes does not seem to correlate with perceived notions of complexity because the number of genes seems to be relatively constant, an issue termed the [[G-value paradox|G-value Paradox]].<ref>{{ cite journal | vauthors = Hahn MW, Wray GA | date = 2002 | title = The g-value paradox | journal = Evolution and Development | volume = 4 | issue = 2 | pages = 73–75 | doi = 10.1046/j.1525-142X.2002.01069.x| pmid = 12004964 | s2cid = 2810069 }}</ref> For example, the genome of the unicellular ''[[Polychaos dubium]]'' (formerly known as ''Amoeba dubia'') has been reported to contain more than 200 times the amount of DNA in humans (i.e. more than 600 billion [[genome size|pairs of bases]] vs a bit more than 3 billion in humans).<ref name=Gregory>{{cite journal | vauthors = Gregory TR, Hebert PD | title = The modulation of DNA content: proximate causes and ultimate consequences | journal = Genome Research | volume = 9 | issue = 4 | pages = 317–324 | date = April 1999 | pmid = 10207154 | doi = 10.1101/gr.9.4.317 | s2cid = 16791399 | doi-access = free }}</ref> The [[pufferfish]] ''[[Takifugu]] rubripes'' genome is only about one eighth the size of the human genome, yet seems to have a comparable number of genes. Genes take up about 30% of the pufferfish genome and the coding DNA is about 10%. (Non-coding DNA = 90%.) The reduced size of the pufferfish genome is due to a reduction in the length of introns and less repetitive DNA.<ref>{{ cite journal | vauthors = Aparicio S, Chapman J, Stupka E, Putnam N, Chia JM, Dehal P, Christoffels A, Rash S, Hoon S, Smit A | date = 2002 | title = Whole-genome shotgun assembly and analysis of the genome of Fugu rubripes | journal = Science | volume = 297 | issue = 5585 | pages = 1301–1310 | doi = 10.1126/science.1072104| pmid = 12142439 | bibcode = 2002Sci...297.1301A | s2cid = 10310355 }}</ref><ref name="Ohno">{{cite journal | vauthors = Ohno S | title = So much "junk" DNA in our genome | journal = Brookhaven Symposia in Biology | volume = 23 | pages = 366–370 | date = 1972 | pmid = 5065367 | oclc = 101819442 }}</ref>
Line 17:
==Types of non-coding DNA sequences==
===Noncoding genes===
Line 37 ⟶ 36:
[[Cis-regulatory element|Regulatory elements]] are sites that control the [[Transcription (genetics)|transcription]] of a nearby gene. They are almost always sequences where [[transcription factor]]s bind to DNA and these transcription factors can either activate transcription (activators) or repress transcription (repressors). Regulatory elements were discovered in the 1960s and their general characteristics were worked out in the 1970s by studying specific transcription factors in bacteria and [[bacteriophage]].{{citation needed|date=June 2022}}
Promoters and regulatory sequences represent an abundant class of noncoding DNA but they mostly consist of a collection of relatively short sequences so they
Many regulatory sequences occur near promoters, usually upstream of the transcription start site of the gene. Some occur within a gene and a few are located downstream of the transcription termination site. In eukaryotes, there are some regulatory sequences that are located at a considerable distance from the promoter region. These distant regulatory sequences are often called [[Enhancer (genetics)|enhancers]] but there is no rigorous definition of enhancer that distinguishes it from other transcription factor binding sites.<ref>{{cite journal | vauthors = Compe E, Egly JM | title = The Long Road to Understanding RNAPII Transcription Initiation and Related Syndromes | journal = Annual Review of Biochemistry | volume = 90 | pages = 193–219 | date = 2021 | doi = 10.1146/annurev-biochem-090220-112253| pmid = 34153211 | s2cid = 235595550 }}</ref><ref>{{cite journal | vauthors = Visel A, Rubin EM, Pennacchio LA | title = Genomic views of distant-acting enhancers | journal = Nature | volume = 461 | issue = 7261 | pages = 199–205 | date = September 2009 | pmid = 19741700 | pmc = 2923221 | doi = 10.1038/nature08451 | author-link3 = Len A. Pennacchio | bibcode = 2009Natur.461..199V }}</ref>
Line 48 ⟶ 47:
Introns are the parts of a gene that are transcribed into the [[precursor RNA]] sequence, but ultimately removed by [[RNA splicing]] during the processing to mature RNA. Introns are found in both types of genes: protein-coding genes and noncoding genes. They are present in prokaryotes but they are much more common in eukaryotic genomes.{{citation needed|date=June 2022}}
Group I and group II introns take up only a small percentage of the genome when they are present. Spliceosomal introns (see Figure) are only found in eukaryotes and they can represent a substantial proportion of the genome. In humans, for example, introns in protein-coding genes cover 37% of the genome. Combining that with about 1% coding sequences means that protein-coding genes occupy about 38% of the human genome. The calculations for noncoding genes are more complicated because there
===Untranslated regions===
Line 60 ⟶ 59:
DNA synthesis begins at specific sites called [[Origin of replication|origins of replication]]. These are regions of the genome where the DNA replication machinery is assembled and the DNA is unwound to begin DNA synthesis. In most cases, replication proceeds in both directions from the replication origin.
The main features of replication origins are sequences where specific initiation proteins are bound. A typical replication origin covers about 100-200 base pairs of DNA. Prokaryotes have one origin of replication per chromosome or plasmid but there are usually multiple origins in eukaryotic chromosomes. The human genome contains about 100,000 origins of replication representing about 0.3% of the genome.<ref>{{cite journal |vauthors=Leonard AC, Méchali M |title=DNA replication origins |journal=Cold Spring Harbor Perspectives in Biology |volume=5 |pages=a010116 |date=2013 |issue=10 |doi=10.1101/cshperspect.a010116|pmid=23838439 |pmc=3783049 }}</ref><ref>{{cite journal |vauthors=Urban JM, Foulk MS, Casella C, Gerbi SA |date=2015 |title=The hunt for origins of DNA replication in multicellular eukaryotes |journal=F1000Prime Reports |volume=7 |page=30 |doi=10.12703/P7-30|pmid=25926981 |pmc=4371235 |doi-access=free }}</ref><ref>{{cite journal |vauthors=Prioleau M, MacAlpine DM |date=2016 |title=DNA replication origins—where do we begin? |journal=Genes & Development |volume=30 |issue=15 |pages=1683–1697 |doi=10.1101/gad.285114.116|pmid=27542827 |pmc=5002974 }}</ref>
===Centromeres===
Line 66 ⟶ 65:
[[File:Human karyotype with bands and sub-bands.png|thumb|Schematic [[karyotype|karyogram]] of a human, showing an overview of the [[human genome]] on [[G banding]], wherein non-coding DNA is present at the centromeres (shown as narrow segment of each chromosome), and also occurs to a greater extent in darker ([[GC-content|GC poor]]) regions.<ref name=Romiguier2017>{{cite journal | vauthors = Romiguier J, Roux C | title = Analytical Biases Associated with GC-Content in Molecular Evolution | journal = Frontiers in Genetics | volume = 8 | issue = | pages = 16 | year = 2017 | pmid = 28261263 | pmc = 5309256 | doi = 10.3389/fgene.2017.00016 | doi-access = free }} </ref>]]
Centromeres are the sites where spindle fibers attach to newly replicated chromosomes in order to segregate them into daughter cells when the cell divides. Each eukaryotic chromosome has a single functional centromere that
===Telomeres===
Line 76 ⟶ 75:
{{Main|Scaffold/matrix attachment region}}
Both prokaryotic and eukarotic genomes are organized into large loops of protein-bound DNA. In eukaryotes, the bases of the loops are called [[Scaffold/matrix attachment region|scaffold attachment regions]] (SARs) and they consist of stretches of DNA that bind an RNA/protein complex to stabilize the loop. There are about 100,000 loops in the human genome and each
===Pseudogenes===
{{Main|Pseudogene}}
Pseudogenes are mostly former genes that have become non-functional due to mutation, but the term also refers to inactive DNA sequences that are derived from RNAs produced by functional genes ([[Pseudogene|processed pseudogenes]]). Pseudogenes are only a small fraction of noncoding DNA in prokaryotic genomes because they are eliminated by negative selection. In some eukaryotes, however, pseudogenes can accumulate because selection
The human genome contains about 15,000 pseudogenes derived from protein-coding genes and an unknown number derived from noncoding genes.<ref>{{ cite web | url = https://useast.ensembl.org/Homo_sapiens/Info/Annotation | title = Ensemble Human reference genome GRCh38.p13}}</ref> They may cover a substantial fraction of the genome (~5%) since many of them contain former intron sequences.
Line 92 ⟶ 91:
[[File:Bacterial mobile elements.svg|thumb|upright=1.35|[[Mobile genetic elements]] in the cell (left) and how they can be acquired (right)]]
[[Transposon]]s and [[retrotransposon]]s are [[mobile genetic elements]]. Retrotransposon [[Repeated sequence (DNA)|repeated sequences]], which include [[Retrotransposon#LINEs|long interspersed nuclear elements]] (LINEs) and [[Retrotransposon#SINEs|short interspersed nuclear elements]] (SINEs), account for a large proportion of the genomic sequences in many species. [[Alu sequence]]s, classified as a short interspersed nuclear element, are the most abundant mobile elements in the human genome. Some examples have been found of SINEs exerting transcriptional control of some protein-encoding genes.<ref>{{cite journal |vauthors=Ponicsan SL, Kugel JF, Goodrich JA |title=Genomic gems: SINE RNAs regulate mRNA production |journal=Current Opinion in Genetics & Development |volume=20 |issue=2 |pages=149–155 |date=April 2010 |pmid=20176473 |pmc=2859989 |doi=10.1016/j.gde.2010.01.004}}</ref><ref>{{cite journal |vauthors=Häsler J, Samuelsson T, Strub K |title=Useful 'junk': Alu RNAs in the human transcriptome |journal=Cellular and Molecular Life Sciences |volume=64 |issue=14 |pages=1793–1800 |date=July 2007 |pmid=17514354 |s2cid=5938630 |doi=10.1007/s00018-007-7084-0 |type=Submitted manuscript |url=https://archive-ouverte.unige.ch/unige:17489|pmc=11136058 }}</ref><ref>{{cite journal |vauthors=Walters RD, Kugel JF, Goodrich JA |title=InvAluable junk: the cellular impact and function of Alu and B2 RNAs |journal=IUBMB Life |volume=61 |issue=8 |pages=831–837 |date=August 2009 |pmid=19621349 |pmc=4049031 |doi=10.1002/iub.227}}</ref>
[[Endogenous retrovirus]] sequences are the product of [[reverse transcription]] of [[retrovirus]] genomes into the genomes of [[germ cell]]s. Mutation within these retro-transcribed sequences can inactivate the viral genome.<ref>{{cite journal | vauthors = Nelson PN, Hooley P, Roden D, Davari Ejtehadi H, Rylance P, Warren P, Martin J, Murray PG | display-authors = 6 | title = Human endogenous retroviruses: transposable elements with potential? | journal = Clinical and Experimental Immunology | volume = 138 | issue = 1 | pages = 1–9 | date = October 2004 | pmid = 15373898 | pmc = 1809191 | doi = 10.1111/j.1365-2249.2004.02592.x }}</ref>
Over 8% of the human genome is made up of (mostly decayed) endogenous retrovirus sequences, as part of the over 42% fraction that is recognizably derived of retrotransposons, while another 3% can be identified to be the remains of [[Transposon#DNA transposons|DNA transposon]]s. Much of the remaining half of the genome that is currently without an explained origin is expected to have found its origin in transposable elements that were active so long ago (> 200 million years) that random mutations have rendered them unrecognizable.<ref name=humangenome>{{cite journal | vauthors = Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W, Funke R, Gage D, Harris K, Heaford A, Howland J, Kann L, Lehoczky J, LeVine R, McEwan P, McKernan K, Meldrim J, Mesirov JP, Miranda C, Morris W, Naylor J, Raymond C, Rosetti M, Santos R, Sheridan A, Sougnez C, Stange-Thomann Y, Stojanovic N, Subramanian A, Wyman D, Rogers J, Sulston J, Ainscough R, Beck S, Bentley D, Burton J, Clee C, Carter N, Coulson A, Deadman R, Deloukas P, Dunham A, Dunham I, Durbin R, French L, Grafham D, Gregory S, Hubbard T, Humphray S, Hunt A, Jones M, Lloyd C, McMurray A, Matthews L, Mercer S, Milne S, Mullikin JC, Mungall A, Plumb R, Ross M, Shownkeen R, Sims S, Waterston RH, Wilson RK, Hillier LW, McPherson JD, Marra MA, Mardis ER, Fulton LA, Chinwalla AT, Pepin KH, Gish WR, Chissoe SL, Wendl MC, Delehaunty KD, Miner TL, Delehaunty A, Kramer JB, Cook LL, Fulton RS, Johnson DL, Minx PJ, Clifton SW, Hawkins T, Branscomb E, Predki P, Richardson P, Wenning S, Slezak T, Doggett N, Cheng JF, Olsen A, Lucas S, Elkin C, Uberbacher E, Frazier M, Gibbs RA, Muzny DM, Scherer SE, Bouck JB, Sodergren EJ, Worley KC, Rives CM, Gorrell JH, Metzker ML, Naylor SL, Kucherlapati RS, Nelson DL, Weinstock GM, Sakaki Y, Fujiyama A, Hattori M, Yada T, Toyoda A, Itoh T, Kawagoe C, Watanabe H, Totoki Y, Taylor T, Weissenbach J, Heilig R, Saurin W, Artiguenave F, Brottier P, Bruls T, Pelletier E, Robert C, Wincker P, Smith DR, Doucette-Stamm L, Rubenfield M, Weinstock K, Lee HM, Dubois J, Rosenthal A, Platzer M, Nyakatura G, Taudien S, Rump A, Yang H, Yu J, Wang J, Huang G, Gu J, Hood L, Rowen L, Madan A, Qin S, Davis RW, Federspiel NA, Abola AP, Proctor MJ, Myers RM, Schmutz J, Dickson M, Grimwood J, Cox DR, Olson MV, Kaul R, Raymond C, Shimizu N, Kawasaki K, Minoshima S, Evans GA, Athanasiou M, Schultz R, Roe BA, Chen F, Pan H, Ramser J, Lehrach H, Reinhardt R, McCombie WR, de la Bastide M, Dedhia N, Blöcker H, Hornischer K, Nordsiek G, Agarwala R, Aravind L, Bailey JA, Bateman A, Batzoglou S, Birney E, Bork P, Brown DG, Burge CB, Cerutti L, Chen HC, Church D, Clamp M, Copley RR, Doerks T, Eddy SR, Eichler EE, Furey TS, Galagan J, Gilbert JG, Harmon C, Hayashizaki Y, Haussler D, Hermjakob H, Hokamp K, Jang W, Johnson LS, Jones TA, Kasif S, Kaspryzk A, Kennedy S, Kent WJ, Kitts P, Koonin EV, Korf I, Kulp D, Lancet D, Lowe TM, McLysaght A, Mikkelsen T, Moran JV, Mulder N, Pollara VJ, Ponting CP, Schuler G, Schultz J, Slater G, Smit AF, Stupka E, Szustakowki J, Thierry-Mieg D, Thierry-Mieg J, Wagner L, Wallis J, Wheeler R, Williams A, Wolf YI, Wolfe KH, Yang SP, Yeh RF, Collins F, Guyer MS, Peterson J, Felsenfeld A, Wetterstrand KA, Patrinos A, Morgan MJ, de Jong P, Catanese JJ, Osoegawa K, Shizuya H, Choi S, Chen YJ, Szustakowki J | display-authors = 6 | title = Initial sequencing and analysis of the human genome | journal = Nature | volume = 409 | issue = 6822 | pages = 860–921 | date = February 2001 | pmid = 11237011 | doi = 10.1038/35057062 | doi-access = free | bibcode = 2001Natur.409..860L | hdl = 2027.42/62798 | hdl-access = free }}</ref> Genome size variation in at least two kinds of plants is mostly the result of retrotransposon sequences.<ref>{{cite journal | vauthors = Piegu B, Guyot R, Picault N, Roulin A, Sanyal A, Saniyal A, Kim H, Collura K, Brar DS, Jackson S, Wing RA, Panaud O | display-authors = 6 | title = Doubling genome size without polyploidization: dynamics of retrotransposition-driven genomic expansions in Oryza australiensis, a wild relative of rice | journal = Genome Research | volume = 16 | issue = 10 | pages = 1262–1269 | date = October 2006 | pmid = 16963705 | pmc = 1581435 | doi = 10.1101/gr.5290206 }}</ref><ref>{{cite journal | vauthors = Hawkins JS, Kim H, Nason JD, Wing RA, Wendel JF | title = Differential lineage-specific amplification of transposable elements is responsible for genome size variation in Gossypium | journal = Genome Research | volume = 16 | issue = 10 | pages = 1252–1261 | date = October 2006 | pmid = 16954538 | pmc = 1581434 | doi = 10.1101/gr.5282906 }}</ref>
===Highly repetitive DNA===
Line 108 ⟶ 107:
===Junk DNA===
{{Main|Junk DNA}}
Junk DNA is DNA that has no biologically relevant function such as pseudogenes and fragments of once active transposons. Bacteria and viral genomes have very little junk DNA<ref>{{cite journal | vauthors = Gil R, and Latorre A | date = 2012 | title = Factors behind junk DNA in bacteria | journal = Genes | volume = 3 | issue = 4 | pages = 634–650 | doi = 10.3390/genes3040634 | pmid = 24705080 | pmc = 3899985 | doi-access = free }}</ref><ref>{{Cite journal |last1=Brandes |first1=Nadav |last2=Linial |first2=Michal |date=2016 |title=Gene overlapping and size constraints in the viral world |journal=Biology Direct |language=en |volume=11 |issue=1 |pages=26 |doi=10.1186/s13062-016-0128-3 |pmid=27209091 |pmc=4875738 |issn=1745-6150 |doi-access=free }}</ref> but some eukaryotic genomes may have a substantial amount of junk DNA.<ref name="PalazzoGregory2014">{{cite journal | vauthors = Palazzo AF, Gregory TR | title = The case for junk DNA | journal = PLOS Genetics | volume = 10 | issue = 5 | pages = e1004351 | date = May 2014 | pmid = 24809441 | pmc = 4014423 | doi = 10.1371/journal.pgen.1004351 | doi-access = free }}</ref> The exact amount of nonfunctional DNA in humans and other species with large genomes has not been determined and there is considerable controversy in the scientific literature.<ref>{{cite journal | last = Morange | first = Michel | date = 2014 | title = Genome as a Multipurpose Structure Built by Evolution | journal = Perspectives in Biology and Medicine | volume = 57 | issue = 1 | pages = 162–171 | doi = 10.1353/pbm.2014.0008 | pmid = 25345709 | s2cid = 27613442 | url = https://hal.archives-ouvertes.fr/hal-01480552/file/ARTICLE%20ENCODE%20MM%2070114%20corrige%C2%A6%C3%BC.pdf }}</ref><ref>{{cite journal | vauthors = Haerty W, and Ponting CP | title = No Gene in the Genome Makes Sense Except in the Light of Evolution. | year = 2014 | journal = Annual Review of Genomics and Human Genetics | volume =25 | pages = 71–92 | doi = 10.1146/annurev-genom-090413-025621| pmid = 24773316 | doi-access = free }}</ref>
The nonfunctional DNA in bacterial genomes is mostly located in the intergenic fraction of non-coding DNA but in eukaryotic genomes it may also be found within [[introns]].
==Genome-wide association studies (GWAS) and non-coding DNA==
[[Genome-wide association studies]] (GWAS) identify linkages between alleles and observable traits such as phenotypes and diseases. Most of the associations are between [[single-nucleotide polymorphisms]] (SNPs) and the trait being examined and most of these SNPs are located in non-functional DNA. The association establishes a linkage that helps map the DNA region responsible for the trait but it
SNPs that are tightly linked to traits are the ones most likely to identify a causal mutation. (The association is referred to as tight [[linkage disequilibrium]].) About 12% of these polymorphisms are found in coding regions; about 40% are located in introns; and most of the rest are found in intergenic regions, including regulatory sequences.<ref name=Manolio/>
== See also ==
*[[Non-coding RNA]]
== References ==
Line 138 ⟶ 127:
* {{cite book | vauthors = Bennett MD, Leitch IJ | year = 2005 | chapter = Genome size evolution in plants |chapter-url=https://books.google.com/books?id=8HtPZP9VSiMC&pg=PA89 | title = The Evolution of the Genome | veditors = Gregory RT | publisher = Elsevier | ___location = San Diego | pages = 89–162 |isbn=978-0-08-047052-8}}
* {{cite book |doi=10.1016/B978-012301463-4/50003-6 |chapter=Genome Size Evolution in Animals |title=The Evolution of the Genome |year=2005 | vauthors = Gregory TR |pages=3–87 |isbn=978-0-12-301463-4 }}
* {{cite journal | vauthors = Shabalina SA, Spiridonov NA | title = The mammalian transcriptome and the function of non-coding DNA sequences | journal = Genome Biology | volume = 5 | issue = 4 | pages = 105 | year = 2004 | pmid = 15059247 | pmc = 395773 | doi = 10.1186/gb-2004-5-4-105 | doi-access = free }}
* {{cite journal | vauthors = Castillo-Davis CI | title = The evolution of noncoding DNA: how much junk, how much func? | journal = Trends in Genetics | volume = 21 | issue = 10 | pages = 533–536 | date = October 2005 | pmid = 16098630 | doi = 10.1016/j.tig.2005.08.001 }}
{{Refend}}
|