Coding region: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 19:37, 26 January 2020 edit Rjwilmsi (talk \| contribs) Extended confirmed users, Pending changes reviewers, Rollbackers 933,673 edits →Regulation: Added 1 doi to a journal cite Tag: AWB ← Previous edit		Latest revision as of 23:01, 22 July 2025 edit undo Citation bot (talk \| contribs) Bots 5,872,790 edits Added article-number. Removed parameters. Some additions/deletions were parameter name changes. \| Use this bot. Report bugs. \| Suggested by Abductive \| Category:Biochemistry \| #UCB_Category 20/248
(46 intermediate revisions by 26 users not shown)
Line 1: {{short description\|Portion of gene's sequence which codes for protein}} ~~{{more citations needed\|date=August 2018}}~~ The '''coding region''' of a [[gene]], also known as the '''~~CDS~~coding DNA sequence''' (~~from~~ ''~~coding sequence~~'CDS'''), is the portion of a gene's [[DNA]] or [[RNA]] that codes for a [[protein]].<ref name=":12">{{cite web\|url=http://genome.wellcome.ac.uk/doc_WTD020755.html\|title=Gene Structure\|last=Twyman\|first=Richard\|date=1 August 2003\|publisher=The Wellcome Trust\|url-status=dead\|archive-url=https://web.archive.org/web/20070328214808/http://genome.wellcome.ac.uk/doc_WTD020755.html\|archive-date=28 March 2007\|access-date=6 April 2003}}</ref> Studying the length, composition, regulation, splicing, structures, and functions of coding regions compared to non-coding regions over different species and time periods can provide a significant amount of important information regarding gene organization and evolution of [[prokaryote]]s and [[eukaryote]]s.<ref>{{Cite journal \| vauthors = Höglund M, Säll T, Röhme D \|date=February 1990\|title=On the origin of coding sequences from random open reading frames\|journal=Journal of Molecular Evolution\|volume=30\|issue=2\|pages=104–108\|doi=10.1007/bf02099936\|issn=0022-2844\|bibcode=1990JMolE..30..104H\|s2cid=5978109}}</ref> This can further assist in mapping the [[Human Genome Project\|human genome]] and developing gene therapy.<ref>{{cite journal \| vauthors = Sakharkar MK, Chow VT, Kangueane P \| title = Distributions of exons and introns in the human genome \| journal = In Silico Biology \| volume = 4 \| issue = 4 \| pages = 387–93 \| date = 2004 \| doi = 10.3233/ISB-00142 \| pmid = 15217358 }}</ref> == Definition == Although this term is also sometimes used interchangeably with [[exon]], it is not the exact same thing: the [[exon]] iscan be composed of the coding region as well as the 3' and 5' [[untranslated region]]s of the RNA, and so therefore, an exon would be partially made up of coding ~~regions~~region. The 3' and 5' [[untranslated region]]s of the RNA, which do not code for protein, are termed [[Non-coding region\|non-coding]] regions and are not discussed on this page.<ref>{{~~Citation~~Cite book\|last=Parnell\|first=Laurence D.\|chapter=Advances in Technologies and Study Design\|date=2012-01-01\|chapter-url=http://www.sciencedirect.com/science/article/pii/B9780123983978000022\|journal=Progress in Molecular Biology and Translational Science\|volume=108\|pages=17–50\|editor-last=Bouchard\|editor-first=C.~~\|series=Recent Advances in Nutrigenetics and Nutrigenomics~~\|publisher=Academic Press\|access-date=2019-11-07\|editor2-last=Ordovas\|editor2-first=J. M.\|doi=10.1016/B978-0-12-398397-8.00002-2\|pmid=22656372\|title=Recent Advances in Nutrigenetics and Nutrigenomics\|isbn=9780123983978}}</ref> There is often confusion between coding regions and [[exome]]s and there is a clear distinction between these terms. While the [[exome]] refers to all exons within a genome, the coding region refers to ~~a singular section~~sections of the DNA (or ~~RNA~~[[primary transcript]]) or a singular section of processed mRNA which specifically codes for a certain kind of protein.   == History == In 1978, [[Walter Gilbert]] published "Why Genes in Pieces" which first began to explore the idea that the gene is a mosaic—that each full [[nucleic acid]] strand is not coded continuously but is interrupted by "silent" non-coding regions. This was the first indication that there needed to be a distinction between the parts of the genome that code for protein, now called coding regions, and those that do not.<ref>{{cite journal \| vauthors = Gilbert W \| title = Why genes in pieces? \| journal = Nature \| volume = 271 \| issue = 5645 \| pages = 501 \| date = February 1978 \| pmid = 622185 \| doi = 10.1038/271501a0 \| bibcode = 1978Natur.271..501G \| s2cid = 4216649 \| doi-access = free }}</ref> == Composition == [[File:Transitions-transversions.png\|thumb\|286x286px\|'''Point mutation types:''' transitions (blue) are elevated compared to transversions (red) in GC-rich coding regions.~~<ref>(n.d.). Retrieved from <nowiki>https://www.differencebetween.com/wp-content/uploads/2017/03/Difference-Between-Transition-and-Transversion-3.png</nowiki></ref>~~]] The evidence suggests that there is a general interdependence between base composition patterns and coding region availability.<ref>{{cite journal \| vauthors = Lercher MJ, Urrutia AO, Pavlícek A, Hurst LD \| title = A unification of mosaic structures in the human genome \| journal = Human Molecular Genetics \| volume = 12 \| issue = 19 \| pages = 2411–5 \| date = October 2003 \| pmid = 12915446 \| doi = 10.1093/hmg/ddg251 \| doi-access = free }}</ref> The coding region is thought to contain a higher [[GC-content]] than non-coding regions. There is further research that discovered that the longer the coding strand, the higher the GC-content. Short coding strands are comparatively still GC-poor, similar to the low GC-content of the base composition translational [[stop codon]]s like TAG, TAA, and TGA.<ref>{{cite journal \| vauthors = Oliver JL, Marín A \| title = A relationship between GC content and coding-sequence length \| journal = Journal of Molecular Evolution \| volume = 43 \| issue = 3 \| pages = 216–23 \| date = September 1996 \| pmid = 8703087 \| doi = 10.1007/pl00006080 \| bibcode = 1996JMolE..43..216O }}</ref> GC-rich areas are also where the ratio [[point mutation]] type is altered slightly: there are more [[Transition (genetics)\|transitions]], which are changes from purine to purine or pyrimidine to pyrimidine, compared to [[transversion]]s, which are changes from purine to pyrimidine or pyrimidine to purine. The transitions are less likely to change the encoded amino acid and remain a [[silent mutation]] (especially if they occur in the third [[nucleotide]] of a codon) which is usually beneficial to the organism during translation and protein formation.<ref>{{Cite web\|url=http://rosalind.info/glossary/gene-coding-region/\|title=ROSALIND {{!}} Glossary {{!}} Gene coding region\|website=rosalind.info\|access-date=2019-10-31}}</ref> This indicates that essential coding regions (gene-rich) are higher in GC-content and more stable and resistant to [[mutation]] compared to accessory and non-essential regions (gene-poor).<ref>{{cite journal \| vauthors = Vinogradov AE \| title = DNA helix: the importance of being GC-rich \| journal = Nucleic Acids Research \| volume = 31 \| issue = 7 \| pages = 1838–44 \| date = April 2003 \| pmid = 12654999 \| pmc = 152811 \| doi = 10.1093/nar/gkg296 }}</ref> However, it is still unclear whether this came about through neutral and random mutation or through a pattern of [[Natural selection\|selection]].<ref>{{cite journal \| vauthors = Bohlin J, Eldholm V, Pettersson JH, Brynildsrud O, Snipen L \| title = The nucleotide composition of microbial genomes indicates differential patterns of selection on core and accessory genomes \| journal = BMC Genomics \| volume = 18 \| issue = 1 \| ~~pages~~article-number = 151 \| date = February 2017 \| pmid = 28187704 \| pmc = 5303225 \| doi = 10.1186/s12864-017-3543-7 \| doi-access = free }}</ref> There is also debate on whether the methods used, such as gene windows, to ascertain the relationship between GC-content and coding region are accurate and unbiased.<ref>{{cite journal \| vauthors = Sémon M, Mouchiroud D, Duret L \| title = Relationship between gene expression and GC-content in mammals: statistical significance and biological relevance \| journal = Human Molecular Genetics \| volume = 14 \| issue = 3 \| pages = 421–7 \| date = February 2005 \| pmid = 15590696 \| doi = 10.1093/hmg/ddi038 \| doi-access = free }}</ref> == Structure and ~~Function~~function == [[File:Coding Region in DNA.png\|thumb\|398x398px\|'''Transcription''': RNA Polymerase (RNAP) uses a template DNA strand and begins coding at the promoter sequence (green) and ends at the terminator sequence (red) in order to encompass the entire coding region into the ~~product~~ pre-mRNA (teal). The pre-mRNA is polymerised 5' to 3' and the template DNA read 3' to 5']] [[File:Transcription label en.jpg\|thumb\|An electron-micrograph of DNA strands decorated by hundreds of RNAP molecules too small to be resolved. Each RNAP is transcribing an RNA strand, which can be seen branching off from the DNA. "Begin" indicates the 3' end of the DNA, where RNAP initiates transcription; "End" indicates the 5' end, where the longer RNA molecules are completely transcribed.]] In [[DNA]], the coding region is flanked by the [[Promoter (genetics)\|promoter sequence]] on the 35' end of the [[template strand]] and the termination sequence on the 53' end. During [[Transcription (biology)\|transcription]], the [[RNA Polymerase\|RNA Polymerase (RNAP)]] binds to the promoter sequence and moves along the template strand to the coding region. RNAP then adds RNA [[nucleotide]]s complementary to the coding region in order to form the [[mRNA]], substituting [[uracil]] in place of [[thymine]].<ref name=":2">Overview of transcription. (n.d.). Retrieved from ~~<nowiki>~~https://www.khanacademy.org/science/biology/gene-expression-central-dogma/transcription-of-dna-into-rna/a/overview-of-transcription~~</nowiki>~~ .</ref> This continues until the RNAP reaches the termination sequence.<ref name=":2" /> After transcription and maturation, the [[mature mRNA]] formed encompasses multiple parts important for its eventual translation into [[protein]]. The coding region in an mRNA is flanked by the [[Five prime untranslated region\|5' untranslated region]] (5'-UTR) and [[Three prime untranslated region\|3' untranslated region]] (3'-UTR),<ref name=":12"/> the [[Five-prime cap\|5' cap]], and [[Poly a tail\|Poly-A tail]]. During [[Translation (biology)\|translation]], the [[ribosome]] facilitates the attachment of the [[Transfer RNA\|tRNAs]] to the coding region, 3 nucleotides at a time ([[codons]]).<ref>{{Cite web\|url=https://www.nature.com/scitable/topicpage/translation-dna-to-mrna-to-protein-393/\|title=Translation: DNA to mRNA to Protein\|last=Clancy\|first=Suzanne\|date=2008\|website=Scitable: By Nature Education~~\|url-status=live\|archive-url=\|archive-date=\|access-date=~~}}</ref> The tRNAs transfer their associated [[amino acid]]s to the growing [[polypeptide]] chain, eventually forming the protein defined in the initial DNA coding region. [[File:Mature_mRNA.png\|thumb\|413x413px\|The coding region (teal) is flanked by untranslated regions, the 5' cap, and the poly(A) tail which together form the '''mature mRNA'''.<ref>{{Citation\|last=Plociam\|title=English: The structure of a mature eukaryotic mRNA. A fully processed mRNA includes the 5' cap, 5' UTR, coding region, 3' UTR, and poly(A) tail.\|date=2005-08-08\|url=https://commons.wikimedia.org/wiki/File:Mature_mRNA.png\|access-date=2019-11-19}}</ref>]] Line 31: [[Alkylation]] is one form of regulation of the coding region.<ref>{{cite journal \| vauthors = Shinohara K, Sasaki S, Minoshima M, Bando T, Sugiyama H \| title = Alkylation of template strand of coding region causes effective gene silencing \| journal = Nucleic Acids Research \| volume = 34 \| issue = 4 \| pages = 1189–95 \| date = 2006-02-13 \| pmid = 16500890 \| pmc = 1383623 \| doi = 10.1093/nar/gkl005 }}</ref> The gene that would have been transcribed can be silenced by targeting a specific sequence. The bases in this sequence would be blocked using [[Alkyl\|alkyl groups]], which create the [[Gene silencing\|silencing]] effect.<ref>{{Cite web\|url=http://www.informatics.jax.org/vocab/gene_ontology/GO:0006305\|title=DNA alkylation Gene Ontology Term (GO:0006305)\|website=www.informatics.jax.org\|access-date=2019-10-30}}</ref> While the [[regulation of gene expression]] manages the abundance of RNA or protein made in a cell, the regulation of these mechanisms can be controlled by a [[regulatory sequence]] found before the [[open reading frame]] begins in a strand of DNA. The [[regulatory sequence]] will then determine the ___location and time that expression will occur for a protein coding region.<ref>{{Cite journal \|last1=Shafee\|first1=Thomas\|last2=Lowe\|first2=Rohan \| name-list-~~format~~style = vanc \|date=2017 \|title=Eukaryotic and prokaryotic gene structure\|journal=WikiJournal of Medicine\|volume=4\|issue=1\|doi=10.15347/wjm/2017.002\|doi-access=free}}</ref> [[RNA splicing]] ultimately determines what part of the sequence becomes translated and expressed, and this process involves cutting out introns and putting together exons. Where the RNA [[spliceosome]] cuts, however, is guided by the recognition of [[splice site]]s, in particular the 5' splicing site, which is one of the substrates for the first step in splicing.<ref>{{cite journal \| vauthors = Konarska MM \| title = Recognition of the 5' splice site by the spliceosome \| journal = Acta Biochimica Polonica \| volume = 45 \| issue = 4 \| pages = 869–81 \| date = 1998 \| pmid = 10397335 \| doi = 10.18388/abp.1998_4346 \| doi-access = free }}</ref> The coding regions are within the exons, which become covalently joined together to form the [[mature messenger RNA]]. == Mutations == [[Mutation]]s in the coding region can have very diverse effects on the phenotype of the organism. While some mutations in this region of DNA/RNA can result in advantageous changes, others can be harmful and sometimes even lethal to an organism's survival. In contrast, changes in the non-coding region may not always result in detectable changes in phenotype. === Mutation ~~Types~~types === [[File:Different_Types_of_Mutations.png\|thumb\|381x381px\|Examples of the various forms of '''point mutations''' that may exist within coding regions. Such alterations may or may not have phenotypic changes, depending on whether or not they code for different amino acids during translation.<ref>{{Citation\|last=Jonsta247\|title=English: Example of silent mutation\|date=2013-05-10\|url=https://commons.wikimedia.org/wiki/File:Different_Types_of_Mutations.png\|access-date=2019-11-19}}</ref>]] There are various forms of mutations that can occur in coding regions. One form is [[silent mutation]]s, in which a change in nucleotides does not result in any change in amino acid after transcription and translation.<ref name=":3">Yang, J. (2016, March 23). What are Genetic Mutation? Retrieved from ~~<nowiki>~~https://www.singerinstruments.com/resource/what-are-genetic-mutation/~~</nowiki>~~ .</ref> There also exist [[nonsense mutation]]s, where base alterations in the coding region code for a premature stop codon, producing a shorter final protein. [[Point mutation\|Point mutations,]], or single base pair changes in the coding region, that code for different amino acids during translation, are called [[missense mutation]]s. Other types of mutations include [[frameshift mutation]]s such as [[Insertion mutation\|insertions]] or [[Deletion (genetics)\|deletions]].<ref name=":3" /> === Formation === Some forms of mutations are [[Heredity\|hereditary]] ([[germline mutation]]s), or passed on from a parent to its offspring.<ref name=":4">What is a gene mutation and how do mutations occur? - Genetics Home Reference - NIH. (n.d.). Retrieved from ~~<nowiki>~~https://~~ghr.nlm.nih~~medlineplus.gov/~~primer~~genetics/understanding/mutationsanddisorders/genemutation</~~nowiki>~~ .</ref> Such mutated coding regions are present in all cells within the organism. Other forms of mutations are acquired ([[somatic mutation]]s) during an ~~organisms~~organism's lifetime, and may not be constant cell-to-cell.<ref name=":4" /> These changes can be caused by [[mutagen]]s, [[carcinogen]]s, or other environmental agents (ex. [[Ultraviolet\|UV]]). Acquired mutations can also be a result of copy-errors during [[DNA replication]] and are not passed down to offspring. Changes in the coding region can also be [[De novo mutation\|de novo]] (new); such changes are thought to occur shortly after [[Fertilisation\|fertilization]], resulting in a mutation present in the offspring's DNA while being absent in both the sperm and egg cells.<ref name=":4" /> === Prevention === There exist multiple transcription and translation mechanisms to prevent lethality due to deleterious mutations in the coding region. Such measures include [[Proofreading (biology)\|proofreading]] by some [[DNA polymerase\|DNA Polymerases]] during replication, [[DNA mismatch repair\|mismatch repair]] following replication,<ref>{{Cite web \|title=DNA proofreading and repair. (~~n.d.~~article). ~~Retrieved from <nowiki>~~\|url=https://www.khanacademy.org/science/~~high-school-~~biology/hsdna-~~molecular~~as-~~genetics/hs~~the-~~discovery~~genetic-~~and~~material/dna-~~structure-of-dna~~replication/a/dna-proofreading-and-repair~~</nowiki>.~~ \|access-date=2023-05-22 \|website=Khan Academy \|language=en}}</ref> and the '[[Wobble hypothesis\|Wobble Hypothesis]]' which describes the [[Degeneracy (biology)\|degeneracy]] of the third base within an mRNA codon.<ref>Peretó J. (2011) Wobble Hypothesis (Genetics). In: Gargaud M. et al. (eds) Encyclopedia of Astrobiology. Springer, Berlin, Heidelberg</ref> == Constrained ~~Coding~~coding ~~Regions~~regions (CCRs) == While it is well known that the genome of one individual can have extensive differences when compared to the genome of another, recent research has found that some coding regions are highly constrained, or resistant to mutation, between individuals of the same species. This is similar to the concept of interspecies constraint in [[Conserved sequence\|conserved sequences]]. Researchers termed these highly constrained sequences constrained coding regions (CCRs), and have also discovered that such regions may be involved in [[high [[purifying selection]]. On average, there is approximately 1 protein-altering mutation every 7 coding bases, but some CCRs can have over 100 bases in sequence with no ~~observation~~observed protein-altering mutations, some without even synonymous mutations.<ref name=":0">Havrilla, J. M., Pedersen, B. S., Layer, R. M., & Quinlan, A. R. (2018). A map of constrained coding regions in the human genome. ''Nature Genetics'', 88–95. {{doi: \|10.1101/220814}}</ref> These patterns of constraint between genomes may provide clues to the sources of rare [[Developmental disorder\|developmental diseases]] or potentially even embryonic lethality. Clinically validated variants and [[de novo mutation]]s in CCRs have been previously linked to disorders such as [[infantile epileptic encephalopathy]], developmental delay and severe heart disease.<ref name=":0" /> == Coding ~~Sequence~~sequence ~~Detection~~detection == [[File:Human karyotype with bands and sub-bands.png\|thumb\|Schematic [[karyotype\|karyogram]] of a human, showing an overview of the [[human genome]] on [[G banding]] (which includes [[Giemsa-stain]]ing), wherein coding DNA regions occur to a greater extent in lighter ([[GC-content\|GC rich]]) regions.<ref name=Romiguier2017>{{cite journal\| author=Romiguier J, Roux C\| title=Analytical Biases Associated with GC-Content in Molecular Evolution. \| journal=Front Genet \| year= 2017 \| volume= 8 \| issue= \| pages= 16 \| pmid=28261263 \| doi=10.3389/fgene.2017.00016 \| pmc=5309256 \| doi-access=free }} </ref><br>{{further\|Karyotype}}]] While identification of [[open reading frames]] within a DNA sequence is straightforward, identifying coding sequences is not, because the cell translates only a subset of all open reading frames to proteins.<ref>{{cite journal \| vauthors = Furuno M, Kasukawa T, Saito R, Adachi J, Suzuki H, Baldarelli R, Hayashizaki Y, Okazaki Y \| display-authors = 6 \| title = CDS annotation in full-length cDNA sequence \| journal = Genome Research \| volume = 13 \| issue = 6B \| pages = 1478–87 \| date = June 2003 \| pmid = 12819146 \| pmc = 403693 \| doi = 10.1101/gr.1060303 ~~\| url = http://genome.cshlp.org/content/13/6b/1478.full.pdf+html~~ \| publisher = Cold Spring Harbor Laboratory Press }}</ref> Currently CDS prediction uses sampling and sequencing of mRNA from cells, although there is still the problem of determining which parts of a given mRNA are actually translated to protein. CDS prediction is a subset of [[gene prediction]], the latter also including prediction of DNA sequences that code not only for protein but also for other functional elements such as RNA genes and regulatory sequences. In both [[prokaryote]]s and [[eukaryote]]s, [[Overlapping gene\|gene overlapping]] occurs relatively often in both DNA and RNA viruses as an evolutionary advantage to reduce genome size while retaining the ability to produce various proteins from the available coding regions.<ref>{{cite journal \| vauthors = Rogozin IB, Spiridonov AN, Sorokin AV, Wolf YI, Jordan IK, Tatusov RL, Koonin EV \| title = Purifying and directional selection in overlapping prokaryotic genes \| language = ~~English~~en \| journal = Trends in Genetics \| volume = 18 \| issue = 5 \| pages = 228–32 \| date = May 2002 \| pmid = 12047938 \| doi = 10.1016/S0168-9525(02)02649-5 \| url = https://www.cell.com/trends/genetics/abstract/S0168-9525(02)02649-5 \| url-access = subscription }}</ref><ref>{{cite journal \| vauthors = Chirico N, Vianelli A, Belshaw R \| title = Why genes overlap in viruses \| journal = Proceedings. Biological Sciences \| volume = 277 \| issue = 1701 \| pages = 3809–17 \| date = December 2010 \| pmid = 20610432 \| pmc = 2992710 \| doi = 10.1098/rspb.2010.1052 }}</ref> For both DNA and RNA, [[Sequence alignment#Pairwise alignment\|pairwise alignments]] can detect overlapping coding regions, including short [[open reading frame]]s in viruses, but would require a known coding strand to compare the potential overlapping coding strand with.<ref>{{cite journal \| vauthors = Firth AE, Brown CM \| title = Detecting overlapping coding sequences with pairwise alignments \| journal = Bioinformatics \| volume = 21 \| issue = 3 \| pages = 282–92 \| date = February 2005 \| pmid = 15347574 \| doi = 10.1093/bioinformatics/bti007 \| url = https://academic.oup.com/bioinformatics/article/21/3/282/237775 \| doi-access = free }}</ref> An alternative method using single genome sequences would not require multiple genome sequences to execute comparisons but would require at least 50 nucleotides overlapping in order to be sensitive.<ref>{{cite journal \| vauthors = Schlub TE, Buchmann JP, Holmes EC \| title = A Simple Method to Detect Candidate Overlapping Genes in Viruses Using Single Genome Sequences \| journal = Molecular Biology and Evolution \| volume = 35 \| issue = 10 \| pages = 2572–2581 \| date = October 2018 \| pmid = 30099499 \| pmc = 6188560 \| doi = 10.1093/molbev/msy155 \| editor-first = Harmit \| editor-last = Malik }}</ref> == See also == Line 61 ⟶ 62: [[Mature messenger RNA\|Mature mRNA]] The portion of the mRNA transcription product that is translated [[Gene structure]] The other elements that make up a gene [[Nested gene]] Entire coding sequence lies within the bounds of a larger external gene [[Non-coding DNA]] Parts of genomes that do not encode protein-coding genes [[Non-coding RNA]] Molecules that do not encode proteins, so have no CDS [[Junk DNA\|Non-functional DNA]] Parts of genomes with no relevant biological function == References ==