Coding region: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 18:17, 18 November 2024 edit 152.1.128.70 (talk) No edit summary ← Previous edit		Latest revision as of 23:01, 22 July 2025 edit undo Citation bot (talk \| contribs) Bots 5,872,790 edits Added article-number. Removed parameters. Some additions/deletions were parameter name changes. \| Use this bot. Report bugs. \| Suggested by Abductive \| Category:Biochemistry \| #UCB_Category 20/248
(6 intermediate revisions by 4 users not shown)
Line 1: {{short description\|Portion of gene's sequence which codes for protein}} The '''coding region''' of a [[gene]], also known as the '''coding DNA sequence''' ('''CDS'''), is the portion of a gene's [[DNA]] or [[RNA]] that codes for a [[protein]].<ref name=":12">{{cite web\|url=http://genome.wellcome.ac.uk/doc_WTD020755.html\|title=Gene Structure\|last=Twyman\|first=Richard\|date=1 August 2003\|publisher=The Wellcome Trust\|url-status=dead\|archive-url=https://web.archive.org/web/20070328214808/http://genome.wellcome.ac.uk/doc_WTD020755.html\|archive-date=28 March 2007\|access-date=6 April 2003}}</ref> Studying the length, composition, regulation, splicing, structures, and functions of coding regions compared to non-coding regions over different species and time periods can provide a significant amount of important information regarding gene organization and evolution of [[prokaryote]]s and [[eukaryote]]s.<ref>{{Cite journal \| vauthors = Höglund M, Säll T, Röhme D \|date=February 1990\|title=On the origin of coding sequences from random open reading frames\|journal=Journal of Molecular Evolution\|volume=30\|issue=2\|pages=104–108\|doi=10.1007/bf02099936\|issn=0022-2844\|bibcode=1990JMolE..30..104H\|s2cid=5978109}}</ref> This can further assist in mapping the [[Human Genome Project\|human genome]] and developing gene therapy.<ref>{{cite journal \| vauthors = Sakharkar MK, Chow VT, Kangueane P \| title = Distributions of exons and introns in the human genome \| journal = In Silico Biology \| volume = 4 \| issue = 4 \| pages = 387–93 \| date = 2004 \| doi = 10.3233/ISB-00142 \| pmid = 15217358 }}</ref> == Definition == Although this term is also sometimes used interchangeably with [[exon]], it is not the exact same thing: the [[exon]] iscan be composed of the coding region as well as the 3' and 5' [[untranslated region]]s of the RNA, and so therefore, an exon would be partially made up of coding ~~regions~~region. The 3' and 5' [[untranslated region]]s of the RNA, which do not code for protein, are termed [[Non-coding region\|non-coding]] regions and are not discussed on this page.<ref>{{Cite book\|last=Parnell\|first=Laurence D.\|chapter=Advances in Technologies and Study Design\|date=2012-01-01\|chapter-url=http://www.sciencedirect.com/science/article/pii/B9780123983978000022\|journal=Progress in Molecular Biology and Translational Science\|volume=108\|pages=17–50\|editor-last=Bouchard\|editor-first=C.\|publisher=Academic Press\|access-date=2019-11-07\|editor2-last=Ordovas\|editor2-first=J. M.\|doi=10.1016/B978-0-12-398397-8.00002-2\|pmid=22656372\|title=Recent Advances in Nutrigenetics and Nutrigenomics\|isbn=9780123983978}}</ref> There is often confusion between coding regions and [[exome]]s and there is a clear distinction between these terms. While the [[exome]] refers to all exons within a genome, the coding region refers to ~~a singular section~~sections of the DNA (or ~~RNA~~[[primary transcript]]) or a singular section of processed mRNA which specifically codes for a certain kind of protein.   == History == Line 17: GC-rich areas are also where the ratio [[point mutation]] type is altered slightly: there are more [[Transition (genetics)\|transitions]], which are changes from purine to purine or pyrimidine to pyrimidine, compared to [[transversion]]s, which are changes from purine to pyrimidine or pyrimidine to purine. The transitions are less likely to change the encoded amino acid and remain a [[silent mutation]] (especially if they occur in the third [[nucleotide]] of a codon) which is usually beneficial to the organism during translation and protein formation.<ref>{{Cite web\|url=http://rosalind.info/glossary/gene-coding-region/\|title=ROSALIND {{!}} Glossary {{!}} Gene coding region\|website=rosalind.info\|access-date=2019-10-31}}</ref> This indicates that essential coding regions (gene-rich) are higher in GC-content and more stable and resistant to [[mutation]] compared to accessory and non-essential regions (gene-poor).<ref>{{cite journal \| vauthors = Vinogradov AE \| title = DNA helix: the importance of being GC-rich \| journal = Nucleic Acids Research \| volume = 31 \| issue = 7 \| pages = 1838–44 \| date = April 2003 \| pmid = 12654999 \| pmc = 152811 \| doi = 10.1093/nar/gkg296 }}</ref> However, it is still unclear whether this came about through neutral and random mutation or through a pattern of [[Natural selection\|selection]].<ref>{{cite journal \| vauthors = Bohlin J, Eldholm V, Pettersson JH, Brynildsrud O, Snipen L \| title = The nucleotide composition of microbial genomes indicates differential patterns of selection on core and accessory genomes \| journal = BMC Genomics \| volume = 18 \| issue = 1 \| ~~pages~~article-number = 151 \| date = February 2017 \| pmid = 28187704 \| pmc = 5303225 \| doi = 10.1186/s12864-017-3543-7 \| doi-access = free }}</ref> There is also debate on whether the methods used, such as gene windows, to ascertain the relationship between GC-content and coding region are accurate and unbiased.<ref>{{cite journal \| vauthors = Sémon M, Mouchiroud D, Duret L \| title = Relationship between gene expression and GC-content in mammals: statistical significance and biological relevance \| journal = Human Molecular Genetics \| volume = 14 \| issue = 3 \| pages = 421–7 \| date = February 2005 \| pmid = 15590696 \| doi = 10.1093/hmg/ddi038 \| doi-access = free }}</ref> == Structure and function == Line 43: === Formation === Some forms of mutations are [[Heredity\|hereditary]] ([[germline mutation]]s), or passed on from a parent to its offspring.<ref name=":4">What is a gene mutation and how do mutations occur? - Genetics Home Reference - NIH. (n.d.). Retrieved from https://~~ghr.nlm.nih~~medlineplus.gov/~~primer~~genetics/understanding/mutationsanddisorders/genemutation/ .</ref> Such mutated coding regions are present in all cells within the organism. Other forms of mutations are acquired ([[somatic mutation]]s) during an organism's lifetime, and may not be constant cell-to-cell.<ref name=":4" /> These changes can be caused by [[mutagen]]s, [[carcinogen]]s, or other environmental agents (ex. [[Ultraviolet\|UV]]). Acquired mutations can also be a result of copy-errors during [[DNA replication]] and are not passed down to offspring. Changes in the coding region can also be [[De novo mutation\|de novo]] (new); such changes are thought to occur shortly after [[Fertilisation\|fertilization]], resulting in a mutation present in the offspring's DNA while being absent in both the sperm and egg cells.<ref name=":4" /> === Prevention === Line 55: While identification of [[open reading frames]] within a DNA sequence is straightforward, identifying coding sequences is not, because the cell translates only a subset of all open reading frames to proteins.<ref>{{cite journal \| vauthors = Furuno M, Kasukawa T, Saito R, Adachi J, Suzuki H, Baldarelli R, Hayashizaki Y, Okazaki Y \| display-authors = 6 \| title = CDS annotation in full-length cDNA sequence \| journal = Genome Research \| volume = 13 \| issue = 6B \| pages = 1478–87 \| date = June 2003 \| pmid = 12819146 \| pmc = 403693 \| doi = 10.1101/gr.1060303 \| publisher = Cold Spring Harbor Laboratory Press }}</ref> Currently CDS prediction uses sampling and sequencing of mRNA from cells, although there is still the problem of determining which parts of a given mRNA are actually translated to protein. CDS prediction is a subset of [[gene prediction]], the latter also including prediction of DNA sequences that code not only for protein but also for other functional elements such as RNA genes and regulatory sequences. In both [[prokaryote]]s and [[eukaryote]]s, [[Overlapping gene\|gene overlapping]] occurs relatively often in both DNA and RNA viruses as an evolutionary advantage to reduce genome size while retaining the ability to produce various proteins from the available coding regions.<ref>{{cite journal \| vauthors = Rogozin IB, Spiridonov AN, Sorokin AV, Wolf YI, Jordan IK, Tatusov RL, Koonin EV \| title = Purifying and directional selection in overlapping prokaryotic genes \| language = en \| journal = Trends in Genetics \| volume = 18 \| issue = 5 \| pages = 228–32 \| date = May 2002 \| pmid = 12047938 \| doi = 10.1016/S0168-9525(02)02649-5 \| url = https://www.cell.com/trends/genetics/abstract/S0168-9525(02)02649-5 \| url-access = subscription }}</ref><ref>{{cite journal \| vauthors = Chirico N, Vianelli A, Belshaw R \| title = Why genes overlap in viruses \| journal = Proceedings. Biological Sciences \| volume = 277 \| issue = 1701 \| pages = 3809–17 \| date = December 2010 \| pmid = 20610432 \| pmc = 2992710 \| doi = 10.1098/rspb.2010.1052 }}</ref> For both DNA and RNA, [[Sequence alignment#Pairwise alignment\|pairwise alignments]] can detect overlapping coding regions, including short [[open reading frame]]s in viruses, but would require a known coding strand to compare the potential overlapping coding strand with.<ref>{{cite journal \| vauthors = Firth AE, Brown CM \| title = Detecting overlapping coding sequences with pairwise alignments \| journal = Bioinformatics \| volume = 21 \| issue = 3 \| pages = 282–92 \| date = February 2005 \| pmid = 15347574 \| doi = 10.1093/bioinformatics/bti007 \| url = https://academic.oup.com/bioinformatics/article/21/3/282/237775 \| doi-access = free }}</ref> An alternative method using single genome sequences would not require multiple genome sequences to execute comparisons but would require at least 50 nucleotides overlapping in order to be sensitive.<ref>{{cite journal \| vauthors = Schlub TE, Buchmann JP, Holmes EC \| title = A Simple Method to Detect Candidate Overlapping Genes in Viruses Using Single Genome Sequences \| journal = Molecular Biology and Evolution \| volume = 35 \| issue = 10 \| pages = 2572–2581 \| date = October 2018 \| pmid = 30099499 \| pmc = 6188560 \| doi = 10.1093/molbev/msy155 \| editor-first = Harmit \| editor-last = Malik }}</ref> == See also ==