Content deleted Content added
No edit summary |
Citation bot (talk | contribs) Added article-number. Removed parameters. Some additions/deletions were parameter name changes. | Use this bot. Report bugs. | Suggested by Abductive | Category:Biochemistry | #UCB_Category 20/248 |
||
(6 intermediate revisions by 4 users not shown) | |||
Line 1:
{{short description|Portion of gene's sequence which codes for protein}}
The '''coding region''' of a [[gene]], also known as the '''coding DNA sequence''' ('''CDS'''), is the portion of a gene's [[DNA]] or [[RNA]] that codes for a [[protein]].<ref name=":12">{{cite web|url=http://genome.wellcome.ac.uk/doc_WTD020755.html|title=Gene Structure|last=Twyman|first=Richard|date=1 August 2003|publisher=The Wellcome Trust|url-status=dead|archive-url=https://web.archive.org/web/20070328214808/http://genome.wellcome.ac.uk/doc_WTD020755.html|archive-date=28 March 2007|access-date=6 April 2003}}</ref> Studying the length, composition, regulation, splicing, structures, and functions of coding regions compared to non-coding regions over different species and time periods can provide a significant amount of important information regarding gene organization and evolution of [[prokaryote]]s and [[eukaryote]]s.<ref>{{Cite journal | vauthors = Höglund M, Säll T, Röhme D |date=February 1990|title=On the origin of coding sequences from random open reading frames|journal=Journal of Molecular Evolution|volume=30|issue=2|pages=104–108|doi=10.1007/bf02099936|issn=0022-2844|bibcode=1990JMolE..30..104H|s2cid=5978109}}</ref> This can further assist in mapping the [[Human Genome Project|human genome]] and developing gene therapy.<ref>{{cite journal | vauthors = Sakharkar MK, Chow VT, Kangueane P | title = Distributions of exons and introns in the human genome | journal = In Silico Biology | volume = 4 | issue = 4 | pages = 387–93 | date = 2004 | doi = 10.3233/ISB-00142 | pmid = 15217358 }}</ref>
== Definition ==
Although this term is also sometimes used interchangeably with [[exon]], it is not the exact same thing: the [[exon]]
There is often confusion between coding regions and [[exome]]s and there is a clear distinction between these terms. While the [[exome]] refers to all exons within a genome, the coding region refers to
== History ==
Line 17:
GC-rich areas are also where the ratio [[point mutation]] type is altered slightly: there are more [[Transition (genetics)|transitions]], which are changes from purine to purine or pyrimidine to pyrimidine, compared to [[transversion]]s, which are changes from purine to pyrimidine or pyrimidine to purine. The transitions are less likely to change the encoded amino acid and remain a [[silent mutation]] (especially if they occur in the third [[nucleotide]] of a codon) which is usually beneficial to the organism during translation and protein formation.<ref>{{Cite web|url=http://rosalind.info/glossary/gene-coding-region/|title=ROSALIND {{!}} Glossary {{!}} Gene coding region|website=rosalind.info|access-date=2019-10-31}}</ref>
This indicates that essential coding regions (gene-rich) are higher in GC-content and more stable and resistant to [[mutation]] compared to accessory and non-essential regions (gene-poor).<ref>{{cite journal | vauthors = Vinogradov AE | title = DNA helix: the importance of being GC-rich | journal = Nucleic Acids Research | volume = 31 | issue = 7 | pages = 1838–44 | date = April 2003 | pmid = 12654999 | pmc = 152811 | doi = 10.1093/nar/gkg296 }}</ref> However, it is still unclear whether this came about through neutral and random mutation or through a pattern of [[Natural selection|selection]].<ref>{{cite journal | vauthors = Bohlin J, Eldholm V, Pettersson JH, Brynildsrud O, Snipen L | title = The nucleotide composition of microbial genomes indicates differential patterns of selection on core and accessory genomes | journal = BMC Genomics | volume = 18 | issue = 1 |
== Structure and function ==
Line 43:
=== Formation ===
Some forms of mutations are [[Heredity|hereditary]] ([[germline mutation]]s), or passed on from a parent to its offspring.<ref name=":4">What is a gene mutation and how do mutations occur? - Genetics Home Reference - NIH. (n.d.). Retrieved from https://
=== Prevention ===
Line 55:
While identification of [[open reading frames]] within a DNA sequence is straightforward, identifying coding sequences is not, because the cell translates only a subset of all open reading frames to proteins.<ref>{{cite journal | vauthors = Furuno M, Kasukawa T, Saito R, Adachi J, Suzuki H, Baldarelli R, Hayashizaki Y, Okazaki Y | display-authors = 6 | title = CDS annotation in full-length cDNA sequence | journal = Genome Research | volume = 13 | issue = 6B | pages = 1478–87 | date = June 2003 | pmid = 12819146 | pmc = 403693 | doi = 10.1101/gr.1060303 | publisher = Cold Spring Harbor Laboratory Press }}</ref> Currently CDS prediction uses sampling and sequencing of mRNA from cells, although there is still the problem of determining which parts of a given mRNA are actually translated to protein. CDS prediction is a subset of [[gene prediction]], the latter also including prediction of DNA sequences that code not only for protein but also for other functional elements such as RNA genes and regulatory sequences.
In both [[prokaryote]]s and [[eukaryote]]s, [[Overlapping gene|gene overlapping]] occurs relatively often in both DNA and RNA viruses as an evolutionary advantage to reduce genome size while retaining the ability to produce various proteins from the available coding regions.<ref>{{cite journal | vauthors = Rogozin IB, Spiridonov AN, Sorokin AV, Wolf YI, Jordan IK, Tatusov RL, Koonin EV | title = Purifying and directional selection in overlapping prokaryotic genes | language = en | journal = Trends in Genetics | volume = 18 | issue = 5 | pages = 228–32 | date = May 2002 | pmid = 12047938 | doi = 10.1016/S0168-9525(02)02649-5 | url = https://www.cell.com/trends/genetics/abstract/S0168-9525(02)02649-5 | url-access = subscription }}</ref><ref>{{cite journal | vauthors = Chirico N, Vianelli A, Belshaw R | title = Why genes overlap in viruses | journal = Proceedings. Biological Sciences | volume = 277 | issue = 1701 | pages = 3809–17 | date = December 2010 | pmid = 20610432 | pmc = 2992710 | doi = 10.1098/rspb.2010.1052 }}</ref> For both DNA and RNA, [[Sequence alignment#Pairwise alignment|pairwise alignments]] can detect overlapping coding regions, including short [[open reading frame]]s in viruses, but would require a known coding strand to compare the potential overlapping coding strand with.<ref>{{cite journal | vauthors = Firth AE, Brown CM | title = Detecting overlapping coding sequences with pairwise alignments | journal = Bioinformatics | volume = 21 | issue = 3 | pages = 282–92 | date = February 2005 | pmid = 15347574 | doi = 10.1093/bioinformatics/bti007 | url = https://academic.oup.com/bioinformatics/article/21/3/282/237775 | doi-access = free }}</ref> An alternative method using single genome sequences would not require multiple genome sequences to execute comparisons but would require at least 50 nucleotides overlapping in order to be sensitive.<ref>{{cite journal | vauthors = Schlub TE, Buchmann JP, Holmes EC | title = A Simple Method to Detect Candidate Overlapping Genes in Viruses Using Single Genome Sequences | journal = Molecular Biology and Evolution | volume = 35 | issue = 10 | pages = 2572–2581 | date = October 2018 | pmid = 30099499 | pmc = 6188560 | doi = 10.1093/molbev/msy155 | editor-first = Harmit | editor-last = Malik }}</ref>
== See also ==
|