Coding region: Difference between revisions

Content deleted Content added
titles
Line 19:
This indicates that essential coding regions (gene-rich) are higher in GC-content and more stable and resistant to [[mutation]] compared to accessory and non-essential regions (gene-poor).<ref>{{cite journal | vauthors = Vinogradov AE | title = DNA helix: the importance of being GC-rich | journal = Nucleic Acids Research | volume = 31 | issue = 7 | pages = 1838–44 | date = April 2003 | pmid = 12654999 | pmc = 152811 | doi = 10.1093/nar/gkg296 }}</ref> However, it is still unclear whether this came about through neutral and random mutation or through a pattern of [[Natural selection|selection]].<ref>{{cite journal | vauthors = Bohlin J, Eldholm V, Pettersson JH, Brynildsrud O, Snipen L | title = The nucleotide composition of microbial genomes indicates differential patterns of selection on core and accessory genomes | journal = BMC Genomics | volume = 18 | issue = 1 | pages = 151 | date = February 2017 | pmid = 28187704 | pmc = 5303225 | doi = 10.1186/s12864-017-3543-7 }}</ref> There is also debate on whether the methods used, such as gene windows, to ascertain the relationship between GC-content and coding region are accurate and unbiased.<ref>{{cite journal | vauthors = Sémon M, Mouchiroud D, Duret L | title = Relationship between gene expression and GC-content in mammals: statistical significance and biological relevance | journal = Human Molecular Genetics | volume = 14 | issue = 3 | pages = 421–7 | date = February 2005 | pmid = 15590696 | doi = 10.1093/hmg/ddi038 | doi-access = free }}</ref>
 
== Structure and Functionfunction ==
[[File:Coding Region in DNA.png|thumb|398x398px|'''Transcription''': RNA Polymerase (RNAP) uses a template DNA strand and begins coding at the promoter sequence (green) and ends at the terminator sequence (red) in order to encompass the entire coding region into the product mRNA (teal). ''[I have a doubt if the 5' and 3' end are shown incorrectly in this figure]'']] [[File:Transcription label en.jpg|thumb|An electron-micrograph of DNA strands decorated by hundreds of RNAP molecules too small to be resolved. Each RNAP is transcribing an RNA strand, which can be seen branching off from the DNA. "Begin" indicates the 3' end of the DNA, where RNAP initiates transcription; "End" indicates the 5' end, where the longer RNA molecules are completely transcribed.]]
In [[DNA]], the coding region is flanked by the [[Promoter (genetics)|promoter sequence]] on the 5' end of the [[template strand]] and the termination sequence on the 3' end. During [[Transcription (biology)|transcription]], the [[RNA Polymerase|RNA Polymerase (RNAP)]] binds to the promoter sequence and moves along the template strand to the coding region. RNAP then adds RNA [[nucleotide]]s complementary to the coding region in order to form the [[mRNA]], substituting [[uracil]] in place of [[thymine]].<ref name=":2">Overview of transcription. (n.d.). Retrieved from <nowiki>https://www.khanacademy.org/science/biology/gene-expression-central-dogma/transcription-of-dna-into-rna/a/overview-of-transcription</nowiki>.</ref> This continues until the RNAP reaches the termination sequence.<ref name=":2" />
Line 38:
[[Mutation]]s in the coding region can have very diverse effects on the phenotype of the organism. While some mutations in this region of DNA/RNA can result in advantageous changes, others can be harmful and sometimes even lethal to an organism's survival. In contrast, changes in the coding region may not always result in detectable changes in phenotype.
 
=== Mutation Typestypes ===
[[File:Different_Types_of_Mutations.png|thumb|381x381px|Examples of the various forms of '''point mutations''' that may exist within coding regions. Such alterations may or may not have phenotypic changes, depending on whether or not they code for different amino acids during translation.<ref>{{Citation|last=Jonsta247|title=English: Example of silent mutation|date=2013-05-10|url=https://commons.wikimedia.org/wiki/File:Different_Types_of_Mutations.png|access-date=2019-11-19}}</ref>]]
There are various forms of mutations that can occur in coding regions. One form is [[silent mutation]]s, in which a change in nucleotides does not result in any change in amino acid after transcription and translation.<ref name=":3">Yang, J. (2016, March 23). What are Genetic Mutation? Retrieved from <nowiki>https://www.singerinstruments.com/resource/what-are-genetic-mutation/</nowiki>.</ref> There also exist [[nonsense mutation]]s, where base alterations in the coding region code for a premature stop codon, producing a shorter final protein. [[Point mutation|Point mutations]], or single base pair changes in the coding region, that code for different amino acids during translation, are called [[missense mutation]]s. Other types of mutations include [[frameshift mutation]]s such as [[Insertion mutation|insertions]] or [[Deletion (genetics)|deletions]].<ref name=":3" />
Line 51:
While it is well known that the genome of one individual can have extensive differences when compared to the genome of another, recent research has found that some coding regions are highly constrained, or resistant to mutation, between individuals of the same species. This is similar to the concept of interspecies constraint in [[Conserved sequence|conserved sequences]]. Researchers termed these highly constrained sequences constrained coding regions (CCRs), and have also discovered that such regions may be involved in high [[purifying selection]]. On average, there is approximately 1 protein-altering mutation every 7 coding bases, but some CCRs can have over 100 bases in sequence with no observed protein-altering mutations, some without even synonymous mutations.<ref name=":0">Havrilla, J. M., Pedersen, B. S., Layer, R. M., & Quinlan, A. R. (2018). A map of constrained coding regions in the human genome. ''Nature Genetics'', 88–95. doi: 10.1101/220814</ref> These patterns of constraint between genomes may provide clues to the sources of rare [[Developmental disorder|developmental diseases]] or potentially even embryonic lethality. Clinically validated variants and [[de novo mutation]]s in CCRs have been previously linked to disorders such as [[infantile epileptic encephalopathy]], developmental delay and severe heart disease.<ref name=":0" />
 
== Coding Sequencesequence Detectiondetection ==
While identification of [[open reading frames]] within a DNA sequence is straightforward, identifying coding sequences is not, because the cell translates only a subset of all open reading frames to proteins.<ref>{{cite journal | vauthors = Furuno M, Kasukawa T, Saito R, Adachi J, Suzuki H, Baldarelli R, Hayashizaki Y, Okazaki Y | display-authors = 6 | title = CDS annotation in full-length cDNA sequence | journal = Genome Research | volume = 13 | issue = 6B | pages = 1478–87 | date = June 2003 | pmid = 12819146 | pmc = 403693 | doi = 10.1101/gr.1060303 | publisher = Cold Spring Harbor Laboratory Press }}</ref> Currently CDS prediction uses sampling and sequencing of mRNA from cells, although there is still the problem of determining which parts of a given mRNA are actually translated to protein. CDS prediction is a subset of [[gene prediction]], the latter also including prediction of DNA sequences that code not only for protein but also for other functional elements such as RNA genes and regulatory sequences.