Content deleted Content added
m Dated {{Clarify}}. (Build J3) |
Opening made bold, removed repetitiveness. |
||
Line 1:
{{Original research|date=March 2012}}
When a [[nucleotide]] sequence is studied as a text written in the four-letter alphabet, the repetitiveness of such a text, that is, the repetition of its [[N-gram|N-grams (words)]], can be calculated and serves as a measure of sequence complexity. Thus, the more complex a [[DNA_sequence|DNA sequence]], the richer its [[oligonucleotide]] vocabulary, whereas repetitious sequences have relatively lower complexities. We have recently improved the original algorithm described in (Trifonov 1990)<ref name=Trifonov1990/> without changing the essence of the linguistic complexity approach.{{Or|date=March 2012}}<ref name=Gabrielian1999>{{cite doi|10.1016/S0097-8485(99)00007-8|noedit}}}</ref><ref name=Orlov2004>{{cite doi|10.1093/nar/gkh466|noedit}}}</ref><ref name=Janson2004>{{cite doi|10.1016/j.tcs.2004.06.023|noedit}}}</ref>
|