Revision as of 10:21, 14 March 2012 edit Helpful Pixie Bot (talk \| contribs) Bots 571,497 edits m Dated {{Clarify}}. (Build J3) ← Previous edit		Revision as of 11:17, 14 March 2012 edit undo Stigmatella aurantiaca (talk \| contribs) Extended confirmed users 8,849 edits Opening made bold, removed repetitiveness. Next edit →
Line 1: {{Original research\|date=March 2012}} ~~The~~'''Linguistic ~~linguistic~~sequence complexity''' (LC) is a measure of the 'vocabulary richness' of a text.<ref name=Trifonov1990>{{cite book\| author=[http://evolution.haifa.ac.il/index.php/people/item/40-edward-n-trifonov-phd Edward N. Trifonov] \|year=1990\| book=Structure & Methods\| title=Structure and Methods\| series= Human Genome Initiative and DNA Recombination\| volume=1\| pages=69–77\|chapter=Making sense of the human genome\|publisher=Adenine Press, New York}}</ref> ~~is a measure of the 'vocabulary richness' of a text.~~ When a [[nucleotide]] sequence is studied as a text written in the four-letter alphabet, the repetitiveness of such a text, that is, the repetition of its [[N-gram\|N-grams (words)]], can be calculated and serves as a measure of sequence complexity. Thus, the more complex a [[DNA_sequence\|DNA sequence]], the richer its [[oligonucleotide]] vocabulary, whereas repetitious sequences have relatively lower complexities. We have recently improved the original algorithm described in (Trifonov 1990)<ref name=Trifonov1990/> without changing the essence of the linguistic complexity approach.{{Or\|date=March 2012}}<ref name=Gabrielian1999>{{cite doi\|10.1016/S0097-8485(99)00007-8\|noedit}}}</ref><ref name=Orlov2004>{{cite doi\|10.1093/nar/gkh466\|noedit}}}</ref><ref name=Janson2004>{{cite doi\|10.1016/j.tcs.2004.06.023\|noedit}}}</ref>

Linguistic sequence complexity: Difference between revisions