Linguistic sequence complexity: Difference between revisions

Content deleted Content added
Rkalendar (talk | contribs)
mNo edit summary
Rkalendar (talk | contribs)
mNo edit summary
Line 1:
The linguistic complexity (LC) measure <ref>{{cite book| author=[http://evolution.haifa.ac.il/index.php/people/item/40-edward-n-trifonov-phd Edward N. Trifonov] |year=1990| book=Structure & Methods| title=Structure and Methods| series= Human Genome Initiative and DNA Recombination| volume=1| pages=69–77|chapter=Making sense of the human genome|publisher=Adenine Press, New York [http://evolution.haifa.ac.il/index.php/people/item/40-edward-n-trifonov-phd Edward N. Trifonov Ph.D.]}}</ref> was introduced as a measure of the ‘vocabulary richness’of a text.
When a [[nucleotide]] sequence is studied as a text written in the four-letter alphabet, the repetitiveness of such a text, that is, the extensive repetition of some [[N-gram|N-grams (words)]], can be calculated, and served as a measure of sequence complexity. Thus, the more complex a [[DNA_sequence|DNA sequence]], the richer is its oligonucleotide vocabulary, whereas repetitious sequences have relatively lower complexities. We have recently improved the original algorithm described in (Trifonov 1990) without changing the essence of the linguistic complexity approach.