Linguistic sequence complexity: Difference between revisions

Content deleted Content added
m +wikilinks to Edward Trifonov on Wikipedia
fix cite error "book=" as "work="
Line 1:
'''Linguistic sequence complexity''' (LC) is a measure of the 'vocabulary richness' of a text.<ref name=Trifonov1990>{{cite book| author=[[Edward N. Trifonov]] |year=1990| bookwork=Structure & Methods| title=Structure and Methods| series= Human Genome Initiative and DNA Recombination| volume=1| pages=69–77|chapter=Making sense of the human genome|publisher=Adenine Press, New York}}</ref>
When a [[nucleotide]] sequence is written as text using a four-letter alphabet, the repetitiveness of the text, that is, the repetition of its [[N-gram|N-grams (words)]], can be calculated and serves as a measure of sequence complexity. Thus, the more complex a [[DNA sequence]], the richer its [[oligonucleotide]] vocabulary, whereas repetitious sequences have relatively lower complexities. Subsequent work improved the original algorithm described in ([[Edward Trifonov|Trifonov]] 1990)<ref name=Trifonov1990/> without changing the essence of the linguistic complexity approach.<ref name=Gabrielian1999>{{cite doi|10.1016/S0097-8485(99)00007-8|noedit}}</ref><ref name=Orlov2004>{{cite doi|10.1093/nar/gkh466|noedit}}</ref><ref name=Janson2004>{{cite doi|10.1016/j.tcs.2004.06.023|noedit}}</ref>