Revision as of 21:39, 5 May 2015 edit AManWithNoPlan (talk \| contribs) Extended confirmed users 100,078 edits m Cleaned up using AutoEd ← Previous edit		Revision as of 21:41, 5 May 2015 edit undo AManWithNoPlan (talk \| contribs) Extended confirmed users 100,078 edits m Fix ref Next edit →
Line 1: '''Linguistic sequence complexity''' (LC) is a measure of the 'vocabulary richness' of a genetic text in [[gene sequence]]s.<ref name=Trifonov1990>{{cite book\| author=[[Edward N. Trifonov]] \|year=1990\| ~~work~~title=Structure and Methods, Vol. 1 \| series=Human Genome Initiative and DNA Recombination; Proceedings of the Sixth Conversation in the Discipline Biomolecular Stereodynamics \|pages=69–77 \|chapter=Making sense of the human genome\|publisher=Adenine Press \|___location=Albany, New York}}</ref> When a [[nucleotide]] sequence is written as text using a four-letter alphabet, the repetitiveness of the text, that is, the repetition of its [[N-gram]]s (words), can be calculated and serves as a measure of sequence complexity. Thus, the more complex a [[DNA sequence]], the richer its [[oligonucleotide]] vocabulary, whereas repetitious sequences have relatively lower complexities. Subsequent work improved the original algorithm described in [[Edward Trifonov\|Trifonov]] (1990),<ref name=Trifonov1990 /> without changing the essence of the linguistic complexity approach.<ref name=Gabrielian1999>{{cite doi\|10.1016/S0097-8485(99)00007-8\|noedit}}</ref><ref name=Orlov2004>{{cite doi\|10.1093/nar/gkh466\|noedit}}</ref><ref name=Janson2004>{{cite doi\|10.1016/j.tcs.2004.06.023\|noedit}}</ref>

Linguistic sequence complexity: Difference between revisions