Sequence alignment: Difference between revisions

Content deleted Content added
m Dot-matrix methods: Main template
Line 32:
 
==Representations==
Ref. : GTCGTAGAATA </br>
[[Read (biology)|Read]]: CACGTAG--TA </br>
CIGAR: 2S5M2D2M </br>
 
where: </br>
2S = 2 mismatches </br>
5M = 5 matches </br>
2D = 2 deletions </br>
2M = 2 matches </br>
 
 
Alignments are commonly represented both graphically and in text format. In almost all sequence alignment representations, sequences are written in rows arranged so that aligned residues appear in successive columns. In text formats, aligned columns containing identical or similar characters are indicated with a system of conservation symbols. As in the image above, an asterisk or pipe symbol is used to show identity between two columns; other less common symbols include a colon for conservative substitutions and a period for semiconservative substitutions. Many sequence visualization programs also use color to display information about the properties of the individual sequence elements; in DNA and RNA sequences, this equates to assigning each nucleotide its own color. In protein alignments, such as the one in the image above, color is often used to indicate amino acid properties to aid in judging the [[conservation (genetics)|conservation]] of a given amino acid substitution. For multiple sequences the last row in each column is often the [[consensus sequence]] determined by the alignment; the consensus sequence is also often represented in graphical format with a [[sequence logo]] in which the size of each nucleotide or amino acid letter corresponds to its degree of conservation.<ref name=Schneider>{{cite journal| journal=Nucleic Acids Res | volume=18 | pages=6097–6100 | year=1990 |author1=Schneider TD |author2=Stephens RM | title=Sequence logos: a new way to display consensus sequences |pmid=2172928 |pmc=332411 |url=http://nar.oxfordjournals.org/cgi/pmidlookup?view=long&pmid=2172928 |doi=10.1093/nar/18.20.6097| issue=20}}</ref>