Multiple sequence alignment: Difference between revisions

Content deleted Content added
added some wikilinks and details of technique
Line 1:
[[Image:RPLP0_90_ClustalW_aln.gif|right|thumb|300px|First 90 positions of a protein multiple sequence alignment of instances of the acidic ribosomal protein P0 (L10E) from several organisms. Generated with [[ClustalW]].]]
 
The multiple alignment problem consists of inferring all [[Homology (biology)|homologous]] characters among multiplethree or more biological sequences. The characters may consist of single [[Nucleotide|
nucleotides]], [[Amino acid|amino acids]], genes, or any sequence segments that may share an evolutionary origin.
Multiple alignments may be used to study which sequences have been conserved over time. This is the starting point of [[Comparative genomics|comparative genomics]] and [[Molecular phylogeny| molecular phylogenetics]] studies. The theoretical basis for multiple sequence alignments is that the sequences have evolved by [[point mutation]]s, [[deletion]]s or [[insertions]]. Point mutations would result in the alignment having differing characters in the same column of the alignment, while deletions and insertions would have gaps in the columns effected by the insertion or deletion.
 
==Multiple sequence alignment programs and techniques==
 
A common approach for multiple sequence alignment is to progressively align sequences using a guide tree. Initially, each sequence at the leaves is represented as a trivial alignment of a single sequence. Then, at each internal node, the alignments at its children are merged into an alignment of the alignments. At the end, the root contains an alignment on all the sequences. This is called ''progressive alignment''. Usually, these alignemnts which occur at the interior nodes are done as [[pairwise alignment]]s where each of the two children alignments is treated as a single sequence. Two sequences can be aligned using dynamic programming techniques such as [[Smith-Waterman]] and a scoring matrix such as [[BLOSUM]] or [[PAM]].
 
There are many computer programs to produce multiple sequence alignments starting with a collection of unaligned sequences ([[Clustal|ClustalW]], [[DIALIGN]], [[MAVID]], [[Threaded Blockset Aligner|TBA]], [[T-Coffee]]) and to represent graphically those alignments ([[Clustal|ClustalW]], [http://www.sanger.ac.uk/Software/Pfam/help/belvu_setup.shtml Belvu]).