Multiple sequence alignment

A multiple sequence alignment is a sequence alignment of three or more biological sequences.

The multiple alignment problem consists of inferring all homologous characters among these sequences. The characters may consist of single nucleotides, amino acids, genes, or any sequence segments that may share an evolutionary origin. Multiple alignments may be used to study which sequences have been conserved over time. This is the starting point of comparative genomics and molecular phylogenetics studies. The theoretical basis for multiple sequence alignments is that the sequences have evolved by point mutations, deletions or insertions. Point mutations would result in the alignment having differing characters in the same column of the alignment, while deletions and insertions would have gaps in the columns effected by the insertion or deletion.

Multiple sequence alignment programs and techniques

A common approach for multiple sequence alignment is to progressively align sequences using a guide tree. Initially, each sequence at the leaves is represented as a trivial alignment of a single sequence. Then, at each internal node, the alignments at its children are merged into an alignment of the alignments. At the end, the root contains an alignment on all the sequences. This is called progressive alignment. Usually, these alignemnts which occur at the interior nodes are done as pairwise alignments where each of the two children alignments is treated as a single sequence. Two sequences can be aligned using dynamic programming techniques such as Needleman-Wunsch and a substitution matrix such as BLOSUM or Margaret Dayhoff's PAM (Point Accepted Mutation).

There are many computer programs to produce multiple sequence alignments starting with a collection of unaligned sequences (ClustalW, DIALIGN, MAVID, TBA, T-Coffee) and to represent graphically those alignments (ClustalW, Belvu).

References

Survey articles

Duret, L. (2000). "Multiple alignment for structural functional or phylogenetic analyses of homologous sequences". In D. Higgins and W. Taylor (ed.). Bioinformatics sequence structure and databanks. Oxford: Oxford University Press. {{cite book}}: Unknown parameter |coauthors= ignored (|author= suggested) (help)

Notredame, C. (2002). "Recent progresses in multiple sequence alignment: a survey". Pharmacogenomics. 31 (1): 131 -- 144.

Thompson, J. D. (1999). "A comprehensive comparison of multiple sequence alignment programs". Nucleic Acids Research. 27 (13): 12682--2690. {{cite journal}}: Unknown parameter |coauthors= ignored (|author= suggested) (help)

This bioinformatics-related article is a stub. You can help Wikipedia by expanding it.