Multiple sequence alignment

This is an old revision of this page, as edited by Wzhao553 (talk | contribs) at 18:53, 30 April 2006 (revise notes on progressive alignment). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

In Bioinformatics, multiple alignment can be used to study evolutionary relationships between sequences of proteins or genes. Since the changes between gene sequences due to evolution are incremental, genes with a common evolutionary origin (or their protein products) can be compared by aligning identical or similar residues.

First 90 positions of a protein multiple sequence alignment of instances of the acidic ribosomal protein P0 (L10E) from several organisms. Generated with ClustalW.

A multiple sequence alignment is a graphical representation where several DNA or protein sequences are aligned on top of each other so that residues likely to play equivalent roles in the aligned sequences occupy the same column.

The alignment may then be used to study, which regions of genes have been conserved, and which are sensitive to mutation, over the years. This is very useful in designing experiments to test and modify the function of specific proteins, to predict the function and structure of proteins, and to identify new members of protein families.

The production of a protein sequence alignment is also a necessary step for the study of the phylogeny of a protein family.

Multiple sequence alignment programs and techniques

A common approach for multiple sequence alignment is to progressively align sequences using a guide tree. Initially, each sequence at the leaves is represented as a trivial alignment of a single sequence. Then, at each internal node, the alignments at its children are merged into an alignment of the alignments. At the end, the root contains an alignment on all the sequences. This is called progressive alignment.

There are many computer programs to produce multiple sequence alignments starting with a collection of unaligned sequences (ClustalW, T-Coffee) and to represent graphically those alignments (ClustalW, Belvu).

References

Survey articles

  • Duret, L. (2000). "Multiple alignment for structural functional or phylogenetic analyses of homologous sequences". In D. Higgins and W. Taylor (ed.). Bioinformatics sequence structure and databanks. Oxford: Oxford University Press. {{cite book}}: Unknown parameter |coauthors= ignored (|author= suggested) (help)
  • Notredame, C. (2002). "Recent progresses in multiple sequence alignment: a survey". Pharmacogenomics. 31 (1): 131 -- 144.
  • Thompson, J. D. (1999). "A comprehensive comparison of multiple sequence alignment programs". Nucleic Acids Research. 27 (13): 12682--2690. {{cite journal}}: Unknown parameter |coauthors= ignored (|author= suggested) (help)