Multiple sequence alignment: Difference between revisions

Content deleted Content added
NCurse (talk | contribs)
mNo edit summary
Line 24:
A set of methods to produce MSAs while reducing the errors inherent in progressive methods are classified as "iterative" because they work similarly to progressive methods but repeatedly realign the initial sequences as well as adding new sequences to the growing MSA. One reason progressive methods are so strongly dependent on a high-quality initial alignment is the fact that these alignments are always incorporated into the final result - that is, once a sequence has been aligned into the MSA, its alignment is not considered further. This approximation improves efficiency at the cost of accuracy. By contrast, iterative methods can return to previously calculated pairwise alignments or sub-MSAs incorporating subsets of the query sequence as a means of optimizing a general [[objective function]] such as finding a high-quality alignment score.
 
A variety of subtly different iteration methods have been implemented and made available in software packages; reviews and comparisons have been useful but generally refrain from choosing a "best" technique.<ref name="hirosawa">Hirosawa M, Totoki Y, Hoshida M, Ishikawa M. (1995). Comprehensive study on iterative algorithms of multiple sequence alignment. ''Comput Appl Biosci'' 11:13-18.</ref> The software package [http://prrn.hgc.jp/ PRRN/PRRP] uses a [[hill-climbing algorithm]] to optimize its MSA alignment score<ref name="gotoh">Gotoh O. (1996). Significant improvement in accuracy of multiple protein sequence alignments by iterative refinement as assessed by reference to structural alignments. ''J Mol Biol'' 264(4):823-38.</ref> and iteratively corrects both alignment weights and locally divergent or "gappy" regions of the growing MSA.<ref name="mount">Mount DM. (2004). Bioinformatics: Sequence and Genome Analysis 2nd ed. Cold Spring Harbor Laboratory Press: Cold Spring Harbor, NY.</ref> PRRP performs best when refining an alignment previously constructed by a faster method.<ref name="mount">
 
Another iterative program, DIALIGN, takes an unusual approach of focusing narrowly on local alignments between sub-segments or [[sequence motif]]s without introducing a gap penalty.<ref name="brudno">Brudno M, Chapman M, Göttgens B, Batzoglou S, Morgenstern B. (2003) Fast and sensitive multiple alignment of large genomic sequences. ''BMC Bioinformatics'' 4:66.</ref> The alignment of individual motifs is then achieved with a matrix representation similar to a dot-matrix plot in a pairwise alignment. DIALIGN is also available as a web portal at [http://dialign.gobics.de/chaos-dialign-submission CHAOS/DIALIGN].
 
A third popular iteration-based method called MUSCLE (multiple sequence alignment by log-expectation) improves on progressive methods with a more accurate distance measure to assess the relatedness of two sequences.<ref name="edgar">Edgar RC. (2004), MUSCLE: multiple sequence alignment with high accuracy and high throughput. ''Nucleic Acids Research'' 32(5), 1792-97.</ref> The distance measure is updated between iteration stages (although, in its original form, MUSCLE contained only 2-3 iterations depending on whether refinement was enabled). A web portal and download site is available at [http://www.drive5.com/muscle/ MUSCLE].
 
==Hidden Markov models==