Multiple sequence alignment: Difference between revisions

Content deleted Content added
Iterative methods: chaos/dialign ref
Hidden Markov models: refs to original papers on some methods
Line 35:
Typical HMM-based methods work by representing an MSA as a form of [[directed acyclic graph]] known as a partial-order graph, which consists of a series of nodes representing possible entries in the columns of an MSA. In this representation a column that is absolutely conserved (that is, that all the sequences in the MSA share a particular character at a particular position) is coded as a single node with as many outgoing connections as there are possible characters in the next column of the alignment. In the terms of a typical hidden Markov model, the observed states are the individual alignment columns and the "hidden" states represent the presumed ancestral sequence from which the sequences in the query set are hypothesized to have descended. A efficient search variant of the dynamic programming method, known as the [[Viterbi algorithm]], is generally used to successively align the growing MSA to the next sequence in the query set to produce a new MSA.<ref name="hughey">Hughey R, Krogh A. (1996). Hidden Markov models for sequence analysis: extension and analysis of the basic method. ''CABIOS'' 12(2):95-107.</ref> This is distinct from progressive alignment methods because the alignment of prior sequences is updated at each new sequence addition. However, like progressive methods, this technique can be influenced by the order in which the sequences in the query set are integrated into the alignment, especially when the sequences are distantly related.<ref name="mount" />
 
Several software programs are available in which variants of HMM-based methods have been implemented and which are noted for their scalability and efficiency, although properly using an HMM method is more complex than using more common progressive methods. The simplest is [http://sourceforge.net/project/showfiles.php?group_id=168080 POA] (Partial-Order Alignment)<!--this download link is temporary, temporarilyremember availableto forreplace downloadwhen viait's [[SourceForge]]fixed--><ref name="grasso">Grasso C, Lee C. (2004). Combining partial order alignment and progressive multiple sequence alignment increases alignment speed and scalability to very large alignment problems. ''Bioinformatics'' 20(10):1546-56.</ref>; a similar but more generalized method is implemented in the package [http://www.cse.ucsc.edu/compbio/sam.html SAM] (Sequence Alignment and Modeling System).<ref name="hughey">Hughey R, whichKrogh isA. alsoSAM: availableSequence asalignment aand [http://wwwmodeling software system.cse.ucsc.edu/compbio/HMM Technical Report UCSC-apps/T02CRL-query.html96-22, webUniversity portal]of California, Santa Cruz, CA, September 1996.</ref> SAM has been used as a source of alignments for [[protein structure prediction]] to participate in the [[CASP]] structure prediction experiment and to develop a database of predicted proteins in the [[yeast]] species [[S. cerevisiae]]. A database-search technique based on HMM methods is available in the program [http://hmmer.wustl.edu/ HMMer].<ref name="durbin">Durbin R, Eddy S, Krogh A, Mitchison G. (1998). Biological sequence analysis: probabilistic models of proteins and nucleic acids, Cambridge University Press, 1998.</ref>
 
==Genetic algorithms and simulated annealing==