Multiple sequence alignment: Difference between revisions

Content deleted Content added
Consensus methods: rm external links
Hidden Markov models: rm external links
Line 84:
Typical HMM-based methods work by representing an MSA as a form of [[directed acyclic graph]] known as a partial-order graph, which consists of a series of nodes representing possible entries in the columns of an MSA. In this representation a column that is absolutely conserved (that is, that all the sequences in the MSA share a particular character at a particular position) is coded as a single node with as many outgoing connections as there are possible characters in the next column of the alignment. In the terms of a typical hidden Markov model, the observed states are the individual alignment columns and the "hidden" states represent the presumed ancestral sequence from which the sequences in the query set are hypothesized to have descended. An efficient search variant of the dynamic programming method, known as the [[Viterbi algorithm]], is generally used to successively align the growing MSA to the next sequence in the query set to produce a new MSA.<ref name="hughey">{{cite journal |vauthors=Hughey R, Krogh A | year = 1996 | title = Hidden Markov models for sequence analysis: extension and analysis of the basic method | journal = CABIOS | volume = 12 | issue = 2| pages = 95–107 | pmid = 8744772 | doi=10.1093/bioinformatics/12.2.95| citeseerx = 10.1.1.44.3365 }}</ref> This is distinct from progressive alignment methods because the alignment of prior sequences is updated at each new sequence addition. However, like progressive methods, this technique can be influenced by the order in which the sequences in the query set are integrated into the alignment, especially when the sequences are distantly related.<ref name="mount" />
 
Several software programs are available in which variants of HMM-based methods have been implemented and which are noted for their scalability and efficiency, although properly using an HMM method is more complex than using more common progressive methods. The simplest is [http://sourceforge.net/projects/poamsa/files/ POA] (Partial-Order Alignment (POA)<!--this download link is temporary, remember to replace when it's fixed-->;<ref name="grasso">{{cite journal | doi = 10.1093/bioinformatics/bth126 |vauthors=Grasso C, Lee C | year = 2004 | title = Combining partial order alignment and progressive multiple sequence alignment increases alignment speed and scalability to very large alignment problems | journal = Bioinformatics | volume = 20 | issue = 10| pages = 1546–56 | pmid = 14962922 | doi-access = free }}</ref> and a similar but more generalizedgeneral method is implemented in the packages [http://compbio.soe.ucsc.edu/sam.html SAM] (Sequence Alignment and Modeling System (SAM) software package.<ref name="hugheyT">Hughey R, Krogh A. SAM: Sequence alignment and modeling software system. Technical Report UCSC-CRL-96-22, University of California, Santa Cruz, CA, September 1996.</ref> and [[HMMER]].<ref name="durbin">Durbin R, Eddy S, Krogh A, Mitchison G. (1998). Biological sequence analysis: probabilistic models of proteins and nucleic acids, Cambridge University Press, 1998.</ref>
SAM has been used as a source of alignments for [[protein structure prediction]] to participate in the [[CASP]] structure prediction experiment and to develop a database of predicted proteins in the [[yeast]] species ''[[S. cerevisiae]]''. [[HHpred / HHsearch|HHsearch]]<ref>{{ cite journal| author = Söding J | title = Protein homology detection by HMM-HMM comparison| journal = Bioinformatics| year = 2005| volume = 21| issue = 7| pages = 951–960| pmid = 15531603| doi = 10.1093/bioinformatics/bti125| citeseerx = 10.1.1.519.1257}}</ref> is a software package for the detection of remotely related protein sequences based on the pairwise comparison of HMMs. A server running HHsearch ([[HHpred / HHsearch|HHpred]]) was by far the fastest of the 10 best automatic structure prediction servers in the CASP7 and CASP8 structure prediction competitions.<ref>{{ cite journal|vauthors=Battey JN, Kopp J, Bordoli L, Read RJ, Clarke ND, Schwede T | title = Automated server predictions in CASP7| journal = Proteins | year = 2007 | volume = 69 | issue = Suppl 8 | pages = 68–82 | pmid = 17894354| doi = 10.1002/prot.21761 | s2cid = 29879391| doi-access = free }}</ref>
 
===Phylogeny-aware methods===