Multiple sequence alignment: Difference between revisions

Content deleted Content added
m Cut needless carriage return whitespace characters in sections: to standardize, aid work via small screens. MOS:FIRSTABBReviations clarify, define before WP:ABBRs in parentheses.
GreenC bot (talk | contribs)
Rescued 1 archive link. Wayback Medic 2.5 per WP:URLREQ#nih.gov
Line 49:
A direct method for producing an MSA uses the [[dynamic programming]] technique to identify the globally optimal alignment solution. For proteins, this method usually involves two sets of parameters: a [[gap penalty]] and a [[substitution matrix]] assigning scores or probabilities to the alignment of each possible pair of amino acids based on the similarity of the amino acids' chemical properties and the evolutionary probability of the mutation. For nucleotide sequences, a similar gap penalty is used, but a much simpler substitution matrix, wherein only identical matches and mismatches are considered, is typical. The scores in the substitution matrix may be either all positive or a mix of positive and negative in the case of a global alignment, but must be both positive and negative, in the case of a local alignment.<ref>{{cite web|title=Help with matrices used in sequence comparison tools|url=http://www.ebi.ac.uk/help/matrix.html|url-status=dead|archive-url=https://web.archive.org/web/20100311140200/http://www.ebi.ac.uk/help/matrix.html|archive-date=March 11, 2010|access-date=March 3, 2010|publisher=European Bioinformatics Institute}}</ref>
 
For ''n'' individual sequences, the naive method requires constructing the ''n''-dimensional equivalent of the matrix formed in standard pairwise [[sequence alignment]]. The search space thus increases exponentially with increasing ''n'' and is also strongly dependent on sequence length. Expressed with the [[big O notation]] commonly used to measure [[Computational complexity theory|computational complexity]], a [[Naïve algorithm|naïve]] MSA takes ''O(Length<sup>Nseqs</sup>)'' time to produce. To find the global optimum for ''n'' sequences this way has been shown to be an [[NP-completeness|NP-complete]] problem.<ref name="wang">{{cite journal|vauthors=Wang L, Jiang T|year=1994|title=On the complexity of multiple sequence alignment|journal=J Comput Biol|volume=1|issue=4|pages=337–348|citeseerx=10.1.1.408.894|doi=10.1089/cmb.1994.1.337|pmid=8790475}}</ref><ref name="just">{{cite journal|author=Just W|year=2001|title=Computational complexity of multiple sequence alignment with SP-score|journal=J Comput Biol|volume=8|issue=6|pages=615–23|citeseerx=10.1.1.31.6382|doi=10.1089/106652701753307511|pmid=11747615}}</ref><ref name="elias">{{cite journal|author=Elias, Isaac|year=2006|title=Settling the intractability of multiple alignment|journal=J Comput Biol|volume=13|issue=7|pages=1323–1339|citeseerx=10.1.1.6.256|doi=10.1089/cmb.2006.13.1323|pmid=17037961}}</ref> In 1989, based on Carrillo-Lipman Algorithm,<ref name="carrillo">{{cite journal|vauthors=Carrillo H, Lipman DJ|year=1988|title=The Multiple Sequence Alignment Problem in Biology|url=https://zenodo.org/record/1236134|journal=SIAM Journal on Applied Mathematics|volume=48|issue=5|pages=1073–1082|doi=10.1137/0148063}}</ref> Altschul introduced a practical method that uses pairwise alignments to constrain the n-dimensional search space.<ref name="altschul">{{cite journal|vauthors=Lipman DJ, Altschul SF, Kececioglu JD|year=1989|title=A tool for multiple sequence alignment|journal=Proc Natl Acad Sci U S A|volume=86|issue=12|pages=4412–4415|bibcode=1989PNAS...86.4412L|doi=10.1073/pnas.86.12.4412|pmc=287279|pmid=2734293|doi-access=free}}</ref> In this approach pairwise dynamic programming alignments are performed on each pair of sequences in the query set, and only the space near the n-dimensional intersection of these alignments is searched for the n-way alignment. The MSA program optimizes the sum of all of the pairs of characters at each position in the alignment (the so-called ''sum of pair'' score) and has been implemented in a software program for constructing multiple sequence alignments.<ref>{{cite web|title=Genetic analysis software|url=https://www.ncbi.nlm.nih.gov/CBBresearch/Schaffer/msa.html|archive-url=https://web.archive.org/web/20000119082433/http://www.ncbi.nlm.nih.gov/CBBresearch/Schaffer/msa.html|url-status=dead|archive-date=January 19, 2000|access-date=March 3, 2010|publisher=National Center for Biotechnology Information}}</ref> In 2019, Hosseininasab and van Hoeve showed that by using decision diagrams, MSA may be modeled in polynomial space complexity.<ref name="hosseininasab"/>
 
===Progressive alignment construction===