Multiple sequence alignment: Difference between revisions

Content deleted Content added
HMM, GA, SA methods
m Dynamic programming and computational complexity: not just NP-hard, it's NP-complete
Line 10:
The most direct method for producing an MSA uses the [[dynamic programming]] technique to identify the globally optimal alignment solution. For proteins, this method usually involves two sets of parameters: a [[gap penalty]] and a [[substitution matrix]] assigning scores or probabilities to the alignment of each possible pair of amino acids based on the similarity of the amino acids' chemical properties and the evolutionary probability of the mutation. For nucleotide sequences a substitution matrix can be used, but since there are only four possible standard characters per sequence and the individual nucleotides do not typically differ much in substitution probability, the parameters for DNA and RNA sequences usually consist of a gap penalty, a positive score for character matches, and a negative score for mismatches.
 
For ''n'' individual sequences, the method requires constructing the ''n''-dimensional equivalent of the matrix formed in standard pairwise dynamic programming. The search space thus increases exponentially with increasing ''n'' and is also strongly dependent on sequence length. To find the global optimum for ''n'' sequences this way has been shown to be an [[NP-hardcomplete]] problem{{ref|Wang}}{{ref|Just}}. Methods to reduce the search space by first performing pairwise dynamic programming on each pair of sequences in the query set and searching only the solution space near these results (effectively finding the intersection between local paths immediately surrounding each pairwise optimum solution) render the dynamic programming technique more efficient. The so-called "sum of pairs" method has been implemented in the software package [http://www.ncbi.nlm.nih.gov/CBBresearch/Schaffer/msa.html MSA], but it is still impractical for many MSA applications that require the simultaneous alignment of dozens or even a few hundred sequences. Dynamic programming methods are now used only when an extremely high-quality alignment of a small number of sequences is needed, and as a [[benchmark]]ing standard in evaluating new or refined heuristic techniques.
 
==Progressive alignment construction==