Sequence alignment: Difference between revisions

Content deleted Content added
m remove redundant URL
Line 55:
 
==Pairwise alignment==
Pairwise sequence alignment methods are used to find the best-matching piecewise (local or global) alignments of two query sequences. Pairwise alignments can only be used between two sequences at a time, but they are efficient to calculate and are often used for methods that do not require extreme precision (such as searching a database for sequences with high similarity to a query). The three primary methods of producing pairwise alignments are dot-matrix methods, dynamic programming, and word methods;<ref name="mount"/> however, multiple sequence alignment techniques can also align pairs of sequences. Although each method has its individual strengths and weaknesses, all three pairwise methods have difficulty with highly repetitive sequences of low [[information content]] - especially where the number of repetitions differ in the two sequences to be aligned. One way of quantifying the utility of a given pairwise alignment is the 'maximum

===Maximal unique match' (MUM), or the longest subsequence that occurs in both query sequences. Longer MUM sequences typically reflect closer relatedness.===
One way of quantifying the utility of a given pairwise alignment is the 'maximal unique match' (MUM), or the longest subsequence that occurs in both query sequences. Longer MUM sequences typically reflect closer relatedness.<ref name="Alignment of whole genomes">{{cite journal |last1=Delcher |first1=A. L. |last2=Kasif |first2=S. |last3=Fleishmann |first3=R.D. |last4=Peterson |first4=J. |last5=White |first5=O. |last6=Salzberg |first6=S.L. |title=Alignment of whole genomes |journal=Nucleic Acids Research |date=1999 |volume=27 |issue=11 |pages=2369-2376 |doi=10.1093/nar/30.11.2478 |pmid=10325427}}</ref> in the [[multiple sequence alignment]] of [[genomes]] in [[computational biology]]. Identification of MUMs and other potential anchors, is the first step in larger alignment systems such as [[MUMmer]]. Anchors are the areas between two genomes where they are highly similar. To understand what a MUM is we can break down each word in the acronym. Match implies that the substring occurs in both sequences to be aligned. Unique means that the substring occurs only once in each sequence. Finally, maximal states that the substring is not part of another larger string that fulfills both prior requirements. The idea behind this, is that long sequences that match exactly and occur only once in each genome are almost certainly part of the global alignment.
 
More precisely:
 
<blockquote>"Given two genomes A and B, Maximal Unique Match (MUM) substring is a common substring of A and B of length longer than a specified minimum length d (by default d= 20) such that
* it is maximal, that is, it cannot be extended on either end without incurring a mismatch; and
* it is unique in both sequences"<ref name="Algorithms in Bioinformatics">{{cite book |last1=Wing-Kin |first1=Sung |title=Algorithms in Bioinformatics: A Practical Introduction |date=2010 |publisher=Chapman & Hall/CRC Press |___location=Boca Raton |isbn=978-1420070330 |edition=First}}</ref></blockquote>
 
===Dot-matrix methods===