Content deleted Content added
Open access status updates in citations with OAbot #oabot |
→Maximal unique match: Deleted errant mid-sentence period. Tags: Mobile edit Mobile web edit |
||
(3 intermediate revisions by 3 users not shown) | |||
Line 97:
===Maximal unique match===
One way of quantifying the utility of a given pairwise alignment is the '[[maximal unique match]]' (MUM), or the longest subsequence that occurs in both query sequences. Longer MUM sequences typically reflect closer relatedness
More precisely:
Line 131:
Word methods, also known as ''k''-tuple methods, are [[heuristic]] methods that are not guaranteed to find an optimal alignment solution, but are significantly more efficient than dynamic programming. These methods are especially useful in large-scale database searches where it is understood that a large proportion of the candidate sequences will have essentially no significant match with the query sequence. Word methods are best known for their implementation in the database search tools [[FASTA]] and the [[BLAST (biotechnology)|BLAST]] family.<ref name=mount/> Word methods identify a series of short, nonoverlapping subsequences ("words") in the query sequence that are then matched to candidate database sequences. The relative positions of the word in the two sequences being compared are subtracted to obtain an offset; this will indicate a region of alignment if multiple distinct words produce the same offset. Only if this region is detected do these methods apply more sensitive alignment criteria; thus, many unnecessary comparisons with sequences of no appreciable similarity are eliminated.
In the FASTA method, the user defines a value ''k'' to use as the word length with which to search the database. The method is slower but more sensitive at lower values of ''k'', which are also preferred for searches involving a very short query sequence. The BLAST family of search methods provides a number of algorithms optimized for particular types of queries, such as searching for distantly related sequence matches. BLAST was developed to provide a faster alternative to FASTA without sacrificing much accuracy; like FASTA, BLAST uses a word search of length ''k'', but evaluates only the most significant word matches, rather than every word match as does FASTA. Most BLAST implementations use a fixed default word length that is optimized for the query and database type, and that is changed only under special circumstances, such as when searching with repetitive or very short query sequences. Implementations can be found via a number of web portals, such as [http://www.ebi.ac.uk/fasta33/ EMBL FASTA] and [https://web.archive.org/web/19970615060854/http://www.ncbi.nlm.nih.gov/BLAST/ NCBI BLAST].
==Multiple sequence alignment==
Line 139:
===Dynamic programming===
The technique of dynamic programming is theoretically applicable to any number of sequences; however, because it is computationally expensive in both time and [[computer memory|memory]], it is rarely used for more than three or four sequences in its most basic form. This method requires constructing the ''n''-dimensional equivalent of the sequence matrix formed from two sequences, where ''n'' is the number of sequences in the query. Standard dynamic programming is first used on all pairs of query sequences and then the "alignment space" is filled in by considering possible matches or gaps at intermediate positions, eventually constructing an alignment essentially between each two-sequence alignment. Although this technique is computationally expensive, its guarantee of a global optimum solution is useful in cases where only a few sequences need to be aligned accurately. One method for reducing the computational demands of dynamic programming, which relies on the "sum of pairs" [[objective function]], has been implemented in the [https://archive.today/20120805063524/http://www.ncbi.nlm.nih.gov/CBBresearch/Schaffer/msa.html MSA] software package.<ref name=lipman>{{cite journal | journal=Proc Natl Acad Sci USA | volume=86 | pages=4412–5 | year=1989 |author1=Lipman DJ |author2=Altschul SF |author3=Kececioglu JD | title=A tool for multiple sequence alignment | pmid=2734293 | doi=10.1073/pnas.86.12.4412 | issue=12 | pmc=287279 | bibcode=1989PNAS...86.4412L | doi-access=free }}</ref>
===Progressive methods===
Line 172:
===Combinatorial extension===
The combinatorial extension method of structural alignment generates a pairwise structural alignment by using local geometry to align short fragments of the two proteins being analyzed and then assembles these fragments into a larger alignment.<ref name=shindyalov>{{cite journal | journal=Protein Eng | volume=11 | pages=739–47 | year=1998 |author1=Shindyalov IN |author2=Bourne PE. | title=Protein structure alignment by incremental combinatorial extension (CE) of the optimal path | pmid=9796821 | doi = 10.1093/protein/11.9.739 | issue=9 | doi-access=
==Phylogenetic analysis==
Line 200:
==Non-biological uses==
The methods used for biological sequence alignment have also found applications in other fields, most notably in [[natural language processing]] and in [[Sequence analysis in social sciences|social sciences]], where the [[Needleman-Wunsch algorithm]] is usually referred to as [[Optimal matching]].<ref>{{cite journal|author1=Abbott A. |author2=Tsay A. | year=2000 | title=Sequence Analysis and Optimal Matching Methods in Sociology, Review and Prospect | journal=Sociological Methods and Research | volume=29|issue=1 | pages=3–33 | doi=10.1177/0049124100029001001|s2cid=121097811 }}</ref> Techniques that generate the set of elements from which words will be selected in [[natural language generation|natural-language generation]] algorithms have borrowed multiple sequence alignment techniques from bioinformatics to produce linguistic versions of [[automated theorem proving|computer-generated mathematical proofs]].<ref name=Barzilay>{{cite book|author1=Barzilay R |author2=Lee L. |title=Proceedings of the ACL-02 conference on Empirical methods in natural language processing - EMNLP '02 |chapter=Bootstrapping lexical choice via multiple-sequence alignment |year=2002 | pages=164–171 | chapter-url=
==Software==
{{Main|Sequence alignment software}}
A more complete list of available software categorized by algorithm and alignment type is available at [[sequence alignment software]], but common software tools used for general sequence alignment tasks include ClustalW2<ref>{{cite web|url=http://www.ebi.ac.uk/Tools/msa/clustalw2/|title=ClustalW2 < Multiple Sequence Alignment < EMBL-EBI|last=EMBL-EBI|website=www.EBI.ac.uk|access-date=12 June 2017}}</ref> and T-coffee<ref>[https://web.archive.org/web/20080918022531/http://tcoffee.vital-it.ch/cgi-bin/Tcoffee/tcoffee_cgi/index.cgi T-coffee]</ref> for alignment, and BLAST<ref>{{cite web|url=
Alignment algorithms and software can be directly compared to one another using a standardized set of [[Benchmark (computing)|benchmark]] reference multiple sequence alignments known as BAliBASE.<ref name=thompson2>{{cite journal | journal=Bioinformatics | volume=15 | pages=87–8 | year=1999 |author1=Thompson JD |author2=Plewniak F |author3=Poch O | title=BAliBASE: a benchmark alignment database for the evaluation of multiple alignment programs | pmid=10068696 | doi = 10.1093/bioinformatics/15.1.87 | issue=1 | doi-access=free }}</ref> The data set consists of structural alignments, which can be considered a standard against which purely sequence-based methods are compared. The relative performance of many common alignment methods on frequently encountered alignment problems has been tabulated and selected results published online at BAliBASE.<ref>[https://web.archive.org/web/20121130084356/http://bips.u-strasbg.fr/fr/Products/Databases/BAliBASE/prog_scores.html BAliBASE]</ref><ref name=thompson3>{{cite journal | journal=Nucleic Acids Res | volume=27 | pages=2682–90 | year=1999 |author1=Thompson JD |author2=Plewniak F |author3=Poch O. | title=A comprehensive comparison of multiple sequence alignment programs | url= | pmid=10373585 | doi = 10.1093/nar/27.13.2682 | issue=13 | pmc=148477 }}</ref> A comprehensive list of BAliBASE scores for many (currently 12) different alignment tools can be computed within the protein workbench STRAP.<ref>{{cite web|url=http://3d-alignment.eu/|title=Multiple sequence alignment: Strap|website=3d-alignment.eu|access-date=12 June 2017}}</ref>
|