Content deleted Content added
m →Dynamic programming: Same spelling as on the linked page. |
m Open access bot: doi added to citation with #oabot. |
||
Line 38:
Global alignments, which attempt to align every residue in every sequence, are most useful when the sequences in the query set are similar and of roughly equal size. (This does not mean global alignments cannot start and/or end in gaps.) A general global alignment technique is the [[Needleman–Wunsch algorithm]], which is based on dynamic programming. Local alignments are more useful for dissimilar sequences that are suspected to contain regions of similarity or similar sequence motifs within their larger sequence context. The [[Smith–Waterman algorithm]] is a general local alignment method based on the same dynamic programming scheme but with additional choices to start and end at any place.<ref name="Polyanovsky2011"/>
Hybrid methods, known as semi-global or "glocal" (short for '''glo'''bal-lo'''cal''') methods, search for the best possible partial alignment of the two sequences (in other words, a combination of one or both starts and one or both ends is stated to be aligned). This can be especially useful when the downstream part of one sequence overlaps with the upstream part of the other sequence. In this case, neither global nor local alignment is entirely appropriate: a global alignment would attempt to force the alignment to extend beyond the region of overlap, while a local alignment might not fully cover the region of overlap.<ref name=brudno>{{cite journal|author1=Brudno M |author2=Malde S |author3=Poliakov A |author4=Do CB |author5=Couronne O |author6=Dubchak I |author7=Batzoglou S | year=2003 | title=Glocal alignment: finding rearrangements during alignment | journal= Bioinformatics | volume=Suppl 1| issue=90001| pages=i54–62| series=19 | pmid = 12855437| doi = 10.1093/bioinformatics/btg1005 | url=http://bioinformatics.oxfordjournals.org/cgi/pmidlookup?view=long&pmid=12855437| doi-access=free }}</ref> Another case where semi-global alignment is useful is when one sequence is short (for example a gene sequence) and the other is very long (for example a chromosome sequence). In that case, the short sequence should be globally (fully) aligned but only a local (partial) alignment is desired for the long sequence.
Fast expansion of genetic data challenges speed of current DNA sequence alignment algorithms. Essential needs for an efficient and accurate method for DNA variant discovery demand innovative approaches for parallel processing in real time. [[Optical computing]] approaches have been suggested as promising alternatives to the current electrical implementations, yet their applicability remains to be tested [https://onlinelibrary.wiley.com/doi/abs/10.1002/jbio.201900227].
Line 94:
===Techniques inspired by computer science===
A variety of general [[Optimization (mathematics)|optimization]] algorithms commonly used in computer science have also been applied to the multiple sequence alignment problem. [[Hidden Markov model]]s have been used to produce probability scores for a family of possible multiple sequence alignments for a given query set; although early HMM-based methods produced underwhelming performance, later applications have found them especially effective in detecting remotely related sequences because they are less susceptible to noise created by conservative or semiconservative substitutions.<ref name=karplus>{{cite journal | journal=Bioinformatics | volume=14 | issue=10 | pages= 846–856| year=1998 |author1=Karplus K |author2=Barrett C |author3=Hughey R. | title=Hidden Markov models for detecting remote protein homologies | url=http://bioinformatics.oxfordjournals.org/cgi/pmidlookup?view=long&pmid=9927713 | pmid=9927713 | doi = 10.1093/bioinformatics/14.10.846 | doi-access=free }}</ref> [[Genetic algorithm]]s and [[simulated annealing]] have also been used in optimizing multiple sequence alignment scores as judged by a scoring function like the sum-of-pairs method. More complete details and software packages can be found in the main article [[multiple sequence alignment]].
The [[Burrows–Wheeler transform]] has been successfully applied to fast short read alignment in popular tools such as [[Bowtie (sequence analysis)|Bowtie]] and BWA. See [[FM-index]].
Line 111:
===Combinatorial extension===
The combinatorial extension method of structural alignment generates a pairwise structural alignment by using local geometry to align short fragments of the two proteins being analyzed and then assembles these fragments into a larger alignment.<ref name=shindyalov>{{cite journal | journal=Protein Eng | volume=11 | pages=739–47 | year=1998 |author1=Shindyalov IN |author2=Bourne PE. | title=Protein structure alignment by incremental combinatorial extension (CE) of the optimal path | url=http://peds.oxfordjournals.org/cgi/pmidlookup?view=long&pmid=9796821 | pmid=9796821 | doi = 10.1093/protein/11.9.739 | issue=9 | doi-access=free }}</ref> Based on measures such as rigid-body [[Root mean square deviation (bioinformatics)|root mean square distance]], residue distances, local secondary structure, and surrounding environmental features such as residue neighbor [[hydrophobic]]ity, local alignments called "aligned fragment pairs" are generated and used to build a similarity matrix representing all possible structural alignments within predefined cutoff criteria. A path from one protein structure state to the other is then traced through the matrix by extending the growing alignment one fragment at a time. The optimal such path defines the combinatorial-extension alignment. A web-based server implementing the method and providing a database of pairwise alignments of structures in the Protein Data Bank is located at the [https://web.archive.org/web/19981203071023/http://cl.sdsc.edu/ Combinatorial Extension] website.
==Phylogenetic analysis==
Line 125:
Methods of statistical significance estimation for gapped sequence alignments are available in the literature.<ref name="ortet"/><ref name=altschul>{{cite book|author1=Altschul SF |author2=Gish W | year=1996| title=Local Alignment Statistics| journal= Meth.Enz. | volume=266 | pages = 460–480|doi=10.1016/S0076-6879(96)66029-7|pmid=8743700 |series=Methods in Enzymology|isbn=9780121821678}}</ref><ref name=hartmann>{{cite journal| author=Hartmann AK| year=2002| title=Sampling rare events: statistics of local sequence alignments|
journal= Phys. Rev. E| volume=65| page=056102|doi=10.1103/PhysRevE.65.056102| pmid=12059642| issue=5|arxiv=cond-mat/0108201|bibcode=2002PhRvE..65e6102H| url=https://www.semanticscholar.org/paper/bedd73ed63f6f8ea1985360f0d725630fe0f3fc3}}</ref><ref name=newberg>{{cite journal| author=Newberg LA | year=2008 | title=Significance of gapped sequence alignments | journal= J Comput Biol| volume=15| pages=1187–1194 | pmid = 18973434 | doi=10.1089/cmb.2008.0125| nopp=true| issue=9| pmc=2737730}}</ref><ref name=eddy>{{cite journal| author=Eddy SR| year=2008 | title=A probabilistic model of local sequence alignment that simplifies statistical significance estimation | journal= PLOS Comput Biol | volume=4| editor1-first=Burkhard| pages=e1000069 | pmid = 18516236| editor1-last=Rost | doi=10.1371/journal.pcbi.1000069| issue=5| pmc=2396288| last2=Rost| first2=Burkhard| bibcode=2008PLSCB...4E0069E| url=https://www.semanticscholar.org/paper/0b66a33b74518b0f0e46d5157a3b571035aab40e }}</ref><ref name=bastien>{{cite journal|author1=Bastien O |author2=Aude JC |author3=Roy S |author4=Marechal E | year=2004 | title=Fundamentals of massive automatic pairwise alignments of protein sequences: theoretical significance of Z-value statistics | journal= Bioinformatics | volume=20| issue=4| pages=534–537| pmid = 14990449| doi = 10.1093/bioinformatics/btg440 | url=http://bioinformatics.oxfordjournals.org/content/20/4/534.long| doi-access=free }}</ref><ref name=agrawal11>{{cite journal|author1=Agrawal A |author2=Huang X | year=2011| title=Pairwise Statistical Significance of Local Sequence Alignment Using Sequence-Specific and Position-Specific Substitution Matrices|journal= IEEE/ACM Transactions on Computational Biology and Bioinformatics| volume=8| pages=194–205|doi=10.1109/TCBB.2009.69|pmid=21071807 | issue=1|url=https://www.semanticscholar.org/paper/765f9333d5af7274c0a44b39407f78c1dcdfab0f }}</ref><ref name=agrawal08>{{cite journal| author1=Agrawal A| author2=Brendel VP| author3=Huang X| year=2008| title=Pairwise statistical significance and empirical determination of effective gap opening penalties for protein local sequence alignment| journal=International Journal of Computational Biology and Drug Design| volume=1| pages=347–367| doi=10.1504/IJCBDD.2008.022207| pmid=20063463| url=http://inderscience.metapress.com/content/1558538106522500/| issue=4| url-status=dead| archiveurl=https://archive.is/20130128163812/http://inderscience.metapress.com/content/1558538106522500/| archivedate=28 January 2013| df=dmy-all}}</ref>
===Assessment of credibility===
Line 145:
A more complete list of available software categorized by algorithm and alignment type is available at [[sequence alignment software]], but common software tools used for general sequence alignment tasks include ClustalW2<ref>{{cite web|url=http://www.ebi.ac.uk/Tools/msa/clustalw2/|title=ClustalW2 < Multiple Sequence Alignment < EMBL-EBI|last=EMBL-EBI|date=|website=www.EBI.ac.uk|access-date=12 June 2017}}</ref> and T-coffee<ref>[https://web.archive.org/web/20080918022531/http://tcoffee.vital-it.ch/cgi-bin/Tcoffee/tcoffee_cgi/index.cgi T-coffee]</ref> for alignment, and BLAST<ref>{{cite web|url=http://blast.ncbi.nlm.nih.gov/Blast.cgi|title=BLAST: Basic Local Alignment Search Tool|author=|date=|website=blast.ncbi.nlm.NIH.gov|access-date=12 June 2017}}</ref> and FASTA3x<ref>{{cite web|url=http://fasta.bioch.virginia.edu/fasta_www2/fasta_list2.shtml|title=UVA FASTA Server|author=|date=|website=fasta.bioch.Virginia.edu|access-date=12 June 2017}}</ref> for database searching. Commercial tools such as [[DNASTAR|DNASTAR Lasergene]], [[Geneious]], and [[PatternHunter]] are also available. Tools annotated as performing [http://edamontology.org/operation_0292 sequence alignment] are listed in the [https://bio.tools/?page=1&function=%22Sequence%20alignment%22&sort=score bio.tools] registry.
Alignment algorithms and software can be directly compared to one another using a standardized set of [[Benchmark (computing)|benchmark]] reference multiple sequence alignments known as BAliBASE.<ref name=thompson2>{{cite journal | journal=Bioinformatics | volume=15 | pages=87–8 | year=1999 |author1=Thompson JD |author2=Plewniak F |author3=Poch O | title=BAliBASE: a benchmark alignment database for the evaluation of multiple alignment programs | url=http://bioinformatics.oxfordjournals.org/cgi/pmidlookup?view=long&pmid=10068696 | pmid=10068696 | doi = 10.1093/bioinformatics/15.1.87 | issue=1 | doi-access=free }}</ref> The data set consists of structural alignments, which can be considered a standard against which purely sequence-based methods are compared. The relative performance of many common alignment methods on frequently encountered alignment problems has been tabulated and selected results published online at BAliBASE.<ref>[https://web.archive.org/web/20121130084356/http://bips.u-strasbg.fr/fr/Products/Databases/BAliBASE/prog_scores.html BAliBASE]</ref><ref name=thompson3>{{cite journal | journal=Nucleic Acids Res | volume=27 | pages=2682–90 | year=1999 |author1=Thompson JD |author2=Plewniak F |author3=Poch O. | title=A comprehensive comparison of multiple sequence alignment programs | url=http://nar.oxfordjournals.org/cgi/pmidlookup?view=long&pmid=10373585 | pmid=10373585 | doi = 10.1093/nar/27.13.2682 | issue=13 | pmc=148477 }}</ref> A comprehensive list of BAliBASE scores for many (currently 12) different alignment tools can be computed within the protein workbench STRAP.<ref>{{cite web|url=http://3d-alignment.eu/|title=Multiple sequence alignment: Strap|author=|date=|website=3d-alignment.eu|access-date=12 June 2017}}</ref>
==See also==
|