Content deleted Content added
→Pairwise alignment: merged material from Maximal unique match |
GreenC bot (talk | contribs) Reformat 1 archive link; Removed 6 oxfordjournals.com URLs per discussion. Wayback Medic 2.5 |
||
Line 50:
Global alignments, which attempt to align every residue in every sequence, are most useful when the sequences in the query set are similar and of roughly equal size. (This does not mean global alignments cannot start and/or end in gaps.) A general global alignment technique is the [[Needleman–Wunsch algorithm]], which is based on dynamic programming. Local alignments are more useful for dissimilar sequences that are suspected to contain regions of similarity or similar sequence motifs within their larger sequence context. The [[Smith–Waterman algorithm]] is a general local alignment method based on the same dynamic programming scheme but with additional choices to start and end at any place.<ref name="Polyanovsky2011"/>
Hybrid methods, known as semi-global or "glocal" (short for '''glo'''bal-lo'''cal''') methods, search for the best possible partial alignment of the two sequences (in other words, a combination of one or both starts and one or both ends is stated to be aligned). This can be especially useful when the downstream part of one sequence overlaps with the upstream part of the other sequence. In this case, neither global nor local alignment is entirely appropriate: a global alignment would attempt to force the alignment to extend beyond the region of overlap, while a local alignment might not fully cover the region of overlap.<ref name=brudno>{{cite journal|author1=Brudno M |author2=Malde S |author3=Poliakov A |author4=Do CB |author5=Couronne O |author6=Dubchak I |author7=Batzoglou S | year=2003 | title=Glocal alignment: finding rearrangements during alignment | journal= Bioinformatics | volume=Suppl 1| issue=90001| pages=i54–62| series=19 | pmid = 12855437| doi = 10.1093/bioinformatics/btg1005
Fast expansion of genetic data challenges speed of current DNA sequence alignment algorithms. Essential needs for an efficient and accurate method for DNA variant discovery demand innovative approaches for parallel processing in real time. [[Optical computing]] approaches have been suggested as promising alternatives to the current electrical implementations, yet their applicability remains to be tested [https://onlinelibrary.wiley.com/doi/abs/10.1002/jbio.201900227].
Line 109:
===Iterative methods===
Iterative methods attempt to improve on the heavy dependence on the accuracy of the initial pairwise alignments, which is the weak point of the progressive methods. Iterative methods optimize an [[objective function]] based on a selected alignment scoring method by assigning an initial global alignment and then realigning sequence subsets. The realigned subsets are then themselves aligned to produce the next iteration's multiple sequence alignment. Various ways of selecting the sequence subgroups and objective function are reviewed in.<ref name=hirosawa>{{cite journal | journal=Comput Appl Biosci | volume=11 | pages=13–8 | year=1995 |author1=Hirosawa M |author2=Totoki Y |author3=Hoshida M |author4=Ishikawa M. | title=Comprehensive study on iterative algorithms of multiple sequence alignment
===Motif finding===
Line 117:
[[File:A profile HMM modelling a multiple sequence alignment.png|thumb|A profile HMM modelling a multiple sequence alignment]]
A variety of general [[Optimization (mathematics)|optimization]] algorithms commonly used in computer science have also been applied to the multiple sequence alignment problem. [[Hidden Markov model]]s have been used to produce probability scores for a family of possible multiple sequence alignments for a given query set; although early HMM-based methods produced underwhelming performance, later applications have found them especially effective in detecting remotely related sequences because they are less susceptible to noise created by conservative or semiconservative substitutions.<ref name=karplus>{{cite journal | journal=Bioinformatics | volume=14 | issue=10 | pages= 846–856| year=1998 |author1=Karplus K |author2=Barrett C |author3=Hughey R. | title=Hidden Markov models for detecting remote protein homologies
The [[Burrows–Wheeler transform]] has been successfully applied to fast short read alignment in popular tools such as [[Bowtie (sequence analysis)|Bowtie]] and BWA. See [[FM-index]].
Line 134:
===Combinatorial extension===
The combinatorial extension method of structural alignment generates a pairwise structural alignment by using local geometry to align short fragments of the two proteins being analyzed and then assembles these fragments into a larger alignment.<ref name=shindyalov>{{cite journal | journal=Protein Eng | volume=11 | pages=739–47 | year=1998 |author1=Shindyalov IN |author2=Bourne PE. | title=Protein structure alignment by incremental combinatorial extension (CE) of the optimal path
==Phylogenetic analysis==
Line 148:
Methods of statistical significance estimation for gapped sequence alignments are available in the literature.<ref name="ortet"/><ref name=altschul>{{cite book|author1=Altschul SF |author2=Gish W | year=1996| title=Local Alignment Statistics| journal= Meth.Enz. | volume=266 | pages = 460–480|doi=10.1016/S0076-6879(96)66029-7|pmid=8743700 |series=Methods in Enzymology|isbn=9780121821678}}</ref><ref name=hartmann>{{cite journal| author=Hartmann AK| year=2002| title=Sampling rare events: statistics of local sequence alignments|
journal= Phys. Rev. E| volume=65| page=056102|doi=10.1103/PhysRevE.65.056102| pmid=12059642| issue=5|arxiv=cond-mat/0108201|bibcode=2002PhRvE..65e6102H| s2cid=193085| url=https://www.semanticscholar.org/paper/bedd73ed63f6f8ea1985360f0d725630fe0f3fc3}}</ref><ref name=newberg>{{cite journal| author=Newberg LA | year=2008 | title=Significance of gapped sequence alignments | journal= J Comput Biol| volume=15| pages=1187–1194 | pmid = 18973434 | doi=10.1089/cmb.2008.0125| issue=9| pmc=2737730}}</ref><ref name=eddy>{{cite journal| author=Eddy SR| year=2008 | title=A probabilistic model of local sequence alignment that simplifies statistical significance estimation | journal= PLOS Comput Biol | volume=4| editor1-first=Burkhard| pages=e1000069 | pmid = 18516236| editor1-last=Rost | doi=10.1371/journal.pcbi.1000069| issue=5| pmc=2396288| last2=Rost| first2=Burkhard| bibcode=2008PLSCB...4E0069E| s2cid=15640896 }}</ref><ref name=bastien>{{cite journal|author1=Bastien O |author2=Aude JC |author3=Roy S |author4=Marechal E | year=2004 | title=Fundamentals of massive automatic pairwise alignments of protein sequences: theoretical significance of Z-value statistics | journal= Bioinformatics | volume=20| issue=4| pages=534–537| pmid = 14990449| doi = 10.1093/bioinformatics/btg440
===Assessment of credibility===
Line 168:
A more complete list of available software categorized by algorithm and alignment type is available at [[sequence alignment software]], but common software tools used for general sequence alignment tasks include ClustalW2<ref>{{cite web|url=http://www.ebi.ac.uk/Tools/msa/clustalw2/|title=ClustalW2 < Multiple Sequence Alignment < EMBL-EBI|last=EMBL-EBI|website=www.EBI.ac.uk|access-date=12 June 2017}}</ref> and T-coffee<ref>[https://web.archive.org/web/20080918022531/http://tcoffee.vital-it.ch/cgi-bin/Tcoffee/tcoffee_cgi/index.cgi T-coffee]</ref> for alignment, and BLAST<ref>{{cite web|url=http://blast.ncbi.nlm.nih.gov/Blast.cgi|title=BLAST: Basic Local Alignment Search Tool|website=blast.ncbi.nlm.NIH.gov|access-date=12 June 2017}}</ref> and FASTA3x<ref>{{cite web|url=http://fasta.bioch.virginia.edu/fasta_www2/fasta_list2.shtml|title=UVA FASTA Server|website=fasta.bioch.Virginia.edu|access-date=12 June 2017}}</ref> for database searching. Commercial tools such as [[DNASTAR|DNASTAR Lasergene]], [[Geneious]], and [[PatternHunter]] are also available. Tools annotated as performing [http://edamontology.org/operation_0292 sequence alignment] are listed in the [https://bio.tools/?page=1&function=%22Sequence%20alignment%22&sort=score bio.tools] registry.
Alignment algorithms and software can be directly compared to one another using a standardized set of [[Benchmark (computing)|benchmark]] reference multiple sequence alignments known as BAliBASE.<ref name=thompson2>{{cite journal | journal=Bioinformatics | volume=15 | pages=87–8 | year=1999 |author1=Thompson JD |author2=Plewniak F |author3=Poch O | title=BAliBASE: a benchmark alignment database for the evaluation of multiple alignment programs
==See also==
|