Sequence alignment: Difference between revisions

Content deleted Content added
m ""Edit ""this is a unusual mistake in syntax..
Tags: Visual edit Mobile edit Mobile web edit
Alter: author2. Add: pmid, author pars. 1-2. Removed URL that duplicated unique identifier. Removed parameters. Some additions/deletions were actually parameter name changes.| You can use this tool yourself. Report bugs here.
Line 14:
| pmid = 22032267
| year = 2011
| author1last1 = Polyanovsky
| first1 = V. O.
| title = Comparative analysis of the quality of a global algorithm and a local algorithm for alignment of two sequences
Line 82:
Progressive, hierarchical, or tree methods generate a multiple sequence alignment by first aligning the most similar sequences and then adding successively less related sequences or groups to the alignment until the entire query set has been incorporated into the solution. The initial tree describing the sequence relatedness is based on pairwise comparisons that may include heuristic pairwise alignment methods similar to [[FASTA]]. Progressive alignment results are dependent on the choice of "most related" sequences and thus can be sensitive to inaccuracies in the initial pairwise alignments. Most progressive multiple sequence alignment methods additionally weight the sequences in the query set according to their relatedness, which reduces the likelihood of making a poor choice of initial sequences and thus improves alignment accuracy.
 
Many variations of the [[Clustal]] progressive implementation<ref name=higgins>{{cite journal | journal=Gene | volume=73 | issue=1 | pages=237–44 | year=1988 | author=[[Desmond G. Higgins|Higgins DG]], Sharp PM | title=CLUSTAL: a package for performing multiple sequence alignment on a microcomputer | url=http://linkinghub.elsevier.com/retrieve/pii/0378-1119(88)90330-7 | pmid=3243435 | doi = 10.1016/0378-1119(88)90330-7 }}</ref><ref name=thompson>{{cite journal | journal=Nucleic Acids Res | volume=22 | pages=4673–80 | year=1994 | author1=Thompson JD| author2-link= [[Desmond G. Higgins |author2= Higgins DG]]|author3= Gibson TJ. | title=CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice | pmid=7984417 |pmc=308517 |url=http://nar.oxfordjournals.org/content/22/22/4673 |doi=10.1093/nar/22.22.4673 | issue=22 }}</ref><ref name=chenna>{{cite journal | journal=Nucleic Acids Res | volume=31 | pages=3497–500 | year=2003 |author1=Chenna R |author2=Sugawara H |author3=Koike T |author4=Lopez R |author5=Gibson TJ |author6=Higgins DG |author7=Thompson JD. | title=Multiple sequence alignment with the Clustal series of programs | url=http://nar.oxfordjournals.org/cgi/pmidlookup?view=long&pmid=12824352 | pmid=12824352 | doi = 10.1093/nar/gkg500 | issue=13 | pmc=168907 }}</ref> are used for multiple sequence alignment, phylogenetic tree construction, and as input for [[protein structure prediction]]. A slower but more accurate variant of the progressive method is known as [[T-Coffee]].<ref name=notredame>{{cite journal | journal=J Mol Biol | volume=302 | issue=1 | pages=205–17 | year=2000 | author1=Notredame C| author2-link= [[Desmond G. Higgins |author2= Higgins DG]]|author3= Heringa J. | title=T-Coffee: A novel method for fast and accurate multiple sequence alignment | url=http://linkinghub.elsevier.com/retrieve/pii/S0022-2836(00)94042-7 | pmid=10964570 | doi = 10.1006/jmbi.2000.4042 }}</ref>
 
===Iterative methods===
Line 121:
In database searches such as BLAST, statistical methods can determine the likelihood of a particular alignment between sequences or sequence regions arising by chance given the size and composition of the database being searched. These values can vary significantly depending on the search space. In particular, the likelihood of finding a given alignment by chance increases if the database consists only of sequences from the same organism as the query sequence. Repetitive sequences in the database or query can also distort both the search results and the assessment of statistical significance; BLAST automatically filters such repetitive sequences in the query to avoid apparent hits that are statistical artifacts.
 
Methods of statistical significance estimation for gapped sequence alignments are available in the literature.<ref name="ortet"/><ref name=altschul>{{cite book|author1=Altschul SF |author2=Gish W | year=1996| title=Local Alignment Statistics| journal= Meth.Enz. | volume=266 | pages = 460–480|doi=10.1016/S0076-6879(96)66029-7|pmid=8743700 |series=Methods in Enzymology|isbn=9780121821678}}</ref><ref name=hartmann>{{cite journal| author=Hartmann AK| year=2002| title=Sampling rare events: statistics of local sequence alignments|
journal= Phys. Rev. E| volume=65| page=056102|doi=10.1103/PhysRevE.65.056102| pmid=12059642| issue=5|arxiv=cond-mat/0108201|bibcode=2002PhRvE..65e6102H}}</ref><ref name=newberg>{{cite journal| author=Newberg LA | year=2008 | title=Significance of gapped sequence alignments | journal= J Comput Biol| volume=15| pages=1187–1194 | pmid = 18973434 | doi=10.1089/cmb.2008.0125| nopp=true| issue=9| pmc=2737730}}</ref><ref name=eddy>{{cite journal| author=Eddy SR| year=2008 | title=A probabilistic model of local sequence alignment that simplifies statistical significance estimation | journal= PLoS Comput Biol | volume=4| editor1-first=Burkhard| pages=e1000069 | pmid = 18516236| editor1-last=Rost | doi=10.1371/journal.pcbi.1000069| issue=5| pmc=2396288| last2=Rost| first2=Burkhard| bibcode=2008PLSCB...4E0069E}}</ref><ref name=bastien>{{cite journal|author1=Bastien O |author2=Aude JC |author3=Roy S |author4=Marechal E | year=2004 | title=Fundamentals of massive automatic pairwise alignments of protein sequences: theoretical significance of Z-value statistics | journal= Bioinformatics | volume=20| issue=4| pages=534–537| pmid = 14990449| doi = 10.1093/bioinformatics/btg440 | url=http://bioinformatics.oxfordjournals.org/content/20/4/534.long}}</ref><ref name=agrawal11>{{cite journal|author1=Agrawal A |author2=Huang X | year=2011| title=Pairwise Statistical Significance of Local Sequence Alignment Using Sequence-Specific and Position-Specific Substitution Matrices|journal= IEEE/ACM Transactions on Computational Biology and Bioinformatics| volume=8| pages=194–205|doi=10.1109/TCBB.2009.69|pmid=21071807 | issue=1}}</ref><ref name=agrawal08>{{cite journal| author1=Agrawal A| author2=Brendel VP| author3=Huang X| year=2008| title=Pairwise statistical significance and empirical determination of effective gap opening penalties for protein local sequence alignment| journal=International Journal of Computational Biology and Drug Design| volume=1| pages=347–367| doi=10.1504/IJCBDD.2008.022207| url=http://inderscience.metapress.com/content/1558538106522500/| issue=4| deadurl=yes| archiveurl=https://archive.is/20130128163812/http://inderscience.metapress.com/content/1558538106522500/| archivedate=28 January 2013| df=dmy-all}}</ref>