Sequence alignment: Difference between revisions

Content deleted Content added
m wordage
Citation bot (talk | contribs)
Removed proxy/dead URL that duplicated identifier. | Use this bot. Report bugs. | #UCB_CommandLine
Line 145:
Progressive, hierarchical, or tree methods generate a multiple sequence alignment by first aligning the most similar sequences and then adding successively less related sequences or groups to the alignment until the entire query set has been incorporated into the solution. The initial tree describing the sequence relatedness is based on pairwise comparisons that may include heuristic pairwise alignment methods similar to [[FASTA]]. Progressive alignment results are dependent on the choice of "most related" sequences and thus can be sensitive to inaccuracies in the initial pairwise alignments. Most progressive multiple sequence alignment methods additionally weight the sequences in the query set according to their relatedness, which reduces the likelihood of making a poor choice of initial sequences and thus improves alignment accuracy.
 
Many variations of the [[Clustal]] progressive implementation<ref name=higgins>{{cite journal | journal=Gene | volume=73 | issue=1 | pages=237–44 | year=1988 | author=[[Desmond G. Higgins|Higgins DG]], Sharp PM | title=CLUSTAL: a package for performing multiple sequence alignment on a microcomputer | pmid=3243435 | doi = 10.1016/0378-1119(88)90330-7 }}</ref><ref name=thompson>{{cite journal | journal=Nucleic Acids Res | volume=22 | pages=4673–80 | year=1994 | author1=Thompson JD| author2-link=Desmond G. Higgins |author2= Higgins DG|author3= Gibson TJ. | title=CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice | pmid=7984417 |pmc=308517 |url=|doi=10.1093/nar/22.22.4673 | issue=22 }}</ref><ref name=chenna>{{cite journal | journal=Nucleic Acids Res | volume=31 | pages=3497–500 | year=2003 |author1=Chenna R |author2=Sugawara H |author3=Koike T |author4=Lopez R |author5=Gibson TJ |author6=Higgins DG |author7=Thompson JD. | title=Multiple sequence alignment with the Clustal series of programs | url= | pmid=12824352 | doi = 10.1093/nar/gkg500 | issue=13 | pmc=168907 }}</ref> are used for multiple sequence alignment, phylogenetic tree construction, and as input for [[protein structure prediction]]. A slower but more accurate variant of the progressive method is known as [[T-Coffee]].<ref name=notredame>{{cite journal | journal=J Mol Biol | volume=302 | issue=1 | pages=205–17 | year=2000 | author1=Notredame C| author2-link=Desmond G. Higgins |author2= Higgins DG|author3= Heringa J. | title=T-Coffee: A novel method for fast and accurate multiple sequence alignment | pmid=10964570 | doi = 10.1006/jmbi.2000.4042 | s2cid=10189971 | url=https://semanticscholar.org/paper/701a93542d50c480699d7d0660f33875ae639dfd }}</ref>
 
===Iterative methods===
Line 167:
 
===DALI===
The DALI method, or [[distance matrix]] alignment, is a fragment-based method for constructing structural alignments based on contact similarity patterns between successive hexapeptides in the query sequences.<ref name=holm>{{cite journal | journal=Science | volume=273 | pages=595–603 | year=1996 |author1=Holm L |author2=Sander C | title=Mapping the protein universe | pmid=8662544 | doi = 10.1126/science.273.5275.595 | issue=5275 | bibcode=1996Sci...273..595H | s2cid=7509134 | url=https://semanticscholar.org/paper/26eebe60c05a6d2eefb687dc63e83e753546102a }}</ref> It can generate pairwise or multiple alignments and identify a query sequence's structural neighbors in the [[Protein Data Bank]] (PDB). It has been used to construct the [[Families of structurally similar proteins|FSSP]] structural alignment database (Fold classification based on Structure-Structure alignment of Proteins, or Families of Structurally Similar Proteins). A DALI webserver can be accessed at [https://web.archive.org/web/20090301064750/http://ekhidna.biocenter.helsinki.fi/dali_server/start DALI] and the FSSP is located at [https://web.archive.org/web/20051125045348/http://ekhidna.biocenter.helsinki.fi/dali/start The Dali Database].
 
===SSAP===
Line 187:
 
Methods of statistical significance estimation for gapped sequence alignments are available in the literature.<ref name="ortet"/><ref name=altschul>{{cite book|author1=Altschul SF |author2=Gish W |chapter=Local alignment statistics |title=Computer Methods for Macromolecular Sequence Analysis | year=1996| journal= Meth.Enz. | volume=266 | pages = 460–480|doi=10.1016/S0076-6879(96)66029-7|pmid=8743700 |series=Methods in Enzymology|isbn=9780121821678}}</ref><ref name=hartmann>{{cite journal| author=Hartmann AK| year=2002| title=Sampling rare events: statistics of local sequence alignments|
journal= Phys. Rev. E| volume=65| page=056102|doi=10.1103/PhysRevE.65.056102| pmid=12059642| issue=5|arxiv=cond-mat/0108201|bibcode=2002PhRvE..65e6102H| s2cid=193085| url=https://www.semanticscholar.org/paper/bedd73ed63f6f8ea1985360f0d725630fe0f3fc3}}</ref><ref name=newberg>{{cite journal| author=Newberg LA | year=2008 | title=Significance of gapped sequence alignments | journal= J Comput Biol| volume=15| pages=1187–1194 | pmid = 18973434 | doi=10.1089/cmb.2008.0125| issue=9| pmc=2737730}}</ref><ref name=eddy>{{cite journal| author=Eddy SR| year=2008 | title=A probabilistic model of local sequence alignment that simplifies statistical significance estimation | journal= PLOS Comput Biol | volume=4| editor1-first=Burkhard| pages=e1000069 | pmid = 18516236| editor1-last=Rost | doi=10.1371/journal.pcbi.1000069| issue=5| pmc=2396288| last2=Rost| first2=Burkhard| bibcode=2008PLSCB...4E0069E| s2cid=15640896 | doi-access=free }}</ref><ref name=bastien>{{cite journal|author1=Bastien O |author2=Aude JC |author3=Roy S |author4=Marechal E | year=2004 | title=Fundamentals of massive automatic pairwise alignments of protein sequences: theoretical significance of Z-value statistics | journal= Bioinformatics | volume=20| issue=4| pages=534–537| pmid = 14990449| doi = 10.1093/bioinformatics/btg440 | doi-access=free }}</ref><ref name=agrawal11>{{cite journal|author1=Agrawal A |author2=Huang X | year=2011| title=Pairwise Statistical Significance of Local Sequence Alignment Using Sequence-Specific and Position-Specific Substitution Matrices|journal= IEEE/ACM Transactions on Computational Biology and Bioinformatics| volume=8| pages=194–205|doi=10.1109/TCBB.2009.69|pmid=21071807 | issue=1|s2cid=6559731 |url=https://www.semanticscholar.org/paper/765f9333d5af7274c0a44b39407f78c1dcdfab0f }}</ref><ref name=agrawal08>{{cite journal| author1=Agrawal A| author2=Brendel VP| author3=Huang X| year=2008| title=Pairwise statistical significance and empirical determination of effective gap opening penalties for protein local sequence alignment| journal=International Journal of Computational Biology and Drug Design| volume=1| pages=347–367| doi=10.1504/IJCBDD.2008.022207| pmid=20063463| url=http://inderscience.metapress.com/content/1558538106522500/| issue=4| url-status=dead| archive-url=https://archive.today/20130128163812/http://inderscience.metapress.com/content/1558538106522500/| archive-date=28 January 2013| df=dmy-all| url-access=subscription}}</ref>
 
===Assessment of credibility===
Line 198:
 
==Other biological uses==
Sequenced RNA, such as [[expressed sequence tags]] and full-length mRNAs, can be aligned to a sequenced genome to find where there are genes and get information about [[alternative splicing]]<ref>{{cite book |author1=Kim N |author2=Lee C |title=Bioinformatics |chapter=Bioinformatics Detection of Alternative Splicing |volume=452 |pages=179–97 |year=2008 |pmid=18566765 |doi=10.1007/978-1-60327-159-2_9 |series=Methods in Molecular Biology |isbn=978-1-58829-707-5}}</ref> and [[RNA editing]].<ref>{{cite journal |vauthors=Li JB, Levanon EY, Yoon JK, etal |title=Genome-wide identification of human RNA editing sites by parallel DNA capturing and sequencing |journal=Science |volume=324 |issue=5931 |pages=1210–3 |date=May 2009 |pmid=19478186 |doi=10.1126/science.1170995|bibcode=2009Sci...324.1210L |s2cid=31148824 |url=https://semanticscholar.org/paper/7e6aecb6226022e7471d6262f7ea5ef90b7a53f5 }}</ref> Sequence alignment is also a part of [[genome assembly]], where sequences are aligned to find overlap so that ''[[contig]]s'' (long stretches of sequence) can be formed.<ref>{{cite journal |vauthors=Blazewicz J, Bryja M, Figlerowicz M, etal |title=Whole genome assembly from 454 sequencing output via modified DNA graph concept |journal=Comput Biol Chem |volume=33 |issue=3 |pages=224–30 |date=June 2009 |pmid=19477687 |doi=10.1016/j.compbiolchem.2009.04.005}}</ref> Another use is [[single nucleotide polymorphism|SNP]] analysis, where sequences from different individuals are aligned to find single basepairs that are often different in a population.<ref>{{cite journal |author1=Duran C |author2=Appleby N |author3=Vardy M |author4=Imelfort M |author5=Edwards D |author6=Batley J |title=Single nucleotide polymorphism discovery in barley using autoSNPdb |journal=Plant Biotechnol. J. |volume=7 |issue=4 |pages=326–33 |date=May 2009 |pmid=19386041 |doi=10.1111/j.1467-7652.2009.00407.x |doi-access=free }}</ref>
 
==Non-biological uses==