Sequence alignment: Difference between revisions

Content deleted Content added
Cewbot (talk | contribs)
m Fix broken anchor: 2010-07-04 #Table of standard amino acid abbreviations and side chain properties→Amino acid#Table of standard amino acid abbreviations and properties
Citation bot (talk | contribs)
Alter: series. Add: doi-access, pmc. Removed parameters. | Use this bot. Report bugs. | Suggested by Headbomb | Linked from Wikipedia:WikiProject_Academic_Journals/Journals_cited_by_Wikipedia/Sandbox | #UCB_webform_linked 27/390
Line 96:
 
===Maximal unique match===
One way of quantifying the utility of a given pairwise alignment is the '[[maximal unique match]]' (MUM), or the longest subsequence that occurs in both query sequences. Longer MUM sequences typically reflect closer relatedness.<ref name="Alignment of whole genomes">{{cite journal |last1=Delcher |first1=A. L. |last2=Kasif |first2=S. |last3=Fleishmann |first3=R.D. |last4=Peterson |first4=J. |last5=White |first5=O. |last6=Salzberg |first6=S.L. |title=Alignment of whole genomes |journal=Nucleic Acids Research |date=1999 |volume=27 |issue=11 |pages=2369–2376 |doi=10.1093/nar/30.11.2478 |pmid=10325427|pmc=148804 |doi-access=free }}</ref> in the [[multiple sequence alignment]] of [[genomes]] in [[computational biology]]. Identification of MUMs and other potential anchors, is the first step in larger alignment systems such as [[MUMmer]]. Anchors are the areas between two genomes where they are highly similar. To understand what a MUM is we can break down each word in the acronym. Match implies that the substring occurs in both sequences to be aligned. Unique means that the substring occurs only once in each sequence. Finally, maximal states that the substring is not part of another larger string that fulfills both prior requirements. The idea behind this, is that long sequences that match exactly and occur only once in each genome are almost certainly part of the global alignment.
 
More precisely:
Line 139:
 
===Dynamic programming===
The technique of dynamic programming is theoretically applicable to any number of sequences; however, because it is computationally expensive in both time and [[computer memory|memory]], it is rarely used for more than three or four sequences in its most basic form. This method requires constructing the ''n''-dimensional equivalent of the sequence matrix formed from two sequences, where ''n'' is the number of sequences in the query. Standard dynamic programming is first used on all pairs of query sequences and then the "alignment space" is filled in by considering possible matches or gaps at intermediate positions, eventually constructing an alignment essentially between each two-sequence alignment. Although this technique is computationally expensive, its guarantee of a global optimum solution is useful in cases where only a few sequences need to be aligned accurately. One method for reducing the computational demands of dynamic programming, which relies on the "sum of pairs" [[objective function]], has been implemented in the [https://www.ncbi.nlm.nih.gov/CBBresearch/Schaffer/msa.html MSA] software package.<ref name=lipman>{{cite journal | journal=Proc Natl Acad Sci USA | volume=86 | pages=4412–5 | year=1989 |author1=Lipman DJ |author2=Altschul SF |author3=Kececioglu JD | title=A tool for multiple sequence alignment | pmid=2734293 | doi=10.1073/pnas.86.12.4412 | issue=12 | pmc=287279 | bibcode=1989PNAS...86.4412L | doi-access=free }}</ref>
 
===Progressive methods===
Line 163:
Structural alignments, which are usually specific to protein and sometimes RNA sequences, use information about the [[secondary structure|secondary]] and [[tertiary structure]] of the protein or RNA molecule to aid in aligning the sequences. These methods can be used for two or more sequences and typically produce local alignments; however, because they depend on the availability of structural information, they can only be used for sequences whose corresponding structures are known (usually through [[X-ray crystallography]] or [[NMR spectroscopy]]). Because both protein and RNA structure is more evolutionarily conserved than sequence,<ref name=chothia>{{cite journal | journal=EMBO J | volume=5 | issue=4 | pages=823–6 |date=April 1986 |author1=Chothia C |author2=Lesk AM. | title=The relation between the divergence of sequence and structure in proteins | pmid=3709526 |pmc=1166865 | doi=10.1002/j.1460-2075.1986.tb04288.x }}</ref> structural alignments can be more reliable between sequences that are very distantly related and that have diverged so extensively that sequence comparison cannot reliably detect their similarity.
 
Structural alignments are used as the "gold standard" in evaluating alignments for homology-based [[protein structure prediction]]<ref name=skolnick>{{cite journal | journal=Proc Natl Acad Sci USA | volume=102 | pages=1029–34 | year=2005 |author1=Zhang Y |author2=Skolnick J. | title=The protein structure prediction problem could be solved using the current PDB library | pmid=15653774 | doi = 10.1073/pnas.0407152101 | issue=4 | pmc=545829 | bibcode=2005PNAS..102.1029Z | doi-access=free }}</ref> because they explicitly align regions of the protein sequence that are structurally similar rather than relying exclusively on sequence information. However, clearly structural alignments cannot be used in structure prediction because at least one sequence in the query set is the target to be modeled, for which the structure is not known. It has been shown that, given the structural alignment between a target and a template sequence, highly accurate models of the target protein sequence can be produced; a major stumbling block in homology-based structure prediction is the production of structurally accurate alignments given only sequence information.<ref name=skolnick/>
 
===DALI===
Line 197:
 
==Other biological uses==
Sequenced RNA, such as [[expressed sequence tags]] and full-length mRNAs, can be aligned to a sequenced genome to find where there are genes and get information about [[alternative splicing]]<ref>{{cite book |author1=Kim N |author2=Lee C |title=Bioinformatics detection of alternative splicing |journal=Methods Mol. Biol. |volume=452 |pages=179–97 |year=2008 |pmid=18566765 |doi=10.1007/978-1-60327-159-2_9 |series=Methods in Molecular Biology™Biology |isbn=978-1-58829-707-5}}</ref> and [[RNA editing]].<ref>{{cite journal |vauthors=Li JB, Levanon EY, Yoon JK, etal |title=Genome-wide identification of human RNA editing sites by parallel DNA capturing and sequencing |journal=Science |volume=324 |issue=5931 |pages=1210–3 |date=May 2009 |pmid=19478186 |doi=10.1126/science.1170995|bibcode=2009Sci...324.1210L |s2cid=31148824 |url=https://semanticscholar.org/paper/7e6aecb6226022e7471d6262f7ea5ef90b7a53f5 }}</ref> Sequence alignment is also a part of [[genome assembly]], where sequences are aligned to find overlap so that ''[[contig]]s'' (long stretches of sequence) can be formed.<ref>{{cite journal |vauthors=Blazewicz J, Bryja M, Figlerowicz M, etal |title=Whole genome assembly from 454 sequencing output via modified DNA graph concept |journal=Comput Biol Chem |volume=33 |issue=3 |pages=224–30 |date=June 2009 |pmid=19477687 |doi=10.1016/j.compbiolchem.2009.04.005}}</ref> Another use is [[single nucleotide polymorphism|SNP]] analysis, where sequences from different individuals are aligned to find single basepairs that are often different in a population.<ref>{{cite journal |author1=Duran C |author2=Appleby N |author3=Vardy M |author4=Imelfort M |author5=Edwards D |author6=Batley J |title=Single nucleotide polymorphism discovery in barley using autoSNPdb |journal=Plant Biotechnol. J. |volume=7 |issue=4 |pages=326–33 |date=May 2009 |pmid=19386041 |doi=10.1111/j.1467-7652.2009.00407.x |doi-access=free }}</ref>
 
==Non-biological uses==