Sequence alignment: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 04:44, 15 July 2025 edit GreenC bot (talk \| contribs) Bots 3,062,661 edits Rescued 2 archive links; Move 1 url. Wayback Medic 2.5 per WP:URLREQ#nih.gov ← Previous edit		Latest revision as of 13:05, 25 August 2025 edit undo Pibabel (talk \| contribs) 26 edits →Maximal unique match: Deleted errant mid-sentence period. Tags: Mobile edit Mobile web edit
(One intermediate revision by one other user not shown)
Line 97: ===Maximal unique match=== One way of quantifying the utility of a given pairwise alignment is the '[[maximal unique match]]' (MUM), or the longest subsequence that occurs in both query sequences. Longer MUM sequences typically reflect closer relatedness. <ref name="Alignment of whole genomes">{{cite journal \|last1=Delcher \|first1=A. L. \|last2=Kasif \|first2=S. \|last3=Fleishmann \|first3=R.D. \|last4=Peterson \|first4=J. \|last5=White \|first5=O. \|last6=Salzberg \|first6=S.L. \|title=Alignment of whole genomes \|journal=Nucleic Acids Research \|date=1999 \|volume=27 \|issue=11 \|pages=2369–2376 \|doi=10.1093/nar/30.11.2478 \|pmid=10325427\|pmc=148804 \|doi-access=free }}</ref> in the [[multiple sequence alignment]] of [[genomes]] in [[computational biology]]. Identification of MUMs and other potential anchors, is the first step in larger alignment systems such as [[MUMmer]]. Anchors are the areas between two genomes where they are highly similar. To understand what a MUM is we can break down each word in the acronym. Match implies that the substring occurs in both sequences to be aligned. Unique means that the substring occurs only once in each sequence. Finally, maximal states that the substring is not part of another larger string that fulfills both prior requirements. The idea behind this, is that long sequences that match exactly and occur only once in each genome are almost certainly part of the global alignment. More precisely: Line 200: ==Non-biological uses== The methods used for biological sequence alignment have also found applications in other fields, most notably in [[natural language processing]] and in [[Sequence analysis in social sciences\|social sciences]], where the [[Needleman-Wunsch algorithm]] is usually referred to as [[Optimal matching]].<ref>{{cite journal\|author1=Abbott A. \|author2=Tsay A. \| year=2000 \| title=Sequence Analysis and Optimal Matching Methods in Sociology, Review and Prospect \| journal=Sociological Methods and Research \| volume=29\|issue=1 \| pages=3–33 \| doi=10.1177/0049124100029001001\|s2cid=121097811 }}</ref> Techniques that generate the set of elements from which words will be selected in [[natural language generation\|natural-language generation]] algorithms have borrowed multiple sequence alignment techniques from bioinformatics to produce linguistic versions of [[automated theorem proving\|computer-generated mathematical proofs]].<ref name=Barzilay>{{cite book\|author1=Barzilay R \|author2=Lee L. \|title=Proceedings of the ACL-02 conference on Empirical methods in natural language processing - EMNLP '02 \|chapter=Bootstrapping lexical choice via multiple-sequence alignment \|year=2002 \| pages=164–171 \| chapter-url=~~http~~https://www.cs.cornell.edu/home/llee/papers/gen-msa.pdf\| volume=10\| doi=10.3115/1118693.1118715\|arxiv=cs/0205065\|bibcode=2002cs........5065B \|s2cid=7521453 }}</ref> In the field of historical and comparative [[linguistics]], sequence alignment has been used to partially automate the [[comparative method (linguistics)\|comparative method]] by which linguists traditionally reconstruct languages.<ref>{{cite thesis \|author=Kondrak, Grzegorz \|title=Algorithms for Language Reconstruction \|publisher=University of Toronto \|year=2002 \|url=http://www.cs.ualberta.ca/~kondrak/papers/thesis.pdf \|access-date=2007-01-21 \|archive-url=https://web.archive.org/web/20081217043010/http://www.cs.ualberta.ca/~kondrak/papers/thesis.pdf \|archive-date=17 December 2008 \|url-status=dead }}</ref> Business and marketing research has also applied multiple sequence alignment techniques in analyzing series of purchases over time.<ref name=prinzie>{{cite journal\|author1=Prinzie A. \|author2=D. Van den Poel \|year=2006 \| url=http://econpapers.repec.org/paper/rugrugwps/05_2F292.htm \| title=Incorporating sequential information into traditional classification models by using an element/position-sensitive SAM \| journal=Decision Support Systems \| volume=42 \| issue=2\| pages= 508–526 \| doi=10.1016/j.dss.2005.02.004\| url-access=subscription }} See also Prinzie and Van den Poel's paper {{cite journal \| url=http://econpapers.repec.org/paper/rugrugwps/07_2F442.htm \| title=Predicting home-appliance acquisition sequences: Markov/Markov for Discrimination and survival analysis for modeling sequential information in NPTB models \| year=2007 \| journal=Decision Support Systems \| volume=44 \| issue=1 \| pages= 28–45 \| doi=10.1016/j.dss.2007.02.008 \| author=Prinzie, A \| last2=Vandenpoel \| first2=D\| url-access=subscription }}</ref> ==Software==