Ruzzo–Tompa algorithm: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 23:18, 16 February 2023 edit Veonm (talk \| contribs) 2 edits General improvements Tag: Visual edit: Switched ← Previous edit		Latest revision as of 08:42, 27 August 2025 edit undo Citation bot (talk \| contribs) Bots 5,863,398 edits Added bibcode. Removed URL that duplicated identifier. \| Use this bot. Report bugs. \| Suggested by Headbomb \| Linked from Wikipedia:WikiProject_Academic_Journals/Journals_cited_by_Wikipedia/Sandbox \| #UCB_webform_linked 332/967
(7 intermediate revisions by 4 users not shown)
Line 1: The '''Ruzzo–Tompa algorithm''' or the '''RT algorithm'''<ref name=":0">{{Cite journal \|~~last~~last1=Spouge \|~~first~~first1=John L. \|last2=Ramírez \|first2=Leonardo Mariño \|last3=Sheetlin \|first3=Sergey L. \|date=2014 \|title=Searching for repeats, as an example of using the generalised Ruzzo-Tompa algorithm to find optimal subsequences with gaps ~~\|url=http://www.inderscience.com/link.php?id=62991~~ \|journal=International Journal of Bioinformatics Research and Applications \|language=en \|volume=10 \|issue=4/5 \|pages=~~384~~384–408 \|doi=10.1504/IJBRA.2014.062991 \|issn=1744-5485 \|pmc=~~PMC4135518~~4135518 \|pmid=24989859}}</ref> is a [[Time complexity#Linear time\|linear-time]] [[algorithm]] for finding all non-overlapping, contiguous, maximal scoring subsequences in a sequence of real numbers.<ref name="ruzzo_tompa">{{cite journal\|last1=Ruzzo\|first1=Walter L.\|last2=Martin\|first2=Tompa\|title=A ~~Linear~~linear ~~Time~~time ~~Algorithm~~algorithm for ~~Finding~~finding ~~All~~all ~~Maximal~~maximal ~~Scoring~~scoring ~~Subsequences~~subsequences\|journal=Proceedings. International Conference on Intelligent Systems for Molecular Biology\|date=1999\|pages=234–241\|pmid=10786306\|isbn=9781577350835\|url=https://dl.acm.org/citation.cfm?id=660812\|ref=ruzzo-tompa}}</ref> The Ruzzo–Tompa algorithm was proposed by Walter L. Ruzzo and Martin Tompa.<ref>{{Cite web \|title=A Linear Time Algorithm for Finding All Maximal Scoring Subsequences \|url=https://homes.cs.washington.edu/~ruzzo/papers/maxseq.pdf}}</ref> This algorithm is an improvement over previously known quadratic time algorithms.<ref name=":0" /> The maximum scoring subsequence from the set produced by the algorithm is also a solution to the [[maximum subarray problem]]. The Ruzzo–Tompa algorithm has applications in [[bioinformatics]],<ref name="karlin" /> [[web scraping]],<ref name="pasternack">{{cite book\|last1=Pasternack\|first1=Jeff\|last2=Roth\|first2=Dan\|title=~~Extracting~~Proceedings ~~Article~~of ~~Text~~the ~~from~~18th ~~the~~international ~~Web~~conference ~~with~~on ~~Maximum~~World wide ~~Subsequence~~web ~~Segmentation~~\|~~journal~~chapter=~~Proceedings~~Extracting ofarticle ~~the~~text ~~18th~~from ~~International~~the ~~Conference~~web onwith ~~World~~maximum subsequence ~~Wide~~segmentation ~~Web~~\|date=2009\|pages=971–980\|doi=10.1145/1526709.1526840\|isbn=9781605584874\|s2cid=346124}}</ref> and [[information retrieval]].<ref name="liang">{{cite book\|last1=Liang\|first1=Shangsong\|last2=Ren\|first2=Zhaochun\|last3=Weerkamp\|first3=Wouter\|last4=Meij\|first4=Edgar\|last5=de Rijke\|first5=Maarten\|title~~=Time-Aware Rank Aggregation for Microblog Search\|journal~~=Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management \|chapter=Time-Aware Rank Aggregation for Microblog Search \|date=2014\|pages=989–998\|doi=10.1145/2661829.2661905\|isbn=9781450325981\|citeseerx=10.1.1.681.6828\|s2cid=14287901}}</ref> ==Applications== Line 7: The Ruzzo–Tompa algorithm has been used in [[Bioinformatics]] tools to study biological data. The problem of finding disjoint maximal subsequences is of practical importance in the analysis of [[DNA]]. Maximal subsequences algorithms have been used in the identification of transmembrane segments and the evaluation of [[sequence homology]].<ref name="karlin">{{cite journal\|last1=Karlin\|first1=S\|last2=Altschul\|first2=SF\|title=Applications and statistics for multiple high-scoring segments in molecular sequences\|journal=Proceedings of the National Academy of Sciences of the United States of America\|date=Jun 15, 1993\|volume=90\|issue=12\|pages=5873–5877\|pmid=8390686\|pmc=46825\|doi=10.1073/pnas.90.12.5873\|bibcode=1993PNAS...90.5873K\|doi-access=free}}</ref> The algorithm is used in [[sequence alignment]] which is used as a method of identifying similar [[DNA]], [[RNA]], or [[protein]] sequences.<ref>{{Cite ~~journal~~book \|~~last~~last1=Spouge \|~~first~~first1=John L. \|last2=Mariño-Ramírez \|first2=Leonardo \|last3=Sheetlin \|first3=Sergey L. \|title=2012 IEEE 2nd International Conference on Computational Advances in Bio and medical Sciences (ICCABS) \|chapter=The ruzzo-tompa algorithm can find the maximal paths in weighted, directed graphs on a one-dimensional lattice \|~~url=https://ieeexplore.ieee.org/document/6182645/ \|journal~~year=2012 ~~IEEE 2nd International Conference on Computational Advances in Bio and medical Sciences (ICCABS)~~ \|pages=1–6 \|doi=10.1109/ICCABS.2012.6182645\|isbn=978-1-4673-1321-6 \|s2cid=14584619 }}</ref> Accounting for the ordering of pairs of high-scoring subsequences in two sequences creates better sequence alignments. This is because the biological model suggests that separate high-scoring subsequence pairs arise from insertions or deletions within a matching region. Requiring consistent ordering of high-scoring subsequence pairs increases their statistical significance.<ref name="karlin" /> ===Web scraping=== Line 17: ==Problem definition== The problem of finding all maximal subsequences is defined as follows: Given a list of real numbered scores <math>x_1,x_2,\ldots,x_n</math>, find the list of contiguous subsequences that gives the greatest total score, where the score of each subsequence <math>S_{i,j} = \sum_{i\leq k\leq j} x_k</math>. The subsequences must be disjoint (non-overlapping) and have a positive score.<ref>{{Cite ~~journal~~book \|~~last~~last1=Spouge \|~~first~~first1=John L. \|last2=Mariño-Ramírez \|first2=Leonardo \|last3=Sheetlin \|first3=Sergey L. \|title=2012 IEEE 2nd International Conference on Computational Advances in Bio and medical Sciences (ICCABS) \|chapter=The ruzzo-tompa algorithm can find the maximal paths in weighted, directed graphs on a one-dimensional lattice \|~~url=https://ieeexplore.ieee.org/document/6182645/ \|journal~~year=2012 ~~IEEE 2nd International Conference on Computational Advances in Bio and medical Sciences (ICCABS)~~ \|pages=1–6 \|doi=10.1109/ICCABS.2012.6182645\|isbn=978-1-4673-1321-6 \|s2cid=14584619 }}</ref> ==Other algorithms== Line 82: == References == {{reflist}} == Further reading == * {{cite journal \| last1=Ali \| first1=Syed Arslan \| last2=Raza \| first2=Basit \| last3=Malik \| first3=Ahmad Kamran \| last4=Shahid \| first4=Ahmad Raza \| last5=Faheem \| first5=Muhammad \| last6=Alquhayz \| first6=Hani \| last7=Kumar \| first7=Yogan Jaya \| title=An Optimally Configured and Improved Deep Belief Network (OCI-DBN) Approach for Heart Disease Prediction Based on Ruzzo–Tompa and Stacked Genetic Algorithm \| journal=IEEE Access \| publisher=Institute of Electrical and Electronics Engineers (IEEE) \| volume=8 \| year=2020 \| issn=2169-3536 \| doi=10.1109/access.2020.2985646 \| pages=65947–65958\| bibcode=2020IEEEA...865947A \| s2cid=215817246 \| doi-access=free }} {{DEFAULTSORT:Ruzzo-Tompa algorithm}}