Ruzzo–Tompa algorithm: Difference between revisions

Content deleted Content added
Veonm (talk | contribs)
General improvements
Citation bot (talk | contribs)
Added bibcode. Removed URL that duplicated identifier. | Use this bot. Report bugs. | Suggested by Headbomb | Linked from Wikipedia:WikiProject_Academic_Journals/Journals_cited_by_Wikipedia/Sandbox | #UCB_webform_linked 332/967
 
(7 intermediate revisions by 4 users not shown)
Line 1:
The '''Ruzzo–Tompa algorithm''' or the '''RT algorithm'''<ref name=":0">{{Cite journal |lastlast1=Spouge |firstfirst1=John L. |last2=Ramírez |first2=Leonardo Mariño |last3=Sheetlin |first3=Sergey L. |date=2014 |title=Searching for repeats, as an example of using the generalised Ruzzo-Tompa algorithm to find optimal subsequences with gaps |url=http://www.inderscience.com/link.php?id=62991 |journal=International Journal of Bioinformatics Research and Applications |language=en |volume=10 |issue=4/5 |pages=384384–408 |doi=10.1504/IJBRA.2014.062991 |issn=1744-5485 |pmc=PMC41355184135518 |pmid=24989859}}</ref> is a [[Time complexity#Linear time|linear-time]] [[algorithm]] for finding all non-overlapping, contiguous, maximal scoring subsequences in a sequence of real numbers.<ref name="ruzzo_tompa">{{cite journal|last1=Ruzzo|first1=Walter L.|last2=Martin|first2=Tompa|title=A Linearlinear Timetime Algorithmalgorithm for Findingfinding Allall Maximalmaximal Scoringscoring Subsequencessubsequences|journal=Proceedings. International Conference on Intelligent Systems for Molecular Biology|date=1999|pages=234–241|pmid=10786306|isbn=9781577350835|url=https://dl.acm.org/citation.cfm?id=660812|ref=ruzzo-tompa}}</ref> The Ruzzo–Tompa algorithm was proposed by Walter L. Ruzzo and Martin Tompa.<ref>{{Cite web |title=A Linear Time Algorithm for Finding All Maximal Scoring Subsequences |url=https://homes.cs.washington.edu/~ruzzo/papers/maxseq.pdf}}</ref> This algorithm is an improvement over previously known quadratic time algorithms.<ref name=":0" /> The maximum scoring subsequence from the set produced by the algorithm is also a solution to the [[maximum subarray problem]].
 
The Ruzzo–Tompa algorithm has applications in [[bioinformatics]],<ref name="karlin" /> [[web scraping]],<ref name="pasternack">{{cite book|last1=Pasternack|first1=Jeff|last2=Roth|first2=Dan|title=ExtractingProceedings Articleof Textthe from18th theinternational Webconference withon MaximumWorld wide Subsequenceweb Segmentation|journalchapter=ProceedingsExtracting ofarticle thetext 18thfrom Internationalthe Conferenceweb onwith Worldmaximum subsequence Widesegmentation Web|date=2009|pages=971–980|doi=10.1145/1526709.1526840|isbn=9781605584874|s2cid=346124}}</ref> and [[information retrieval]].<ref name="liang">{{cite book|last1=Liang|first1=Shangsong|last2=Ren|first2=Zhaochun|last3=Weerkamp|first3=Wouter|last4=Meij|first4=Edgar|last5=de Rijke|first5=Maarten|title=Time-Aware Rank Aggregation for Microblog Search|journal=Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management |chapter=Time-Aware Rank Aggregation for Microblog Search |date=2014|pages=989–998|doi=10.1145/2661829.2661905|isbn=9781450325981|citeseerx=10.1.1.681.6828|s2cid=14287901}}</ref>
 
==Applications==
Line 7:
The Ruzzo–Tompa algorithm has been used in [[Bioinformatics]] tools to study biological data. The problem of finding disjoint maximal subsequences is of practical importance in the analysis of [[DNA]]. Maximal subsequences algorithms have been used in the identification of transmembrane segments and the evaluation of [[sequence homology]].<ref name="karlin">{{cite journal|last1=Karlin|first1=S|last2=Altschul|first2=SF|title=Applications and statistics for multiple high-scoring segments in molecular sequences|journal=Proceedings of the National Academy of Sciences of the United States of America|date=Jun 15, 1993|volume=90|issue=12|pages=5873–5877|pmid=8390686|pmc=46825|doi=10.1073/pnas.90.12.5873|bibcode=1993PNAS...90.5873K|doi-access=free}}</ref>
 
The algorithm is used in [[sequence alignment]] which is used as a method of identifying similar [[DNA]], [[RNA]], or [[protein]] sequences.<ref>{{Cite journalbook |lastlast1=Spouge |firstfirst1=John L. |last2=Mariño-Ramírez |first2=Leonardo |last3=Sheetlin |first3=Sergey L. |title=2012 IEEE 2nd International Conference on Computational Advances in Bio and medical Sciences (ICCABS) |chapter=The ruzzo-tompa algorithm can find the maximal paths in weighted, directed graphs on a one-dimensional lattice |url=https://ieeexplore.ieee.org/document/6182645/ |journalyear=2012 IEEE 2nd International Conference on Computational Advances in Bio and medical Sciences (ICCABS) |pages=1–6 |doi=10.1109/ICCABS.2012.6182645|isbn=978-1-4673-1321-6 |s2cid=14584619 }}</ref> Accounting for the ordering of pairs of high-scoring subsequences in two sequences creates better sequence alignments. This is because the biological model suggests that separate high-scoring subsequence pairs arise from insertions or deletions within a matching region. Requiring consistent ordering of high-scoring subsequence pairs increases their statistical significance.<ref name="karlin" />
 
===Web scraping===
Line 17:
==Problem definition==
 
The problem of finding all maximal subsequences is defined as follows: Given a list of real numbered scores <math>x_1,x_2,\ldots,x_n</math>, find the list of contiguous subsequences that gives the greatest total score, where the score of each subsequence <math>S_{i,j} = \sum_{i\leq k\leq j} x_k</math>. The subsequences must be disjoint (non-overlapping) and have a positive score.<ref>{{Cite journalbook |lastlast1=Spouge |firstfirst1=John L. |last2=Mariño-Ramírez |first2=Leonardo |last3=Sheetlin |first3=Sergey L. |title=2012 IEEE 2nd International Conference on Computational Advances in Bio and medical Sciences (ICCABS) |chapter=The ruzzo-tompa algorithm can find the maximal paths in weighted, directed graphs on a one-dimensional lattice |url=https://ieeexplore.ieee.org/document/6182645/ |journalyear=2012 IEEE 2nd International Conference on Computational Advances in Bio and medical Sciences (ICCABS) |pages=1–6 |doi=10.1109/ICCABS.2012.6182645|isbn=978-1-4673-1321-6 |s2cid=14584619 }}</ref>
 
==Other algorithms==
Line 82:
== References ==
{{reflist}}
 
== Further reading ==
* {{cite journal | last1=Ali | first1=Syed Arslan | last2=Raza | first2=Basit | last3=Malik | first3=Ahmad Kamran | last4=Shahid | first4=Ahmad Raza | last5=Faheem | first5=Muhammad | last6=Alquhayz | first6=Hani | last7=Kumar | first7=Yogan Jaya | title=An Optimally Configured and Improved Deep Belief Network (OCI-DBN) Approach for Heart Disease Prediction Based on Ruzzo–Tompa and Stacked Genetic Algorithm | journal=IEEE Access | publisher=Institute of Electrical and Electronics Engineers (IEEE) | volume=8 | year=2020 | issn=2169-3536 | doi=10.1109/access.2020.2985646 | pages=65947–65958| bibcode=2020IEEEA...865947A | s2cid=215817246 | doi-access=free }}
 
{{DEFAULTSORT:Ruzzo-Tompa algorithm}}