Ruzzo–Tompa algorithm: Difference between revisions

Content deleted Content added
No edit summary
Line 1:
The '''Ruzzo-Tompa algorithm''' is a linear time algorithm for finding all non-overlapping, contiguous, maximal scoring subsequences in a sequence of real numbers.<ref>{{cite journal|last1=Ruzzo|first1=Walter L.|last2=Martin|first2=Tompa|title=A Linear Time Algorithm for Finding All Maximal Scoring Subsequences|journal=Proceedings. International Conference on Intelligent Systems for Molecular Biology|date=1999|pages=234-241|pmid=10786306|url=https://dl.acm.org/citation.cfm?id=660812|ref=ruzzo-tompa}}</ref>. This algorithm is an improvement over previously known quadratic time algorithms. The maximum scoring subsequence from the set produced by the algorithm is also a solution to the [[Maximum subarray problem]].
 
The Ruzzo-Tompa algorithm has applications in [[Bioinformatics]], [[Web Scraping]], [[Information retrieval]]
The problem of find disjoint maximal subsequences is of practical importance in the analysis of [[DNA]]. Maximal subsequences algorithms have been used in the identification of transmembrane segments and the evaluation of [[sequence homology]]<ref name="ruzzo_tompa">{{cite journal|last1=Karlin|first1=S|last2=Altschul|first2=SF|title=Applications and statistics for multiple high-scoring segments in molecular sequences|journal=Proceedings of the National Academy of Sciences of the United States of America|date=Jun 15, 1993|volume=90|issue=12|pages=5873-5877|pmid=8390686}}</ref>
 
==Applications==
===Bioinformatics===
The Ruzzo-Tompa algorithm has been used in [[Bioinformatics]] tools to study biological data. The problem of findfinding disjoint maximal subsequences is of practical importance in the analysis of [[DNA]]. Maximal subsequences algorithms have been used in the identification of transmembrane segments and the evaluation of [[sequence homology]]<ref name="ruzzo_tompa">{{cite journal|last1=Karlin|first1=S|last2=Altschul|first2=SF|title=Applications and statistics for multiple high-scoring segments in molecular sequences|journal=Proceedings of the National Academy of Sciences of the United States of America|date=Jun 15, 1993|volume=90|issue=12|pages=5873-5877|pmid=8390686}}</ref>
 
==Problem Definition==
 
The problem of finding all maximal subsequences is defined as follows: Given a list of real numbered scores <math>x_1,x_2,...,x_n</math>, find the list of contiguous subsequences that gives the great total score, where the score of each subsequence <math>S_{i,j} = \sum_{i\leq k\leq j}^{} x_k</math>. The subsequences must be disjoint (non-overlapping) and have a positive score.
 
==Algorithm==