Content deleted Content added
Cleaning up accepted Articles for creation submission (AFCH 0.9) |
No edit summary |
||
Line 1:
The '''Ruzzo-Tompa algorithm''' is a linear time algorithm for finding all non-overlapping, contiguous, maximal scoring subsequences in a sequence of real numbers.<ref>{{cite journal|last1=Ruzzo|first1=Walter L.|last2=Martin|first2=Tompa|title=A Linear Time Algorithm for Finding All Maximal Scoring Subsequences|journal=Proceedings. International Conference on Intelligent Systems for Molecular Biology|date=1999|pages=234-241|pmid=10786306|url=https://dl.acm.org/citation.cfm?id=660812|ref=ruzzo-tompa}}</ref>. This algorithm is an improvement over previously known quadratic time algorithms. The maximum scoring subsequence from the set produced by the algorithm is a solution to the [[Maximum subarray problem]].
The problem of find disjoint maximal subsequences is of practical importance in the analysis of [[DNA]]. Maximal subsequences algorithms have been used in the identification of transmembrane segments and the evaluation of [[sequence homology]]<ref name="ruzzo_tompa">{{cite journal|last1=Karlin|first1=S|last2=Altschul|first2=SF|title=Applications and statistics for multiple high-scoring segments in molecular sequences|journal=Proceedings of the National Academy of Sciences of the United States of America|date=Jun 15, 1993|volume=90|issue=12|pages=5873-5877|pmid=8390686}}</ref>
==Problem Definition==
Line 20:
# Otherwise (i.e., there is such a j, but <math>R_j < R_k</math>), extend the subsequence <math>I_k</math> to the left to encompass everything up to and including the leftmost score in <math>I_j</math>. Delete subsequences <math>I_j,I_j+1,...,I_k-1</math> from the list, and append <math>I_k</math> to the end of the list. Reconsider the newly extended subsequence <math>I_k</math> (now renumbered <math>I_j</math>) as in step 1.
:Once the end of the input is reached, all subsequences remaining on the list <math>I</math> are maximal.<ref name="ruzzo_tompa"></ref>
As example of the algorithm running, consider the input score list <math>L=[4,-5,3,-2,1,2]</math>. On input <math>4</math> in step 1, no satisfying <math>j</math> is found so step 2 applies and <math>[4]</math> is appended to <math>I</math>, so <math>I=[[4]], R=[4], L=[0]</math>. The input <math>-5</math> is skipped and the input <math>3</math> is read. In step 1, no satisfying <math>j</math> is found so <math>[3]</math> is appended to <math>I</math>, so <math>I=[[4],[3]], R=[4,2], L=[0,-1]</math>. The input <math>-2</math> is skipped and the input <math>1</math> is read. In step 1, a value of <math>j=1</math> is found to satisfy <math>L[j]<L[k]</math>, so step 3 applies. In step 3, <math>R[j]\ngeq R[k]</math>, so <math>[1] is appended to <math>I</math>, and now <math>I=[[4],[3],[1]], R=[4,2,1], L=[0,-1,0]</math>.
The following [[Python (programming language)|Python]] code implements the Ruzzo-Tompa algorithm:
|