Ruzzo–Tompa algorithm: Difference between revisions

Content deleted Content added
Created page with '{{subst:AFC submission/draftnew}}<!-- Important, do not remove this line before article has been created. --> The Ruzzo-Tompa algorithm is a linear time alg...'
 
No edit summary
Line 1:
{{AFC submission|t||ts=20180416061329|u=John Douglas Huff|ns=118|demo=}}<!-- Important, do not remove this line before article has been created. -->
 
The [['''Ruzzo-Tompa algorithm]]''' is a linear time algorithm for finding all non-overlapping, contiguous, maximal scoring subsequences in a sequence of real numbers<ref>{{cite journal|last1=Ruzzo|first1=Walter L.|last2=Martin|first2=Tompa|title=A Linear Time Algorithm for Finding All Maximal Scoring Subsequences|journal=Proceedings. International Conference on Intelligent Systems for Molecular Biology|date=1999|pages=234-241|pmid=10786306|url=https://dl.acm.org/citation.cfm?id=660812|ref=ruzzo-tompa}}</ref>. ThereThis algorithm is an existingimprovement articleover onpreviously theknown [[Maximumquadratic subarraytime problem]]algorithms. The [[Ruzzo-Tompa]]maximum scoring subsequence from the set produced by the algorithm is ana improvementsolution overto previouslythe known[[Maximum quadraticsubarray time algorithmsproblem]].
 
The problem of find disjoint maximal subsequences is of practical importance in the analysis of [[DNA]]. Maximal subsequences algorithms have been used in the identification of transmembrane segments and the evaluation of [[sequence homology]]<ref>{{cite journal|last1=Karlin|first1=S|last2=Altschul|first2=SF|title=Applications and statistics for multiple high-scoring segments in molecular sequences|journal=Proceedings of the National Academy of Sciences of the United States of America|date=Jun 15, 1993|volume=90|issue=12|pages=5873-5877|pmid=8390686}}</ref>.
 
==Algorithm==
 
The standard implementation of the Ruzzo-Tompa algorithm runs in <math>O(n)</math> time and uses <math>O(n)</math> space, where <math>n</math> is the length of the list of scores. The description of the algorithm provided by Ruzzo and Tompa is as follows:
 
 
The following [[Python (programming language)|Python]] code implements the Ruzzo-Tompa algorithm:
 
<source lang="python" line="1">
def RuzzoTompa(scores):
k=0
total = 0;
# Allocating arrays of size n
I,L,R,Lidx = [[0]*len(scores) for _ in range(4)]
for i,s in enumerate(scores):
total += s
if s > 0:
I[k] = (i,i+1)
Lidx[k] = i
L[k] = total-s
R[k] = total
while(True):
maxj = None
for j in range(k-1,-1,-1):
if L[j] < L[k]:
maxj = j
break;
if maxj != None and R[maxj] < R[k]:
I[maxj] = (Lidx[maxj],i+1)
R[maxj] = total
k = maxj
else:
k+=1
break;
return [scores[I[l][0]:I[l][1]] for l in range(k)]
</source>