Content deleted Content added
←Created page with '{{subst:AFC submission/draftnew}}<!-- Important, do not remove this line before article has been created. --> The Ruzzo-Tompa algorithm is a linear time alg...' |
No edit summary |
||
Line 1:
{{AFC submission|t||ts=20180416061329|u=John Douglas Huff|ns=118|demo=}}<!-- Important, do not remove this line before article has been created. -->
The
The problem of find disjoint maximal subsequences is of practical importance in the analysis of [[DNA]]. Maximal subsequences algorithms have been used in the identification of transmembrane segments and the evaluation of [[sequence homology]]<ref>{{cite journal|last1=Karlin|first1=S|last2=Altschul|first2=SF|title=Applications and statistics for multiple high-scoring segments in molecular sequences|journal=Proceedings of the National Academy of Sciences of the United States of America|date=Jun 15, 1993|volume=90|issue=12|pages=5873-5877|pmid=8390686}}</ref>.
==Algorithm==
The standard implementation of the Ruzzo-Tompa algorithm runs in <math>O(n)</math> time and uses <math>O(n)</math> space, where <math>n</math> is the length of the list of scores. The description of the algorithm provided by Ruzzo and Tompa is as follows:
The following [[Python (programming language)|Python]] code implements the Ruzzo-Tompa algorithm:
<source lang="python" line="1">
def RuzzoTompa(scores):
k=0
total = 0;
# Allocating arrays of size n
I,L,R,Lidx = [[0]*len(scores) for _ in range(4)]
for i,s in enumerate(scores):
total += s
if s > 0:
I[k] = (i,i+1)
Lidx[k] = i
L[k] = total-s
R[k] = total
while(True):
maxj = None
for j in range(k-1,-1,-1):
if L[j] < L[k]:
maxj = j
break;
if maxj != None and R[maxj] < R[k]:
I[maxj] = (Lidx[maxj],i+1)
R[maxj] = total
k = maxj
else:
k+=1
break;
return [scores[I[l][0]:I[l][1]] for l in range(k)]
</source>
|