Revision as of 15:25, 17 April 2018 edit Legacypac (talk \| contribs) Extended confirmed users, Pending changes reviewers 158,031 edits Cleaning up accepted Articles for creation submission (AFCH 0.9) ← Previous edit		Revision as of 19:13, 17 April 2018 edit undo John Douglas Huff (talk \| contribs) 32 edits No edit summary Next edit →
Line 1: The '''Ruzzo-Tompa algorithm''' is a linear time algorithm for finding all non-overlapping, contiguous, maximal scoring subsequences in a sequence of real numbers.<ref>{{cite journal\|last1=Ruzzo\|first1=Walter L.\|last2=Martin\|first2=Tompa\|title=A Linear Time Algorithm for Finding All Maximal Scoring Subsequences\|journal=Proceedings. International Conference on Intelligent Systems for Molecular Biology\|date=1999\|pages=234-241\|pmid=10786306\|url=https://dl.acm.org/citation.cfm?id=660812\|ref=ruzzo-tompa}}</ref>. This algorithm is an improvement over previously known quadratic time algorithms. The maximum scoring subsequence from the set produced by the algorithm is a solution to the [[Maximum subarray problem]]. The problem of find disjoint maximal subsequences is of practical importance in the analysis of [[DNA]]. Maximal subsequences algorithms have been used in the identification of transmembrane segments and the evaluation of [[sequence homology]]<ref name="ruzzo_tompa">{{cite journal\|last1=Karlin\|first1=S\|last2=Altschul\|first2=SF\|title=Applications and statistics for multiple high-scoring segments in molecular sequences\|journal=Proceedings of the National Academy of Sciences of the United States of America\|date=Jun 15, 1993\|volume=90\|issue=12\|pages=5873-5877\|pmid=8390686}}</ref> ==Problem Definition== Line 20: # Otherwise (i.e., there is such a j, but <math>R_j < R_k</math>), extend the subsequence <math>I_k</math> to the left to encompass everything up to and including the leftmost score in <math>I_j</math>. Delete subsequences <math>I_j,I_j+1,...,I_k-1</math> from the list, and append <math>I_k</math> to the end of the list. Reconsider the newly extended subsequence <math>I_k</math> (now renumbered <math>I_j</math>) as in step 1. :Once the end of the input is reached, all subsequences remaining on the list <math>I</math> are maximal.<ref name="ruzzo_tompa"></ref> As example of the algorithm running, consider the input score list <math>L=[4,-5,3,-2,1,2]</math>. On input <math>4</math> in step 1, no satisfying <math>j</math> is found so step 2 applies and <math>[4]</math> is appended to <math>I</math>, so <math>I=[[4]], R=[4], L=[0]</math>. The input <math>-5</math> is skipped and the input <math>3</math> is read. In step 1, no satisfying <math>j</math> is found so <math>[3]</math> is appended to <math>I</math>, so <math>I=[[4],[3]], R=[4,2], L=[0,-1]</math>. The input <math>-2</math> is skipped and the input <math>1</math> is read. In step 1, a value of <math>j=1</math> is found to satisfy <math>L[j]<L[k]</math>, so step 3 applies. In step 3, <math>R[j]\ngeq R[k]</math>, so <math>[1] is appended to <math>I</math>, and now <math>I=[[4],[3],[1]], R=[4,2,1], L=[0,-1,0]</math>. The following [[Python (programming language)\|Python]] code implements the Ruzzo-Tompa algorithm:

Ruzzo–Tompa algorithm: Difference between revisions