Maximum subarray problem: Difference between revisions

Content deleted Content added
top: mentioning complexity once in the lead is sufficient
Empty subarrays admitted: fix misunderstanding: described changes obtain the algorithm variant
 
(49 intermediate revisions by 18 users not shown)
Line 2:
[[File:Maximum Subarray Visualization.svg|thumbnail|Visualization of how sub-arrays change based on start and end positions of a sample. Each possible contiguous sub-array is represented by a point on a colored line. That point's y-coordinate represents the sum of the sample. Its x-coordinate represents the end of the sample, and the leftmost point on that colored line represents the start of the sample. In this case, the array from which samples are taken is [2, 3, -1, -20, 5, 10]. ]]
 
In [[computer science]], the '''maximum sum subarray problem''', also known as the '''maximum segment sum problem''', is the task of finding a contiguous subarray with the largest sum, within a given one-dimensional [[array data structure|array]] A[1...n] of numbers. It can be solved in <math>O(n)</math> time and <math>O(1)</math> space.
 
Formally, the task is to find indices <math>i</math> and <math>j</math> with <math>1 \leq i \leq j \leq n </math>, such that the sum
Line 15:
# Several different sub-arrays may have the same maximum sum.
 
Although this problem can be solved using several different algorithmic techniques, including brute force,{{sfn|Bentley|1989|p=70}} divide and conquer,{{sfn|Bentley|1989|p=73}} dynamic programming,{{sfn|Bentley|1989|p=74}} and reduction to shortest paths, a simple single-pass algorithm known as Kadane's algorithm solves it in <math>O(n)</math> time and <math>O(1)</math> spaceefficiently.
 
== History ==
The maximum subarray problem was proposed by [[Ulf Grenander]] in 1977 as a simplified model for [[maximum likelihood estimation]] of patterns in digitized images.{{sfn|Bentley|1984|p=868-869}}
 
Grenander was looking to find a rectangular subarray with maximum sum, in a two-dimensional array of real numbers. A brute-force algorithm for the two-dimensional problem runs in ''O''(''n''<sup>6</sup>) time; because this was prohibitively slow, Grenander proposed the one-dimensional problem to gain insight into its structure. Grenander derived an algorithm that solves the one-dimensional problem in ''O''(''n''<sup>2</sup>) time, using [[prefix sum]]{{NoteTag
|By using a precomputed table of cumulative sums <math>S[k] = \sum_{x=1}^k A[x]</math> to compute the subarray sum <math>\sum_{x=i}^j A[x] = S[j] - S[i-1]</math> in constant time<!--sentence fragment-->
}}, improving the brute force running time of ''O''(''n''<sup>3</sup>). When [[Michael Shamos]] heard about the problem, he overnight devised an ''O''(''n'' log ''n'') [[divide-and-conquer algorithm]] for it.
}}
improving the brute force running time of ''O''(''n''<sup>3</sup>). When [[Michael Shamos]] heard about the problem, he overnight devised an ''O''(''n'' log ''n'') [[divide-and-conquer algorithm]] for it.
Soon after, Shamos described the one-dimensional problem and its history at a [[Carnegie Mellon University]] seminar attended by [[Jay Kadane]], who designed within a minute an ''O''(''n'')-time algorithm,{{sfn|Bentley|1984|p=868-869}}{{sfn|Bentley|1989|p=76-77}}{{sfn|Gries|1982|p=211}} which is as fast as possible.<ref group=note>since every algorithm must at least scan the array once which already takes ''O''(''n'') time</ref> In 1982, [[David Gries]] obtained the same ''O''(''n'')-time algorithm by applying [[Edsger W. Dijkstra |Dijkstra]]'s "standard strategy";{{sfn|Gries|1982|p=209-211}} in 1989, [[Richard Bird (computer scientist)|Richard Bird]] derived it by purely algebraic manipulation of the brute-force algorithm using the [[Bird–Meertens formalism]].{{sfn|Bird|1989|loc=Sect.8, p.126}}
 
Line 29 ⟶ 28:
 
== Applications ==
{{expert needed|Computational biology|section|reason=fix inline tags|date=September 2019}}
Maximum subarray problems arise in many fields, such as genomic [[sequence analysis]] and [[computer vision]].
 
Genomic sequence analysis employs maximum subarray algorithms to identify important biological segments of protein sequences.{{citation needed|date=Marchthat 2020}}have unusual properties, by assigning scores to points within the sequence that are positive when a motif to be recognized is present, and negative when it is not, and then seeking the maximum subarray among these scores. These problems include conserved segments, GC-rich regions, tandem repeats, low-complexity filter, DNA binding domains, and regions of high charge.<ref>{{citation neededharvtxt|date=OctoberRuzzo|Tompa|1999}}; 2017{{harvtxt|Alves|Cáceres|Song|2004}}</ref>
 
In [[computer vision]], bitmap images generally consist only of positive values, for which the maximum subarray problem is trivial: the result is always the whole array. However, after subtracting a threshold value (such as the average pixel value) from each pixel, so that above-average pixels will be positive and below-average pixels will be negative, the maximum subarray problem can be applied to the modified image to detect bright areas within it.<ref>{{harvtxt|Bae|Takaoka|2006}}; {{harvtxt|Weddell|Read|Thaher|Takaoka|2013}}</ref>
In [[computer vision]], maximum-subarray algorithms are used on bitmap images to detect the brightest area in an image.
 
==Kadane's algorithm==
===EmptyNo empty subarrays admitted===
 
{| align="right" class="wikitable collapsible collapsed"
[[Joseph Born Kadane|Kadane's]] algorithm scans the given array <math>A[1\ldots n]</math> from left to right.
! Example run
|-
| [[File:Kadane run −2,1,−3,4,−1,2,1,−5,4.gif|thumb|500px|Execution of Kadane's algorithm on the [[#top|above]] example array. ''{{color|#0000c0|Blue}}:'' subarray with largest sum ending at ''i''; ''{{color|#00c000|green}}:'' subarray with largest sum encountered so far; a lower case letter indicates an empty array; variable ''i'' is left implicit in Python code.]]
|}
[[Joseph Born Kadane|Kadane's]] original algorithm solves the problem version when empty subarrays are admitted. It scans the given array <math>A[1\ldots n]</math> from left to right.
In the <math>j</math>th step, it computes the subarray with the largest sum ending at <math>j</math>; this sum is maintained in variable <code>current_sum</code>.{{NoteTag
|named <code>MaxEndingHere</code> in {{harvtxt|Bentley|1989}}, and <code>c</code> in {{harvtxt|Gries|1982}}
Line 52 ⟶ 46:
and easily obtained as the maximum of all values of <code>current_sum</code> seen so far, cf. line 7 of the algorithm.
 
As a [[loop invariant]], in the <math>j</math>th step, the old value of <code>current_sum</code> holds the maximum over all <math>i \in \{ 1,\ldots, j-1 \}</math> of the sum <math>A[i]+\cdots+A[j-1]</math>.{{NoteTag
|This sum is <math>0</math> when <math>i=j</math>, corresponding to the empty subarray <math>A[j\ldots j-1]</math>.
}}
Therefore, <code>current_sum</code><math>+A[j]</math>{{NoteTag|
In the Python code below, <math>A[j]</math> is expressed as <code>x</code>, with the index <math>j</math> left implicit.
}}
is the maximum over all <math>i \in \{ 1,\ldots, j-1 \}</math> of the sum <math>A[i]+\cdots+A[j]</math>. To extend the latter maximum to cover also the case <math>i=j+1</math>, it is sufficient to consider also the emptysingleton subarray <math>A[j+1 \; \ldots \; j]</math>. This is done in line 6 by assigning <math>\max(0A[j],</math><code>current_sum</code><math>+A[j])</math> as the new value of <code>current_sum</code>, which after that holds the maximum over all <math>i \in \{ 1, \ldots, j+1 \}</math> of the sum <math>A[i]+\cdots+A[j]</math>.
 
Thus, the problem can be solved with the following code,{{sfn|Bentley|1989|p=74}}{{sfn78,171|ps=. Bentley, like Gries, first introduces the variant admitting empty subarrays, see [[#Empty subarrays admitted|1982|p=211below]], and describes only the changes.}} expressed here in [[Python (programming language)|Python]]:.
 
<syntaxhighlight lang="python" line="1">
def max_subarray(numbers):
"""Find the largest sum of any contiguous subarray."""
best_sum = 0
current_sum = 0
for x in numbers:
current_sum = max(0, current_sum + x)
best_sum = max(best_sum, current_sum)
return best_sum
</syntaxhighlight>
 
This version of the algorithm will return 0 if the input contains no positive elements (including when the input is empty).
 
===No empty subarrays admitted===
For the variant of the problem which disallows empty subarrays, <code>best_sum</code> should be initialized to negative infinity instead{{sfn|Bentley|1989|p=78,171}} and also in the for loop <code>current_sum</code> should be updated as <code>max(x, current_sum + x)</code>.{{NoteTag
|While the latter modification is not mentioned by {{harvtxt|Bentley|1989}}, it achieves maintaining the modified loop invariant <code>current_sum</code><math>=\max_{i \in \{ 1, ..., j \}} A[i]+...+A[j]</math> at the beginning of the <math>j</math>th step.
}}
In that case, if the input contains no positive element, the returned value is that of the largest element (i.e., the value closest to 0), or negative infinity if the input was empty. For correctness, an exception should be raised when the input array is empty, since an empty array has no maximum subarray:
 
<syntaxhighlight lang="python" line="1">
def max_subarray(numbers):
"""Find the largest sum of any contiguous subarray."""
if numbers == []:
raise ValueError('Empty array has no nonempty subarrays')
 
best_sum = float('-inf')
current_sum = 0
Line 95 ⟶ 65:
</syntaxhighlight>
 
If the input contains no positive element, the returned value is that of the largest element (i.e., the value closest to 0), or negative infinity if the input was empty. For correctness, an exception should be raised when the input array is empty, since an empty array has no maximum nonempty subarray. If the array is nonempty, its first element could be used in place of negative infinity, if needed to avoid mixing numeric and non-numeric values.
===Conditionally admitting empty subarrays===
The only case when it matters if empty subarrays are admitted, is if all numbers in the input array are negative. In this case, the maximum subarray will either be empty (when empty subarrays are allowed), or contain the largest number in the input array (when empty subarrays are not allowed).
 
The algorithm can be adapted to the case which allows empty subarrays or to keep track of the starting and ending indices of the maximum subarray.
An alternative algorithm that admits empty subarrays is easily developed from the algorithm given above which does not admit empty subarrays: The only change that is needed is to return <code>max(best_sum, 0)</code> instead of <code>best_sum</code>. It can be seen that this version is correct:
* For an empty input array the previous algorithm will return minus infinity, so this algorithm will return zero, which corresponds to the sum of elements of an empty subarray.
* For an input array with only negative numbers, the previous algorithm will return the largest of the integers, which is negative. So this algorithm will return zero, which corresponds to the sum of elements of an empty subarray.
* For all other cases, there is at least one nonnegative integer in the output, so there is a nonempty subarray for which the sum of the elements is at least 0. Since the sum of the elements is always zero for empty subarrays, it doesn't matter if empty subarrays are admitted or not, so this algorithm correctly returns the same answer as the previous algorithm gives.
 
This algorithm calculates the maximum subarray ending at each position from the maximum subarray ending at the previous position, so it can be viewed as a case of [[dynamic programming]].
This algorithm can also be converted to a version that conditionally admits empty subarrays, based on a parameter: If empty subarrays are admitted, return <code>max(0, best_sum)</code>, otherwise, return <code>best_sum</code>. An exception should be raised the input array is empty but empty subarrays are not admitted:
<syntaxhighlight lang="python" line="1">
def max_subarray(numbers, admit_empty_subarrays=True):
"""Find the largest sum of any contiguous subarray."""
if not(admit_empty_subarrays) and numbers == []:
raise ValueError('Empty array has no nonempty subarrays')
 
===Empty subarrays admitted===
best_sum = float('-inf')
{| align="right" class="wikitable collapsible collapsed"
current_sum = 0
! Example run
for x in numbers:
|-
current_sum = max(x, current_sum + x)
| [[File:Kadane run −2,1,−3,4,−1,2,1,−5,4.gif|thumb|500px|Execution of Kadane's algorithm on the [[#top|above]] example array. ''{{color|#0000c0|Blue}}:'' subarray with largest sum ending at ''i''; ''{{color|#00c000|green}}:'' subarray with largest sum encountered so far; a lower case letter indicates an empty array; variable ''i'' is left implicit in Python code.]]
best_sum = max(best_sum, current_sum)
|}
Kadane's algorithm, as originally published, is for solving the problem variant which allows empty subarrays.{{sfn|Bentley|1989|p=74}}{{sfn|Gries|1982|p=211}}
In such variant, the answer is 0 when the input contains no positive elements (including when the input is empty).
The variant is obtained with two changes in code: in line 3, <code>best_sum</code> should be initialized to 0 to account for the empty subarray <math>A[0 \ldots -1]</math>
<syntaxhighlight lang="python" line start="3">
best_sum = 0;
</syntaxhighlight>
and line 6 in the for loop <code>current_sum</code> should be updated to <code>max(0, current_sum + x)</code>.{{NoteTag
|While {{harvtxt|Bentley|1989}} does not mention this difference, using <code>x</code> instead of <code>0</code> in the [[#No empty subarrays admitted|above]] version without empty subarrays achieves maintaining its loop invariant <code>current_sum</code><math>=\max_{i \in \{ 1, ..., j-1 \}} A[i]+...+A[j-1]</math> at the beginning of the <math>j</math>th step.
}}
<syntaxhighlight lang="python" line start="6">
current_sum = max(0, current_sum + x)
</syntaxhighlight>
 
As a [[loop invariant]], in the <math>j</math>th step, the old value of <code>current_sum</code> holds the maximum over all <math>i \in \{ 1,\ldots, j \}</math> of the sum <math>A[i]+\cdots+A[j-1]</math>.{{NoteTag
if admit_empty_subarrays:
|This sum is <math>0</math> when <math>i=j</math>, corresponding to the empty subarray <math>A[j\ldots j-1]</math>.
best_sum = max(0, best_sum)
}}
 
Therefore, <code>current_sum</code><math>+A[j]</math>
return best_sum
is the maximum over all <math>i \in \{ 1,\ldots, j \}</math> of the sum <math>A[i]+\cdots+A[j]</math>. To extend the latter maximum to cover also the case <math>i=j+1</math>, it is sufficient to consider also the empty subarray <math>A[j+1 \; \ldots \; j]</math>. This is done in line 6 by assigning <math>\max(0,</math><code>current_sum</code><math>+A[j])</math> as the new value of <code>current_sum</code>, which after that holds the maximum over all <math>i \in \{ 1, \ldots, j+1 \}</math> of the sum <math>A[i]+\cdots+A[j]</math>. Machine-verified [[C (programming language)|C]] / [[Frama-C]] code of both variants can be found [[:commons:File:Kadane run −2,1,−3,4,−1,2,1,−5,4.gif#Source code|here]].
</syntaxhighlight>
 
===Computing the best subarray's position===
The algorithm can be modified to keep track of the starting and ending indices of the maximum subarray as well:.
 
<syntaxhighlight lang="python" line="1">
def max_subarray(numbers):
"""Find a contiguous subarray with the largest sum."""
best_sum = 0 # or: float('-inf')
best_start = best_end = 0 # or: None
current_sum = 0
for current_end, x in enumerate(numbers):
if current_sum <= 0:
# Start a new sequence at the current element
current_start = current_end
current_sum = x
else:
# Extend the existing sequence with the current element
current_sum += x
 
if current_sum > best_sum:
best_sum = current_sum
best_start = current_start
best_end = current_end + 1 # the +1 is to make 'best_end' match Python's slice convention (endpoint excluded)
 
return best_sum, best_start, best_end
</syntaxhighlight>
 
In Python, arrays are indexed starting from 0, and slices exclude the endpoint, so that the subarray [22, 33] in the array a=[-11, 22, 33, -44] would be expressed as a[1:3].
 
Because of the way this algorithm uses optimal substructures (the maximum subarray ending at each position is calculated in a simple way from a related but smaller and overlapping subproblem: the maximum subarray ending at the previous position) this algorithm can be viewed as a simple/trivial example of [[dynamic programming]].
===Complexity===
The runtime complexity of Kadane's algorithm is <math>O(n)</math> and its space complexity is <math>O(1)</math>.{{sfn|Bentley|1989|p=74}}{{sfn|Gries|1982|p=211}}
Because of the way this algorithm uses optimal substructures (the maximum subarray ending at each position is calculated in a simple way from a related but smaller and overlapping subproblem: the maximum subarray ending at the previous position) this algorithm can be viewed as a simple/trivial example of [[dynamic programming]].
 
The runtime complexity of Kadane's algorithm is <math>O(n)</math>.{{sfn|Bentley|1989|p=74}}{{sfn|Gries|1982|p=211}}
 
== Generalizations ==
Line 167 ⟶ 115:
{{NoteFoot|30em}}
 
==ReferencesNotes==
{{Reflist}}
 
==References==
*{{citation
| last1 = Alves | first1 = Carlos E. R.
| last2 = Cáceres | first2 = Edson
| last3 = Song | first3 = Siang W.
| editor1-last = Kranzlmüller | editor1-first = Dieter
| editor2-last = Kacsuk | editor2-first = Péter
| editor3-last = Dongarra | editor3-first = Jack J.
| contribution = BSP/CGM Algorithms for Maximum Subsequence and Maximum Subarray
| doi = 10.1007/978-3-540-30218-6_24
| pages = 139–146
| publisher = Springer
| series = Lecture Notes in Computer Science
| title = Recent Advances in Parallel Virtual Machine and Message Passing Interface, 11th European PVM/MPI Users' Group Meeting, Budapest, Hungary, September 19-22, 2004, Proceedings
| volume = 3241
| year = 2004| isbn = 978-3-540-23163-9
}}
*{{citation
| last1 = Backurs
Line 185 ⟶ 150:
| pages = 81:1–81:13
| doi = 10.4230/LIPIcs.ICALP.2016.81
| doi-access = free
| s2cid = 12720136
}}
Line 200 ⟶ 166:
| s2cid=2681670
}}.
*{{citation
| last1 = Bae | first1 = Sung Eun
| last2 = Takaoka | first2 = Tadao
| doi = 10.1093/COMJNL/BXL007
| issue = 3
| journal = The Computer Journal
| pages = 358–374
| title = Improved Algorithms for the \emph{K}-Maximum Subarray Problem
| volume = 49
| year = 2006}}
*{{citation
| last1=Bengtsson
Line 238 ⟶ 214:
}}
*{{citation
| url=http://comjnl.oxfordjournals.org/content/32/2/122.full.pdf
| first=Richard S.
| last= Bird
Line 249 ⟶ 224:
| year=1989
| doi=10.1093/comjnl/32.2.122
}}
}}{{dead link|date=May 2021|bot=medic}}{{cbignore|bot=medic}}
*{{citation
| first1 = Gerth Stølting | last1 = Brodal
| first2 = Allan Grønlund | last2 = Jørgensen
| title = Mathematical Foundations of Computer Science 2007
| contribution = A linear time algorithm for the ''k'' maximal sums problem
| title = Mathematical Foundations of Computer Science
| publisher = Springer-Verlag
| series = Lecture Notes in Computer Science
| volume = 4708 | year = 2007 | pages = 442–453 | doi = 10.1007/978-3-540-74456-6_40
| isbn = 978-3-540-74455-9
}}.
}}.
*{{citation
| url=https://core.ac.uk/download/pdf/82596333.pdf
Line 272 ⟶ 248:
| hdl=1813/6370
}}
*{{citation
| last1 = Ruzzo | first1 = Walter L.
| last2 = Tompa | first2 = Martin
| editor1-last = Lengauer | editor1-first = Thomas
| editor2-last = Schneider | editor2-first = Reinhard
| editor3-last = Bork | editor3-first = Peer
| editor4-last = Brutlag | editor4-first = Douglas L.
| editor5-last = Glasgow | editor5-first = Janice I.
| editor6-last = Mewes | editor6-first = Hans-Werner
| editor7-last = Zimmer | editor7-first = Ralf
| contribution = A Linear Time Algorithm for Finding All Maximal Scoring Subsequences
| contribution-url = https://www.aaai.org/Library/ISMB/1999/ismb99-027.php
| pages = 234–241
| publisher = AAAI
| title = Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology, August 6–10, 1999, Heidelberg, Germany
| year = 1999}}
*{{citation
| first = Tadao | last = Takaoka
Line 293 ⟶ 285:
| pages = 446–452
| doi =
| isbn = 978-0-89871-410-4
| access-date = November 17, 2018
}}
*{{citation
| last1 = Weddell | first1 = Stephen John
| last2 = Read | first2 = Tristan
| last3 = Thaher | first3 = Mohammed
| last4 = Takaoka | first4 = Tadao
| doi = 10.1117/1.JEI.22.4.043011
| issue = 4
| journal = Journal of Electronic Imaging
| page = 043011
| title = Maximum subarray algorithms for use in astronomical imaging
| volume = 22
| year = 2013| bibcode = 2013JEI....22d3011W
}}
 
== External links ==
Line 307 ⟶ 313:
[[Category:Optimization algorithms and methods]]
[[Category:Dynamic programming]]
[[Category:Polynomial-time problems]]
[[Category:Articles with example Python (programming language) code]]