Subset sum problem: Difference between revisions

Content deleted Content added
BrainStack (talk | contribs)
Link suggestions feature: 3 links added.
Line 31:
*The input integers are positive, and ''T'' = sum(''S'')/2. This can also be proved by reduction from the general variant; see [[partition problem]].
 
The analogous counting problem #SSP, which asks to enumerate the number of subsets summing to the target, is [[Sharp P complete|#P-complete]].<ref>[[Yuval Filmus|Filmus, Yuval]] (30 January 2016). [https://cs.stackexchange.com/a/52466 Answer] to: "[https://cs.stackexchange.com/q/52453 Is there a known, fast algorithm for counting all subsets that sum to below a certain number?]". ''Theoretical Computer Science [[Stack Exchange]].'' Note that Filmus' citation in support of the claim (Faliszewski, Piotr; Hemaspaandra, Lane (2009). "The complexity of power-index comparison". ''Theoretical Computer Science''. Elsevier. '''410''': 101-107. DOI&nbsp;[http://dx.doi.org/10.1016/j.tcs.2008.09.034 10.1016/j.tcs.2008.09.034]) does not in fact prove the claim, instead directing readers to another citation ([[Christos Papadimitriou|Papadimitriou, Christos]] (1994). [https[iarchive://archive.org/details/computationalcom0000papa/ |''Computational Complexity'']]. Addison-Wesley: Reading, MA. Chapter&nbsp;9. {{ISBN|0-201-53082-1}}&nbsp;&mdash#x2014; via the [[Internet Archive]]), which does not explicitly prove the claim either. Papadimitriou's proof that SSP is NP-complete via reduction of [[3SAT]] does, however, generalize to a reduction from [[Sharp-SAT|#3SAT]] to #SSP.</ref>
 
== Exponential time algorithms ==
Line 40:
The most [[naive solution|naïve algorithm]] would be to cycle through all subsets of ''n'' numbers and, for every one of them, check if the subset sums to the right number. The running time is of order <math>O(2^n \cdot n)</math>, since there are <math>2^n</math> subsets and, to check each subset, we need to sum at most ''n'' elements.
 
The algorithm can be implemented by [[depth-first search]] of a [[binary tree]]: each level in the tree corresponds to an input number; the left branch corresponds to excluding the number from the set, and the right branch corresponds to including the number (hence the name Inclusion-Exclusion). The memory required is <math>O(n)</math>. The run-time can be improved by several heuristics:<ref name=":0" />
 
* Process the input numbers in descending order.
Line 47:
 
=== Horowitz and Sahni ===
In 1974, Horowitz and [[Sartaj Sahni|Sahni]]<ref>{{cite journal|last1=Horowitz|first1=Ellis|title=Computing partitions with applications to the knapsack problem|url=https://ecommons.cornell.edu/bitstream/1813/5989/1/72-134.pdf |archive-url=https://ghostarchive.org/archive/20221009/https://ecommons.cornell.edu/bitstream/1813/5989/1/72-134.pdf |archive-date=2022-10-09 |url-status=live|journal=[[Journal of the Association for Computing Machinery]]|volume=21|issue=2|pages=277–292|year=1974|doi=10.1145/321812.321823|mr=0354006|last2=Sahni|first2=Sartaj|author2-link=Sartaj Sahni|hdl=1813/5989|s2cid=16866858|hdl-access=free}}</ref> published a faster exponential-time algorithm, which runs in time <math>O( 2^{n/2}\cdot (n/2))</math>, but requires much more space - <math>O( 2^{n/2})</math>. The algorithm splits arbitrarily the ''n'' elements into two sets of <math>n/2</math> each. For each of these two sets, it stores a list of the sums of all <math>2^{n/2}</math> possible subsets of its elements. Each of these two lists is then sorted. Using even the fastest comparison [[sorting algorithm]], Mergesort for this step would take time <math>O(2^{n/2}n)</math>. However, given a sorted list of sums for <math>k</math> elements, the list can be expanded to two sorted lists with the introduction of a (<math>k+1</math>)th element, and these two sorted lists can be merged in time <math>O(2^k)</math>. Thus, each list can be generated in sorted form in time <math>O(2^{n/2})</math>. Given the two sorted lists, the algorithm can check if an element of the first array and an element of the second array sum up to ''T'' in time <math>O(2^{n/2})</math>. To do that, the algorithm passes through the first array in decreasing order (starting at the largest element) and the second array in increasing order (starting at the smallest element). Whenever the sum of the current element in the first array and the current element in the second array is more than ''T'', the algorithm moves to the next element in the first array. If it is less than ''T'', the algorithm moves to the next element in the second array. If two elements that sum to ''T'' are found, it stops. (The sub-problem for two elements sum is known as two-sum.<ref>{{Cite web |title=The Two-Sum Problem |url=https://www3.cs.uic.edu/pub/CS211/LabsS18/Two-Sum.pdf |archive-url=https://ghostarchive.org/archive/20221009/https://www3.cs.uic.edu/pub/CS211/LabsS18/Two-Sum.pdf |archive-date=2022-10-09 |url-status=live}}</ref>)
 
=== Schroeppel and Shamir ===
Line 74:
* (''i''+1, ''s''+<math>x_{i+1}</math>), implying that <math>x_{i+1}</math> is included in the subset.
 
Starting from the initial state (0, 0), it is possible to use any graph [[search algorithm]] (e.g. [[Breadth-first search|BFS]]) to search the state (''N'', ''T''). If the state is found, then by backtracking we can find a subset with a sum of exactly ''T''.
 
The run-time of this algorithm is at most linear in the number of states. The number of states is at most ''N'' times the number of different possible sums. Let {{mvar|A}} be the sum of the negative values and {{mvar|B}} the sum of the positive values; the number of different possible sums is at most ''B''-''A'', so the total runtime is in <math>O(N \cdot (B-A))</math>. For example, if all input values are positive and bounded by some constant ''C'', then ''B'' is at most ''N C'', so the time required is <math>O(N^{2}C)</math>.