Divide-and-conquer algorithm: Difference between revisions

Content deleted Content added
m Improve text and math symbol format
Line 31:
 
=== Solving difficult problems ===
Divide and conquer is a powerful tool for solving conceptually difficult problems: all it requires is a way of breaking the problem into sub-problems, of solving the trivial cases and of combining sub-problems to the original problem. Similarly, decrease and conquer only requires reducing the problem to a single smaller problem, such as the classic [[Tower of Hanoi]] puzzle, which reduces moving a tower of height ''<math>n''</math> to moving a tower of height ''<math>n''&nbsp;−&nbsp;-1</math>.
 
=== Algorithm efficiency ===
The divide-and-conquer paradigm often helps in the discovery of efficient algorithms. It was the key, for example, to Karatsuba's fast multiplication method, the quicksort and mergesort algorithms, the [[Strassen algorithm]] for matrix multiplication, and fast Fourier transforms.
 
In all these examples, the D&C approach led to an improvement in the [[asymptotic complexity|asymptotic cost]] of the solution. For example, if (a) the [[Recursion (computer science)|base cases]] have constant-bounded size, the work of splitting the problem and combining the partial solutions is proportional to the problem's size <math>n</math>, and (b) there is a bounded number <math>p</math> of sub-problems of size ~ <math>n/p</math> at each stage, then the cost of the divide-and-conquer algorithm will be <math>O(n\log_{p}n)</math>.
For example, if (a) the [[Recursion (computer science)|base cases]] have constant-bounded size, the work of splitting the problem and combining the partial solutions is proportional to the problem's size ''n'', and (b) there is a bounded number ''p'' of sub-problems of size ~ ''n''/''p'' at each stage, then the cost of the divide-and-conquer algorithm will be O(''n'' log<sub>''p''</sub>''n'').
 
=== Parallelism ===
Line 43 ⟶ 42:
 
=== Memory access ===
Divide-and-conquer algorithms naturally tend to make efficient use of [[memory cache]]s. The reason is that once a sub-problem is small enough, it and all its sub-problems can, in principle, be solved within the cache, without accessing the slower main memory. An algorithm designed to exploit the cache in this way is called ''[[cache-oblivious algorithm|cache-oblivious]]'', because it does not contain the cache size as an explicit parameter.<ref name="cahob">{{cite journal | author = M. Frigo |author2=C. E. Leiserson |author3=H. Prokop | title = Cache-oblivious algorithms | journal = Proc. 40th Symp. On the Foundations of Computer Science |pages=285–297 | year = 1999|url=https://dspace.mit.edu/bitstream/handle/1721.1/80568/43558192-MIT.pdf;sequence=2|doi=10.1109/SFFCS.1999.814600 |isbn=0-7695-0409-4 |s2cid=62758836 }}</ref> Moreover, D&C algorithms can be designed for important algorithms (e.g., sorting, FFTs, and matrix multiplication) to be ''optimal'' cache-oblivious algorithms–they use the cache in a probably optimal way, in an asymptotic sense, regardless of the cache size. In contrast, the traditional approach to exploiting the cache is ''blocking'', as in [[loop nest optimization]], where the problem is explicitly divided into chunks of the appropriate size—this can also use the cache optimally, but only when the algorithm is tuned for the specific cache sizes of a particular machine.
Moreover, D&C algorithms can be designed for important algorithms (e.g., sorting, FFTs, and matrix multiplication) to be ''optimal'' cache-oblivious algorithms–they use the cache in a probably optimal way, in an asymptotic sense, regardless of the cache size. In contrast, the traditional approach to exploiting the cache is ''blocking'', as in [[loop nest optimization]], where the problem is explicitly divided into chunks of the appropriate size—this can also use the cache optimally, but only when the algorithm is tuned for the specific cache sizes of a particular machine.
 
The same advantage exists with regards to other hierarchical storage systems, such as [[Non-uniform memory access|NUMA]] or [[virtual memory]], as well as for multiple levels of cache: once a sub-problem is small enough, it can be solved within a given level of the hierarchy, without accessing the higher (slower) levels.
Line 60 ⟶ 58:
 
=== Stack size ===
In recursive implementations of D&C algorithms, one must make sure that there is sufficient memory allocated for the recursion stack, otherwise the execution may fail because of [[stack overflow]]. D&C algorithms that are time-efficient often have relatively small recursion depth. For example, the quicksort algorithm can be implemented so that it never requires more than <math>\log_2 n</math> nested recursive calls to sort <math>n</math> items.
 
Stack overflow may be difficult to avoid when using recursive procedures, since many compilers assume that the recursion stack is a contiguous area of memory, and some allocate a fixed amount of space for it. Compilers may also save more information in the recursion stack than is strictly necessary, such as return address, unchanging parameters, and the internal variables of the procedure. Thus, the risk of stack overflow can be reduced by minimizing the parameters and internal variables of the recursive procedure or by using an explicit stack structure.
Line 67 ⟶ 65:
In any recursive algorithm, there is considerable freedom in the choice of the ''base cases'', the small subproblems that are solved directly in order to terminate the recursion.
 
Choosing the smallest or simplest possible base cases is more elegant and usually leads to simpler programs, because there are fewer cases to consider and they are easier to solve. For example, an FFT algorithm could stop the recursion when the input is a single sample, and the quicksort list-sorting algorithm could stop when the input is the empty list; in both examples there is only one base case to consider, and it requires no processing.
 
On the other hand, efficiency often improves if the recursion is stopped at relatively large base cases, and these are solved non-recursively, resulting in a [[hybrid algorithm]]. This strategy avoids the overhead of recursive calls that do little or no work, and may also allow the use of specialized non-recursive algorithms that, for those base cases, are more efficient than explicit recursion. A general procedure for a simple hybrid recursive algorithm is ''short-circuiting the base case'', also known as ''[[arm's-length recursion]]''. In this case whether the next step will result in the base case is checked before the function call, avoiding an unnecessary function call. For example, in a tree, rather than recursing to a child node and then checking whether it is null, checking null before recursing; this avoids half the function calls in some algorithms on binary trees. Since a D&C algorithm eventually reduces each problem or sub-problem instance to a large number of base instances, these often dominate the overall cost of the algorithm, especially when the splitting/joining overhead is low. Note that these considerations do not depend on whether recursion is implemented by the compiler or by an explicit stack.
 
Thus, for example, many library implementations of quicksort will switch to a simple loop-based [[insertion sort]] (or similar) algorithm once the number of items to be sorted is sufficiently small. Note that, if the empty list were the only base case, sorting a list with ''<math>n''</math> entries would entail maximally ''<math>n''</math> quicksort calls that would do nothing but return immediately. Increasing the base cases to lists of size 2 or less will eliminate most of those do-nothing calls, and more generally a base case larger than 2 is typically used to reduce the fraction of time spent in function-call overhead or stack manipulation.
 
Alternatively, one can employ large base cases that still use a divide-and-conquer algorithm, but implement the algorithm for predetermined set of fixed sizes where the algorithm can be completely [[loop unwinding|unrolled]] into code that has no recursion, loops, or [[Conditional (programming)|conditionals]] (related to the technique of [[partial evaluation]]). For example, this approach is used in some efficient FFT implementations, where the base cases are unrolled implementations of divide-and-conquer FFT algorithms for a set of fixed sizes.<ref name="fftw">{{cite journal | author = Frigo, M. |author2=Johnson, S. G. | url = http://www.fftw.org/fftw-paper-ieee.pdf | title = The design and implementation of FFTW3 | journal = Proceedings of the IEEE | volume = 93 | issue = 2 |date=February 2005 | pages = 216–231 | doi = 10.1109/JPROC.2004.840301|citeseerx=10.1.1.66.3097 |s2cid=6644892 }}</ref> [[Source-code generation]] methods may be used to produce the large number of separate base cases desirable to implement this strategy efficiently.<ref name="fftw"/>