Schlemiel the Painter's algorithm

This is an old revision of this page, as edited by Wookie2u (talk | contribs) at 14:32, 12 May 2008 (The Solution). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Template:Unencyclopedic

In computer science Schlemiel The Painter's Algorithm denotes an algorithm which repeatedly reprocesses content. This reprocessing is inefficient, which becomes problematic with repetition. The problem most commonly appears in string and buffer processing.

The algorithm is named after an anecdote about a person who is painting a fence. His progress keeps diminishing because he keeps a bucket of paint at one end of the fence, instead of moving the bucket to the part of the fence he is painting.

An example

Repeated calls to ANSI-C's strcat function is an example of Schlemiel's algorithm.

An implementation of strcat might look something like this:

char *strcat(char *dest, const char *src) {
  // remember the start of the destination string
  char *startOfDest = dest;
  // make dest point at the null terminator at the end of the destination string
  while(*dest) {                                         // <-- EXPENSIVE
    dest++; 
  }
  // append the source string to the destination string
  for( ; *dest=*src; dest++,src++);
  // null terminate the destination string
  dest = '\0';
  // return the start of the destination string
  return startOfDest;
}

The strcat function is innocuous, unless it is called repeatedly.

Here is an example of how not to use strcat:

// builds a 1 Megabyte string of "word, word, word, ....";
void testPerformanceOfStrcat() {
  final int BUFFER_SIZE = 1024*1024;
  char word[] = "word"; 
  char buffer[BUFFER_SIZE] = "";
  while( strlen(buffer) < BUFFER_SIZE-strlen(word) ) {   // <-- EXPENSIVE
    if(*buffer) strcat(buffer, ", ");                    // <-- EXPENSIVE
    strcat(buffer, word);                                // <-- EXPENSIVE
  }
}

The estimated cost of testPerformanceOfStrcat is:

  • NOTE: "Ancillary costs" like the cost of making a function call are ignored.
  • The string "word, " is 6 characters in length.
  • The number of times through the while loop is int(1024*1024/6) = 174,762
  • The cost of strlen(buffer) in the while condition is fib(174,762) = 15,270,965,703 cycles.
  • The cost of strlen(word) in the while condition is 4 cycles per loop = 174,762 * 4 = 699,048
  • The cost of finding the-end-dest in strcat(buffer, ", ") is 15,270,965,703 cycles.
  • The cost of finding the-end-dest in strcat(buffer, word) is 15,270,965,703 cycles.
  • The cost of copying the string is 6 cycles per loop = 174,762 * 6 = 1,048,572
  • The TOTAL COST of testPerformanceOfStrcat is 15,270,965,710 * 3 + 699,048 + 1,048,572 = 45,814,644,750 or about 46 billion cycles.
  • 46 billion cycles on a 2GHz CPU is about 23 seconds.

The Solution

The solution to the performance degradation is to remember the-end-of-the-string. The following reimplimentation the traditional strcat function returns this value, which may then be passed to the next invocation of stringcat, eliminating the overhead of repeatedly seeking the concatenation point.

char *stringcat(char *dest, const char *src) {
  // make dest point at the null terminator at the end of the destination string
  while(*dest) {                                         // <-- EXPENSIVE, SO AVOID UYSING IT REPEATEDLY.
    dest++; 
  }
  // append the source string to the destination string
  for( ; *dest=*src; dest++,src++);
  // null terminate the destination string
  dest = '\0';
  // return a pointer to the null terminator at the end of the destination string
  return dest;
}

The following test harness excersises the enhanced stringcat function.

// builds a 1Meg string of "word, word, word, ...."
char *testPerformanceOfStringcat() {
  final int BUFFER_SIZE = 1024*1024;
  char word[] = "word"; 
  char nextWord[] = ", word"; 
  char buffer[BUFFER_SIZE] = "";
  char *last = buffer+BUFFER_SIZE-strlen(word)-strlen(", "); // <-- Do this once!
  char *endOfBuffer = stringcat(buffer, word);               // <-- Do this once!
  while( endOfBuffer < last ) {                              // <-- 15 billion times cheaper
    endOfBuffer = strcat(endOfBuffer, nextWord);             // <-- 30 billion times cheaper
  }
}

The estimated cost of testPerformanceOfStringcat is:

  • The number of times through the loop is still int(1024*1024/6) = 174,762
  • The overhead of finding the-end-of-dest in the while condition has been eliminated.
  • The overhead of finding the-end-of-dest in strcat(endOfBuffer, word) has been eliminated
  • The cost of copying the string, which is 6 cycles per loop = 174,762 * 6 = 1,048,572
  • The TOTAL COST of testPerformanceOfStringcat is down to 174,762 + 1,048,572 = 1,223,334
  • 1.2 million cycles on a 2GHz CPU is 0.6 milliseconds.

References

  • [1] Joel Spolsky's article "Back to Basics"