This article is written like a personal reflection, personal essay, or argumentative essay that states a Wikipedia editor's personal feelings or presents an original argument about a topic. |
This article or section is in a state of significant expansion or restructuring. You are welcome to assist in its construction by editing it as well. If this article or section has not been edited in several days, please remove this template. If you are the editor who added this template and you are actively editing, please be sure to replace this template with {{in use}} during the active editing session. Click on the link for template parameters to use.
This redirect was last edited by Wookie2u (talk | contribs) 17 years ago. (Update timer) |
Introduction
In computer science "Schlemiel The Painter's Algorithm" denotes an algorithm which repeatedly reprocesses content. This reprocessing is extremely inefficient, which becomes problematic with repetition. The problem most commonly appears in string and buffer processing.
Who is Schlemiel?
Schlemiel got a job painting the white-lines down the middle of the road. On his first day Schlemiel painted 300 yards of road, which is good. On his second day Schlemiel painted 150 yards, which isn't too bad. On the third day Schlemiel painted only 30 yards, which is grounds for dismissal.
The boss called Schlemiel into the office for some counselling. The boss said "Schlemiel, I think you are good worker, so why is that the longer you work the less you get done?". "Well" Schlemiel replied "the further I get from my can of paint the further I have to run to the end of the line, and all the way back again to replenish my brush." The boss replied "Schlemiel you need to take your paint with you!"
The principle is childishly simple, but it is easy too easy to miss this scenario when implementing a complex software product, especially when the scenario spans modules.
The Problem
Repeated calls to ANSI-C's strcat function is a classic example of Schlemiel's algorithm.
An implementation of strcat might look something like this:
char *strcat(char *dest, const char *src) {
// remember the start of the destination string
char *startOfDest = dest;
// make dest point at the null terminator at the end of the destination string
while(*dest) { // <-- EXPENSIVE
dest++;
}
// append the source string to the destination string
while(*src) {
dest++ = src++;
}
// null terminate the destination string
dest = '\0';
// return the start of the destination string
return startOfDest;
}
The strcat function is innocious, unless it is called repeatedly.
Here is an example of how not to use strcat:
// builds a 1 Megabyte string of "word, word, word, ....";
void testPerformanceOfStrcat() {
final int BUFFER_SIZE = 1024*1024;
char word[] = "word";
char buffer[BUFFER_SIZE] = "";
while( strlen(buffer) < BUFFER_SIZE-strlen(word) ) { // <-- EXPENSIVE
if(*buffer) strcat(buffer, ", "); // <-- EXPENSIVE
strcat(buffer, word); // <-- EXPENSIVE
}
}
The estimated cost of testPerformanceOfStrcat is:
- NOTE: "Ancillary costs" like the cost of making a function call are ignored.
- The string "word, " is 6 characters in length.
- The number of times through the while loop is int(1024*1024/6) = 174,762
- The cost of strlen(buffer) in the while condition is fib(174,762) = 15,270,965,703 cycles.
- The cost of strlen(word) in the while condition is 4 cycles per loop = 174,762 * 4 = 699,048
- The cost of finding the-end-dest in strcat(buffer, ", ") is 15,270,965,703 cycles.
- The cost of finding the-end-dest in strcat(buffer, word) is 15,270,965,703 cycles.
- The cost of copying the string is 6 cycles per loop = 174,762 * 6 = 1,048,572
- The TOTAL COST of testPerformanceOfStrcat is 15,270,965,710 * 3 + 699,048 + 1,048,572 = 45,814,644,750 or about 46 billion cycles.
- 46 billion cycles on a 2GHz CPU is about 23 seconds, which is totally unaceptable.
The Solution
The solution to Schlemiel's performance degradation is obvious: "Schlemiel, take your paint with you". The same principle can be applied to strcat. The program needs to remember the-___location-of-end-of-the-string, to avoid the expense of repeatedlt finding it.
The below stringcat method returns the-new-end-of-the-destination-string.
char *stringcat(char *dest, const char *src) {
// make dest point at the null terminator at the end of the destination string
while(*dest) { // <-- EXPENSIVE, SO AVOID UYSING IT REPEATEDLY.
dest++;
}
// append the source string to the destination string
while(*src) {
dest++ = src++;
}
// null terminate the destination string
dest = '\0';
// return a pointer to the null terminator at the end of the destination string
return dest;
}
The stringcat function is the same as the traditional strcat, except that it returns the-new-end-of-the-destination-string which may then be passed to the next invocation of stringcat, eliminating the overhead of repeatedly finding the-end-of-the-destination-string.
The following test harness excersises the new stringcat function.
// builds a 1Meg string of "word, word, word, ...."
char *testPerformanceOfStringcat() {
final int BUFFER_SIZE = 1024*1024;
char word[] = "word";
char nextWord[] = ", word";
char buffer[BUFFER_SIZE] = "";
char *last = buffer+BUFFER_SIZE-strlen(word)-strlen(", "); // <-- Do this once!
char *endOfBuffer = stringcat(buffer, word); // <-- Do this once!
while( endOfBuffer < last ) { // <-- 15 billion times cheaper
endOfBuffer = strcat(endOfBuffer, nextWord); // <-- 30 billion times cheaper
}
}
The estimated cost of testPerformanceOfStringcat is:
- The number of times through the loop is still int(1024*1024/6) = 174,762
- The overhead of finding the-end-of-dest in the while condition has been eliminated.
- The overhead of finding the-end-of-dest in strcat(endOfBuffer, word) has been eliminated
- The cost of copying the string, which is 6 cycles per loop = 174,762 * 6 = 1,048,572
- The TOTAL COST of testPerformanceOfStringcat is down to 174,762 + 1,048,572 = 1,223,334
- 1.2 million cycles on a 2GHz CPU is 0.6 milliseconds, which is acceptable.
Conclusion
Schlemiel's algorithm is a common cause of nonperformant, unscalable software implementations. The solution is to eliminate content reprocessing.
References
- [1] Joel Spolsky's article "Back to Basics"