#REDIRECT [[Joel Spolsky#Schlemiel the Painter's algorithm]]
{{essay-like}}{{unencyclopedic}}
{{underconstruction}}
{{Redirect category shell|1=
== Introduction ==
{{R from Merge}}
{{R to Section}}
In computer science "Schlemiel The Painter's Algorithm" denotes an algorithm which repeatedly reprocesses content. This '''re'''processing is extremely inefficient, which becomes problematic with repetition. The problem most commonly appears in string and buffer processing.
}}
== So who was Schlemiel? ==
So who was Schlemiel The Painter? Well, Schlemiel got a job painting the white line down the middle of the road. On his first day Schlemiel painted 300 yards of road, and his boss rejoiced. On his second day, Schlemiel painted 150 yards. By the end of the third day Schlemiel was going nowhere, and the boss called Schlemiel into the office for some "counselling".
The boss said "Schlemiel, I think you are good worker. You are bright and you work hard. So why is that the longer you work the less you get done?". "Well" Schlemiel replied "yesterday afternoon it took me two hours to run from my paint tin to the end of the line, and two hours to run back again to replenish my brush, so I gave up and went home." The boss thought for a few seconds, then said (quit calmly) "Schlemiel you need to '''take your paint with you!'''"
== The Problem ==
Repeated calls to [[ANSI_C|ANSI-C]]'s [[strcat]] function is a classic example of Schlemiel's algorithm.
An implementation of ''strcat'' might look something like this:
<source lang="c">
char *strcat(char *dest, const char *src) {
// remember the start of the destination string
char *startOfDest = dest;
// make dest point at the null terminator at the end of the destination string
while(*dest) { // <-- EXPENSIVE
dest++;
}
// append the source string to the destination string
while(*src) {
dest++ = src++;
}
// null terminate the destination string
dest = '\0';
// return the start of the destination string
return startOfDest;
}
</source>
The ''strcat'' function is innocious, unless it is called '''repeatedly'''.
Here is an example of how '''not''' to use ''strcat'':
<source lang="c">
// builds a 1 Megabyte string of "word, word, word, ....";
void testPerformanceOfStrcat() {
final int BUFFER_SIZE = 1024*1024;
char word[] = "word";
char buffer[BUFFER_SIZE] = "";
while( strlen(buffer) < BUFFER_SIZE-strlen(word) ) { // <-- EXPENSIVE
if(*buffer) strcat(buffer, ", "); // <-- EXPENSIVE
strcat(buffer, word); // <-- EXPENSIVE
}
}
</source>
The estimated cost of ''testPerformanceOfStrcat'' is:
* NOTE: "Ancillary costs" like the cost of making a function call are ignored.
* The string "word, " is 6 characters in length.
* The number of times through the ''while'' loop is int(1024*1024/6) = 174,762
* The cost of ''strlen(buffer)'' in the ''while'' condition is [[Fibonacci_number|fib]](174,762) = 15,270,965,703 cycles.
* The cost of ''strlen(word)'' in the ''while'' condition is 4 cycles per loop = 174,762 * 4 = 699,048
* The cost of finding the-end-dest in ''strcat(buffer, ", ")'' is 15,270,965,703 cycles.
* The cost of finding the-end-dest in ''strcat(buffer, word)'' is 15,270,965,703 cycles.
* The cost of copying the string is 6 cycles per loop = 174,762 * 6 = 1,048,572
* The TOTAL COST of ''testPerformanceOfStrcat'' is 15,270,965,710 * 3 + 699,048 + 1,048,572 = 45,814,644,750 or about 46 billion cycles.
* 46 billion cycles on a 2GHz CPU is about 23 seconds, which is totally unaceptable.
== The Solution ==
The solution to Schlemiel's performance degradation is obvious: "Schlemiel, take your paint with you". The same principle can be applied to ''strcat''. The program needs to remember the-___location-of-end-of-the-string, to avoid the expense of repeatedlt finding it.
The below ''stringcat'' method returns the-new-end-of-the-destination-string.
<source lang="c">
char *stringcat(char *dest, const char *src) {
// make dest point at the null terminator at the end of the destination string
while(*dest) { // <-- EXPENSIVE, SO AVOID UYSING IT REPEATEDLY.
dest++;
}
// append the source string to the destination string
while(*src) {
dest++ = src++;
}
// null terminate the destination string
dest = '\0';
// return a pointer to the null terminator at the end of the destination string
return dest;
}
</source>
Our new ''stringcat'' function is basically the same, it just returns the-new-end-of-the-destination-string instead of the-start-of-the-destination-string... So what? Well so the '''next time''' we call ''stringcat'' we can pass it the-end-of-the-destination-string, so ''stringcat'' doesn't have to waste time finding it. Pretty clever huh!
Now lets write a test harness to examine ''stringcat'''s (hopefully improved) performance.
<source lang="c">
// builds a 1Meg string of "word, word, word, ...."
char *testPerformanceOfStringcat() {
final int BUFFER_SIZE = 1024*1024;
char word[] = "word";
char buffer[BUFFER_SIZE] = "";
char *last = buffer+BUFFER_SIZE-strlen(word)-strlen(", "); // <-- Do this once!
char *endOfBuffer = stringcat(buffer, word); // <-- Do this once!
while( endOfBuffer < last ) { // <-- 15 billion times cheaper
endOfBuffer = strcat(endOfBuffer, word); // <-- 30 billion times cheaper
}
}
</source>
The estimated cost of ''testPerformanceOfStringcat'' is:
* The number of times through the loop is still int(1024*1024/6) = 174,762
* The overhead of finding the-end-of-dest in the ''while'' condition has been eliminated.
* The overhead of finding the-end-of-dest in ''strcat(endOfBuffer, word)'' has been eliminated
* The cost of copying the string, which is 6 cycles per loop = 174,762 * 6 = 1,048,572
* The TOTAL COST of ''testPerformanceOfStringcat'' is down to 174,762 + 1,048,572 = 1,223,334
* 1.2 million cycles on a 2GHz CPU is 0.6 milliseconds, which is acceptable.
== Conclusion ==
"Schlemiel The Painter" algorithms are a common cause of slow unscalable software implementations. "Take your paint with you" is up there in the panthion of things every budding computer scientist needs to know, along with [[KISS_principle|the KISS principle]], and character encoding [http://www.joelonsoftware.com/articles/Unicode.html].
== References ==
* [http://www.joelonsoftware.com/articles/fog0000000319.html] Joel Spolsky's article "Back to Basics"
|