#REDIRECT [[Joel Spolsky#Schlemiel the Painter's algorithm]]
{{Essay-like|date=May 2008}}{{unencyclopedic}}
{{Redirect category shell|1=
In computer science '''Schlemiel The Painter's Algorithm''' denotes an [[algorithm]] which repeatedly reprocesses content. This '''re'''processing is inefficient, which becomes problematic with repetition. The problem most commonly appears in string and buffer processing. The concept was coined by software engineer and computer science writer [[Joel Spolsky]], who used a Yiddish joke to illustrate the problem by way of [[analogy]]. In the joke, Schlemiel (also rendered Shlemiel) is painting a fence; his boss notes that he is slowing down alarmingly. When challenged, Schlemiel complains that he can't help it - "every day I get farther and farther away from the paint can!"<ref>http://www.joelonsoftware.com/articles/fog0000000319.html</ref>
{{R from Merge}}
{{R to Section}}
The concept of the "schlemiel" struck a chord with programmers, becoming part of their vernacular.<ref>http://discuss.techinterview.org/default.asp?interview.11.246942.7</ref><ref>http://www.codinghorror.com/blog/archives/000957.html</ref> It has been cited as a good example of the entertaining, crystal-clear style that makes Spolsky's work accessible and enjoyable to non-programmers, making him one of the few "programming pros" to become notable outside of their field.<ref>http://dir.salon.com/story/tech/feature/2004/12/09/spolsky/</ref>
}}
== An example ==
Repeated calls to [[ANSI_C|ANSI-C]]'s [[strcat]] function is an example of Schlemiel's algorithm.
An implementation of ''strcat'' might look something like this:
<source lang="c">
char *strcat(char *dest, const char *src) {
// remember the start of the destination string
char *startOfDest = dest;
// make dest point at the null terminator at the end of the destination string
while(*dest) { // <-- EXPENSIVE
dest++;
}
// append the source string to the destination string
for( ; *dest=*src; dest++,src++);
// null terminate the destination string
dest = '\0';
// return the start of the destination string
return startOfDest;
}
</source>
The ''strcat'' function is innocuous, unless it is called '''repeatedly'''.
Here is an example of how '''not''' to use ''strcat'':
<source lang="c">
// builds a 1 Megabyte string of "word, word, word, ....";
void testPerformanceOfStrcat() {
final int BUFFER_SIZE = 1024*1024;
char word[] = "word";
char buffer[BUFFER_SIZE] = "";
while( strlen(buffer) < BUFFER_SIZE-strlen(word) ) { // <-- EXPENSIVE
if(*buffer) strcat(buffer, ", "); // <-- EXPENSIVE
strcat(buffer, word); // <-- EXPENSIVE
}
}
</source>
The estimated cost of ''testPerformanceOfStrcat'' is:
* NOTE: "Ancillary costs" like the cost of making a function call are ignored.
* The string "word, " is 6 characters in length.
* The number of times through the ''while'' loop is int(1024*1024/6) = 174,762
* The cost of ''strlen(buffer)'' in the ''while'' condition is [[Fibonacci_number|fib]](174,762) = 15,270,965,703 cycles.
* The cost of ''strlen(word)'' in the ''while'' condition is 4 cycles per loop = 174,762 * 4 = 699,048
* The cost of finding the-end-dest in ''strcat(buffer, ", ")'' is 15,270,965,703 cycles.
* The cost of finding the-end-dest in ''strcat(buffer, word)'' is 15,270,965,703 cycles.
* The cost of copying the string is 6 cycles per loop = 174,762 * 6 = 1,048,572
* The TOTAL COST of ''testPerformanceOfStrcat'' is 15,270,965,710 * 3 + 699,048 + 1,048,572 = 45,814,644,750 or about 46 billion cycles.
* 46 billion cycles on a 2GHz CPU is about 23 seconds.
== The Solution ==
The solution to the performance degradation is to remember the-end-of-the-string. The following reimplimentation the traditional ''strcat'' function returns this value, which may then be passed to the '''next''' invocation of ''stringcat'', eliminating the overhead of '''repeatedly''' seeking the concatenation point.
<source lang="c">
char *stringcat(char *dest, const char *src) {
// make dest point at the null terminator at the end of the destination string
while(*dest) { // <-- EXPENSIVE, SO AVOID UYSING IT REPEATEDLY.
dest++;
}
// append the source string to the destination string
for( ; *dest=*src; dest++,src++);
// null terminate the destination string
dest = '\0';
// return a pointer to the null terminator at the end of the destination string
return dest;
}
</source>
The following test harness excersises the enhanced ''stringcat'' function.
<source lang="c">
// builds a 1Meg string of "word, word, word, ...."
char *testPerformanceOfStringcat() {
final int BUFFER_SIZE = 1024*1024;
char word[] = "word";
char nextWord[] = ", word";
char buffer[BUFFER_SIZE] = "";
char *last = buffer+BUFFER_SIZE-strlen(word)-strlen(", "); // <-- Do this once!
char *endOfBuffer = stringcat(buffer, word); // <-- Do this once!
while( endOfBuffer < last ) { // <-- 15 billion times cheaper
endOfBuffer = strcat(endOfBuffer, nextWord); // <-- 30 billion times cheaper
}
}
</source>
The estimated cost of ''testPerformanceOfStringcat'' is:
* The number of times through the loop is still int(1024*1024/6) = 174,762
* The overhead of finding the-end-of-dest in the ''while'' condition has been eliminated.
* The overhead of finding the-end-of-dest in ''strcat(endOfBuffer, word)'' has been eliminated
* The cost of copying the string, which is 6 cycles per loop = 174,762 * 6 = 1,048,572
* The TOTAL COST of ''testPerformanceOfStringcat'' is down to 174,762 + 1,048,572 = 1,223,334
* 1.2 million cycles on a 2GHz CPU is 0.6 milliseconds.
== References ==
* [http://www.joelonsoftware.com/articles/fog0000000319.html] Joel Spolsky's article "Back to Basics"
|