Schlemiel the Painter's algorithm: Difference between revisions

Content deleted Content added
Wookie2u (talk | contribs)
m +{{Redirect category shell}} for multiple-{{R}} #Rs using AWB
 
(53 intermediate revisions by 38 users not shown)
Line 1:
#REDIRECT [[Joel Spolsky#Schlemiel the Painter's algorithm]]
{{Essay-like|date=May 2008}}{{unencyclopedic}}
 
{{Redirect category shell|1=
In computer science '''Schlemiel The Painter's Algorithm''' denotes an [[algorithm]] which repeatedly reprocesses content. This '''re'''processing is inefficient, which becomes problematic with repetition. The problem most commonly appears in string and buffer processing.
{{R from Merge}}
 
{{R to Section}}
The algorithm is named after an anecdote about a person who is painting a fence. His progress keeps diminishing because he keeps a bucket of paint at one end of the fence, instead of moving the bucket to the part of the fence he is painting.
}}
 
== An example ==
 
Repeated calls to [[ANSI_C|ANSI-C]]'s [[strcat]] function is an example of Schlemiel's algorithm.
 
An implementation of ''strcat'' might look something like this:
<source lang="c">
char *strcat(char *dest, const char *src) {
// remember the start of the destination string
char *startOfDest = dest;
// make dest point at the null terminator at the end of the destination string
while(*dest) { // <-- EXPENSIVE
dest++;
}
// append the source string to the destination string
for( ; *dest=*src; dest++,src++);
// null terminate the destination string
dest = '\0';
// return the start of the destination string
return startOfDest;
}
</source>
 
The ''strcat'' function is innocuous, unless it is called '''repeatedly'''.
 
Here is an example of how '''not''' to use ''strcat'':
<source lang="c">
// builds a 1 Megabyte string of "word, word, word, ....";
void testPerformanceOfStrcat() {
final int BUFFER_SIZE = 1024*1024;
char word[] = "word";
char buffer[BUFFER_SIZE] = "";
while( strlen(buffer) < BUFFER_SIZE-strlen(word) ) { // <-- EXPENSIVE
if(*buffer) strcat(buffer, ", "); // <-- EXPENSIVE
strcat(buffer, word); // <-- EXPENSIVE
}
}
</source>
 
The estimated cost of ''testPerformanceOfStrcat'' is:
* NOTE: "Ancillary costs" like the cost of making a function call are ignored.
* The string "word, " is 6 characters in length.
* The number of times through the ''while'' loop is int(1024*1024/6) = 174,762
* The cost of ''strlen(buffer)'' in the ''while'' condition is [[Fibonacci_number|fib]](174,762) = 15,270,965,703 cycles.
* The cost of ''strlen(word)'' in the ''while'' condition is 4 cycles per loop = 174,762 * 4 = 699,048
* The cost of finding the-end-dest in ''strcat(buffer, ", ")'' is 15,270,965,703 cycles.
* The cost of finding the-end-dest in ''strcat(buffer, word)'' is 15,270,965,703 cycles.
* The cost of copying the string is 6 cycles per loop = 174,762 * 6 = 1,048,572
* The TOTAL COST of ''testPerformanceOfStrcat'' is 15,270,965,710 * 3 + 699,048 + 1,048,572 = 45,814,644,750 or about 46 billion cycles.
* 46 billion cycles on a 2GHz CPU is about 23 seconds.
 
== The Solution ==
 
The solution to the performance degradation is to remember the-end-of-the-string. The following reimplimentation the traditional ''strcat'' function returns this value, which may then be passed to the '''next''' invocation of ''stringcat'', eliminating the overhead of '''repeatedly''' seeking the concatenation point.
 
<source lang="c">
char *stringcat(char *dest, const char *src) {
// make dest point at the null terminator at the end of the destination string
while(*dest) { // <-- EXPENSIVE, SO AVOID UYSING IT REPEATEDLY.
dest++;
}
// append the source string to the destination string
for( ; *dest=*src; dest++,src++);
// null terminate the destination string
dest = '\0';
// return a pointer to the null terminator at the end of the destination string
return dest;
}
</source>
 
The following test harness excersises the enhanced ''stringcat'' function.
 
<source lang="c">
// builds a 1Meg string of "word, word, word, ...."
char *testPerformanceOfStringcat() {
final int BUFFER_SIZE = 1024*1024;
char word[] = "word";
char nextWord[] = ", word";
char buffer[BUFFER_SIZE] = "";
char *last = buffer+BUFFER_SIZE-strlen(word)-strlen(", "); // <-- Do this once!
char *endOfBuffer = stringcat(buffer, word); // <-- Do this once!
while( endOfBuffer < last ) { // <-- 15 billion times cheaper
endOfBuffer = strcat(endOfBuffer, nextWord); // <-- 30 billion times cheaper
}
}
</source>
 
The estimated cost of ''testPerformanceOfStringcat'' is:
* The number of times through the loop is still int(1024*1024/6) = 174,762
* The overhead of finding the-end-of-dest in the ''while'' condition has been eliminated.
* The overhead of finding the-end-of-dest in ''strcat(endOfBuffer, word)'' has been eliminated
* The cost of copying the string, which is 6 cycles per loop = 174,762 * 6 = 1,048,572
* The TOTAL COST of ''testPerformanceOfStringcat'' is down to 174,762 + 1,048,572 = 1,223,334
* 1.2 million cycles on a 2GHz CPU is 0.6 milliseconds.
 
== References ==
* [http://www.joelonsoftware.com/articles/fog0000000319.html] Joel Spolsky's article "Back to Basics"