Schlemiel the Painter's algorithm: Difference between revisions

Content deleted Content added
$wgUser (talk | contribs)
farther for physical distance, not further
Rewrote most of the sentences to be clearer
Line 1:
{{Orphan|date=February 2009}}
In software development, a '''Schlemiel the Painter'''['s]<!-- Spolsky uses it both with and without the possessive 's -->''' algorithm''' denotes any methodology that is inefficient because the programmer has overlooked some fundamental issues at the very [[High and low level|lowest levels]] of [[software design]]. The term was coined by software engineer and essayist [[Joel Spolsky]].
 
__TOC__
==Spolsky's analogy==
Spolsky used a [[Yiddish]] joke to illustrate a certain poor programming practice. In the joke, Schlemiel (also rendered Shlemiel) has a job painting the dotted lines down the middle of a road. Each day, Schlemiel paints less than he painted the day before. When askedhe howis that could possiblyasked bewhy, Schlemiel complains that it is because each day he gets farther away from the paint can.<ref name="basics">{{citation|last=Spolsky|first=Joel|title=Back to Basics|date=December 11, 2001|series=Joel on Software|url=http://www.joelonsoftware.com/articles/fog0000000319.html|publisher=joelonsoftware.com}}.</ref>
 
The inefficiency Spolsky was drawing an analogy to wasrefers to the poor programming practice of repeated [[concatenation]] of [[C (programming language)|C]]-style null[[Null character|null]]-terminated character arrays (in general computing parlance, these are known as "[[String (computer science)|strings]]")&mdash; in which the lengthposition of the destination string has to be recomputed from the beginning of the string each time because it is not carried over from a previous concatenation.
 
Spolsky condemned such inefficiencies as typical for programmers who had not been taught basic programming techniques before they began programming using higher level languages: "Generations of graduates are descending on us and creating ''Shlemiel The Painter algorithms'' right and left and they don't even realize it, since they fundamentally have no idea that strings are, at a very deep level, difficult."<ref name="basics" />
Line 14:
 
==Spolsky's example==
The programming practice that Spolsky used to make his point was repeated [[concatenation]] of [[C (programming language)|C]]-style null-terminated character arrays ("strings").<ref name="basics" />
 
The first step in every implementation of the [[C standard library|standard C library]] function for [[concatenation|concatenating]] strings is a determination ofdetermining the length of the string being appended to by checking each character in the array, starting from the beginning, to see if it is the terminating [[Null character|null character]]. In subsequent steps, another string is then copied to the end of the first string, so effectively concatenating the two. At the end of the concatenation, the combined length willof bethe known,combined butstring will beis discarded upon return to the calling code.
 
In Spolsky's example, the "Schlemiels" occur when multiple strings are being concatenated together:
# <code>[[strcat]]( buffer, "John" ); &nbsp; </code>/* Here, the string "John" is appended to the buffer */
# <code>strcat( buffer, "Paul" ); &nbsp; </code>/* Now the string "Paul" is appended to ''that'' */
# <code>strcat( buffer, "George" ); </code>/* ... and the string "George" is appended to ''that'' */
# <code>strcat( buffer, "Ringo" ); &nbsp;</code>/* ... and the string "Ringo" is appended to ''that'' */
ByAfter Paul is finished appending to John, the endlength of "JohnPaul" (or, more precisely, the firstposition invocationof the terminating null character) is known within the [[Scope (programming)|scope]] of <code>strcat()</code> but is discarded upon its return to the lengthpoint ofafter Paul and before George. Afterwards, when <code>bufferstrcat()</code> willis betold knownto butappend notGeorge preservedto upon"JohnPaul", return<code>strcat()</code> tostarts at the pointvery justfirst aftercharacter stepof 1the andarray (which is 'J') all over again just beforeto stepfind 2the terminating null character. SubsequentEach subsequent callscall to <code>strcat()</code> havehas to compute thatthe length again before concatenating another name to the <code>buffer</code>.
 
As such, and analogousAnalogous to Schlemiel's not carrying the paint-bucket (or the string's length) with him, all the subsequent <code>strcat()</code>s have to again "walk" the length of the string to determine where the second string should be copied. As more data is added to it, the string in <code>buffer</code>, that terminating null character also gets longerfarther away from the beginning with each call to <code>strcat()</code>, andmeaning withmore increasingchecks length,must thebe determinationtaken ofto find that lengthcharacter takes longer, which means thatand subsequent calls are increasingly slower. Just&mdash;just as "Schlemiel's" path to his bucket keeps getting longer.
 
The problems illustrated by Spolsky's example remainare hiddennot noticed toby a programmer withwho littleis orusing noa knowledgehigh of the underlying principles and functions, which every higher-level language and libraryhas willlittle stillor beno usingknowledge even whenof the low-levelunderlying manipulationprinciples isand not immediately obvious at the higher levelfunctions. "Some of the biggest mistakes people make even at the highest architectural levels come from having a weak or broken understanding of a few simple things at the very lowest levels."<ref name="basics" />
 
== References ==