Criticism of the C programming language: Difference between revisions

Content deleted Content added
m moved C programming language, criticism to Criticism of C: and the most-awkward title award goes to...
cleanup lead paragraph and section headers. please find some way to avoid using a <ref> as part of a section title, I'll explain why if it's not obvious.
Line 1:
The '''[[programming language]] [[C (programming language)|C programming language]]''' is a very widely used [[programming language]], minimalistic and [[High and low level (description)|low-level]] by design. Despite its popularity, C's characteristics have led to much '''criticism''' of the language. Some criticisms have arisen from misconceptions or misinterpretations of the C standard, while others have some degree of validity. This article is concerned with the latter.
 
Many beginning programmers have difficulty learning C's syntax and peculiarities, and even many expert programmers find C programs difficult to maintain and debug. A popular saying, repeated by such notable language designers as [[Bjarne Stroustrup]], is that "C makes it easy to shoot yourself in the foot." [http://www.research.att.com/~bs/bs_faq.html#really-say-that] In other words, C permits many operations that are generally not desirable, and thus many simple programming errors are not detected by the compiler and may not even be readily apparent at runtime. This potentially leads to programs with unpredictable behavior and security holes, if sufficient care and discipline are not used in programming and maintenance.
Line 7:
Kernighan and Ritchie made reference to the basic [[design philosophy]] of C in their response to criticism of C not being a strongly-typed language<ref>Brian W. Kernighan and Dennis M. Ritchie: ''The C Programming Language,'' 2<sup>nd</sup> ed., [[Prentice Hall]], 1988, p. 3.</ref>: "Nevertheless, C retains the basic philosophy that programmers know what they are doing; it only requires that they state their intentions explicitly."<ref>{{cite web | author=Dennis Ritchie | url=http://cm.bell-labs.com/cm/cs/who/dmr/chist.html | title=The Development of the C Language | accessdate=2006-07-26}}</ref>
 
===Memory allocation===
One issue to be aware of when using C is that automatically and dynamically allocated objects are not necessarily initialized (depending on what facility is used to allocate memory); they initially have an indeterminate value (typically whatever values are present in the [[Computer storage|memory space]] they occupy, which might not even be a legal [[Bit|bit pattern]] for that type). This value is highly unpredictable and can vary between two machines, two program runs, or even two calls to the same function. If the program attempts to use such an uninitialized value, the results are undefined. Many modern compilers try to detect and warn about this problem, but both [[Type I and type II errors|false positives]] and false negatives occur.
 
Another common problem is that heap memory has to be manually synchronized with its actual usage in any program for it to be correctly reused as much as possible. For example, if an automatic pointer variable goes out of scope or has its value overwritten while still referencing a particular allocation that is not freed via a call to <code>[[malloc|free()]]</code>, then that memory cannot be recovered for later reuse and is essentially lost to the program, a phenomenon known as ''[[memory leak]].'' Conversely, it is possible to release memory too soon, and in some cases continue to be able to use it, but since the allocation system can re-allocate the memory at any time for unrelated reasons, this results in unpredictable behavior, typically manifested in portions of the program far removed from the erroneously written segment. Such issues are ameliorated in languages with [[garbage collection (computer science)|automatic garbage collection]] or [[resource acquisition is initialization|RAII]].
 
===Pointers===
Pointers are a primary source of potential danger. Because they are typically unchecked, a pointer can be made to point to any arbitrary ___location (even within code), causing unpredictable effects. Although properly-used pointers point to safe places, they can be moved to unsafe places using [[Data pointer|pointer arithmetic]]; the memory they point to may be deallocated and reused ([[dangling pointer]]s); they may be uninitialized ([[wild pointer]]s); or they may be directly assigned a value using a cast, union, or through another corrupt pointer. In general, C is permissive in allowing manipulation of and conversion between pointer types, although compilers typically provide options for various levels of checking. Other languages attempt to address these problems by using more restrictive [[reference (computer science)|reference]] types.
 
===Arrays===
Although C supports static arrays, it is not required that array indexes be validated ([[bounds checking]]). For example, one can write to the sixth element of an array with five elements, yielding generally undesirable results. This type of bug, called a ''[[buffer overflow]],'' has been notorious as the source of a number of security problems. On the other hand, since [[bounds checking elimination]] technology was largely nonexistent when C was defined, bounds checking came with a severe performance penalty, particularly in numerical computation. By comparison, a few years earlier some [[Fortran]] compilers had a switch to toggle bounds checking on or off; however, this would have been much less useful for C, where array arguments are passed as simple pointers.
 
Multidimensional arrays are commonly used in numerical algorithms (mainly from applied [[linear algebra]]) to store matrices. The structure of the C array is particularly well suited to this particular task, provided one remembers to count indices starting from 0 instead of 1. This issue is discussed in the book ''[[Numerical Recipes|Numerical Recipes in C]]'', chapter 1.2, page 20''ff'' ([http://www.library.cornell.edu/nr/bookcpdf/c1-2.pdf read online]). In that book there is also a solution based on negative indexing which introduces other dangers. Starting indices at 0 has been assimilated into the computing culture, and is no longer as alien a notion as it seemed when C was first introduced.
 
===Variadic functions===
Another source of bugs is [[variadic function]]s, which take a variable number of arguments. Unlike other prototyped C functions, checking the types of arguments to variadic functions at [[compile-time]] is, in general, impossible without additional information. If the wrong type of data is passed, the effect is unpredictable, and often fatal. Variadic functions also handle null pointer constants in a way which is often surprising to those unfamiliar with the language semantics. For example, NULL must be cast to the desired pointer type when passed to a variadic function. The [[printf]] family of functions supplied by the standard library, used to generate [[formatted text]] output, has been noted for its error-prone variadic interface, which relies on a format string to specify the number and types of trailing arguments.
 
However, [[Type system|type-checking]] of variadic functions from the standard library is a quality-of-implementation issue; many modern compilers do type-check <code>printf</code> calls, producing warnings if the argument list is inconsistent with the format string. Even so, not all <code>printf</code> calls can be checked statically since the format string can be built at runtime, and other variadic functions typically remain unchecked.
 
===Syntax===
Although mimicked by many languages because of its widespread familiarity, C's syntax has been often targeted as one of its weakest points. For example, Kernighan and Ritchie say in the second edition of ''The C Programming Language'', "C, like any other language, has its blemishes. Some of the operators have the wrong precedence; some parts of the syntax could be better."
<!-- Note that these Stroustrup quotes do talk about C++, not C -->
Line 42:
</blockquote>
 
==Economy of expression==
===Economy of expression<ref>The heading of this section is borrowed from the first sentence of the preface to the first edition of Brian W. Kernighan and Dennis M. Ritchie: ''The C Programming Language,'' reprinted in 2<sup>nd</sup> ed., p. xi.</ref>===
One occasional criticism of C is that it can be concise to the point of being cryptic. A [[classic example]] that appears in K&R<ref>Brian W. Kernighan and Dennis M. Ritchie: ''The C Programming Language,'' 2<sup>nd</sup> ed., p. 106. Note that this example fails if the array <code>t</code> be larger than <code>s</code>, a complication that is handled by the safer library function <code>strncpy</code>.</ref> is the following function to copy the contents of string <code>t</code> to string <code>s</code>:
 
Line 73:
In a modern optimising compiler, these two pieces of code produce identical [[Assembly language|assembly code]], so the smaller code does not produce smaller output. In more verbose languages such as [[Pascal programming language|Pascal]], a similar iteration would require several statements. For C programmers, the economy of style is idiomatic and leads to shorter expressions; for critics, being able to do too much with a single line of C code can lead to problems in comprehension.
 
===Internal Consistency=consistency==
Some features of C, its preprocessor, and/or implementation are inconsistent. One of C's features is three distinct classes of non-wide string literals. One is for run-time data, another is for include files with quotation marks around the filename, and the third is for include filenames in angle brackets. The allowed symbol set, and the interpretation of them, is not consistent between the three. To some extent this arose from the need to accommodate a wide variety of file naming conventions, such as [[MS-DOS]]'s use of backslash as a path separator.
 
Line 86:
even though spaces around the minus sign would not otherwise be required.
 
===Standardization===
The C programming language was standardized by [[ANSI]] in 1989 and adopted as an [[ISO]] standard in 1990; the standard has subsequently been extended twice. Some features of the C standard, such as trigraphs and [[Complex number|complex arithmetic]], have been challenged on the ground of questionable user demand. Some major C compilers have not yet become fully conformant to later versions of the C standard.
 
Line 93:
As well, the C standard leaves some code fragments undefined, such as the order of evaluation of arguments to a function, to allow compilers to compile them in whatever way they believe will be optimal. However, this can result in code fragments which behave differently when compiled by different compilers, by different versions of the same compiler, or on different architectures.
 
===Maintenance===
There are other problems in C that don't directly result in bugs or errors, but make it harder for programmers to build a robust, maintainable, large-scale system. Examples of these include:
* A fragile system for importing definitions (<code>#include</code>) that relies on literal text inclusion and redundantly keeping prototypes and function definitions in sync, and increases build times.
Line 99:
* A weak type system that lets many clearly erroneous programs compile without errors.
 
===Tools for mitigating issues with C===
Tools have been created to help C programmers avoid these problems in many cases.
 
Line 114:
Not every programmer needs a tool to deal with every concern. There are many C programmers who aren't prone to memory leaks, know when to check for buffer overflow, have no uncertainty about <code>=</code> vs. <code>==</code>, know the precedence and associativity of operators by heart, never resort to redundant parentheses, use strtok() within its limits, ''etc.''
 
==See alsoNotes==
*[[Comparison of programming languages]]
*[[Programming tool]]s: [[Lint programming tool|lint]], [[Splint (programming tool)|splint]]
*[[Pascal and C]]
 
==Footnotes==
<div class="references-small">
<references/>
</div>
 
==See also==
*[[Pascal and C]]
*[[Comparison of programming languages]]
 
 
[[Category:C programming language]]