In computer programming a naming convention is a set of rules for choosing the character sequence to be used for an identifier.
Reasons for using a naming convention (as opposed to allowing people, eg, programmers, to choose any character sequence) include the following:
- to provide useful information to a reader, eg, an identifier's type (see: Hungarian notation) or its intended use
- to enhance clarity (for example by disallowing overly long names or abbreviations);
The choice of naming conventions can be an enormously controversial issue, with partisans of each holding theirs to be the best and others to be inferior.
Business Value of Naming Conventions
While largely hidden from the view of most business users, "good names" when well chosen (there is no single BEST naming convention that works across all software environments) make it significantly easier for subsequent generations of analysts and developers to understand what the system is doing and how to fix or extend the programming code for new business needs.
Example:
a = b * c while programmatically correct is entirely opaque as to intent or meaning.
weeklyPay = hoursWorked * payRate is easy to read and understand (at least for programmers accustomed to such cryptic "words").
Multiple-word identifiers
A common recommendation is "Use meaningful identifiers." A single word may not be as sufficiently meaningful, or specific, as multiple words. As most programming languages do not allow the whitespace in identifiers, a method of delimiting each word is needed (to make it easier for subsequent readers to interpret those character sequences belonging to each word). There are several in widespread use; each with a significant following.
One approach is to delimit separate words with a nonalphanumeric character. The two characters commonly used for this purpose are the hyphen ('-') and the underscore ('_'), eg, the two-word name two words would be represented as two-words or two_words. The hyphen is used by nearly all programmers writing Cobol and Lisp. Many other languages (eg, languages in the C and Pascal families) reserve the hyphen for use as the subtraction operator, and so it is not available for use in identifiers.
An alternate approach is to indicate word boundaries using capitalization, thus rendering two words as either twoWords or TwoWords. The term CamelCase is sometimes used to describe this technique.
Information in identifiers
There is significant disagreement over whether it is permissible to use short (ie, containing few characters) identifiers. The argument being that it is not possible to encode much information, if any, in a short sequence of characters. Whether programmers prefer short identifiers because they are too lazy to type, or think up, longer identifiers, or because in many situations a longer identifier simply clutters the visible code and provides no worthwhile additional benefit (over a shorter identifier) is an open research issue.
Often forgotten in the highly personal debates over which naming practice is "best" is the fact that a typical business application will actually be written in several software languages. Every software language has its own idiosyncracies as to special meaning characters (perhaps to indicate local vs global variable), separators, and length.
In early software languages, written on punch cards, variable names could be restricted to a maximum 6 characters. It would not be unusual for such cryptically short names (in Fortran) to be migrated to COBOL.
There are several well known systems for codifying specific technical aspects of a particular identifier in the name. Perhaps the most well-known is Hungarian notation, which encodes the type of a variable in its name. Several more minor conventions are widespread; one example is the convention of excluding the use of lowercase letters in identifiers representing macros in C and C++.
OF Language
One of the earliest published convention systems was IBM's "OF Language" documented in a 1980s IMS manual. [reference to be found/supplied]
While largely unheard of today (21st century), the PRIME-MODIFIER-CLASS word scheme produced highly readable (if you're a COBOL programmer) names like this: CUST-ACT-NO (customer account number).
In good practice, the CLASS words would be a very short list of data types relevant to a particular application. Common CLASS words might be: NO (number), ID (identifier), TXT (text), AMT (amount), QTY (quantity), FL (flag), CD (code), and so forth.
(see also Hungarian_notation)
Positional Notation
A style used for very short (8 characters and less) could be: LCCIIL01, where LC would be the application (Letters of Credit), C for COBOL, IIL for the particular process subset, and the 01 a sequence number.
This sort of convention is still in active use in mainframes dependant upon JCL and is also seen in the 8.3 (maximum 8 characters with period separator followed by 3 character file type) MS-DOS style.
External links
- 100 page pdf that uses linguistics and psychology to attempt a cost/benefit analysis of identifier naming issues