Naming convention (programming)

This is an old revision of this page, as edited by DEddy (talk | contribs) at 12:42, 20 April 2006 (OF Language). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

In computer programming a naming convention is a set of rules for choosing the character sequence to be used for an identifier.

Reasons for using a naming convention (as opposed to allowing people, eg, programmers, to choose any character sequence) include the following:

  • to provide useful information to a reader, eg, an identifier's type (see: Hungarian notation) or its intended use
  • to enhance clarity (for example by disallowing overly long names or abbreviations);

The choice of naming conventions can be an enormously controversial issue, with partisans of each holding theirs to be the best and others to be inferior.

Business Value of Naming Conventions

While largely hidden from the view of most business users, "good names" when well chosen (there is no single BEST naming convention that works across all software environments) make it significantly easier for subsequent generations of analysts and developers to understand what the system is doing and how to fix or extend the programming code for new business needs.

Example:

a = b * c while programmatically correct is entirely opaque as to intent or meaning.

weeklyPay = hoursWorked * payRate is easy to read and understand (at least for programmers accustomed to such cryptic "words").


Multiple-word identifiers

A common recommendation is "Use meaningful identifiers." A single word may not be as sufficiently meaningful, or specific, as multiple words. As most programming languages do not allow the whitespace in identifiers, a method of delimiting each word is needed (to make it easier for subsequent readers to interpret those character sequences belonging to each word). There are several in widespread use; each with a significant following.

One approach is to delimit separate words with a nonalphanumeric character. The two characters commonly used for this purpose are the hyphen ('-') and the underscore ('_'), eg, the two-word name two words would be represented as two-words or two_words. The hyphen is used by nearly all programmers writing Cobol and Lisp. Many other languages (eg, languages in the C and Pascal families) reserve the hyphen for use as the subtraction operator, and so it is not available for use in identifiers.

An alternate approach is to indicate word boundaries using capitalization, thus rendering two words as either twoWords or TwoWords. The term CamelCase (or camelCase) is sometimes used to describe this technique.

Information in identifiers

There is significant disagreement over whether it is permissible to use short (ie, containing few characters) identifiers. The argument being that it is not possible to encode much information, if any, in a short sequence of characters. Whether programmers prefer short identifiers because they are too lazy to type, or think up, longer identifiers, or because in many situations a longer identifier simply clutters the visible code and provides no worthwhile additional benefit (over a shorter identifier) is an open research issue.

Often forgotten in the highly personal debates over which naming practice is "best" is the fact that a typical business application will actually be written in several software languages. Every software language has its own idiosyncracies as to special meaning characters (perhaps to indicate local vs global variable), separators, and length.

In early software languages, written on punch cards, variable names could be restricted to a maximum 6 characters. It would not be unusual for such cryptically short names (in Fortran) to be migrated to COBOL.

Extra information in identifiers

There are several well known systems for codifying specific technical aspects of a particular identifier in the name; some are listed below. Individual companies, projects and teams sometimes also devise their own such conventions.

  • Perhaps the most well-known is Hungarian notation, which encodes either the purpose ("Apps Hungarian") or the type ("Systems Hungarian") of a variable in its name[1].
  • In Java, very strong conventions established from the beginning by the language's originators require classes and variables to be capitalised differently. Thus, to a Java programmer, widget.expand() and Widget.expand() imply significantly different behaviour, even without prior knowledge of the Widget class and despite the fact that the compiler enforces no such rules.
  • Identifiers representing macros in C and C++ are, by convention, written using only upper case letters.

OF Language

One of the earliest published convention systems was IBM's "OF Language" documented in a 1980s IMS (Information Management System) manual. [reference to be found/supplied]

While largely unheard of today (21st century), the PRIME-MODIFIER-CLASS word scheme produced highly readable (when you're a programmer and accustomed to cryptic abbreviated labels) names like this: CUST-ACT-NO (customer account number).

PRIME words were meant to indicate major "entities" of interest to a system.

MODIFIER words were used for additional refinement, qualification and readability.

CLASS words (in good practice) would be a very short list of data types relevant to a particular application. Common CLASS words might be: NO (number), ID (identifier), TXT (text), AMT (amount), QTY (quantity), FL (flag), CD (code), and so forth. In practice the available CLASS words would be a list of less than two dozen terms.

CLASS words, typically positioned on the right (suffix), serve much the same purpose as Hungarian_notation prefixes.

The purpose of CLASS words, in addition to consistency, is to indicate to the programmer the type of data which provides clues as to what can/should be done to a particular data field. Prior to the acceptance of BOOLEAN (two values only) fields, FL (flag) would indicate a field with only two possible values.

Positional Notation

A style used for very short (8 characters and less) could be: LCCIIL01, where LC would be the application (Letters of Credit), C for COBOL, IIL for the particular process subset, and the 01 a sequence number.

This sort of convention is still in active use in mainframes dependant upon JCL and is also seen in the 8.3 (maximum 8 characters with period separator followed by 3 character file type) MS-DOS style.

  • 100 page pdf that uses linguistics and psychology to attempt a cost/benefit analysis of identifier naming issues