Data modeling

This is an old revision of this page, as edited by 24.60.169.238 (talk) at 04:34, 26 February 2006. The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

The term data modeling actually refers to two very different things. In the first sense, a data model is a description of the structure of an organization's data, and by implication of the underlying structure of the organization itself. It represents classes of things of significance about which a company wishes to hold information (entity classes), the nature of that information (attributes"), and relationships among those things. The organization of data presented is all about describing the organization and is not concerned with how data might be represented in a computer system.

The entity classes represented can be the tangible things seen by the people in the business, but these tend to be very concrete and subject to change over time. A more robust approach is "conceptual" identifying more fundamental things of significance--of which the things the business sees are examples. For example, an entity class that should appear in every model is PERSON, representing all the people that the organization is concerned with. Entity classes like VENDOR and EMPLOYEE are not appropriate, because each of these describes a role played by a PERSON not the person h'self.

Properly done, a conceptual data model describes the organization's semantics. It is a collection of assertions about the nature of the business. This requires the entity class names to be in English (or French or Polish or whatever), not techno-babble. It also requires discipline in naming relationships so that sentences can be formed from them that represent concrete assertions about the business. One such discipline makes the relationship names prepositions (not verbs) so that they can appear in the sentence Each <<entity 1>> {must be|may be} <<relationship name>> {one and only one|one or more} <<entity 2>>. For example, "Each ORDER must be composed of one or more LINE ITEMS."

The second kind of data model describes the way data would be organized using a database management system or other data management technology. This describes, for example, relational tables and columns or object-oriented classes and attributes. This is sometimes referred to as the "physical" model, but in the original ANSI three schema architecture, this is called "logical". In that world, the physical model describes the storage media (cylinders, tracks, and tablespaces). Ideally, this model will be derived from the more conceptual one just described, if it is to be the basis for a system that will truly serve the organization. It may differ for good and valid reasons, however, since the system designer must now account for things like processing capacity, usage patterns, and the like.

While data analysis is a common term for data modeling, the activity actually has more in common with the ideas and methods of synthesis than it does with taking things apart (the original meaning of analysis). Data modeling strives to bring the data structures of interest together into a cohesive, inseparable, whole by eliminating unnecessary data redundancies and by relating data structures with relationships.

A different approach is through the use of adaptive systems such as artificial neural networks that can autonomously create implicit models of data.

Several techniques have been developed for the design of a data models. While these methodologies guide data modelers in their work, two different people using the same methodology will often come up with very different results. Most notable are:


See also