Data modeling

Managing large quantities of data is a primary function of information systems. Data is seen as falling into two general categories: structured and unstructured. Unstructured data (something of a misnomer, as all data being managed by computers requires some base structure in order to be translated into digital format) refers to such things as word processing documents, emails, pictures, digital audio and video, and the like. Structured data is that stored by data management systems such as relational databases. A spreadsheet is a very simple example of structured data.

Structured data is described by data models. In the early phases of a software development project, emphasis will be on the design of a conceptual data model. This can be detailed into a logical data model sometimes called a functional data model. In later stages, this model may be translated into physical data model.

The term data modeling actually refers to two very different things. In the first sense, a data model is a description of the structure of the data within a given ___domain of concern, and by implication of the underlying structure of that ___domain itself. For example, a model may represent classes of things of significance about which a company wishes to hold information (entity classes), the nature of that information (attributes"), and relationships among those things. The organization of data presented is all about describing the organization and is not concerned with how data might be represented in a computer system.

The entity classes represented can be the tangible things seen by the people in the business, but these tend to be very concrete and subject to change over time. A more robust approach is "conceptual" identifying more fundamental things of significance--of which the things the business sees are examples. For example, an entity class that might appear in a given model is PERSON, representing all the people that an organization is concerned with. [NOTE: Except for the word "NOTE" words in all capital letters refer to entity class names.] Entity classes like VENDOR and EMPLOYEE are not appropriate, because each of these describes a role played by a PERSON not the person themself.

Properly done, a conceptual data model describes the organization's semantics. It is a collection of assertions about the nature of the business. This requires the entity class names to be in English (or French or Polish or whatever), not technical terms. It also requires discipline in naming relationships so that sentences can be formed from them that represent concrete assertions about the business. One such discipline makes the relationship names prepositions (not verbs) so that they can appear in the sentence Each <<entity 1>> {must be|may be} <<relationship name>> {one and only one|one or more} <<entity 2>>. For example, "Each ORDER must be composed of one or more LINE ITEMS."

The second kind of data model describes the way data would be organized using a database management system or other data management technology. This describes, for example, relational tables and columns or object-oriented classes and attributes. This is sometimes referred to as the "physical" model, but in the original ANSI three schema architecture, this is called "logical". In that world, the physical model describes the storage media (cylinders, tracks, and tablespaces). Ideally, this model will be derived from the more conceptual one just described, if it is to be the basis for a system that will truly serve the organization. It may differ for good and valid reasons, however, since the system designer must now account for things like processing capacity, usage patterns, and the like.

While data analysis is a common term for data modeling, the activity actually has more in common with the ideas and methods of synthesis (inferring general concepts from particular instances) than it does with analysis (identifying component concepts from more general ones). {Presumably we call ourselves systems analysts because no one can say systems synthesists.} Data modeling strives to bring the data structures of interest together into a cohesive, inseparable, whole by eliminating unnecessary data redundancies and by relating data structures with relationships.

A different approach is through the use of adaptive systems such as artificial neural networks that can autonomously create implicit models of data.

Several techniques have been developed for the design of a data models. While these methodologies guide data modelers in their work, two different people using the same methodology will often come up with very different results. Most notable are:

Entity-relationship diagrams
Object Role Modeling (ORM) or Nijssen's Information Analysis Method (NIAM)
Business rules or business rules approach* Business rules or business rules approach
RM/T
Bachman diagrams* Object-relationship modeling
Artificial neural networks

External links

[1] for articles on the subject.
Data Modelling Tools from DatabaseAnswers.com
SILVERRUN - tools for conceptual, logical and physical data modeling
Article Database Modelling in UML from Methods & Tools
Data Modelling Dictionary

Data modeling

See also

External links