Data modeling: Difference between revisions

Content deleted Content added
Bansp (talk | contribs)
 
(9 intermediate revisions by 7 users not shown)
Line 12:
Data modeling techniques and methodologies are used to model data in a standard, consistent, predictable manner in order to manage it as a resource. The use of data modeling standards is strongly recommended for all projects requiring a standard means of defining and analyzing data within an organization, e.g., using data modeling:
 
* to assist business analysts, programmers, testers, manual writers, IT package selectors, engineers, managers, related organizations and clients to understand and use an agreed -upon semi-formal model that encompasses the concepts of the organization and how they relate to one another
* to manage data as a resource
* to integrate information systems
* to design databases/[[data warehouse]]s (aka data repositories)
 
Data modelingmodelling may be performed during various types of projects and in multiple phases of projects. Data models are progressive; there is no such thing as the final data model for a business or application. Instead, a data model should be considered a living document that will change in response to a changing business. The data models should ideally be stored in a repository so that they can be retrieved, expanded, and edited over time. [[Jeffrey L. Whitten|Whitten]] et al. (2004) determined two types of data modelingmodelling:<ref name="WBD04"/>
* Strategic data modelingmodelling: This is part of the creation of an information systems strategy, which defines an overall vision and architecture for information systems. [[Information technology engineering]] is a methodology that embraces this approach.
* Data modelingmodelling during systems analysis: In [[systems analysis]] logical data models are created as part of the development of new databases.
 
Data modelingmodelling is also used as a technique for detailing business [[requirement]]s for specific [[database]]s. It is sometimes called ''database modelingmodelling'' because a [[data model]] is eventually implemented in a database.<ref name="WBD04">[[Whitten, Jeffrey L.]]; [[Lonnie D. Bentley]], [[Kevin C. Dittman]]. (2005). ''Systems Analysis and Design Methods''. 6th edition. {{ISBN|0-256-19906-X}}.</ref>
 
== Topics ==
Line 37:
 
=== Conceptual, logical and physical schemas ===
[[File:4-2 ANSI-SPARC three level architecture.svg|thumb|320px|The ANSI/SPARC three -level architecture. This shows that a data model can be an external model (or view), a conceptual model, or a physical model. This is not the only way to look at data models, but it is a useful way, particularly when comparing models.<ref name="MW99"/>]]
 
In 1975 [[American National Standards Institute|ANSI]] described three kinds of data-model ''instance'':<ref>American National Standards Institute. 1975. ''ANSI/X3/SPARC Study Group on Data Base Management Systems; Interim Report''. FDT (Bulletin of ACM SIGMOD) 7:2.</ref>
 
* [[Conceptual schema]]: describes the semantics of a ___domain (the scope of the model). For example, it may be a model of the interest area of an organization or of an industry. This consists of entity classes, representing kinds of things of significance in the ___domain, and relationshipsrelationship assertions about associations between pairs of entity classes. A conceptual schema specifies the kinds of facts or propositions that can be expressed using the model. In that sense, it defines the allowed expressions in an artificial "language" with a scope that is limited by the scope of the model. Simply described, a conceptual schema is the first step in organizing the data requirements.
* [[Logical schema]]: describes the structure of some ___domain of information. This consists of descriptions of (for example) tables, columns, object-oriented classes, and XML tags. The logical schema and conceptual schema are sometimes implemented as one and the same.<ref name="RS001"/>
* [[Physical schema]]: describes the physical means used to store data. This is concerned with partitions, CPUs, [[tablespace]]s, and the like.
Line 63:
* Bottom-up models or View Integration models are often the result of a [[reengineering (software)|reengineering]] effort. They usually start with existing data structures forms, fields on application screens, or reports. These models are usually physical, application-specific, and incomplete from an [[enterprise architecture|enterprise perspective]]. They may not promote data sharing, especially if they are built without reference to other parts of the organization.<ref name="SIG97"/>
* Top-down [[logical data model]]s, on the other hand, are created in an abstract way by getting information from people who know the subject area. A system may not implement all the entities in a logical model, but the model serves as a reference point or template.<ref name="SIG97"/>
Sometimes models are created in a mixture of the two methods: by considering the data needs and structure of an application and by consistently referencing a subject-area model. In many environments, the distinction between a logical data model and a physical data model is blurred. In addition, some [[Computer-aided software engineering|CASE]] tools don't make a distinction between logical and [[physical data model]]s.<ref name="SIG97"/>
 
=== Entity–relationship diagrams ===
Line 70:
There are several notations for data modeling. The actual model is frequently called "entity–relationship model", because it depicts data in terms of the entities and relationships described in the [[data]].<ref name="WBD04"/> An entity–relationship model (ERM) is an abstract conceptual representation of structured data. Entity–relationship modeling is a relational schema [[database model]]ing method, used in [[software engineering]] to produce a type of [[conceptual schema|conceptual data model]] (or [[semantic data model]]) of a system, often a [[relational database]], and its requirements in a [[Top-down and bottom-up design|top-down]] fashion.
 
These models are being used in the first stage of [[information system]] design during the [[requirements analysis]] to describe information needs or the type of [[information]] that is to be stored in a [[database]]. The [[data model]]ing technique can be used to describe any [[Ontology (computer science)|ontology]] (i.e. an overview and classifications of used terms and their relationships) for a certain [[Domain of discourse|universe of discourse]] i.e. the area of interest.
 
Several techniques have been developed for the design of data models. While these methodologies guide data modelers in their work, two different people using the same methodology will often come up with very different results. Most notable are:
Line 89:
[[File:HL7 Reference Information Model.jpg|thumb|320px|Example of a Generic data model.<ref>Amnon Shabo (2006). [http://healthit.hhs.gov/portal/server.pt?open=512&objID=1263&mode=2 Clinical genomics data standards for pharmacogenetics and pharmacogenomics] {{Webarchive|url=https://web.archive.org/web/20090722232240/http://healthit.hhs.gov/portal/server.pt?open=512&objID=1263&mode=2 |date=July 22, 2009 }}.</ref>]]
Generic data models are generalizations of conventional [[data model]]s. They define standardized general relation types, together with the kinds of things that may be related by such a relation type.
The definition of the generic data model is similar to the definition of a natural language. For example, a generic data model may define relation types such as a 'classification relation', being a [[binary relation]] between an individual thing and a kind of thing (a class) and a 'part-whole relation', being a binary relation between two things, one with the role of part, the other with the role of whole, regardless the kind of things that are related.
 
Given an extensible list of classes, this allows the classification of any individual thing and to specifyspecification of part-whole relations for any individual object. By standardization of an extensible list of relation types, a generic data model enables the expression of an unlimited number of kinds of facts and will approach the capabilities of natural languages. Conventional data models, on the other hand, have a fixed and limited ___domain scope, because the instantiation (usage) of such a model only allows expressions of kinds of facts that are predefined in the model.
 
=== Semantic data modeling ===
Line 97:
The logical data structure of a DBMS, whether hierarchical, network, or relational, cannot totally satisfy the requirements for a conceptual definition of data because it is limited in scope and biased toward the implementation strategy employed by the DBMS. That is unless the semantic data model is implemented in the database on purpose, a choice which may slightly impact performance but generally vastly improves productivity.
[[File:A2 4 Semantic Data Models.jpg|thumb|320px|Semantic data models.<ref name="FIPS184"/>]]
Therefore, the need to define data from a conceptual view has led to the development of [[semantic data model]]ing techniques. That is, techniques to define the meaning of data within the context of its interrelationships with other data. As illustrated in the figure the real world, in terms of resources, ideas, events, etc., areis symbolically defined by its description within physical data stores. A semantic data model is an [[Abstraction (computer science)|abstraction]] which defines how the stored symbols relate to the real world. Thus, the model must be a true representation of the real world.<ref name="FIPS184"/>
 
The purpose of semantic data modeling is to create a structural model of a piece of the real world, called "universe of discourse". For this, fourthree fundamental structural relations are considered:
* Classification/instantiation: Objects with some structural similarity are described as instances of classes
* Aggregation/decomposition: Composed objects are obtained by joining itstheir parts
* Generalization/specialization: Distinct classes with some common properties are reconsidered in a more generic class with the common attributes
 
Line 110:
* Integration of existing databases
 
The overall goal of semantic data models is to capture more meaning of data by integrating relational concepts with more powerful [[Abstraction (computer science)|abstraction]] concepts known from the [[artificial intelligence]] field. The idea is to provide high -level modeling primitives as integral partparts of a data model in order to facilitate the representation of real -world situations.<ref>"Semantic data modeling" In: ''Metaclasses and Their Application''. Book Series Lecture Notes in Computer Science. Publisher Springer Berlin / Heidelberg. Volume Volume 943/1995.</ref>
 
== See also ==
Line 132:
 
== Further reading ==
* {{cite thesis
* J.H. ter Bekke (1991). ''Semantic Data Modeling in Relational Environments''
|first = Johannes Hendrikus
|last = ter Bekke
|date = 1991-06-04
|title = Semantic Data Modeling in Relational Environments
|degree = PhD
|___location = Technische Universiteit Delft
|url = https://scispace.com/pdf/semantic-data-modeling-in-relational-environments-k59h4x8kip.pdf
|url-status = live
|archive-url = https://web.archive.org/web/20250402180443/https://scispace.com/pdf/semantic-data-modeling-in-relational-environments-k59h4x8kip.pdf
|archive-date= 2025-04-02
|access-date = 2025-04-02
}}
<!-- * J.H. ter Bekke (1991). ''Semantic Data Modeling in Relational Environments'' -->
* John Vincent Carlis, Joseph D. Maguire (2001). ''Mastering Data Modeling: A User-driven Approach''.
* Alan Chmura, J. Mark Heumann (2005). ''Logical Data Modeling: What it is and how to Do it''.