Data model: Difference between revisions

Content deleted Content added
m +{{Authority control}} (1 ID from Wikidata); WP:GenFixes & cleanup on
cut examplefarlm
 
(9 intermediate revisions by 8 users not shown)
Line 1:
{{Short description|Abstract model}}
{{Short description|Model that organizes elements of data and how they relate to one another and to real-world entities.}}
 
[[File:Data modeling context.svg|thumb|upright=1.6| Overview of a data-modeling context: Data model is based on Data, Data relationship, Data semantic and Data constraint. A data model provides the details of [[information]] to be stored, and is of primary use when the final product is the generation of computer [[software code]] for an application or the preparation of a [[functional specification]] to aid a [[computer software]] make-or-buy decision. The figure is an example of the interaction between [[business process modeling|process]] and data models.<ref name="SS93">Paul R. Smith & Richard Sarfaty Publications, LLC 2009</ref>]]
 
A '''data model'''<ref>{{cite web|title=is UMLan Domain[[abstract Modelingmodel]] -that Stackorganizes Overflow|url=elements https://stackoverflow.com/a/3835214|website=of Stack[[data]] Overflow|publisher=and Stack Exchange Inc.[[Standardization|access-date=standardizes]] 4how Februarythey 2017}}</ref><refrelate name="w3cxpath">{{citeto web|title=one XQueryanother and XPathto Datathe Modelproperties 3.1|url=of https://www.w3.org/TR/xpathreal-datamodel-3/|website=world World Wide Web Consortium (W3C)[[Entity|publisher= W3C|access-date= 4 February 2017}}entities]].</ref><ref name="npmdatamodel">{{cite web|title= DataModel|url = https://wwwcedar.npmjsprinceton.comedu/packageunderstanding-data/datamodel|website=what-data-model npm|publishertitle = npm,What Inc.|access-date=is 4a FebruaryData 2017}}</ref><ref>{{citeModel? web|title=website DataModel (Java EE 6)|url= http://docsprinceton.oracle.com/javaee/6/api/javax/faces/model/DataModel.html|website=edu Java Documentation|publisher= Oracle|access-date = 429 FebruaryMay 20172024}}</ref><ref>{{cite web|last1= Ostrovskiy|first1= Stan|title= iOS:UML ThreeDomain waysModeling to- passStack data from Model to ControllerOverflow|url= https://mediumstackoverflow.com/ios-os-x-developmenta/ios-three-ways-to-pass-data-from-model-to-controller-b47cc72a4336#.ma7pr7no73835214|website= MediumStack Overflow|publisher= AStack MediumExchange CorporationInc.|access-date= 4 February 2017}}</ref> is an [[abstract model]] that organizes elements of [[data]] and [[Standardization|standardizes]] how they relate to one another and to the properties of real-world [[Entity|entities]]. For instance, a data model may specify that the data element representing a car be composed of a number of other elements which, in turn, represent the color and size of the car and define its owner.
 
The corresponding professional activity is called generally ''[[data modeling]]'' or, more specifically, ''[[database design]]''.
Line 26 ⟶ 27:
* "Business rules, specific to how things are done in a particular place, are often fixed in the structure of a data model. This means that small changes in the way business is conducted lead to large changes in computer systems and interfaces".<ref name="MW99"/>
* "Entity types are often not identified, or incorrectly identified. This can lead to replication of data, data structure, and functionality, together with the attendant costs of that duplication in development and maintenance".<ref name="MW99"/>
* "Data models for different systems are arbitrarily different. The result of this is that complex interfaces are required between systems that share data. These interfaces can account for between 25-7025–70% of the cost of current systems".<ref name="MW99"/>
* "Data cannot be shared electronically with customers and suppliers, because the structure and meaning of data has not been standardized. For example, engineering design data and drawings for process plant are still sometimes exchanged on paper".<ref name="MW99"/>
The reason for these problems is a lack of standards that will ensure that data models will both meet business needs and be consistent.<ref name="MW99"/>
Line 42 ⟶ 43:
 
== History ==
One of the earliest pioneering works in modeling information systems was done by Young and Kent (1958),<ref>Young, J. W., and Kent, H. K. (1958). "Abstract Formulation of Data Processing Problems". In: ''Journal of Industrial Engineering''. Nov-Dec 1958. 9(6), pp. 471-479471–479</ref><ref name="JAB07">[[Janis A. Bubenko jr]] (2007) "From Information Algebra to Enterprise Modelling and Ontologies - a Historical Perspective on Modelling for Information Systems". In: ''Conceptual Modelling in Information Systems Engineering''. [[:w:John Krogstie|John Krogstie]] et al. eds. pp 1-181–18</ref> who argued for "a precise and abstract way of specifying the informational and time characteristics of a [[data processing]] problem". They wanted to create "a notation that should enable the [[Systems analyst|analyst]] to organize the problem around any piece of [[computer hardware|hardware]]". Their work was the first effort to create an abstract specification and invariant basis for designing different alternative implementations using different hardware components. The next step in IS modeling was taken by [[CODASYL]], an IT industry consortium formed in 1959, who essentially aimed at the same thing as Young and Kent: the development of "a proper structure for machine-independent problem definition language, at the system level of data processing". This led to the development of a specific IS [[information algebra]].<ref name="JAB07"/>
 
In the 1960s data modeling gained more significance with the initiation of the [[management information system]] (MIS) concept. According to Leondes (2002), "during that time, the information system provided the data and information for management purposes. The first generation [[database system]], called [[Integrated Data Store]] (IDS), was designed by [[Charles Bachman]] at General Electric. Two famous database models, the [[network data model]] and the [[hierarchical data model]], were proposed during this period of time".<ref>Cornelius T. Leondes (2002). ''Database and Data Communication Network Systems: Techniques and Applications''. Page 7</ref> Towards the end of the 1960s, [[Edgar F. Codd]] worked out his theories of data arrangement, and proposed the [[relational model]] for database management based on [[first-order logic|first-order predicate logic]].<ref>''"Derivability, Redundancy, and Consistency of Relations Stored in Large Data Banks"'', E.F. Codd, IBM Research Report, 1969</ref>
Line 52 ⟶ 53:
Bill Kent, in his 1978 book ''Data and Reality,''<ref>{{citation|title=Data and Reality |url=http://www.bkent.net/Doc/darxrp.htm}}</ref> compared a data model to a map of a territory, emphasizing that in the real world, "highways are not painted red, rivers don't have county lines running down the middle, and you can't see contour lines on a mountain". In contrast to other researchers who tried to create models that were mathematically clean and elegant, Kent emphasized the essential messiness of the real world, and the task of the data modeler to create order out of chaos without excessively distorting the truth.
 
In the 1980s, according to Jan L. Harrington (2000), "the development of the [[Object-oriented programming|object-oriented]] paradigm brought about a fundamental change in the way we look at data and the procedures that operate on data. Traditionally, data and procedures have been stored separately: the data and their relationship in a database, the procedures in an application program. Object orientation, however, combined an entity's procedure with its data."<ref name="JLH00">Jan L. Harrington (2000). ''Object-oriented Database Design Clearly Explained''. p.4</ref>
 
During the early 1990s, three Dutch mathematicians Guido Bakema, Harm van der Lek, and JanPieter Zwart, continued the development on the work of [[G.M. Nijssen]]. They focused more on the communication part of the semantics. In 1997 they formalized the method Fully Communication Oriented Information Modeling [[FCO-IM]].
Line 69 ⟶ 70:
: The hierarchical model is similar to the network model except that links in the hierarchical model form a tree structure, while the network model allows arbitrary graph.
; [[Network model]]
: ThisThe [[network model]], also [[graph model]], organizes data using two fundamental constructs, called records and sets. Records (or nodes) contain fields (i.e. attributes), and sets (or edges) define one-to-many, many-to-many and many-to-one relationships between records: one owner, many members. The network data model is an abstraction of the design concept used in the implementation of databases. Network models emphasise interconnectedness, making them ideal for applications where relationships are crucial, like social networks or recommendation systems. This structure allows for efficient querying of relationships without expensive joins.
; [[Relational model]]
: is a database model based on first-order predicate logic. Its core idea is to describe a database as a collection of predicates over a finite set of predicate variables, describing constraints on the possible values and combinations of values. The power of the relational data model lies in its mathematical foundations and a simple user-level paradigm.
Line 207 ⟶ 208:
[[File:Data Flow Diagram Example.jpg|thumb|240px|Data-Flow Diagram example<ref>John Azzolini (2000). [http://ses.gsfc.nasa.gov/ses_data_2000/000712_Azzolini.ppt Introduction to Systems Engineering Practices]. July 2000.</ref>]]
A data-flow diagram (DFD) is a graphical representation of the "flow" of data through an [[information system]]. It differs from the [[flowchart]] as it shows the ''data'' flow instead of the ''control'' flow of the program. A data-flow diagram can also be used for the [[Data visualization|visualization]] of [[data processing]] (structured design). Data-flow diagrams were invented by [[Larry Constantine]], the original developer of structured design,<ref>W. Stevens, G. Myers, L. Constantine, "Structured Design", IBM Systems Journal, 13 (2), 115-139115–139, 1974.</ref> based on Martin and Estrin's "data-flow graph" model of computation.
 
It is common practice to draw a [[System context diagram|context-level data-flow diagram]] first which shows the interaction between the system and outside entities. The '''DFD''' is designed to show how a system is divided into smaller portions and to highlight the flow of data between those parts. This context-level data-flow diagram is then "exploded" to show more detail of the system being modeled
Line 263 ⟶ 264:
* Len Silverston (2001). ''The Data Model Resource Book'' Volume 1/2. John Wiley & Sons.
* Len Silverston & Paul Agnew (2008). ''The Data Model Resource Book: Universal Patterns for data Modeling'' Volume 3. John Wiley & Sons.
* Matthew West (2011) ''[http://store.elsevier.com/product.jsp?isbn=9780123751065 Developing High Quality Data Models]'' Morgan Kaufmann
* Matthew West and Julian Fowler (1999). ''[https://sites.google.com/site/drmatthewwest/publications/princ03.pdf?attredirects=0&d=1 Developing High Quality Data Models]{{Dead link|date=October 2023 |bot=InternetArchiveBot |fix-attempted=yes }}.'' The European Process Industries STEP Technical Liaison Executive (EPISTLE).
* Matthew West (2011) ''[http://store.elsevier.com/product.jsp?isbn=9780123751065 Developing High Quality Data Models]'' Morgan Kaufmann
 
{{Data model}}