Data engineering: Difference between revisions

Content deleted Content added
Tags: Mobile edit Mobile app edit Android app edit App section source
m Expand “Data modeling”: clarify definition; add conceptual/logical/physical levels; cite Simsion & Witt, Date, Chen (1976), Kimball, OMG UML; replace Simplilearn with reliable sources.
Line 50:
 
=== Data modeling ===
{{ main article | Data modelling }}
Data modeling is the analysis and representation of data requirements for an organisation. It produces a data model—an abstract representation that organises business concepts and the relationships and constraints between them. The resulting artefacts guide communication between business and technical stakeholders and inform database design.<ref name="simsionwitt">{{cite book |last1=Simsion |first1=Graeme |last2=Witt |first2=Graham |title=Data Modeling Essentials |edition=4th |publisher=Morgan Kaufmann |year=2015 |isbn=9780128002025}}</ref><ref name="date">{{cite book |last=Date |first=C. J. |title=An Introduction to Database Systems |edition=8th |publisher=Addison-Wesley |year=2004 |isbn=9780321197849}}</ref>
This is the process of producing a [[data model]], an [[abstract model]] to describe the data and relationships between different parts of the data.<ref name="model1">{{cite web |title=What is Data Modelling? Overview, Basic Concepts, and Types in Detail |url=https://www.simplilearn.com/what-is-data-modeling-article |website=Simplilearn.com |access-date=31 July 2022 |date=15 June 2021}}</ref>
 
A common convention distinguishes three levels of models:<ref name="simsionwitt" />
* '''Conceptual model''' – a technology-independent view of the key business concepts and rules.
* '''Logical model''' – a detailed representation in a chosen paradigm (most commonly the relational model) specifying entities, attributes, keys, and integrity constraints.<ref name="date" />
* '''Physical model''' – an implementation-oriented design describing tables, indexes, partitioning, and other operational considerations.<ref name="date" />
 
Approaches include entity–relationship (ER) modeling for operational systems,<ref name="chen1976">{{cite journal |last=Chen |first=Peter P. |title=The Entity–Relationship Model—Toward a Unified View of Data |journal=ACM Transactions on Database Systems |volume=1 |issue=1 |year=1976 |pages=9–36 |doi=10.1145/320434.320440}}</ref> dimensional modeling for analytics and data warehousing,<ref name="kimball">{{cite book |last1=Kimball |first1=Ralph |last2=Ross |first2=Margy |title=The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling |edition=3rd |publisher=Wiley |year=2013 |isbn=9781118530801}}</ref> and the use of UML class diagrams to express conceptual or logical models in general-purpose modeling tools.<ref name="uml">{{cite report |title=Unified Modeling Language (UML) Version 2.5.1 |publisher=Object Management Group |date=2017 |url=https://www.omg.org/spec/UML/2.5.1/}}</ref>
 
Well-formed data models aim to improve data quality and interoperability by applying clear naming standards, normalisation, and integrity constraints.<ref name="date" /><ref name="simsionwitt" />
 
== Roles ==