Content deleted Content added
Mindmatrix (talk | contribs) m Reverted edits by 103.214.118.113 (talk) to last version by MuffinHunter0 |
replace pov templates with how-to template |
||
Line 1:
{{Short description|Data modeling concept}}
{{Multiple issues|
{{Page numbers improve|date=June 2018}}
{{How-to|date=April 2025}}}}
{{Use dmy dates|date=July 2018}}
'''Dimensional modeling''' ('''DM''') is part of the ''[[The Kimball Lifecycle|Business Dimensional Lifecycle]]'' methodology developed by [[Ralph Kimball]] which includes a set of methods, techniques and concepts for use in [[data warehouse]] design.<ref name="ConBegg9"/>{{rp|1258–1260}}<ref name="MoodyKokink-1">{{cite web|url=http://neumann.hec.ca/sites/cours/6-060-00/MK_entreprise.pdf|title=From Enterprise Models to Dimensional Models: A Methodology for Data Warehouse and Data Mart Design|id=Dimensional Modelling|access-date=3 July 2018|first1=Daniel L.|last1=Moody|first2=Mark A.R.|last2=Kortink|url-status=live|archive-url=https://web.archive.org/web/20170517164505/http://neumann.hec.ca/sites/cours/6-060-00/MK_entreprise.pdf|archive-date=17 May 2017|df=dmy-all}}</ref> The approach focuses on identifying the key [[business process]]es within a business and modelling and implementing these first before adding additional business processes, as a [[Top-down and bottom-up design|bottom-up approach]].<ref name="ConBegg9"/>{{rp|1258–1260}} An alternative approach from [[Bill Inmon|Inmon]] advocates a top down design of the model of all the enterprise data using tools such as [[entity-relationship model]]ing (ER).<ref name="ConBegg9">{{cite book|title=Database Systems - A Practical Approach to Design, Implementation and Management|first1=Thomas|last1=Connolly|first2=Carolyn|last2=Begg|publisher=Pearson|isbn=978-1-292-06118-4|edition=6th|at=Part 9 Business Intelligence|date=26 September 2014|df=dmy-all}}</ref>{{rp|1258–1260}}
Line 36 ⟶ 38:
=== Dimension normalization ===
Dimensional normalization or snowflaking removes redundant attributes, which are known in the normal flatten de-normalized dimensions. Dimensions are strictly joined together in sub dimensions.
Line 53 ⟶ 54:
== Benefits of dimensional modeling ==
Benefits of the dimensional model are the following:<ref name="refname5"/>
* Understandability. Compared to the normalized model, the dimensional model is easier to understand and more intuitive. In dimensional models, information is grouped into coherent business categories or dimensions, making it easier to read and interpret. Simplicity also allows software to navigate databases efficiently. In normalized models, data is divided into many discrete entities and even a simple business process might result in dozens of tables joined together in a complex way.
Line 60:
== Dimensional models, Hadoop, and big data ==
We still get the benefits of dimensional models on [[Apache Hadoop|Hadoop]] and similar [[big data]] frameworks. However, some features of Hadoop require us to slightly adapt the standard approach to dimensional modelling.{{cn|date=May 2019}}
* The [[Apache Hadoop#HDFS|Hadoop File System]] is [[Immutable object|immutable]]. We can only add but not update data. As a result we can only append records to dimension tables. [[Slowly changing dimension|Slowly Changing Dimensions]] on Hadoop become the default behavior. In order to get the latest and most up to date record in a dimension table we have three options. First, we can create a [[View (SQL)|View]] that retrieves the latest record using [[Select (SQL)#Window function|windowing functions]]. Second, we can have a compaction service running in the background that recreates the latest state. Third, we can store our dimension tables in mutable storage, e.g. HBase and federate queries across the two types of storage.
|