Content deleted Content added
→Methodology: ce Tags: Mobile edit Mobile web edit Advanced mobile edit |
m →Basic notions: heading & description list markup; Fixed incorrect references formatting – You can help! |
||
(8 intermediate revisions by 7 users not shown) | |||
Line 12:
== History and philosophy ==
{{Original research
In its early days, Dan Linstedt referred to the modeling technique which was to become data vault as ''common foundational warehouse architecture''<ref>Building a scalable datawarehouse with data vault 2.0, p. 11</ref> or ''common foundational modeling architecture''.<ref>Building a scalable datawarehouse with data vault 2.0, p. xv</ref> In [[data warehouse]] modeling there are two well-known competing options for modeling the layer where the data are stored. Either you model according to [[Ralph Kimball]], with conformed dimensions and an [[Enterprise bus matrix|enterprise data bus]], or you model according to [[Bill Inmon]] with the database [[normal forms|normalized]].<ref>{{
Dan Linstedt, the creator of the method, describes the resulting database as follows:
Line 23:
Another issue to which data vault is a response is that more and more there is a need for complete auditability and traceability of all the data in the data warehouse. Due to [[Sarbanes-Oxley]] requirements in the USA and similar measures in Europe this is a relevant topic for many business intelligence implementations, hence the focus of any data vault implementation is complete traceability and auditability of all information.
''Data Vault 2.0'' is the new specification. It is an [[open standard]].<ref>[[#dvos2|A short intro to#datavault 2.0]]</ref> The new specification consists of three pillars: methodology ([[Software Engineering Institute|SEI]]/[[Capability Maturity Model|CMMI]], [[Six Sigma]], [[Systems development life cycle|SDLC]], etc..), the architecture (amongst others an input layer (data stage, called [[persistent staging area]] in Data Vault 2.0) and a presentation layer (data mart), and handling of data quality services and master data services), and the model. Within the methodology, the implementation of best practices is defined. Data Vault 2.0 has a focus on including new components such as [[big data]], [[NoSQL]] - and also focuses on the performance of the existing model. The old specification (documented here for the most part) is highly focused on data vault modeling. It is documented in the book: Building a Scalable Data Warehouse with Data Vault 2.0.<ref>{{Cite web |title=Building a Scalable Data Warehouse with Data Vault 2.0[Book] |url=https://www.oreilly.com/library/view/building-a-scalable/9780128026489/ |access-date=2024-10-02 |website=www.oreilly.com |language=en}}</ref>
It is necessary to evolve the specification to include the new components, along with the best practices in order to keep the EDW and BI systems current with the needs and desires of today's businesses.
=== History ===
Data vault modeling was originally conceived by Dan Linstedt in the 1990s and was released in 2000 as a public ___domain modeling method. In a series of five articles in The Data Administration Newsletter the basic rules of the Data Vault method are expanded and explained. These contain a general overview,<ref>
An alternative (and seldom used) name for the method is "Common Foundational Integration Modelling Architecture."<ref>[[#dwdummy|Data Warehousing for Dummies]], page 83</ref>
Line 42:
== Basic notions ==
Data Vault 2.0 organizes data into three core components that separate stable identifiers from changing descriptive attributes:<ref name="linstedt2015">{{cite book |last1=Linstedt |first1=Daniel |last2=Olschimke |first2=Michael |title=Building a Scalable Data Warehouse with Data Vault 2.0 |publisher=Morgan Kaufmann |year=2015 |isbn=9780128025109 |url=https://books.google.com/books/about/Building_a_Scalable_Data_Warehouse_with.html?id=lgDJBAAAQBAJ}}</ref>
* '''Hub''' – stores a unique business key for a core business concept together with minimal metadata for lineage/audit; it acts as an integration point across sources.<ref name="linstedt2015" />
* '''Link''' – captures the relationship (often many-to-many) between hubs; the participating hub keys define the grain of the relationship.<ref name="linstedt2015" />
* '''Satellite''' – contains descriptive attributes and their history associated with a hub or link; satellites are append-only so every change is preserved (similar in effect to Type-II history in dimensional models).<ref name="linstedt2015" />
Specialized satellites support temporal semantics. For example, an '''effectivity satellite''' on a link records begin/end dates representing when the relationship is considered effective by the business.<ref name="linstedt2015" />
=== Layers ===
*'''Raw Vault''' – a source-driven integration layer that retains granular, auditable history with minimal transformations.<ref name="linstedt2015" />
*'''Business Vault''' – a derived layer that applies business rules and query-assistance structures (e.g., PIT and bridge tables) to facilitate downstream consumption.<ref name="linstedt2015" />
=== Use with dimensional models ===
In practice, Data Vault commonly serves as the historical integration layer, while star-schema information marts are projected from the Raw/Business Vault for performant analytics and simpler user access.<ref name="hultgren">{{cite book |last=Hultgren |first=Hans |title=Modeling the Agile Data Warehouse with Data Vault |year=2012 |publisher=Brighton Hamilton |isbn=9780615723082 |url=https://searchworks.stanford.edu/view/11487017}}</ref><ref name="kimball2013">{{cite web |title=The Data Warehouse Toolkit, 3rd Edition |website=Wiley |url=https://www.wiley.com/en-us/The%2BData%2BWarehous}}</ref>
=== Hubs ===
Line 191 ⟶ 202:
== Tools ==
Some examples of tools are:{{clarify|date=April 2022}}
* [https://www.datavault4dbt.com DataVault4dbt]
* [https://datavault-builder.com 2150 Datavault Builder]
* [https://www.astera.com/products/data-warehouse-builder/ Astera DW Builder]
|