Data vault modeling: Difference between revisions

Content deleted Content added
Adding additional Data Vault specific Tool
m Basic notions: heading & description list markup; Fixed incorrect references formatting – You can help!
 
(3 intermediate revisions by 3 users not shown)
Line 42:
 
== Basic notions ==
Data Vault 2.0 organizes data into three core components that separate stable identifiers from changing descriptive attributes:<ref name="linstedt2015">{{cite book |last1=Linstedt |first1=Daniel |last2=Olschimke |first2=Michael |title=Building a Scalable Data Warehouse with Data Vault 2.0 |publisher=Morgan Kaufmann |year=2015 |isbn=9780128025109 |url=https://books.google.com/books/about/Building_a_Scalable_Data_Warehouse_with.html?id=lgDJBAAAQBAJ}}</ref>
Data vault attempts to solve the problem of dealing with change in the environment by separating the business keys (that do not mutate as often, because they uniquely identify a business entity) and the associations between those business keys, from the descriptive attributes of those keys.
 
* '''Hub''' – stores a unique business key for a core business concept together with minimal metadata for lineage/audit; it acts as an integration point across sources.<ref name="linstedt2015" />
The business keys and their associations are structural attributes, forming the skeleton of the data model. The data vault method has as one of its main axioms that real business keys only change when the business changes and are therefore the most stable elements from which to derive the structure of a historical database. If you use these keys as the backbone of a data warehouse, you can organize the rest of the data around them. This means that choosing the correct keys for the hubs is of prime importance for the stability of your model.<ref>[[#dvsuper|Super Charge your data warehouse]], page 61, why are business keys important</ref> The keys are stored in tables with a few constraints on the structure. These key-tables are called hubs.
* '''Link''' – captures the relationship (often many-to-many) between hubs; the participating hub keys define the grain of the relationship.<ref name="linstedt2015" />
* '''Satellite''' – contains descriptive attributes and their history associated with a hub or link; satellites are append-only so every change is preserved (similar in effect to Type-II history in dimensional models).<ref name="linstedt2015" />
 
Specialized satellites support temporal semantics. For example, an '''effectivity satellite''' on a link records begin/end dates representing when the relationship is considered effective by the business.<ref name="linstedt2015" />
 
=== Layers ===
*'''Raw Vault''' – a source-driven integration layer that retains granular, auditable history with minimal transformations.<ref name="linstedt2015" />
*'''Business Vault''' – a derived layer that applies business rules and query-assistance structures (e.g., PIT and bridge tables) to facilitate downstream consumption.<ref name="linstedt2015" />
 
=== Use with dimensional models ===
In practice, Data Vault commonly serves as the historical integration layer, while star-schema information marts are projected from the Raw/Business Vault for performant analytics and simpler user access.<ref name="hultgren">{{cite book |last=Hultgren |first=Hans |title=Modeling the Agile Data Warehouse with Data Vault |year=2012 |publisher=Brighton Hamilton |isbn=9780615723082 |url=https://searchworks.stanford.edu/view/11487017}}</ref><ref name="kimball2013">{{cite web |title=The Data Warehouse Toolkit, 3rd Edition |website=Wiley |url=https://www.wiley.com/en-us/The%2BData%2BWarehous}}</ref>
 
=== Hubs ===