Data vault modeling: Difference between revisions

Content deleted Content added
m Duplicate word removed
Fixed typo
Tags: Mobile edit Mobile web edit
Line 4:
'''Data vault modeling''', also known as '''common foundational warehouse architecture'''<ref>Building a scalable datawarehouse with data vault 2.0, p. 11</ref> or '''common foundational modeling architecture''',<ref>Building a scalable datawarehouse with data vault 2.0, p. xv</ref> is a [[database]] modeling method that is designed to provide long-term historical storage of [[data]] coming in from multiple operational systems. It is also a method of looking at historical data that deals with issues such as auditing, tracing of data, loading speed and [[Resilience (organizational)|resilience]] to change as well as emphasizing the need to [[Audit trail|trace]] where all the data in the database [[Data lineage|came from]]. This means that every [[Row (database)|row]] in a data vault must be accompanied by record source and load date attributes, enabling an auditor to trace values back to the source. The concept was published in 2000 by [[Dan Linstedt]].
 
Data vault modeling makes no distinction between good and bad data ("bad" meaning not conforming to business rules).<ref>[[#dvsuper|Super Charge your data warehouse]], page 74</ref> This is summarized in the statement that a data vault stores "[[Single source of truth|a single version of the facts]]" (also expressed by [[Dan Linstedt]] as "all the data, all of the time") as opposed to the practice in other data warehouse methods of storing "a [[single version of the truth]]"<ref>[[#rdamhof1|The next generation EDW]]</ref> where data that does not conform to the definitions is removed or "cleansed". A data vault enterprise data warehouse provides both; a single verdionversion of facts and a single source of truth.<ref>Building a scalable datawarehouse with data vault 2.0, p. 6</ref>
 
The modeling method is designed to be resilient to change in the business environment where the data being stored is coming from, by explicitly separating [[Data structure|structural information]] from descriptive [[Attribute (computing)|attributes]].<ref>[[#dvsuper|Super Charge your data warehouse]], page 21</ref> Data vault is designed to enable [[Parallel computing|parallel]] loading as much as possible,<ref>[[#dvsuper|Super Charge your data warehouse]], page 76</ref> so that very large implementations can scale out without the need for major redesign.