Вы находитесь на странице: 1из 4

What is Data Vault modelling?

The standard architecture for building enterprise data warehouses over the past 20 years has been a
combination of third normal form (3NF) (Inmon) and Dimensional Modelling (Kimbal). When data
warehouses were first built, businesses opted for a big bang approach and tried to build the entire
warehouse prior to producing reports. This took a long time, was very expensive and didn’t offer much
immediate value.

Over time, companies and organizations have generally adopted a more incremental approach and built
parts of the warehouse as the business areas demanded it. The data explosion over recent years has seen
a demand for information to be collected and crunched at a faster pace, leaving the traditional methods
unable to keep pace with the changing data landscape.

Data vault modelling is a more recent hybrid approach, which gathers data from multiple sources and is
specifically designed to be resilient to environmental changes. Dan Linstedt, the creator of this methodology,
describes it as:

“A detail oriented, historical tracking and uniquely linked set of normalized tables that support one or more
functional areas of business. The design is flexible, scalable, consistent and adaptable to the needs of the
enterprise.”

Data Vault is very prescriptive in the use of tables and attributes, but is built from simple, easily-understood
building blocks which all members of the team can easily converse around, understand, adapt and conform
to. This reduces the dependence on a modelling standard arbiter, often a bottleneck, and decomposes the
problem into manageable pieces.

Most important is that the building blocks themselves are simple enough to easily be modified and
recombined in place to deal with inevitable change.

What does it look like?


Data vault defines three classes of tables: hubs, links and satellites.

► Hubs are entities which hold unique lists of business keys


► Links are unique lists of associations or relationships between business keys
► Satellites hold the descriptive historical data about the business keys or associations.

An example of the power of the Data Vault approach, is that these concepts are defined so that all the hubs
can be populated in parallel, then all the links in parallel, then all the satellites in parallel, making a simple
and rapid batch easily available.

A member firm of Ernst & Young Global Limited 1 of 4


Liability limited by a scheme approved under Professional Standards Legislation
Why should it be used?
Flexibility
The separation of the business keys in the Hubs and the business relationships in the Links makes the Data
Vault model robust over time. When the business model changes from a one-to-many to a many-to-many
(e.g. more than one sales representative can now get commission on an order), this is not a problem for
Data Vault since all relationships are modelled as a possible many-to-many structure by the Link tables.

System of record
Data vault separates Hubs and Links from the database content which is held in Satellite tables. Each
attribute change in the operational systems is written as a new time stamped row to a satellite table. This
means all changes in the operational systems are captured in the Data Vault model and data for any point in
time can be extracted.

Performance
The vertical partitioning of data into Hubs, Links and Satellites is fundamental to the Data Vault architecture
and enables parallel processing of data and therefore shorter load cycles. This means the Data Vault
architecture handles near real-time loading better than any other architecture. New Hubs, Links and
Satellites records are inserted, records are not updated; this is more efficient and therefore faster.

Focuses on business value


The bulk of the development work focuses on the reporting end of the data cycle rather than on the upfront
work of getting data into the actual data warehouse. This means you can decide what you wish to report on
and then focus your efforts there delivering immediate, high business value.

Works effectively with an Agile delivery methodology


Data vault focuses your effort onto the activity (or report) that delivers the most business value right now and
therefore, works seamlessly with an Agile delivery approach. This ensures your development team/s are
focusing on the highest value tasks at all times.

How do I test this inside my organization?


Data vault is easy to introduce and you can experiment using a corner of your existing data warehouse. It
conforms to relational norms and will easily join to legacy entities and modelling styles.

Our experience is that modelers, architects and ETL developers all appreciate the opportunity to explore
potential productivity-improving modelling and development techniques and, if managed well, the experiment
itself will build knowledge and distinctions that will improve your design under other modelling regimes.

Formal training and certification paths are available in Australia (through C3 and other specialist training
providers). This is valuable for introducing your team to the initial concepts. The modelling method is very
well documented in text and videos on http://learndatavault.com.

There are also technical advice forums and support from the community of Data Vault modelers through
LinkedIn and other technical-social platforms.

A member firm of Ernst & Young Global Limited 2 of 4


Liability limited by a scheme approved under Professional Standards Legislation
Next steps
Data vault focuses effort on delivering business value rather than on the technical architecture and rapidly
adapts to a changing business environment. It shifts the bulk of the work to the reporting end of the cycle
rather than on the upfront work of getting data into the data warehouse, so data load time is dramatically
reduced.

The concept of ‘late binding’ where data isn’t integrated until it’s needed is gathering momentum. Indeed
Data Vault is a good option between 3NF rigidity and extreme flexibility (Hyper Generalization).

Using the Data Vault methodology enables you to work on projects with ‘high business value’ rather than
doing all of the work and then determining what you want to use. Finally, Data Vault works well with an Agile
approach because it slices the data into smaller parts and allows for new data sources to be added without
impacting the existing design.

So if you’re considering what modelling methods to use in your organization to extract the relevant
information from your data warehouse in less time than traditional methods, contact us to find out more about
Data Vault and how it can help you.

A member firm of Ernst & Young Global Limited 3 of 4


Liability limited by a scheme approved under Professional Standards Legislation
EY | Assurance | Tax | Transactions | Advisory

About EY
EY is a global leader in assurance, tax, transaction and
advisory services. The insights and quality services we
deliver help build trust and confidence in the capital markets
and in economies the world over. We develop outstanding
leaders who team to deliver on our promises to all of our
stakeholders. In so doing, we play a critical role in building a
better working world for our people, for our clients and for
our communities.
EY refers to the global organization, and may refer to one or
more, of the member firms of Ernst & Young Global Limited,
each of which is a separate legal entity. Ernst & Young
Global Limited, a UK company limited by guarantee, does
not provide services to clients. For more information about
our organization, please visit ey.com.

© 2016 Ernst & Young Australia.


All Rights Reserved.

This communication provides general information which is current at


the time of production. The information contained in this
communication does not constitute advice and should not be relied
on as such. Professional advice should be sought prior to any action
being taken in reliance on any of the information. Ernst & Young
disclaims all responsibility and liability (including, without limitation,
for any direct or indirect or consequential costs, loss or damage or
loss of profits) arising from anything done or omitted to be done by
any party in reliance, whether wholly or partially, on any of the
information. Any party that relies on the information does so at its
own risk. Liability limited by a scheme approved under Professional
Standards Legislation.

eyc3.com
ey.com/analytics

Contact details:
analytics@eyc3.com

EYC3 creates intelligent client


organizations using data & advanced
analytics.
Our team of data scientists, analysts,
developers, business consultants and
industry experts work with clients at all
stages of their information evolution.

A member firm of Ernst & Young Global Limited 4 of 4


Liability limited by a scheme approved under Professional Standards Legislation