Вы находитесь на странице: 1из 16

EXCLUSIVELY FOR

TDWI PREMIUM MEMBERS

Fourth Quarter 2011

ten mistakes
to avoid
In Dimensional Modeling

By Christopher Adamson

1 2 3
4 5 6
7
8 9 10

tdwi.org
ten mistakes to avoid
In Dimensional Modeling
By Christopher Adamson

Foreword
A dimensional model transforms data into information—the
fundamental objective of every business intelligence (BI) program.
Although it has become the de facto standard for data mart
design, common mistakes disrupt this crucial function.
The dimensional model is more capable than is generally
understood. Often pigeonholed as a data model, it is not
exploited as a presentation model in a federated environment, or
as a requirements model. Entire subject areas are closed off by
the common misconception that some things can only be modeled
using entity-relationship techniques.
To attain the full potential of your dimensional model, it is
necessary to master a broad range of principles, understand
how and what to model, and avoid lapsing into habits from other
modeling disciplines.
A dimensional model of your business is an important asset
of your BI program. Maximize its value by avoiding these
10 mistakes.

About the Author


Christopher Adamson develops data warehousing strategies
for his customers, helps them define and prioritize projects,
and designs data architectures. He has taught dimensional
modeling to thousands of students worldwide. His latest book
is Star Schema: The Complete Reference (McGraw-Hill, 2010)
and he blogs at www.starschemacentral.com. Contact him at
cadamson@oaktonsoftware.com.

© 2011 by TDWI (The Data Warehousing InstituteTM), a division of 1105 Media, Inc.
All rights reserved. Reproductions in whole or in part are prohibited except by written
permission. E-mail requests or feedback to info@tdwi.org.
Product and company names mentioned herein may be trademarks and/or registered
trademarks of their respective companies.

tdwi.org   1
Mistake One:
Thinking of the DimensIonal Model as a
Data Model

A dimensional model can deliver business value long before a


database is designed, and even when no data is ever stored
dimensionally. When it is relegated to use as a data model, these
benefits are lost.
The dimensional model represents a business process in
business terms. It describes activities in the same way human
beings do: as a collection of measurements and associated
context. Because the dimensional model is inherently
understandable, it is ideal as the design basis for a user-facing
analytic database. It is best known for its use in this capacity,
yielding the star schemas and cubes with which we are familiar.
The same business focus that makes it an ideal data model
also makes it an ideal requirements model. In specific terms, a
dimensional model specifies how a business process is measured,
describes the metrics themselves, and captures the attendant
dimensional detail. A single requirement stated in dimensional
terms can address scores of business questions, including many
that are yet to be stated.
A dimensional representation is also ideal as a presentation
model for end users, regardless of how the information is actually
stored. It represents how people think, rather than how data
is organized. Early business intelligence tools were built on
this concept, allowing a semantic layer to sit between the user
and OLTP database tables. Today, the concept of dimensional
presentation is once again becoming relevant, as federated
solutions promise the construction of virtual solutions rather
than physical ones.

2   TDWI rese a rch


Mistake Two:
Modeling Answers

A dimensional model can answer questions that were not


anticipated during its development. This is possible because
a dimensional model is not designed as the answer to specific
questions. Instead, it describes the business processes about
which people ask questions. When designers lose sight of this,
the long-term value of the model is compromised.
Successful dimensional modeling does start with business
questions, usually developed through a series of interviews.
These questions are parsed to identify measurements and
their context (facts and dimensions). These are sorted into
measurement groups for discrete processes, which ultimately
translate into stars or cubes.
If modeling stops here, success will be short-lived. The next
steps are essential for creating a solution that will stand the test
of time. Each process must be further scrutinized and compared
to available operational data. Do other measurements describe
the same processes? Is additional dimensional detail available?
Business interviews about order management, for example, may
indicate that measurements of order dollars and quantity are
studied in the context of products, customers, and time. Looking
beyond the questions, you may learn that each time an order is
taken, a calculation of cost and pocket margin is available, and
each order can be associated with a channel and affiliate.
By looking beyond questions, you are able to develop a model
that fully describes the business process. This resulting solution
is better positioned to support changes in the nature and
sophistication of business questions over time.

tdwi.org   3
Mistake Three:
Assuming Some Things Cannot Be
Modeled Dimensionally

A common misconception holds that there are things a


dimensional model simply cannot handle. In fact, the dimensional
model can represent the same real-world complexities that
entity-relationship models can. Here are some of the most
commonly overlooked capabilities of the dimensional model:
Many-to-many relationships: Dimensional models do not
require that each measurement be linked to a single value in
each dimension. Sales transactions, for example, may be linked
to multiple salespeople through a dimension bridge structure.
No metrics: Situations with no apparent measurement can be
represented dimensionally. Factless fact tables model events
such as phone calls, customer contacts, and Web clicks.
Relationships between dimensions: Although dimensions are
not joined to one another directly, the dimensional model can
capture every important relationship. This is done through fact
tables that describe coverage, conditions, or other associations.
Examples include the relationship between primary care
providers and patients, the assignment of projects to managers,
and weather conditions by location.
Repeating attributes: Dimensions can take on multiple values.
For example, every company may have multiple standard industry
classifications. The dimensional model supports this through
bridge tables that link the base dimension (e.g., company) with
the repeating attribute (e.g., industry).
Recursive relationships: Hierarchy bridge tables support the
use of recursive relationships in the analysis of business process.
Examples include parts-breakdown structures and corporate
organization charts.
Subtyping: Variation in the characteristics of a dimension or
associated measurement is handled through core and custom
models. A core model of sales by customer, for example,
summarizes separate custom models for business and
consumer customers.

4   TDWI rese a rch


Mistake Four:
Omitting A Conformance Plan

The concept of conformance is central to the discipline of


dimensional modeling. Although closely linked with Ralph
Kimball’s dimensional bus architecture, conformance is also
essential for data marts in W.H. Inmon’s hub-and-spoke
architecture, and even in standalone data marts. When
conformed dimensions are not planned, incompatibilities emerge
and many important business metrics become inaccessible.
The most powerful business metrics cross functional boundaries
within the enterprise. These compound business metrics
summarize and combine data from multiple processes. To study
this kind of business metric, it is necessary to collect component
measurements from multiple processes and combine them based
on common, conformed dimensions.
For example, a business that performs direct sales activities has
star schemas that track sales calls, proposals, orders, shipments,
and returns. Spanning this value chain are several compound
metrics. Yield is the ratio between sales calls and orders; return
rate is the ratio between shipments and returns.
Compound metrics are studied by fetching each process-
specific metric and combining the results at a common level
of dimensional detail. This may be done at query time (many BI
tools can do this automatically) or as part of an ETL process that
stores the result in a separate star or cube (sometimes called a
second-line data mart).
This is only possible when the dimensional models for each process
share common dimensions. It won’t be possible to study yield by
customer, industry, or geography if the models for sales calls and
orders do not share a common definition for customer. Stovepipes
like this are avoided by planning conformed dimensions.
Conformance is a central feature of your data architecture. It
can only be assured when planned in advance. Many businesses
labor under the false assumption that they can safely design
and build data marts one subject area at a time and “conform
on the fly.” Invariably, this leads to significant rework of
previously implemented data marts, reports, dashboards, and
ETL processes.
(Continues)

tdwi.org   5
(Continued)
The conformance imperative holds true regardless of your data
warehousing architecture. It is necessary both within a subject
area and across data marts. Conformance plans are commonly
illustrated through matrix diagrams that cross-reference
dimensions with major processes or fact tables. This plan is
backed by a fully attributed dimensional design, and mapped to
real data sources.
When dimensional data architecture is designed around
conformed dimensions, the model serves as a blueprint.
Implementation can proceed one subject area at a time,
each snapping into the overall framework without the risk of
incompatibility.

6   TDWI rese a rch


Mistake Five:
Only Modeling Events

Product demos and introductions to dimensional modeling usually


feature a transaction model. This is the most common kind of
dimensional model, but it is not the only one. Some business
processes benefit from additional perspectives, and others
cannot be modeled as transactions at all. Complete solutions
often feature three types of stars or cubes:
Transaction model: A transaction model records measurements
each time an event occurs. Examples include solutions that
record metrics for each account transaction, invoice line, phone
call, or change of status to a document or application.
Periodic snapshot: This model records measurements at
predefined intervals, providing a different perspective on a
business process. A daily snapshot of balances by account, for
example, facilitates study of deposit levels over time, something
that is hard to do with a transaction model. Some processes can
only be studied in snapshot form; levels in reservoirs or power
consumption do not lend themselves to the transaction model,
and it is often impractical to track inventory transactionally.
Accumulating snapshot: This model ties together information
about disparate activities, allowing the business to study elapsed
time between key events. It can be used to study the average
time between placement of an order, initial shipment, complete
fulfillment, and payment. With a transaction model, this would
require difficult and time-consuming queries that correlate
data across processes. Other examples include the time spent
processing applications, claims, or support tickets.

tdwi.org   7
Mistake Six:
Compromising Dimensions

Richly appointed dimensions are a central factor in the long-


term success of a dimensional model. Left to rookie designers,
dimensions often lack features that can provide powerful
advantages. When the following options are overlooked, the lost
opportunities may be large:
Behavioral dimensions: Past activity can provide a valuable
context for the study of key metrics. Behavioral dimensions
recast information originally recorded as facts for use as
dimensions. Identifying categories of customers based on sales,
for example, provides a powerful way to group and analyze
returns, complaints, or satisfaction ratings.
Advanced slow-change techniques: Novice modelers are
familiar with the basic techniques for coping with changes
to operational data—the famous Type 1/2/3 responses. In
many scenarios, these options are not sufficient. Transactional
dimensions apply effective and expiration timestamps,
facilitating point-in-time analysis and easing the loading of
historic data. Hybrid responses allow the solution to support use
of both current and changed values as required. Where supported
by source systems, a single attribute may be subjected to
different response techniques in accordance with the reason for
the change.
Mini-dimensions: When changes cause undesirable growth in
dimension tables, designers may be tempted to scale back
the amount of historic data that is tracked, compromising
analytic capability. A mini-dimension avoids this necessity by
relocating a group of attributes to a different table. Growth in
the original dimension is stemmed, while preserving historic
context of measurements.

8   TDWI rese a rch


Mistake Seven:
Stopping With the Base Schema

Following the best practices of dimensional modeling, architects


work to ensure that information about each business process is
modeled at the most granular, detailed level possible. Captured
in this manner, the measurements can be summarized across
various dimensions as desired. If the modeling stops with
this base schema, however, several kinds of analysis may be
hampered.
Derived schemas take information from the base schema and
reorganize it for specific purposes. A derived schema can expose
metrics that are hard to assemble from the base schema. It may
be employed to facilitate ad hoc analysis, make reports easier to
develop, or improve performance.
Many kinds of derived schemas may appear in a dimensional
model. Examples include:
• Periodic snapshot models, as already seen, are derived from
transaction models to study the effect of transactions
• Accumulating snapshot models correlate events recorded
at different times around a defining dimension, allowing the
study of elapsed time
• Merged models pre-compute the drill-across comparison of
two or more discrete processes, exposing compound metrics in
a summary level star or cube
• Pivoted models transpose the organization of measurements
between rows and columns, simplifying some forms of
reporting
• Sliced models expose subsets of larger data sets to support
security requirements, facilitate data distribution, or address
scaling issues
• Set comparison models store useful comparisons of two or
more base models, such as events and conditions or eligibility
and participation
• Core models bring together data about similar sub-processes,
such as direct sales and corporate sales, at a common level
of detail

tdwi.org   9
Mistake Eight:
Not Considering Source Data

A dimensional model is not complete if it is based solely on


user requirements. At best, the model will omit useful details
not gathered from business interviews. At worst, it will contain
features that are simply not supported by the operational
systems.
Business interviews are a useful starting point for the modeling
effort, but they cannot reveal every fact and dimension of
significance. Someone may reference “customer data” as
important for slicing and dicing yield data, calling out customer
geography and industry classification explicitly. As you learned
from Mistake Two, it will be necessary to look more closely at
available customer data to ensure the model will stand the test
of time. A strong customer dimension may have as many as 200
attributes.
A study of the source data is also essential in distinguishing
a feasible dimensional model from wishful thinking. There are
often key measurement requirements that are not supported by
operational systems. This may not be evident during interviews,
particularly if a workaround process is in place.
It is also important to map each column in the dimensional
design to source data elements, and enumerate the business
rules that must be applied during transformation. Teams focused
on agile development may delay this level of scrutiny until
implementation, but must not postpone steps to identify source
tables, natural key values, and other salient aspects of the
source data.

10   TDWI rese a rch


Mistake Nine:
Applying Entity-Relationship Techniques

For experienced data modelers, the hardest thing about


dimensional modeling is resisting the urge to apply the techniques
of entity-relationship modeling. Useful in the design of solutions
for transaction processing, these techniques hamper solutions for
analytic processing. The most common mistakes include:
Normalizing dimensions: The normalization of an entity-
relationship model helps the DBMS support transaction processing
while maintaining referential integrity. Brand information, for
example, is removed from a product table so that it does not
repeat for each product. A dimensional model supports analytic
processing, with integrity managed through the ETL process.
Normalization harms understandability, complicates access, and
increases the change processing burden on the ETL process.
Abstraction: Entity-relationship modelers look for opportunities
to collapse similar entities into a single concept. Party, for
example, is a common stand-in for various individuals and
organizations. A dimensional model focuses on usability, calling
out important actors in explicit terms to provide context for
process measurement. Commonalities are addressed through
conformance.
Eliminating redundancy: An entity-relationship modeler strives
to eliminate redundancy from the model. Names are broken
down and stored as constituent parts; categories and types are
replaced with lookup codes; binary characteristics are stored as
Boolean flags. A dimensional model stores information as it will
be used. Names are stored as their components and as common
concatenations; codes are stored along with their decoded
values; Boolean values are transformed into descriptive text. This
optimizes these attributes for usability—filtering queries, sorting
or grouping results, and driving master-detail relationships.

tdwi.org   11
Mistake Ten:
Documenting The Wrong Things

The discipline of dimensional modeling has its own unique


principles and vocabulary. Yet when it comes time to document
the model, designers often produce the same artifacts they use
for entity-relationship models. To have long-term value to the
BI program, documentation of a dimensional model must be
organized differently.
A dimensional model should be presented at three levels of
increasing detail. Like zooming in on an interactive street map,
each reveals additional information about the model:
Business requirements: Measurement objectives are grouped
by subject area, linked to business process, stated in terms of
facts and dimensions, and cross-referenced across common
dimensions. These business requirements convey scope in
business terms and link directly to the next level of detail in the
model, which exposes the concept of table.
High-level design: This level rigorously defines important design
elements such as grain, additivity, surrogate vs. natural keys,
and slow change properties. It includes table diagrams, but
does not record every column of every table. These diagrams are
supplemented with conformance matrices, attribute hierarchy
definitions, and diagrams that illustrate the intended use of each
schema. This level is useful for design reviews, educating users
and developers, and describing project activities.
Detailed design: This level exposes every column of every table,
defines data types, provides definitions and sample data, maps
everything back to source data, and documents transformation
rules. This level of detail is useful for database administrators
and ETL architects. It also contains metadata that will be useful
for BI developers and end users.

12   TDWI rese a rch


a b o u t T DW I
TDWI, a division of 1105 Media, Inc., is the
premier provider of in-depth, high-quality education
in the business intelligence and data warehousing
industry. TDWI is dedicated to educating business
and information technology professionals about
the best practices, strategies, techniques, and tools
required to successfully design, build, maintain, and
enhance business intelligence and data warehousing
solutions. TDWI also fosters the advancement of
business intelligence and data warehousing research
and contributes to knowledge transfer and the
professional development of its members. TDWI
offers a worldwide membership program, five
major educational conferences, topical educational
seminars, role-based training, onsite courses,
certification, solution provider partnerships, an
awards program for best practices, live Webinars,
resourceful publications, an in-depth research
program, and a comprehensive Web site, tdwi.org.

1201 Monster Road SW T 425.277.9126


Suite 250 F 425.687.2842
Renton, WA 98057-2996 E info@tdwi.org

tdwi.org

Вам также может понравиться