Вы находитесь на странице: 1из 16

EXCLUSIVELY FOR TDWI PREMIUM MEMBERS TDWI PREMIUM MEMBERS

Fourth Quarter 2011

ten mistakes to avoid
ten mistakes to avoid
ten mistakes to avoid
ten mistakes to avoid
ten mistakes to avoid
ten mistakes to avoid
ten mistakes to avoid
ten mistakes to avoid
ten mistakes to avoid
ten mistakes to avoid
ten mistakes to avoid
ten mistakes to avoid

ten mistakes to avoid

In Dimensional Modeling

By Christopher Adamson

By Christopher Adamson
By Christopher Adamson
By Christopher Adamson
By Christopher Adamson
By Christopher Adamson
By Christopher Adamson
By Christopher Adamson
By Christopher Adamson
By Christopher Adamson
By Christopher Adamson
By Christopher Adamson
By Christopher Adamson
2011 ten mistakes to avoid In Dimensional Modeling By Christopher Adamson 1 2 3 4 5
1 2 3 4 5 6 7 8 9 10
1
2
3
4
5
6
7
8
9
10

tdwi.org

2011 ten mistakes to avoid In Dimensional Modeling By Christopher Adamson 1 2 3 4 5
ten mistakes to avoid In Dimensional Modeling By Christopher Adamson Foreword A dimensional model transforms
ten mistakes to avoid In Dimensional Modeling By Christopher Adamson Foreword A dimensional model transforms

ten mistakes to avoid

In Dimensional Modeling

By Christopher Adamson

Foreword

A dimensional model transforms data into information—the

fundamental objective of every business intelligence (BI) program. Although it has become the de facto standard for data mart design, common mistakes disrupt this crucial function.

The dimensional model is more capable than is generally understood. Often pigeonholed as a data model, it is not exploited as a presentation model in a federated environment, or as a requirements model. Entire subject areas are closed off by the common misconception that some things can only be modeled using entity-relationship techniques.

To attain the full potential of your dimensional model, it is necessary to master a broad range of principles, understand how and what to model, and avoid lapsing into habits from other modeling disciplines.

A

dimensional model of your business is an important asset

of

your BI program. Maximize its value by avoiding these

10 mistakes.

About the Author

Christopher Adamson develops data warehousing strategies

for his customers, helps them define and prioritize projects, and designs data architectures. He has taught dimensional modeling to thousands of students worldwide. His latest book

is Star Schema: The Complete Reference (McGraw-Hill, 2010)

and he blogs at www.starschemacentral.com. Contact him at cadamson@oaktonsoftware.com.

© 2011 by TDWI (The Data Warehousing Institute TM ), a division of 1105 Media, Inc. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. E-mail requests or feedback to info@tdwi.org.

Product and company names mentioned herein may be trademarks and/or registered trademarks of their respective companies.

MistAke one:

Thinking of The Dimensional moDel as a DaTa moDel

A
A

dimensional model can deliver business value long before a

database is designed, and even when no data is ever stored dimensionally. When it is relegated to use as a data model, these benefits are lost.

The dimensional model represents a business process in business terms . It describes activities in the same way human beings do: as a collection of measurements and associated context. Because the dimensional model is inherently understandable, it is ideal as the design basis for a user-facing analytic database. It is best known for its use in this capacity, yielding the star schemas and cubes with which we are familiar.

The same business focus that makes it an ideal data model also makes it an ideal requirements model . In specific terms, a dimensional model specifies how a business process is measured, describes the metrics themselves, and captures the attendant dimensional detail. A single requirement stated in dimensional terms can address scores of business questions, including many that are yet to be stated.

A dimensional representation is also ideal as a presentation

model for end users, regardless of how the information is actually stored. It represents how people think, rather than how data

is organized. Early business intelligence tools were built on

this concept, allowing a semantic layer to sit between the user and OLTP database tables. Today, the concept of dimensional presentation is once again becoming relevant, as federated

solutions promise the construction of virtual solutions rather than physical ones.

MistAke two:

A a
A
a

moDeling answers

dimensional model can answer questions that were not

anticipated during its development. This is possible because

dimensional model is not designed as the answer to specific

questions. Instead, it describes the business processes about which people ask questions. When designers lose sight of this, the long-term value of the model is compromised.

Successful dimensional modeling does start with business questions, usually developed through a series of interviews. These questions are parsed to identify measurements and their context (facts and dimensions). These are sorted into measurement groups for discrete processes, which ultimately translate into stars or cubes.

If modeling stops here, success will be short-lived. The next

steps are essential for creating a solution that will stand the test

of

time. Each process must be further scrutinized and compared

to

available operational data. Do other measurements describe

the same processes? Is additional dimensional detail available?

Business interviews about order management, for example, may indicate that measurements of order dollars and quantity are studied in the context of products, customers, and time. Looking beyond the questions, you may learn that each time an order is taken, a calculation of cost and pocket margin is available, and each order can be associated with a channel and affiliate.

By looking beyond questions, you are able to develop a model

that fully describes the business process. This resulting solution

is better positioned to support changes in the nature and sophistication of business questions over time.

MistAke three:

MistAke three: assuming some Things CannoT Be moDeleD Dimensionally A common misconception holds that there are

assuming some Things CannoT Be moDeleD Dimensionally

A common misconception holds that there are things a dimensional model simply cannot handle. In fact, the dimensional model can represent the same real-world complexities that entity-relationship models can. Here are some of the most commonly overlooked capabilities of the dimensional model:

Many-to-many relationships: Dimensional models do not require that each measurement be linked to a single value in each dimension. Sales transactions, for example, may be linked to multiple salespeople through a dimension bridge structure.

No metrics: Situations with no apparent measurement can be represented dimensionally. Factless fact tables model events such as phone calls, customer contacts, and Web clicks.

Relationships between dimensions: Although dimensions are not joined to one another directly, the dimensional model can capture every important relationship. This is done through fact tables that describe coverage, conditions, or other associations. Examples include the relationship between primary care providers and patients, the assignment of projects to managers, and weather conditions by location.

Repeating attributes: Dimensions can take on multiple values. For example, every company may have multiple standard industry classifications. The dimensional model supports this through bridge tables that link the base dimension (e.g., company) with the repeating attribute (e.g., industry).

Recursive relationships: Hierarchy bridge tables support the use of recursive relationships in the analysis of business process. Examples include parts-breakdown structures and corporate organization charts.

Subtyping: Variation in the characteristics of a dimension or associated measurement is handled through core and custom models. A core model of sales by customer, for example, summarizes separate custom models for business and consumer customers.

MistAke Four:

MistAke Four: omiTTing a ConformanCe Plan The concept of conformance is central to the discipline of

omiTTing a ConformanCe Plan

The concept of conformance is central to the discipline of dimensional modeling. Although closely linked with Ralph Kimball’s dimensional bus architecture, conformance is also essential for data marts in W.H. Inmon’s hub-and-spoke architecture, and even in standalone data marts. When conformed dimensions are not planned, incompatibilities emerge and many important business metrics become inaccessible.

The most powerful business metrics cross functional boundaries within the enterprise. These compound business metrics summarize and combine data from multiple processes. To study this kind of business metric, it is necessary to collect component measurements from multiple processes and combine them based on common, conformed dimensions.

For example, a business that performs direct sales activities has star schemas that track sales calls, proposals, orders, shipments, and returns. Spanning this value chain are several compound metrics. Yield is the ratio between sales calls and orders; return rate is the ratio between shipments and returns.

Compound metrics are studied by fetching each process- specific metric and combining the results at a common level of dimensional detail. This may be done at query time (many BI tools can do this automatically) or as part of an ETL process that stores the result in a separate star or cube (sometimes called a second-line data mart).

This is only possible when the dimensional models for each process share common dimensions. It won’t be possible to study yield by customer, industry, or geography if the models for sales calls and orders do not share a common definition for customer. Stovepipes like this are avoided by planning conformed dimensions .

Conformance is a central feature of your data architecture. It can only be assured when planned in advance. Many businesses labor under the false assumption that they can safely design and build data marts one subject area at a time and “conform on the fly.” Invariably, this leads to significant rework of previously implemented data marts, reports, dashboards, and ETL processes.

(Continues)

(Continued)

The conformance imperative holds true regardless of your data warehousing architecture. It is necessary both within a subject area and across data marts. Conformance plans are commonly illustrated through matrix diagrams that cross-reference dimensions with major processes or fact tables. This plan is backed by a fully attributed dimensional design, and mapped to real data sources.

When dimensional data architecture is designed around conformed dimensions, the model serves as a blueprint. Implementation can proceed one subject area at a time, each snapping into the overall framework without the risk of incompatibility.

MistAke Five:

MistAke Five: only moDeling evenTs Product demos and introductions to dimensional modeling usually feature a transaction

only moDeling evenTs

Product demos and introductions to dimensional modeling usually feature a transaction model . This is the most common kind of dimensional model, but it is not the only one. Some business processes benefit from additional perspectives, and others cannot be modeled as transactions at all. Complete solutions often feature three types of stars or cubes:

Transaction model: A transaction model records measurements each time an event occurs. Examples include solutions that record metrics for each account transaction, invoice line, phone call, or change of status to a document or application.

Periodic snapshot: This model records measurements at predefined intervals, providing a different perspective on a business process. A daily snapshot of balances by account, for example, facilitates study of deposit levels over time, something that is hard to do with a transaction model. Some processes can only be studied in snapshot form; levels in reservoirs or power consumption do not lend themselves to the transaction model, and it is often impractical to track inventory transactionally.

Accumulating snapshot: This model ties together information about disparate activities, allowing the business to study elapsed time between key events. It can be used to study the average time between placement of an order, initial shipment, complete fulfillment, and payment. With a transaction model, this would require difficult and time-consuming queries that correlate data across processes. Other examples include the time spent processing applications, claims, or support tickets.

MistAke six:

MistAke six: ComPromising Dimensions Richly appointed dimensions are a central factor in the long- term success

ComPromising Dimensions

Richly appointed dimensions are a central factor in the long- term success of a dimensional model. Left to rookie designers, dimensions often lack features that can provide powerful advantages. When the following options are overlooked, the lost opportunities may be large:

Behavioral dimensions: Past activity can provide a valuable context for the study of key metrics. Behavioral dimensions recast information originally recorded as facts for use as dimensions. Identifying categories of customers based on sales, for example, provides a powerful way to group and analyze returns, complaints, or satisfaction ratings.

Advanced slow-change techniques: Novice modelers are familiar with the basic techniques for coping with changes to operational data—the famous Type 1/2/3 responses. In many scenarios, these options are not sufficient. Transactional dimensions apply effective and expiration timestamps, facilitating point-in-time analysis and easing the loading of historic data. Hybrid responses allow the solution to support use of both current and changed values as required. Where supported by source systems, a single attribute may be subjected to different response techniques in accordance with the reason for the change.

Mini-dimensions: When changes cause undesirable growth in dimension tables, designers may be tempted to scale back the amount of historic data that is tracked, compromising analytic capability. A mini-dimension avoids this necessity by relocating a group of attributes to a different table. Growth in the original dimension is stemmed, while preserving historic context of measurements.

MistAke seven:

MistAke seven: sToPPing wiTh The Base sChema Following the best practices of dimensional modeling, architects work

sToPPing wiTh The Base sChema

Following the best practices of dimensional modeling, architects work to ensure that information about each business process is modeled at the most granular, detailed level possible. Captured in this manner, the measurements can be summarized across various dimensions as desired. If the modeling stops with this base schema, however, several kinds of analysis may be hampered.

Derived schemas take information from the base schema and reorganize it for specific purposes. A derived schema can expose metrics that are hard to assemble from the base schema. It may be employed to facilitate ad hoc analysis, make reports easier to develop, or improve performance.

Many kinds of derived schemas may appear in a dimensional model. Examples include:

Periodic snapshot models , as already seen, are derived from transaction models to study the effect of transactions

Accumulating snapshot models correlate events recorded at different times around a defining dimension, allowing the study of elapsed time

Merged models pre-compute the drill-across comparison of two or more discrete processes, exposing compound metrics in a summary level star or cube

Pivoted models transpose the organization of measurements between rows and columns, simplifying some forms of reporting

Sliced models expose subsets of larger data sets to support security requirements, facilitate data distribution, or address scaling issues

Set comparison models store useful comparisons of two or more base models, such as events and conditions or eligibility and participation

Core models bring together data about similar sub-processes, such as direct sales and corporate sales, at a common level of detail

MistAke eight:

A
A

noT ConsiDering sourCe DaTa

dimensional model is not complete if it is based solely on

user requirements. At best, the model will omit useful details not gathered from business interviews. At worst, it will contain features that are simply not supported by the operational systems.

Business interviews are a useful starting point for the modeling effort, but they cannot reveal every fact and dimension of significance. Someone may reference “customer data” as

important for slicing and dicing yield data, calling out customer geography and industry classification explicitly. As you learned from Mistake Two, it will be necessary to look more closely at available customer data to ensure the model will stand the test

of time. A strong customer dimension may have as many as 200

attributes.

A study of the source data is also essential in distinguishing

a feasible dimensional model from wishful thinking. There are

often key measurement requirements that are not supported by operational systems. This may not be evident during interviews, particularly if a workaround process is in place.

It is also important to map each column in the dimensional

design to source data elements, and enumerate the business rules that must be applied during transformation. Teams focused on agile development may delay this level of scrutiny until implementation, but must not postpone steps to identify source tables, natural key values, and other salient aspects of the source data.

MistAke nine:

MistAke nine: aPPlying enTiTy-relaTionshiP TeChniques For experienced data modelers, the hardest thing about dimensional

aPPlying enTiTy-relaTionshiP TeChniques

For experienced data modelers, the hardest thing about dimensional modeling is resisting the urge to apply the techniques of entity-relationship modeling. Useful in the design of solutions for transaction processing, these techniques hamper solutions for analytic processing. The most common mistakes include:

Normalizing dimensions: The normalization of an entity- relationship model helps the DBMS support transaction processing while maintaining referential integrity. Brand information, for example, is removed from a product table so that it does not repeat for each product. A dimensional model supports analytic processing , with integrity managed through the ETL process. Normalization harms understandability, complicates access, and increases the change processing burden on the ETL process.

Abstraction: Entity-relationship modelers look for opportunities to collapse similar entities into a single concept. Party , for example, is a common stand-in for various individuals and organizations. A dimensional model focuses on usability, calling out important actors in explicit terms to provide context for process measurement. Commonalities are addressed through conformance.

Eliminating redundancy: An entity-relationship modeler strives to eliminate redundancy from the model. Names are broken down and stored as constituent parts; categories and types are replaced with lookup codes; binary characteristics are stored as Boolean flags. A dimensional model stores information as it will be used . Names are stored as their components and as common concatenations; codes are stored along with their decoded values; Boolean values are transformed into descriptive text. This optimizes these attributes for usability—filtering queries, sorting or grouping results, and driving master-detail relationships.

MistAke ten:

MistAke ten: DoCumenTing The wrong Things The discipline of dimensional modeling has its own unique principles
MistAke ten: DoCumenTing The wrong Things The discipline of dimensional modeling has its own unique principles

DoCumenTing The wrong Things

The discipline of dimensional modeling has its own unique principles and vocabulary. Yet when it comes time to document the model, designers often produce the same artifacts they use for entity-relationship models. To have long-term value to the BI program, documentation of a dimensional model must be organized differently.

A dimensional model should be presented at three levels of increasing detail. Like zooming in on an interactive street map, each reveals additional information about the model:

Business requirements: Measurement objectives are grouped by subject area, linked to business process, stated in terms of facts and dimensions, and cross-referenced across common dimensions. These business requirements convey scope in business terms and link directly to the next level of detail in the model, which exposes the concept of table.

High-level design: This level rigorously defines important design elements such as grain, additivity, surrogate vs. natural keys, and slow change properties. It includes table diagrams, but does not record every column of every table. These diagrams are supplemented with conformance matrices, attribute hierarchy definitions, and diagrams that illustrate the intended use of each schema. This level is useful for design reviews, educating users and developers, and describing project activities.

Detailed design: This level exposes every column of every table, defines data types, provides definitions and sample data, maps everything back to source data, and documents transformation rules. This level of detail is useful for database administrators and ETL architects. It also contains metadata that will be useful for BI developers and end users.

abou t TDWI

TDWI, a division of 1105 Media, Inc., is the premier provider of in-depth, high-quality education in the business intelligence and data warehousing industry. TDWI is dedicated to educating business and information technology professionals about the best practices, strategies, techniques, and tools required to successfully design, build, maintain, and enhance business intelligence and data warehousing solutions. TDWI also fosters the advancement of business intelligence and data warehousing research and contributes to knowledge transfer and the professional development of its members. TDWI offers a worldwide membership program, five major educational conferences, topical educational seminars, role-based training, onsite courses, certification, solution provider partnerships, an awards program for best practices, live Webinars, resourceful publications, an in-depth research program, and a comprehensive Web site, tdwi.org.

research program, and a comprehensive Web site, tdwi.org. 1201 Monster Road SW T 425.277.9126 Suite 250

1201 Monster Road SW

T

425.277.9126

Suite 250

F

425.687.2842

Renton, WA 98057-2996

E

info@tdwi.org

tdwi.org