A Relentless Book

A relentless collection of more
than 130
tips for designing and
implementing Data
Warehouse projects successfully.
13 March, 2011
Copyright © 2011 Kimbal Group
Data Warehouse Design Tips
The Kimball Group delivers practical techniques that are:
Vendor independent
Reliable,
real-world guidance, not theory
Written by Kimball Group members, the only practitioners certified by Ralph, these
Data
Warehouse design tips and best practices can be used for your Data Warehouse
projects.
TABLE OF CONTENTS
Design Tips 2010
Design Tip #130 Accumulating Snapshots for Complex Workflows
Design Tip #129 Are IT Procedures Beneficial to DW/BI Projects?
Design Tip #128 Selecting Default Values for Nulls
Design Tip #127 Creating and Managing Mini-Dimensions
Design Tip #126 Disruptive ETL Changes
Design Tip #125 Balancing Requirements and Realities
Design Tip #124 Alternatives for Multi-valued Dimensions
Design Tip #123 Using the Dimensional Model to Validate Business Requirements
Design Tip #122 Call to Action for ETL Tool Providers
Design Tip #121 Columnar Databases: Game Changers for DW/BI Deployment?
Design Tip #120 Design Review Dos and Don’ts
Design Tips 2009
Design Tip #119 Updating the Date Dimension
Design Tip #118 Managing Backlogs Dimensionally
Design Tip #117 Dealing with Data Quality: Don’t Just Sit There, Do Something!
Design Tip #116 Add Uncertainty to Your Fact Table
Design Tip #115 Kimball Lifecycle in a Nutshell
Design Tip #114 Avoiding Alternate Organization Hierarchies
Design Tip #113 Creating, Using, and Maintaining Junk Dimensions
Design Tip #112 Creating Historical Dimension Rows
Design Tip #111 Is Agile Enterprise Data Warehousing an Oxymoron?
Design Tip #110 Business Requirements Gathering Dos and Don’ts
Design Tip #109 Dos and Don’ts on the Kimball Forum
Design Tips 2008
Design Tip #108 When is the Dimensional Model Design Done?
Design Tip #107 Using the SQL MERGE Statement for Slowly Changing Dimensions
Design Tip #106 Can the Data Warehouse Benefit from SOA?
Design Tip #105 Snowflakes, Outriggers, and Bridges
Design Tip #104 Upgrading your BI Architecture
Design Tip #103 Staffing the Dimensional Modeling Team
Design Tip #102 Server Configuration Considerations
Design Tip #101 Slowly Changing Vocabulary
Design Tip #100 Keep Your Keys Simple
Design Tip #99 Staging Areas and ETL Tools
Design Tip #98 Focus on Data Stewardship
Design Tips 2007
Design Tip #97 Modeling Data as Both a Fact and Dimension Attribute
Design Tip #96 Think Like A Software Development Manager
Design Tip #95 Patterns to Avoid when Modeling Header/Line Item Transactions
Design Tip #94 Building Custom Tools for the DW/BI System
Design Tip #93 Transactions Create Time Spans
Design Tip #92 Dimension Manager and Fact Provider
Design Tip #91 Marketing the DW/BI System
Design Tip #90 Slowly Changing Entities
Design Tip #89 The Real Time Triage
Design Tip #88 Dashboards Done Right
Design Tip #87 Combining SCD Techniques Having It Both Ways
Design Tips 2006
Design Tip #86
Reference Dimensions for Infrequently-Accessed Degenerates
Design Tip #85
Smart Date Keys to Partition Fact Tables
Design Tip #84
Readers’ Suggestions on Fact Table Surrogate Keys
Design Tip #83
Resist Abstract Generic Dimensions
Design Tip #82
Pivoting the Fact Table with a Fact Dimension
Design Tip #81
Fact Table Surrogate Keys
Design Tip #80
Dimension Row Change Reason Attributes
Design Tip #79
Dangerously Large Dimension Tables
Design Tip #78
Late Arriving Dimension Rows
Design Tip #77
Warning: Summary Data may be Hazardous
Design Tip #76
Advantages of a 64-bit Server
Design Tip #75
Creating the Metadata Strategy
Design Tips 2005
Design Tip #74
Compliance-Enabled Data Warehouses
Design Tip #73
Relating to Agile Methodologies
Design Tip #72
Business Process Decoder Ring
Design Tip #71
Naming Conventions
Design Tip #70
Architecting Data for MS SQL Server 2005
Design Tip #69
Identifying Business Processes
Design Tip #68
Simple Drill-Across in SQL
Design Tip #67
Maintaining Back Pointers to Operational Sources
Design Tip #66
Implementation Analysis Paralysis
Design Tip #65
Document the ETL System
Design Tip #64
Avoid Isolating the DW and BI Teams
Design Tip #63
Building a Change Data Capture System
Design Tips 2004
Design Tip #62
Alternate Hierarchies
Design Tip #61
Handling all the Dates
Design Tip #60
Big Shifts in Business Intelligence
Design Tip #59
Surprising Value of Data Profiling
Design Tip #58
BI Portal
Design Tip #57
Early Arriving Facts
Design Tip #56
Dimensional Modeling for Microsoft Analysis Services
Design Tip #55
Exploring Text Facts
Design Tip #54
Delivering Historical and Current Perspectives
Design Tip #53
Dimension Embellishments
Design Tip #52
Improving Operating Procedures
Design Tip #51
Latest Thinking on Time Dimension Tables
Design Tips 2003
Design Tip #50
Factless Fact Tables
Design Tip #49
Off the Bench about the Bottoms Up Misnomer
Design Tip #48
De-clutter with Junk Dimensions
Design Tip #47
Business Initiatives versus Business Processes
Design Tip #46
Another Look at Degenerate Dimensions
Design Tip #45
Techniques for Modeling Intellectual Capital
Design Tip #44
Reliance on the BI Tool’s Metadata
Design Tip #43
Dealing With Nulls in a Dimensional Model
Design Tip #42
Combining Periodic and Accumulating Snapshots
Design Tips 2002
Design Tip #41
Drill Down into a Detailed Bus Matrix
Design Tip #40
Structure of an Analytic Application
Design Tip #39
Bus Architecture Foundation for Analytic Applications
Design Tip #38
Analytic Application—What's That?
Design Tip #37
Modeling a Pipeline with Accumulating Snapshots
Design Tip #36
To Be or Not To Be Centralized
Design Tip #35
Modeling Time Spans
Design Tip #34
You Don't Need an EDW
Design Tip #33
Using CRM Measures as Behavior Tags
Design Tip #32
Doing the Work at Extract Time
Design Tips 2001
Design Tip #31
Designing a Real Time Partition
Design Tip #30
Put your Fact Tables on a Diet
Design Tip #29
Graceful Modifications to Existing Fact and Dimension Tables
Design Tip #28
Avoiding Catastrophic Failure of the Data Warehouse
Design Tip #27
Being Off-line as Little as Possible
Design Tip #26
Audit Dimensions to Track Lineage and Confidence
Design Tip #25
Dimensional Models for Parent-Child Applications
Design Tip #24
Multinational Dimensional Data Warehouse Considerations
Design Tip #23
Rolling Prediction of the Future
Design Tip #22
Variable Depth Customer Dimensions
Design Tip #21
Declaring the Grain
Design Tip #20
Sparse Facts and Facts with Short Lifetimes
Design Tip #19
Replicating Dimensions Correctly
Design Tip #18
Taking the Publishing Metaphor Seriously
Design Tip #17
Populating Hierarchy Helper Tables
Design Tips 2000
Design Tip #16
Hot Swappable Dimensions
Design Tip #15
Combining SCD Techniques
Design Tip #14
Arbitrary Balance Reporting with Transaction Facts
Design Tip #13
When Fact Tables can be used as Dimensions
Design Tip #12
Accurate Counting with a Dimensional Supplement
Design Tip #11
Accurate Counts within a Dimension
Design Tip #10
Is your Data Correct
Design Tip #9
Processing Slowly Changing Dimensions during Initial Load
Design Tip #8
Perfectly Partioning History with Type 2 SCD
Design Tip #7
Getting your Data Warehouse back on Track
Design Tip #6
Showing the Correlation between Dimensions
Design Tip #5
Surrogate Keys for the Time Dimension
Design Tip #4
Fast Changing Complex Customer Dimensions
Design Tip #3
Focus on Business Process, not Business Departments
Design Tip #2
Multiple Time Stamps
Design Tip #1
Guidelines for an Expressive Clickstream Data Mart
www.kimballgroup.com Number 130, December 1, 2010
Design Tip #130 Accumulating Snapshots for Complex Workflows
By Margy Ross
As Ralph described in Design Tip #37 Modeling a Pipeline with an Accumulating Snapshot,
accumulating snapshots are one of the three fundamental types of fact tables. We often state
that
accumulating snapshot fact tables are appropriate for predictable workflows with well-
established
milestones. They typically have five to ten key milestone dates representing the
workflow/pipeline start,
completion, and the key event dates in between.
Our students and clients sometimes ask for guidance about monitoring cycle performance for a
less
predictable workflow process. These more complex workflows have a definite start and end
date, but
the milestones in between are often numerous and less stable. Some occurrences may skip
over some
intermediate milestones, but there’s no reliable pattern.
Be forewarned that the design for tackling these less predictable workflows is not for the faint of
heart!
The first task is to identify the key dates that will link to role-playing date dimensions. These
dates
represent key milestones; the start and end dates for the process would certainly qualify. In
addition,
you’d want to consider other commonly-occurring, critical milestones. These dates (and their
associated
dimensions) will be used for report and analyses filtering. For example, if you want to see cycle
activity
for all workflows where a milestone date fell in a given work week, calendar month, fiscal period,
or
other standard date dimension attribute, then it should be identified as a key date with a
corresponding
date dimension table. The same holds true if you want to create a time series trend based on
the
milestone date. While selecting specific milestones as the critical ones in a complex process
may be
challenging for IT, business users can typically identify these key milestones fairly readily. But
they’re
often interested in a slew of additional lags which is where things get thorny.
For example, let’s assume there are six critical milestone dates, plus an additional 20 less
critical event
dates associated with a given process/workflow. If we labeled each of these dates
alphabetically, you
could imagine analysts being interested in any of the following date lags:
A-to-B, A-to-C, …, A-to-Z (total of 25 possible lags from event A)
B-to-C, …, B-to-Z (total of 24 possible lags from event B)
C-to-D, …, C-to-Z (total of 23 possible lags from event C)
…
Y-to-Z
Using this example, there would be 325 (25+24+23+…+1) possible lag calculations between
milestone
A and milestone Z. That’s an unrealistic number of facts for a single fact table! Instead of
physically
storing all 325 date lags, you could get away with just storing 25 of them, and then calculate the
others.
Since every cycle occurrence starts by passing through milestone A (workflow begin date), you
could
store all 25 lags from the anchor event A, then calculate the other 300 variations.
Let’s take a simpler example with actual dates to work through the calculations:
Event A (process begin date) - Occurred on November 1
Event B - Occurred on November 2
Event C - Occurred on November 5
Event D - Occurred on November 11
Event E - Didn’t happen
Event F (process end date) - Occurred on November 16
In the corresponding accumulating snapshot fact table row for this example, you’d physically
store the
following facts and their values:
A-to-B days lag - 1
A-to-C days lag - 4
A-to-D days lag - 10
A-to-E days lag - null
A-to-F days lag - 15
To calculate the days lag from B-to-C, you’d take the A-to-C lag value (4) and subtract the A-to-
B lag
value (1) to arrive at 3 days. To calculate the days lag from C-to-F, you’d take the A-to-F value
(15) and
subtract the A-to-C value (4) to arrive at 11 days. Things get a little trickier when an event
doesn’t occur,
like E in our example. When there’s a null involved in the calculation, like the lag from B-to-E or
E-to-F,
the result needs to also be null because one of the events never happened.
This technique works even if the interim dates are not in sequential order. In our example, let’s
assume
the dates for events C and D were swapped: event C occurred on November 11 and D occurred
on
November 5. In this case, the A-to-C days lag is 10 and the A-to-D lag is 4. To calculate the C-
to-D lag,
you’d take the A-to-D lag (4) and subtract the A-to-C lag (10) to arrive at a -6 days.
In our simplified example, storing all the possible lags would have resulted in 15 total facts (5
lags from
event A, plus 4 lags from event B, plus 3 lags from event C, plus 2 lags from event D, plus 1 lag
from
event E). That’s not an unreasonable number of facts to just physically store. This tip makes
more
sense when there are dozens of potential event milestones in a cycle. Of course, you’d want to
hide the
complexity of these lag calculations under the covers from your users, like in a view declaration.
As I warned earlier, this design pattern is not simplistic; however, it’s a viable approach for
addressing a
really tricky problem.
Hi everyone, I'm Neil, one of the educators on this course, standing in for Lead Educator
Alister. Alister's been ill for the last day or two, so let's hope he gets well soon! I'm sure
he'll be here for next week's video but in the meantime, I'm going to talk about some of
the highlights of Week One, and a little about what you have to look forward to in Week
Two.
We've had a great start to the course with thousands of learners from all over the world
and some excellent conversations starting up. There have been 7000 comments on Step
1.1 alone, so if you haven't joined in yet, make sure you do, even if it’s just to say hi! For
any of you who have just started, or who haven't studied on an online course like this
before, I strongly recommend that you watch the video on Step 1.2 - Seven tips for your
FutureLearn Course. One of the best and most useful features of this course is the social
aspect – the ability to chat and share ideas, questions, feedback and more with other
learners around the world. Apart from the fact that you can really learn a lot about the
IELTS test, the actual process of joining in conversations daily, reading and reacting to
other people's comments and of course posting your own, really helps build your
confidence and makes you more comfortable in communicating in English. That in turn can
really help improve your performance, so watch the video on Step 1.2 and try following all
the advice in it. It will really help you get the most out of the course.
In
One student says to another 'Great news, the teacher says we have an exam today come
rain or shine. His friend says 'What's so great about that? ' And the first student says 'It's
snowing outside!'

A Relentless Book

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

A Relentless Book

Загружено:

Авторское право:

Доступные форматы

A relentless collection of more

Вам также может понравиться