Вы находитесь на странице: 1из 4

Causal (Not Casual) Dimensions

Consider causal dimensions when you want your data warehouse to be more informative.
One of the most interesting and valuable dimensions in a data warehouse is one that explains why a
fact table record exists. In most data warehouses, you build a fact table record when something
happens. For example:
When the cash register rings in a retail store, a fact table record is created for each line item on
the sales ticket. he obvious dimensions of this fact table record are product, store, customer,
sales ticket, and time, as shown in Figure !.
"t a bank "#, a fact table record is created for every customer transaction. he dimensions
of this fact table record are financial service, "# location, customer, transaction type, and
time.
When the telephone rings, the phone company creates a fact table record for each $hook
event.$ " complete call%tracking data warehouse in a telephone company records each
completed call, busy signal, wrong number, and partially dialed call.
In all three of these cases, a physical event takes place, and the data warehouse responds by storing a
fact table record. &owever, the physical events and the corresponding fact table records are more
interesting than simply storing a small piece of revenue. 'ach event represents a conscious decision
by the customer to use the product or the service. " good marketing person is fascinated by these
events. Why did the customer choose to buy the product or use the service at that exact moment( If
we only had a dimension called $Why )id he *ustomer +uy #y ,roduct -ust .ow($ our data
warehouses could answer almost any marketing /uestion. We call a dimension like this a $causal$
dimension, because it explains what caused the event.
0urprisingly, in many cases the available data can build a good approximation of a causal dimension.
his data mas/uerades under headings such as ,romotion, 0tore *ondition, )eal, *ontract, 1ate
*ard, or 1eason. For instance, in a retail environment, a n umber of management decisions are in
effect at any time for a product, including emporary ,rice 1eduction, On "d, or On )isplay. 'ach
of these management decisions arguably affects the volume of sales. #ost of these decisions are
viewed as retail promotions.
"t a bank "#, there may be a .ew "ccount )rive, a ,romotional #ailing, or a +ranch eller
0urcharge. "gain, each of these management decisions affects the volume and the patterns of "#
usage. here may also be exogenous effects on "# usage, such as a national holiday or bad
weather, that are not the result of a human management decision.
he telephone company hook events are similarly $explained$ by causal dimensions such as 1educed
1ate )ialing 0pecials, 2ifeline 1ates, and Off%,eak 3sage Incentives. 0ome of these descriptors can
be found in legacy data in the form of *ontracts, )eals, or 1ate *ards.
One of the best things a data warehouse designer can do is search for and build causal dimensions.
)ata for causal conditions, such as promotions, store conditions, or contracts, is often available
somewhere in the corporate environment but is rarely linked in a clean way to the primary transaction
data feed. 1etail transaction systems are the most likely to have a link to causal data, largely because
retail transaction systems must keep track of price reductions and markdowns. 2ess commonly, the
retail transaction system also keeps track of whether an item is on display or is being advertised. In
some cases, the data warehouse team can ask the production point%of%sale programming staff to add
an advisory data field to the legacy data. he store manage r can fill in the field on a regular basis to
record whether an item is being promoted through displays or advertisements. his kind of business
re%engineering greatly simplifies data extraction for the data warehouse team and improves the power
of the d ata warehouse.
"# transaction and telephone switch usage data almost never contain links to causal data. In these
cases, causal data needs to be merged into the transaction data from an entirely separate source, such
as a marketing promotion system.
" useful causal dimension need not describe every minor variation in a promotion or the store
condition. It may be most useful to build a causal dimension at a reasonably high level, gradually
building up a few hundred types of promotion descriptions or store conditions. Figure 4 shows a
useful causal dimension for a retail store point%of%sale fact table. In this case, the relevant causal
conditions being measured include price treatment, ad description, and display description. " ny
given promotion for a product in a store on a given day will consist of some combination of these
factors. For example, orange 5uice may be discounted today in all of the stores, but only some of the
stores may accompany the discount with a special in %store display. .otice that one of the most
important records in this promotion dimension is the record describing $no promotion.$ #ost of the
products in a store on a given day are probably sold under the $no promotion$ causal condition.
" causal dimension is a kind of advisory dimension that should not change the fundamental grain of a
fact table. 1ecall that the grain of a fact table identifies the meaning of a single fact table record. In
Figure !, the grain of the fact table is the i ndividual line item on a particular customer6s sales ticket. I
stated earlier in this article that the natural dimensions of this fact table are product, store, customer,
sales ticket, and time. If you decide that you can describe each sale more specific ally by a set of
promotion conditions, store conditions, and exogenous conditions, then you can add a special key into
the fact table that points to the relevant combined causal description for each sales record. he
addition of such a key does not chang e the number of fact table records. "ll of the old applications
continue to work, continue to produce exactly the same results, and do not re/uire recoding. his
setup is an example of the robustness of the star 5oin database organi7ation described in my "ugust
!889 DBMS article )angerous ,reconceptions$. In this case, the dangerous preconception is that you
cannot add additional information such as a causal dimension to the design after the data warehouse
becomes opera tional. "s this example illustrates, you can add new dimensions at any time, as long as
you are careful to preserve the original grain. "s I pointed out in "ugust, it is easy to preserve the
original grain if you start with the lowest%level transactions in the business, because in a very
fundamental sense it is not possible to create a more granular view of the business. " sales transaction
is a sales transaction, whether or not you accompany it with fancy causal descriptors.
0ome of you may be bothered by the assumption that the causal dimension $explains$ why the
customer bought the product. Obviously, you never know for sure why anyone buys anything. In
some cases, you can6t even be sure whether the presumed stimulus :the ad or the display; was even
noticed by the customer. For these reasons, causal factors are usually classified as $absolute$ or
$suspected.$ "n absolute causal factor, such as a price reduction, is a factor you know affected some
aspect of the sale, such as the price. " suspected causal factor, such as a newspaper ad or bad
weather, is simply a causal factor that existed at the same time as the sale but may not have been
visible to or even noticed by the customer at the time of purchase. In the long run, it is up to advanced
techni/ues such as data mining to determine if a correlation exists between these suspected causal
factors and any change in sales.
he link between causal factors and business performance leads to the most important business
/uestion surrounding a causal dimension % namely, $Was my promotion profitable($ '/uivalently, we
ask, $)id the promotion :or other causal factor; make any difference($ here are at least three
increasingly sophisticated ways to ask this /uestion. he most basic form of the /uestion is: Was I
profitable while the promotion or other causal factor was happening( he intermediate form of the
/uestion is: What was the lift of the promotion compared to the baseline sales( "nd the most
advanced form of the /uestion is: What were the patterns of cannibali7ation and time shifting as a
result of the promotion( Which other products were affected, and which other product s showed no
effect( In a forthcoming feature article on data mining, I will explain these promotion measures and
describe how a data warehouse is used to answer such /uestions.
he existence of a causal dimension often provokes the $what didn6t happen($ /uestion. For example,
what was on promotion that did not sell( 'ven with a causal dimension, you cannot answer these
/uestions with a fact table that records what did happen. " companion fact table, called a coverage
table, is needed in this case. he set difference between the coverage fact table and the primary fact
table provides the answer. In my 0eptember )+#0 article $Factless Fact ables$, I d escribed the
structure of coverage tables that help us show where causal factors did not produce the results we had
hoped for.
1alph <imball was co%inventor of the =erox 0tar workstation, the first commercial product to use
mice, icons, and windows. &e was vice president of applications at #etaphor *omputer 0ystems, and
is the founder and former *'O of 1ed +rick 0ystems. &e now works as an independent consultant
designing large data warehouses. >ou can reach 1alph through his Internet web page at
http:??www.rkimball.com.
Figure 1
%%" typical star 5oin schema for a set of retail sales transactions. 'ach fact table record is a line item
on a particular customer ticket. In this article, I show how to add a sixth $causal$ dimension that
explains why the sale took place.
Figure 2
%%" useful causal dimension in a retail environment. he new causal key is simply inserted into the
existing fact table without violating the grain of the fact table or changing any existing applications.

Вам также может понравиться