Вы находитесь на странице: 1из 23

SIEBEL SYSTEMS, INC.

Siebel Analytics
7/7.5
MetaData
Construction
Guidelines
Version 1.0
DRAFT

Revision History:
Date Author Description

October 28, Jeff McQuigg Original creation, incomplete draft


2002
November 11, Kurt Wolff Comments
2002

November 12 Jeff McQuigg Revision 1, draft

November 15 Jeff McQuigg Revision 2, draft

November 25 Jeff McQuigg Revision 3, draft, input from Paul


Benedict and Kurt Wolff

February 17, Jeff McQuigg Version 1


2003

5/11/2008 Siebel Systems, Inc. Confidential and subject to change Page i


DRAFT

Table of Contents
REVISION HISTORY: ..................................................................................... I

TABLE OF CONTENTS ................................................................................... II

EXECUTIVE SUMMARY .................................................................................. 1

ANALYTICS METADATA BEST PRACTICES ..................................................... 2

Siebel Analytics 7.x MetaData Modeling Best Practices .......................................2


Data Warehousing Best Practices for Siebel Analytics 7.x ...................................3
SIEBEL ANALYTICS META DATA MODELING SCENARIOS .............................. 4

Circular Joins .............................................................................................4

Multi-Fact table Metrics and Reports..............................................................6

Dimensional Extension Tables ......................................................................7

Fact Extension Tables..................................................................................8

Combo Tables ............................................................................................9

No direct physical link between a base Dimension and a Fact table .................11

Fragmentation..........................................................................................13

Review Presentation Layer Aliases ..............................................................16

Many-to-Many Solutions ............................................................................ 16

DATA WAREHOUSE ARCHITECTURAL RECOMMENDATIONS ........................ 19

Simple Dimensional Attribute Denormalization .............................................19

Use of ROW_WIDs for LOVs .......................................................................19

Ensure Full Referential Integrity in the Physical Data Model............................ 20

Convert Non-Indexable Common Filters to Indexable Filters via ETL ............... 20

5/11/2008 Siebel Systems, Inc. Confidential and subject to change Page ii


DRAFT

Executive Summary
This document provides Siebel recommended Best Practices and several specific
scenario solution details regarding Physical Data Modeling and MetaData
Construction in Siebel Analytics 7.x. It is intended to be used as a reference guide
for any Analytics project which requires development of the Analytics Repository,
both for Siebel OOB Applications and Stand Alone. The recommendations presented
in this document are limited to the three layers in the MetaData – Physical, Business
and Presentation, and does not address other Analytics areas such as Report
creation, server tuning, security, installation, or other such areas. Additionally,
several key Physical Data Modeling recommendations are reviewed which if
implemented, may reduce the complexity of the Analytics MetaData.

The document contains the following sections:

Analytics MetaData Best Practices, which briefly overviews some best practices
regarding overall Data Warehouse design and Siebel Analytics Meta Data
construction.

Siebel Analytics Meta Data Modeling Scenarios, which presents a series of


problems and issues along with detailed solutions

Data Warehouse Architectural Recommendations, which reviews some design


considerations when modifying or building the base data model.

5/11/2008 Siebel Systems, Inc. Confidential and subject to change Page 1


DRAFT

Analytics MetaData Best Practices


This section contains several best practices for Data Warehouses as they relate to
the Siebel Analytics 7.x MetaData. Many of the specifics behind the implementation
of some of these Best Practices are discussed in greater detail in the following
section.

Siebel Analytics 7.x MetaData Modeling Best Practices


 Create the Business Model with 1:M complex joins between Logical Dimension
Tables and the Facts. The Business Model should ideally resemble a simple star
schema – facts surrounded by several dimensions that link directly into them. By
modeling a snowflake, more flexibility is allowed, but may create more columns
in the presentation layer.
 For Analytics version 7.0.x, map all Physical Fact sources to one Logical Fact
table. Version 7.5.2+ can support multiple logical Fact tables, but it is
recommended to do so only when entirely new Fact sources are added to an
existing model. By doing so, identification of major additions to the MetaData
will become easier. Modifications to existing Fact tables or aggregates should be
added into the same Facts Logical Table.
 Aggregate sources should be created as a separate Source within the single
Logical Fact Table. Their Aggregation Content in the Content tab should describe
which dimensions and at which levels they correspond.
 Combine all like dimensional attributes into one logical dimension table. Where
needed, include data from other dimensions into the main dimension source via
the use of aliases in the Physical Layer. Ideally this should occur during the ETL
for optimal performance.
 Every logical dimension table should have a dimensional hierarchy associated
with it. Ensure that all appropriate Fact sources link to the proper level in the
hierarchy via Aggregation Content in the Content tab.
 Eliminate all physical joins that cross dimensions with the use of aliases (Inter-
Dimensional circular joins).
 Eliminate all Circular Joins within a dimension in the Physical Model via the
creation of physical table aliases (Intra-Dimensional Circular Joins).
 To aid in reducing lurking physical joins, Import the Physical Data Model without
FKs, and create them as needed. As an added management practice, use aliases
for all tables that are used in the Logical layer – doing so will allow easy
identification of which physical tables are used and which are not used.
 Physically model Fact Extension tables to their base tables via 1:M FK joins, and
include them in the existing source for the Logical Table. In certain cases the
Extension table may be joined directly to the fact table to eliminate an additional
join, improving performance.
 Physically model Dimension Extension tables to their base tables via a 1:M FK
joins, and included them in the existing source for the logical table. Additionally,
create a source for just the Dimension _DX table, and create a 1:M physical join
between it and the Fact tables it applies to. Note that the PK for both the
Dimension Base table and the Dimension Extension table are identical, and the
relationship is required to be 1:1. Thus, although a circular join will occur in
certain instances, it does not alter the record set or negatively impact
performance.

5/11/2008 Siebel Systems, Inc. Confidential and subject to change Page 2


DRAFT

 Save backups of the online repository before and after every completed unit of
work. If needed, use File | Copy As to make an offline copy with the changes you
have made.
 Ensure the MetaData is generating the correct record set first, then focus on
performance tuning activities.
 Ensure that aliases for presentation layer columns and tables are not used unless
necessary. Verify that reports do not use the aliases. Be mindful that renaming
an element at either the Presentation or Logical layer will cause this to occur.
 When creating table aliases in the Physical Layer, keep the original table name,
followed by the alias name, for example W_DAY_D Hire Date. This will keep all
like tables together in the Physical layer window when tables are displayed
alphabetically. Note this is the opposite of how the MetaData was developed.
 Opaque Views (A Physical Layer table that consists of a Select statement) should
be used only as a last resort option. Ideally a physical table should be created,
or alternatively, a database view.
 In general, push as much processing to the database as possible. This includes
tasks such as filtering, string manipulation and additive measures.
 Ensure that all levels of a hierarchy contain an appropriate value for the Number
of elements at this level field. Fact sources are selected on a combination of the
fields selected as well as the levels in the dimensions that they link into. By
adjusting these values, you can alter the fact source that Analytics will select.

Data Warehousing Best Practices for Siebel Analytics


7.x
 Denormalize data into _DX tables via the ETL process to reduce runtime joins to
other tables
 Join to _WID values instead of codes or names
 Create new fact tables to support requirements when existing fact tables do not
adequately meet the dimensional needs
 Create new fact tables or use the _FX to physically store links to other
dimensions when not in the existing data model
 Move as much of the query logic to the ETL as possible to improve system
response time. Pre-calculation of additive metrics and attributes will reduce
query complexity and therefore response time.
 The Physical Data Model should more closely resemble the Analytics Meta Data
(Star/ Snowflake) instead of an OLTP system (approximately 3NF). When the
Physical model becomes more like the underlying transactional model,
performance problems will most likely arise.
 Avoid Coded Records, where the meaning of a record or field changes depending
upon the value of a field. An example of this would be if joins to the W_LOV_D
table were done on Code and type, not with the ROW_WID as is currently done in
the Siebel Applications.

5/11/2008 Siebel Systems, Inc. Confidential and subject to change Page 3


DRAFT

Siebel Analytics Meta Data Modeling Scenarios


The following scenarios represent common situations that may arise in an Analytics
project. Along with a description of each, solutions are provided that detail the
techniques used to handle the issue.

Circular Joins
Issue Description
Ensure that all circular joins are removed from the physical model. There are two
types of circular joins, each having a different effect on queries:

Intra-Dimensional

Description: Two join paths exist between two tables in a single source for a Logical
Dimension table.

Effects: The wrong join path is chosen in the SQL, resulting in the wrong
record set.

Example: Frequently seen when small decode tables are linked into two or more
main tables in a dimensional source for the purposes of
denormalization. For example, a Country Lookup Table may be linked
to the Account table and to the Customer table, which is the parent of
Account:

Solution: Alias each of the tables, and have one alias link to one base table, and
the other alias to another. Ex.: Alias the Dim_Country table, call it
Dim_Customer_ Country, link it to the Dim_Customer table, remove
the link to Account. For the original Dim_Country table, break its join
to Dim_Customer. Thus one Dim_Country table links to one base
table, and the other country table links to the other.

Determine if the new aliased Dim_Country table should be mapped to


the same logical columns as the original Country table (in the case of
pure denormalization (where the Customer Country will always equal
the Account Country) or mapped to new columns (in the case of
different context, meaning where Customer Country may not always
equal Account Country).

If there is a pure denormalization occurring (Account_Country =


Customer_Country), then additional source for the logical table will be
needed. As a source can have only one mapping per logical column,

5/11/2008 Siebel Systems, Inc. Confidential and subject to change Page 4


DRAFT

two sources will be needed, with one source having the original
Dim_Country and another one having the new, aliased Dim_Country.
In this case, map the Dim_Country and Dim_Customer_Country
columns to the same logical columns.

The exception to the rule of no Intra-Dimensional joins is when dealing


with _DX or _DH tables. As long as these tables are 1:1 with the base
_D table, the Circular join will not cause a problem of reduced record
set.

Inter-Dimensional

Description: Two Logical Dimension Tables are physically connected by a join


“behind the scenes”.

Effects: An extra join is issued between the two dimensions when used in
conjunction with facts. In some cases, there may be no impact, but
more frequently the resulting record set may be incorrect.

Example: A common example of this is shown in the diagram below. Here, there
are two Logical Dimension Tables, Time and Customer, and one Fact
Logical Table. The Time Logical Table includes Dim_Day, and the
Customer Logical Table includes both Dim_Customer and Dim_Day.
The relationship between Dim_Customer and Dim_Day indicated when
the customer was acquired.

In this example, the Dim_Day physical table is used in two different


logical dimension tables, which will cause an invalid result. When
Dim_Day, Dim_Customer, and Fact_Sales are used together, Analytics
will join Dim_Day (indicting the day the revenue was booked) with
both the Fact_Sales table and with the Customer_Acquire_Date in
Dim_Customer. This will undoubtedly result in incorrect results.

5/11/2008 Siebel Systems, Inc. Confidential and subject to change Page 5


DRAFT

Solution: Alias each of the physical dimension tables and eliminate the join
between them. Join the aliased table into one dimension, and the
original, non-aliased one into the other dimension. In the example,
Dim_Day is aliased, renamed, and joined into the Customer Dimension
via a different join:.

Comments
Out of the box Siebel Analytics does not have any such circular join issues. An
indication of this can be seen in the numerous aliases for common tables such as
W_ORG_D, W_GEO_D, and W_PERSON_D. For these tables, several aliases have
been made, each indicating its specific type (as defined by the dimensions in the
Business Model). For example, W_ORG_D has aliases for Created by Org,
Competitor, Owner Org, Shipped Account Org, etc. Each of these aliases is used in
one and only one logical table/dimension.

Siebel Analytics 7.x functions best when its underlying data model to be of the
Star/Snowflake schema variety. In such a modeling schema, there are no cross
dimensional links – all instances of a physical table are replicated to align with their
context (i.e. Dimension). There is no concept of simple Geography for example;
there are concepts of Sales Geography, Customer Geography, Originating
Geography, Billing Geography, etc. The context of how the specific table is used is
critical; by first identifying the multiple contexts in which it will be used, a
determination of which Logical Tables/Dimensions (and therefore aliases) will be
needed can be more readily made. Continuing with the example, there would most
likely be logical tables in the Business Model for each of the Geographies: Sales
Geography, Customer Geography, Originating Geography, Billing Geography. In
each of these Logical Tables, there would be a corresponding aliased version of the
W_GEO_D table.

If it is determined that one of the joins is not needed then the join can simply be
deleted and the circular join will be solved.

Multi-Fact table Metrics and Reports


Issue Description
It is frequently desired to have a report with metrics from two fact tables or a single
metric derived from two fact tables.

Solution
Ensure that there are no Fact-to-Fact joins in the Physical Layer. Siebel Analytics will
determine that two fact sources (and therefore two fact tables) are required to

5/11/2008 Siebel Systems, Inc. Confidential and subject to change Page 6


DRAFT

retrieve the desired results, and issue parallel SQL to the database, and finally join
the results on the Analytics server.

Comments
A fact to fact join is frequently a very poor performing join, given the size of the
record sets involved. A query which requires a metric from two fact sources (2 fact
tables) will be handled via the parallel execution of SQL. A data set is retrieved from
the 1st fact table, and another data set is retrieved from the 2nd fact table. The
Analytics server then merges the two data sets on the Server, ideally on reduced
record sets due to the prior application of filters in the database.

Fact Extensions do not fit into this category; as they are not standalone fact tables,
they require a FK to the base fact table to derive Dimensional keys.

In most cases, this scenario would not occur, as metrics that are used together
would be modeled into the same fact table.

Dimensional Extension Tables


Issue Description
How to properly include Dimension Extension tables into Siebel Analytics.

Solution
Physical Layer:

When modeling a Dimension Extension (_DX), join it to the base Dimension table
(_D) as a 1:M FK join on the ROW_WID. By mimicking a parent child between _DX
and _D, the _DX will not be included in queries which do not require any of its fields,
reducing overjoins. Additionally, join it to the facts in the same manner as the _D
table, using a 1:M between the _DX and the Facts. By doing so, queries that need
values from the _DX and the facts but not the _D will bypass the _D table, improving
run time performance. Queries that use the Facts and values from both the _DX and
the _D will produce an extra join; however this join is redundant and may be
ignored. This assumes that there is a 1:1 between the _DX and the _D; if this is not
the case then such a join path may not return valid results.

_DH tables should be modeled in an identical manner: 1:M _DH to _D and 1:M _DH
to Facts as shown below. Note both join to the base dimension on the ROW_WID.

5/11/2008 Siebel Systems, Inc. Confidential and subject to change Page 7


DRAFT

Business Layer:

Add the new extension table to the existing source for the Logical Dimension Table.
A separate source for just the base dimension table should also be created for
performance reasons. To support a new source with just the _DX or _DH, the
physical joins between it and the facts must exist as shown above. Thus, a fully
described Logical Table for this sample dimension is as follows:

Source: W_ORG_D: includes tables W_ORG_D, W_ORG_DX and W_ORG_DH,


required

Source: W_ORG_DX includes table W_ORG_DX, optional for performance

Source: W_ORG_DH includes table W_ORG_DH, optional for performance

Comments
By modeling the _DX as a parent of the _D, Analytics simply thinks of it as another
parent table. As Analytics is primarily designed to support Star/Snowflake schemas,
it will not include parent tables when not necessary.

Note that the Extension table should be modeled in this manner even if it contains
FKs to other dimensional tables.

Fact Extension Tables


Issue Description
How to properly include Fact Extension tables into Siebel Analytics.

Solution
Physical Layer:

Model the Extension table (_FX) to the base Fact table (_F) as a parent-child 1:M..
By mimicking a parent child between the _FX and the _F, the _FX will not be
included in queries which do not require any of its fields.

5/11/2008 Siebel Systems, Inc. Confidential and subject to change Page 8


DRAFT

Business Layer:

Add the new extension table to the existing source for the Facts. A separate source
for just the base fact table will not be necessary. If the _FX table contains additional
FKs to new dimensions, then the following will be necessary:

 A physical join to the Dimension table

 Adjust the Aggregation Content Filter to include the new dimension

Comments
If there is a case where a query will be generated that only contains metrics from the
_FX and dimensions linked directly to the _FX, then create a new source for this _FX.
However this scenario is unlikely, and therefore an additional source just for the _FX
table is not needed. If this scenario is identified, a new fact table should be
considered.

Combo Tables
Issue Description
In some cases, a need may exist where a table is needed to support both Attributes
and Metrics. It therefore is needed as both a Fact table and as a Dimension table.
Refer to the W_ACTIVTY_F table in the Siebel 7.x Core Meta Data model as a real life
example – it serves as both the dimension and the fact.

Solution
Create the physical model as would normally be done – no aliases are necessary for
most cases. Create a logical Dimension table with a source containing all of the
necessary tables to support the dimension. If a table has both facts and attributes,
this may include a Fact tables.

Do the same for the facts by creating a new source with all of the necessary tables to
support the metrics. Note that this may include several dimensions, as there may be
counts off of these dimension tables.

By not aliasing the fact table, an additional join will be eliminated when the two are
used together. If one of the tables uses an alias, then a self-join will be used.

Comments
Care must be taken if the Dimensional version of the physical table is to be used in
queries where it joins to another fact table.

Analytics will mix the direct joins that are desired for the table as a fact and as a
dimension, and will over-join when the table is a dimension.

5/11/2008 Siebel Systems, Inc. Confidential and subject to change Page 9


DRAFT

For example, Assume Physical Table Activities is used (without an alias) in both a
Fact Source and a Dimension Source. When Activities serves as the fact table in a
query, it joins to other dimensions such as the Dim_Acct table in a normal manner as
shown above.

This scenario supports Activities as a dimension as well, shown below joining to a


fact table:

A problem will occur when a query uses the Activities table as a dimension, and
includes other dimensions that join to both the Fact table and the Combo table. The
diagram below shows a query that wishes to see facts from W_FACTS_F by Activity
and Account. The query that is generated will include an Inter-Dimensional join (see
section above) between Dim_Acct and Activities, which will most likely alter the
record set:

In this case, alias the Activities table, and have one of the Activities physical tables
be the source for the dimension and the other be the source for the fact. Thus,
Activities Dimension and Activities Fact will have different physical tables, and the
Inter-Dimensional join will not occur:

5/11/2008 Siebel Systems, Inc. Confidential and subject to change Page 10


DRAFT

No direct physical link between a base Dimension and a Fact


table
Issue Description
There may be a need to link a particular dimension table with a particular fact table,
but no FK to the dimension exists on the fact table. When another table exists which
contains this relationship, it can be modeled in Siebel Analytics to create the link.

For example, a link is needed from W_PROGRAM_D to a fact table W_INVITEM_F,


but there is no direct join possible between the two tables:

However, there are other tables in the Physical model that can be used to create this
join, as shown below:

Through the use of the tables W_PERSON_D and W_CAMP_HIST_F, a link between
W_PROGRAM_D and W_INVITEM_F can be established via the following steps. Note
that this involves a Many-to-Many relationship between W_PROGRAM_D and
W_INVITEM_F, and as such additional techniques discussed in a later section may be
applicable.

5/11/2008 Siebel Systems, Inc. Confidential and subject to change Page 11


DRAFT

Solutions
There are two alternative solutions to this problem. Solution (A) involves modifying
the Facts, and Solution (B) involves modifying the Dimension. Solution A is simpler
than Solution B, and is therefore recommended.

Solution A

This solution involves the creation of a new fact source for the Logical Fact table,
with the added linkage tables included. By doing this, you are effectively adding a
new FK to the logical fact source. Then, add the new dimension to the Aggregation
Content filter for the source. This is in effect identical to Fact Extension (_FX) tables
that have additional FKs to other dimensions. Thus, the Facts Logical Table will have
the following two sources:

Source1: W_INVITEM_F, Aggregation Content at current levels

Source2: W_INVITEM_F, W_PERSON_D, W_CAMP_HIST_F, Aggregation content


at same levels as Source1, plus the Program Dimension

Solution B

Solution B involves putting the two tables used to join (W_PERSON_D and
W_CAMP_HIST_F) into a new, lower level in the existing Program dimension. This
requires three main steps: first the creation of a new logical table source for the
dimension with all three tables (W_PROGRAM_D, W_CAMP_HIST_F and
W_PERSON_D). Second, a new lower level in the Dimensional Hierarchy needs to be
created. Finally, the existing fact source needs to be adjusted to include the new
Dimension by adding to its Aggregation Content filters.

5/11/2008 Siebel Systems, Inc. Confidential and subject to change Page 12


DRAFT

Comments
This is a common need, and does represent a Many-to-Many scenario. As such,
metrics will be over counted.

Fragmentation
Issue Description
How to properly implement Fragmentation on a fact source.

Solution
In this example, the filter will be applied to the W_REVN_F_CURR table, which hold
data for 2002 and beyond, and the W_REVN_F_HIST table which holds data prior to
2002.

 Map in both fact tables to separate sources in the Facts. Everything about
them should be identical except for the table name.

 For each fragmented fact table source, enter its filter in the Fragmentation
content section as follows, using the Dimension, filter:

TheSystem.Time.”Day” < DATE ‘2002-01-01’ (for Historical fact source)


TheSystem.Time.”Day” >= DATE ‘2002-01-01’ (for current fact source)

 All other variations and derivatives from the same hierarchy must be
addressed as well, for example Month, Week, Quarter, Year, etc.

 Be sure to check the check box labeled “This source should be combined with
other sources at this level” if the fact source is a sub-set of the entire data
set. If the source has data that overlaps another table, leave this check box
unchecked.

Differing queries will display the following types of behavior:

Query by Day, with a range covering one of the fragments: Only one fragment
will be used, with Analytics performing the filter.

SELECT
W_REVN_F.REVN,

5/11/2008 Siebel Systems, Inc. Confidential and subject to change Page 13


DRAFT

W_DAY_D."Dim Date"
FROM TheSystem
WHERE
W_DAY_D."Dim Date" BETWEEN date '2001-12-25' AND date '2001-12-31'

select T1030."REVN" as c4,


T294."DAY_DT" as c5
from
"W_DAY_D" T294,

"W_REVN_F" T1030
where T294."ROW_WID" = T1030."CLOSE_DT_WID"

Query by Day, with a range covering both fragments: Unfiltered parallel SQL will
be issued, one for each fragment. Analytics will then perform the filter:

SELECT
W_REVN_F.REVN,
W_DAY_D."Dim Date"
FROM TheSystem
WHERE
W_DAY_D."Dim Date" BETWEEN date '2001-12-25' AND date '2002-01-05'

-------------------- Sending query to database named OLAP (id: <<8680457>>):


select T1030."REVN" as c1,
T294."DAY_DT" as c2
from
"W_DAY_D" T294,

"W_REVN_F" T1030
where T294."ROW_WID" = T1030."CLOSE_DT_WID"

-------------------- Sending query to database named OLAP (id: <<8680532>>):


select T1030."REVN" as c1,
T294."DAY_DT" as c2
from
"W_DAY_D" T294,

"W_REVN_F" T1030
where T294."ROW_WID" = T1030."CLOSE_DT_WID"

For each differing criteria that may be needed, a similar process must be
undertaken. Continuing with the sample above, the tables will be described to
Analytics so that if a user runs a report by Year, Analytics will know how to break up
the query.

 Add CAL_YEAR into the Logical Table for Facts, map it to each of the
Fragments, and add it to the Logical Fact Table key.

 Add in additional Fragment filters as follows:

TheSystem.Time.”CAL_YEAR” < DATE 2002 for Historical fact source


TheSystem.Time.”CAL_YEAR” >= DATE 2002 for current fact source

Note that the query results are different this time:

5/11/2008 Siebel Systems, Inc. Confidential and subject to change Page 14


DRAFT

For a query on a single year, one fragment is used with the filter in the query:

SELECT
W_REVN_F.REVN,
W_DAY_D.CAL_YEAR
FROM TheSystem
WHERE
W_DAY_D.CAL_YEAR = 2001

select sum(T1030."REVN") as c1,


T294."CAL_YEAR" as c2
from
"W_DAY_D" T294,

"W_REVN_F" T1030
where T294."ROW_WID" = T1030."CLOSE_DT_WID" and T294."CAL_YEAR" = 2001
group by T294."CAL_YEAR"

For a query that hits multiple years, a single select containing two sub selects and a
union all is used. Each sub select has the combined filter on it (e.g. For the fact
table with 2002+, its select has a where clause of CAL_YEAR = 2002 or
CAL_YEAR=2001):

SELECT
W_REVN_F.REVN,
W_DAY_D.CAL_YEAR
FROM TheSystem
WHERE
W_DAY_D.CAL_YEAR IN (2001, 2002)

select sum(D3.c3) as c1,


D3.c2 as c2
from
(select T294."CAL_YEAR" as c2,
T1030."REVN" as c3
from
"W_DAY_D" T294,

"W_REVN_F" T1030
where T294."ROW_WID" = T1030."CLOSE_DT_WID" and (T294."CAL_YEAR" = 2001 or
T294."CAL_YEAR" = 2002)
union all
select T294."CAL_YEAR" as c2,
T1030."REVN" as c3
from
"W_DAY_D" T294,

"W_REVN_F" T1030
where T294."ROW_WID" = T1030."CLOSE_DT_WID" and (T294."CAL_YEAR" = 2001 or
T294."CAL_YEAR" = 2002)
) D3
group by D3.c2
Comments
Be aware of large tables that use database partitioning. In order for the database to
properly use its partitions, the query must be structured in such a way that filtering

5/11/2008 Siebel Systems, Inc. Confidential and subject to change Page 15


DRAFT

occurs on the Fact table, and not the dimension table. This can be accomplished by
denormalizing some of the time elements into the Time Logical table as new sources.

Review Presentation Layer Aliases


The use of aliases on the Presentation Layer for both table and columns can have
undesired and difficult to diagnose effects. An alias is automatically created when
the Presentation Layer object is renamed. Aliasing allows front end reports to
continue to use the names on which they were developed for backward compatibility
among Presentation layer versions.

Verify that all aliases are removed from the Presentation Layer unless required.
Note that the removal of aliases will possibly invalidate several pre-existing reports if
these reports were constructed before the new name of the column or table. These
reports should be re-developed with the new Presentation Table and Column names,
replacing the older ones.

Many-to-Many Solutions
It is common to want to model a Many-to-Many relationship between Dimensions
and Facts. For example, it may be necessary to see all employees associated with
an opportunity, not just the primary. This section presents a series of tools and
techniques that may be applied to solve a particular case.

Technique #1: Select a Primary


Although not a technical solution, the best way to solve the M:M problem is to
eliminate it. By selecting one of the many dimensional records that are associated
with a fact, the entire problem can be avoided. In the Siebel OLTP, Primaries are
used throughout the model, which are carried over and used in the Analytics model.
If it is at all possible to identify a primary, and the use of the primary is acceptable
to the user community, then it is recommended to use this technique.

Technique #2: Direct Modeling into the Dimension


A straightforward technique where the table that serves as the intersection table is
modeled into a lower level in the Dimension. The specifics of this technique are
similar to those outlined in Solution B of the No direct physical link between a base
Dimension and a Fact table section above.

Note that over-counting will occur when performing the many-to-many join.

Technique #3a: Use of a Bridge Table


Instead of modeling the relationship table into a new lower level in the dimension as
in Technique #2, the relationship table can become a separate logical table that
servers as the Bridge between the dimension and the facts. Create a new Logical
table with the M:M relationship table as the source, mark the logical table as a
Bridge table, and adjust the Business model to show the relationship of Facts:Bridge
as 1:M and Bridge:Dimension as M:1. The indication that the Logical Table is a
Bridge table is merely an indicator to Analytics that the table is not a Fact table,
which it assumes to be any lowest-level table in the data model.

Note that over-counting will occur when performing the many-to-many join

5/11/2008 Siebel Systems, Inc. Confidential and subject to change Page 16


DRAFT

Technique #3b: Use a Weighted Bridge Table


Similar to Technique #3a, this technique is the classic Kimball approach, where the
Bridge table employs weighting factors to prorate a total value over multiple records.
For example, if there is one Opportunity worth $1,000,000 and there are two
Employees associated with it, the bridge table might contain a record for each with a
weighting factor of 0.5. In this way, each employee will be associated with 0.5 of
the whole amount of $1,000,000, or $500,000. If it is determined that Employee A
should receive 75% of the credit, then the weighting factors would be stored as 0.75
and 0.25, which would give Employee A 75 of the total or $750,000.

It is important to note that the weighting factors must all add up to 1 (One), as they
are effectively percentages of a whole. Additional ETL effort will be required to
complete this solution.

This technique eliminates over-counting, but may be difficult to implement if users


are not comfortable prorating a value over several records.

Technique #4: Use Level Based Measures


As an enhancement to Techniques 2 and 3, the use of level based measures can help
prevent the over counting problem associated with each. When a metric or measure
is explicitly bound to a specific level in a dimension, it is indicating that the metric
will be viewed at that level. If the metrics in a fact table are to be viewed by a
Dimension with which it has a M:M relationship, those metrics can be set to a level in
the dimension, thereby forcing that the records be broken out across that dimension.
By forcing a breakout of rows (one fact row for each dimensional row), aggregation
is prevented, and therefore over counting will not occur.

As an example, suppose there is a M:M between Employee and Fact_Opty_Revenue.


The data in the tables indicate that Tom, Larry and Bill are all linked to an
Opportunity worth $9 million. The user makes a report that asks for the Opportunity
Type and the total Potential Opportunity Revenue. Without level setting the metrics
on the fact table, a report that does not include the employee dimension will
overcount, as each of the three dim records will be brought into the query and
aggregated into one:

Opportunity Type Potential Opportunity Revenue

Software Sales $27,000,000

By level setting the Revenue metrics to the Employee level in the Employee
Dimension, this same report will return the following:

Opportunity Type Potential Opportunity Revenue

Software Sales $9,000,000

Software Sales $9,000,000

Software Sales $9,000,000

5/11/2008 Siebel Systems, Inc. Confidential and subject to change Page 17


DRAFT

Although not intuitively obvious as to the cause of the breakout to the end user, the
over counting scenario is prevented. When the user adds the Employee to the
report, the breakout becomes clearer:

Opportunity Type Employee Potential Opportunity


Revenue

Software Sales Larry $9,000,000

Software Sales Tom $9,000,000

Software Sales Bill $9,000,000

Technique #5: Lower the Fact Table


The most complicated and involved solution is to lower the level of the fact table,
and create a 1:M between the Dimensions and the Facts. This involves a business
rule to split up the metrics and spread them over all possible dimensional records.
In the example above, the simplest spread would be to assign Larry, Tom and Bill
each 1/3 of the total amount of $9,000,000, or $3,000,000. Thus, a report that does
not break out by Employee will still total to the correct $9,000,000. Note that this
would require three records in the fact table instead of one, hence the concept of
lowering the level of detail in the fact.

5/11/2008 Siebel Systems, Inc. Confidential and subject to change Page 18


DRAFT

Data Warehouse Architectural


Recommendations
This section is intended to aid in the design process of the Physical Data Model upon
which Siebel Analytics sits. Many difficult Analytics MetaData Modeling scenarios can
be avoided through a properly designed Physical Data Model. As Siebel Analytics
MetaData Modeling issues are addressed in the above section, this section is aimed
at the Physical Data Model and items that may be added to an ETL process.

Simple Dimensional Attribute Denormalization


In many cases, logical dimension tables in the MetaData are overly complex and
require many tables to provide the necessary attributes. When a dimension requires
the inclusion of large tables, including fact tables, to link to this data, run time
performance will suffer greatly. These values should be determined during the ETL
process and stored in the _DX tables.

As an example, assume that the Employee Dimension requires the NAME field from
W_ORG_D, and two ATTRIB columns from W_TERR_DX. In order to accomplish this,
Analytics must join in at run time, the W_PERSON_F, W_PERSON_FX, W_TERR_DX
and W_ORG_D tables. Having this logic applied at load time, storing these 3 values
in the W_PERSON_DX table will not only speed up the queries, but will simplify the
Analytics MetaData by removing 4 aliases and their required joins.

This type of change can be performed independently on each dimension in a serial


fashion. For example, Employee can be addressed first, followed by Account, then
Opportunity, etc. For each dimension, review the Column Mapping for each source,
and determine if these columns can be added to the base level _DX. Next, modify
the Data Model (if need be), then the ETL code, re-import the effected tables, delete
the aliases and joins that are no longer needed, and remap the column in the Logical
Table source.

A second example demonstrates how this is done in the Siebel Analytics Horizontal
Application:
Source Table Source OLAP Table OLAP Column
Column
S_ORG_EXT LOC W_PERSON_D EMP_ACCNT_LOC
S_ORG_EXT LOC W_OPTY_D ACCNT_LOC
S_ORG_EXT LOC W_ORG_D ACCNT_LOC
S_ORG_EXT LOC W_PRODUCT_D VENDOR_LOC

This table clearly shows how the physical LOC column is used in four different
dimensions. With such denormalizations, it will not be necessary to perform any
additional joins to retrieve the data from the LOC column from other tables.

Use of ROW_WIDs for LOVs


The use of ROW_WIDs when joining to the W_LOV_D table will simplify some of the
joins and speed up query execution. LOV lookups can be done on the NAME column

5/11/2008 Siebel Systems, Inc. Confidential and subject to change Page 19


DRAFT

and a filter on the TYPE column. Although this can be modeled in Analytics, it is far
from ideal, as it complicates the model and forces additional and unnecessary joins.
Modify the ETL process to do these lookups, and join based on the ROW_WIDs. Note
that the aliasing of W_LOV_D will still be required.

Ensure Full Referential Integrity in the Physical Data Model


The use of Outer Joins should be severely limited or eliminated in a properly
designed Data Warehouse. It is a best practice to ensure RI by having all children
assigned to a parent, ensuring that they are not omitted in queries. In cases where
a parent record does not exist, assign the child to the ‘Unspecified’, ‘Unknown’, ‘N/A’
or similar type of record. This will allow a proper 1:M FK join to be built in Analytics,
which provides the following benefits:

 The parent table will not be used unless it is used in the query (no overjoin)

 An index will be used on the parent when it is joined into queries on using the
child

 No child records will be lost in queries that group by the parent. This is
critical, as a Data Warehouse should account for all numerical values, even if
the proper dimensionality is not known

 It allows for simpler Analytics MetaData

Note that the ETL should ensure RI before loading, as it is good warehousing practice
to remove FK constraints in the database to speed load times.

Convert Non-Indexable Common Filters to Indexable Filters via


ETL
In some cases, it may be common to perform a complex filter on a column. If
this filter is either in multiple reports, or is used very heavily, then its
corresponding logic should be moved into the ETL. For example, the following
filter used in a report should be converted into an Indicator or flag via the
ETL: W_ORG_D.NAME not like ‘%SI%’. By performing this calculation and
creating an indicator on the table, an index may be properly used, improving
response time.

5/11/2008 Siebel Systems, Inc. Confidential and subject to change Page 20

Вам также может понравиться