Академический Документы
Профессиональный Документы
Культура Документы
Siebel Analytics
7/7.5
MetaData
Construction
Guidelines
Version 1.0
DRAFT
Revision History:
Date Author Description
Table of Contents
REVISION HISTORY: ..................................................................................... I
No direct physical link between a base Dimension and a Fact table .................11
Fragmentation..........................................................................................13
Executive Summary
This document provides Siebel recommended Best Practices and several specific
scenario solution details regarding Physical Data Modeling and MetaData
Construction in Siebel Analytics 7.x. It is intended to be used as a reference guide
for any Analytics project which requires development of the Analytics Repository,
both for Siebel OOB Applications and Stand Alone. The recommendations presented
in this document are limited to the three layers in the MetaData – Physical, Business
and Presentation, and does not address other Analytics areas such as Report
creation, server tuning, security, installation, or other such areas. Additionally,
several key Physical Data Modeling recommendations are reviewed which if
implemented, may reduce the complexity of the Analytics MetaData.
Analytics MetaData Best Practices, which briefly overviews some best practices
regarding overall Data Warehouse design and Siebel Analytics Meta Data
construction.
Save backups of the online repository before and after every completed unit of
work. If needed, use File | Copy As to make an offline copy with the changes you
have made.
Ensure the MetaData is generating the correct record set first, then focus on
performance tuning activities.
Ensure that aliases for presentation layer columns and tables are not used unless
necessary. Verify that reports do not use the aliases. Be mindful that renaming
an element at either the Presentation or Logical layer will cause this to occur.
When creating table aliases in the Physical Layer, keep the original table name,
followed by the alias name, for example W_DAY_D Hire Date. This will keep all
like tables together in the Physical layer window when tables are displayed
alphabetically. Note this is the opposite of how the MetaData was developed.
Opaque Views (A Physical Layer table that consists of a Select statement) should
be used only as a last resort option. Ideally a physical table should be created,
or alternatively, a database view.
In general, push as much processing to the database as possible. This includes
tasks such as filtering, string manipulation and additive measures.
Ensure that all levels of a hierarchy contain an appropriate value for the Number
of elements at this level field. Fact sources are selected on a combination of the
fields selected as well as the levels in the dimensions that they link into. By
adjusting these values, you can alter the fact source that Analytics will select.
Circular Joins
Issue Description
Ensure that all circular joins are removed from the physical model. There are two
types of circular joins, each having a different effect on queries:
Intra-Dimensional
Description: Two join paths exist between two tables in a single source for a Logical
Dimension table.
Effects: The wrong join path is chosen in the SQL, resulting in the wrong
record set.
Example: Frequently seen when small decode tables are linked into two or more
main tables in a dimensional source for the purposes of
denormalization. For example, a Country Lookup Table may be linked
to the Account table and to the Customer table, which is the parent of
Account:
Solution: Alias each of the tables, and have one alias link to one base table, and
the other alias to another. Ex.: Alias the Dim_Country table, call it
Dim_Customer_ Country, link it to the Dim_Customer table, remove
the link to Account. For the original Dim_Country table, break its join
to Dim_Customer. Thus one Dim_Country table links to one base
table, and the other country table links to the other.
two sources will be needed, with one source having the original
Dim_Country and another one having the new, aliased Dim_Country.
In this case, map the Dim_Country and Dim_Customer_Country
columns to the same logical columns.
Inter-Dimensional
Effects: An extra join is issued between the two dimensions when used in
conjunction with facts. In some cases, there may be no impact, but
more frequently the resulting record set may be incorrect.
Example: A common example of this is shown in the diagram below. Here, there
are two Logical Dimension Tables, Time and Customer, and one Fact
Logical Table. The Time Logical Table includes Dim_Day, and the
Customer Logical Table includes both Dim_Customer and Dim_Day.
The relationship between Dim_Customer and Dim_Day indicated when
the customer was acquired.
Solution: Alias each of the physical dimension tables and eliminate the join
between them. Join the aliased table into one dimension, and the
original, non-aliased one into the other dimension. In the example,
Dim_Day is aliased, renamed, and joined into the Customer Dimension
via a different join:.
Comments
Out of the box Siebel Analytics does not have any such circular join issues. An
indication of this can be seen in the numerous aliases for common tables such as
W_ORG_D, W_GEO_D, and W_PERSON_D. For these tables, several aliases have
been made, each indicating its specific type (as defined by the dimensions in the
Business Model). For example, W_ORG_D has aliases for Created by Org,
Competitor, Owner Org, Shipped Account Org, etc. Each of these aliases is used in
one and only one logical table/dimension.
Siebel Analytics 7.x functions best when its underlying data model to be of the
Star/Snowflake schema variety. In such a modeling schema, there are no cross
dimensional links – all instances of a physical table are replicated to align with their
context (i.e. Dimension). There is no concept of simple Geography for example;
there are concepts of Sales Geography, Customer Geography, Originating
Geography, Billing Geography, etc. The context of how the specific table is used is
critical; by first identifying the multiple contexts in which it will be used, a
determination of which Logical Tables/Dimensions (and therefore aliases) will be
needed can be more readily made. Continuing with the example, there would most
likely be logical tables in the Business Model for each of the Geographies: Sales
Geography, Customer Geography, Originating Geography, Billing Geography. In
each of these Logical Tables, there would be a corresponding aliased version of the
W_GEO_D table.
If it is determined that one of the joins is not needed then the join can simply be
deleted and the circular join will be solved.
Solution
Ensure that there are no Fact-to-Fact joins in the Physical Layer. Siebel Analytics will
determine that two fact sources (and therefore two fact tables) are required to
retrieve the desired results, and issue parallel SQL to the database, and finally join
the results on the Analytics server.
Comments
A fact to fact join is frequently a very poor performing join, given the size of the
record sets involved. A query which requires a metric from two fact sources (2 fact
tables) will be handled via the parallel execution of SQL. A data set is retrieved from
the 1st fact table, and another data set is retrieved from the 2nd fact table. The
Analytics server then merges the two data sets on the Server, ideally on reduced
record sets due to the prior application of filters in the database.
Fact Extensions do not fit into this category; as they are not standalone fact tables,
they require a FK to the base fact table to derive Dimensional keys.
In most cases, this scenario would not occur, as metrics that are used together
would be modeled into the same fact table.
Solution
Physical Layer:
When modeling a Dimension Extension (_DX), join it to the base Dimension table
(_D) as a 1:M FK join on the ROW_WID. By mimicking a parent child between _DX
and _D, the _DX will not be included in queries which do not require any of its fields,
reducing overjoins. Additionally, join it to the facts in the same manner as the _D
table, using a 1:M between the _DX and the Facts. By doing so, queries that need
values from the _DX and the facts but not the _D will bypass the _D table, improving
run time performance. Queries that use the Facts and values from both the _DX and
the _D will produce an extra join; however this join is redundant and may be
ignored. This assumes that there is a 1:1 between the _DX and the _D; if this is not
the case then such a join path may not return valid results.
_DH tables should be modeled in an identical manner: 1:M _DH to _D and 1:M _DH
to Facts as shown below. Note both join to the base dimension on the ROW_WID.
Business Layer:
Add the new extension table to the existing source for the Logical Dimension Table.
A separate source for just the base dimension table should also be created for
performance reasons. To support a new source with just the _DX or _DH, the
physical joins between it and the facts must exist as shown above. Thus, a fully
described Logical Table for this sample dimension is as follows:
Comments
By modeling the _DX as a parent of the _D, Analytics simply thinks of it as another
parent table. As Analytics is primarily designed to support Star/Snowflake schemas,
it will not include parent tables when not necessary.
Note that the Extension table should be modeled in this manner even if it contains
FKs to other dimensional tables.
Solution
Physical Layer:
Model the Extension table (_FX) to the base Fact table (_F) as a parent-child 1:M..
By mimicking a parent child between the _FX and the _F, the _FX will not be
included in queries which do not require any of its fields.
Business Layer:
Add the new extension table to the existing source for the Facts. A separate source
for just the base fact table will not be necessary. If the _FX table contains additional
FKs to new dimensions, then the following will be necessary:
Comments
If there is a case where a query will be generated that only contains metrics from the
_FX and dimensions linked directly to the _FX, then create a new source for this _FX.
However this scenario is unlikely, and therefore an additional source just for the _FX
table is not needed. If this scenario is identified, a new fact table should be
considered.
Combo Tables
Issue Description
In some cases, a need may exist where a table is needed to support both Attributes
and Metrics. It therefore is needed as both a Fact table and as a Dimension table.
Refer to the W_ACTIVTY_F table in the Siebel 7.x Core Meta Data model as a real life
example – it serves as both the dimension and the fact.
Solution
Create the physical model as would normally be done – no aliases are necessary for
most cases. Create a logical Dimension table with a source containing all of the
necessary tables to support the dimension. If a table has both facts and attributes,
this may include a Fact tables.
Do the same for the facts by creating a new source with all of the necessary tables to
support the metrics. Note that this may include several dimensions, as there may be
counts off of these dimension tables.
By not aliasing the fact table, an additional join will be eliminated when the two are
used together. If one of the tables uses an alias, then a self-join will be used.
Comments
Care must be taken if the Dimensional version of the physical table is to be used in
queries where it joins to another fact table.
Analytics will mix the direct joins that are desired for the table as a fact and as a
dimension, and will over-join when the table is a dimension.
For example, Assume Physical Table Activities is used (without an alias) in both a
Fact Source and a Dimension Source. When Activities serves as the fact table in a
query, it joins to other dimensions such as the Dim_Acct table in a normal manner as
shown above.
A problem will occur when a query uses the Activities table as a dimension, and
includes other dimensions that join to both the Fact table and the Combo table. The
diagram below shows a query that wishes to see facts from W_FACTS_F by Activity
and Account. The query that is generated will include an Inter-Dimensional join (see
section above) between Dim_Acct and Activities, which will most likely alter the
record set:
In this case, alias the Activities table, and have one of the Activities physical tables
be the source for the dimension and the other be the source for the fact. Thus,
Activities Dimension and Activities Fact will have different physical tables, and the
Inter-Dimensional join will not occur:
However, there are other tables in the Physical model that can be used to create this
join, as shown below:
Through the use of the tables W_PERSON_D and W_CAMP_HIST_F, a link between
W_PROGRAM_D and W_INVITEM_F can be established via the following steps. Note
that this involves a Many-to-Many relationship between W_PROGRAM_D and
W_INVITEM_F, and as such additional techniques discussed in a later section may be
applicable.
Solutions
There are two alternative solutions to this problem. Solution (A) involves modifying
the Facts, and Solution (B) involves modifying the Dimension. Solution A is simpler
than Solution B, and is therefore recommended.
Solution A
This solution involves the creation of a new fact source for the Logical Fact table,
with the added linkage tables included. By doing this, you are effectively adding a
new FK to the logical fact source. Then, add the new dimension to the Aggregation
Content filter for the source. This is in effect identical to Fact Extension (_FX) tables
that have additional FKs to other dimensions. Thus, the Facts Logical Table will have
the following two sources:
Solution B
Solution B involves putting the two tables used to join (W_PERSON_D and
W_CAMP_HIST_F) into a new, lower level in the existing Program dimension. This
requires three main steps: first the creation of a new logical table source for the
dimension with all three tables (W_PROGRAM_D, W_CAMP_HIST_F and
W_PERSON_D). Second, a new lower level in the Dimensional Hierarchy needs to be
created. Finally, the existing fact source needs to be adjusted to include the new
Dimension by adding to its Aggregation Content filters.
Comments
This is a common need, and does represent a Many-to-Many scenario. As such,
metrics will be over counted.
Fragmentation
Issue Description
How to properly implement Fragmentation on a fact source.
Solution
In this example, the filter will be applied to the W_REVN_F_CURR table, which hold
data for 2002 and beyond, and the W_REVN_F_HIST table which holds data prior to
2002.
Map in both fact tables to separate sources in the Facts. Everything about
them should be identical except for the table name.
For each fragmented fact table source, enter its filter in the Fragmentation
content section as follows, using the Dimension, filter:
All other variations and derivatives from the same hierarchy must be
addressed as well, for example Month, Week, Quarter, Year, etc.
Be sure to check the check box labeled “This source should be combined with
other sources at this level” if the fact source is a sub-set of the entire data
set. If the source has data that overlaps another table, leave this check box
unchecked.
Query by Day, with a range covering one of the fragments: Only one fragment
will be used, with Analytics performing the filter.
SELECT
W_REVN_F.REVN,
W_DAY_D."Dim Date"
FROM TheSystem
WHERE
W_DAY_D."Dim Date" BETWEEN date '2001-12-25' AND date '2001-12-31'
"W_REVN_F" T1030
where T294."ROW_WID" = T1030."CLOSE_DT_WID"
Query by Day, with a range covering both fragments: Unfiltered parallel SQL will
be issued, one for each fragment. Analytics will then perform the filter:
SELECT
W_REVN_F.REVN,
W_DAY_D."Dim Date"
FROM TheSystem
WHERE
W_DAY_D."Dim Date" BETWEEN date '2001-12-25' AND date '2002-01-05'
"W_REVN_F" T1030
where T294."ROW_WID" = T1030."CLOSE_DT_WID"
"W_REVN_F" T1030
where T294."ROW_WID" = T1030."CLOSE_DT_WID"
For each differing criteria that may be needed, a similar process must be
undertaken. Continuing with the sample above, the tables will be described to
Analytics so that if a user runs a report by Year, Analytics will know how to break up
the query.
Add CAL_YEAR into the Logical Table for Facts, map it to each of the
Fragments, and add it to the Logical Fact Table key.
For a query on a single year, one fragment is used with the filter in the query:
SELECT
W_REVN_F.REVN,
W_DAY_D.CAL_YEAR
FROM TheSystem
WHERE
W_DAY_D.CAL_YEAR = 2001
"W_REVN_F" T1030
where T294."ROW_WID" = T1030."CLOSE_DT_WID" and T294."CAL_YEAR" = 2001
group by T294."CAL_YEAR"
For a query that hits multiple years, a single select containing two sub selects and a
union all is used. Each sub select has the combined filter on it (e.g. For the fact
table with 2002+, its select has a where clause of CAL_YEAR = 2002 or
CAL_YEAR=2001):
SELECT
W_REVN_F.REVN,
W_DAY_D.CAL_YEAR
FROM TheSystem
WHERE
W_DAY_D.CAL_YEAR IN (2001, 2002)
"W_REVN_F" T1030
where T294."ROW_WID" = T1030."CLOSE_DT_WID" and (T294."CAL_YEAR" = 2001 or
T294."CAL_YEAR" = 2002)
union all
select T294."CAL_YEAR" as c2,
T1030."REVN" as c3
from
"W_DAY_D" T294,
"W_REVN_F" T1030
where T294."ROW_WID" = T1030."CLOSE_DT_WID" and (T294."CAL_YEAR" = 2001 or
T294."CAL_YEAR" = 2002)
) D3
group by D3.c2
Comments
Be aware of large tables that use database partitioning. In order for the database to
properly use its partitions, the query must be structured in such a way that filtering
occurs on the Fact table, and not the dimension table. This can be accomplished by
denormalizing some of the time elements into the Time Logical table as new sources.
Verify that all aliases are removed from the Presentation Layer unless required.
Note that the removal of aliases will possibly invalidate several pre-existing reports if
these reports were constructed before the new name of the column or table. These
reports should be re-developed with the new Presentation Table and Column names,
replacing the older ones.
Many-to-Many Solutions
It is common to want to model a Many-to-Many relationship between Dimensions
and Facts. For example, it may be necessary to see all employees associated with
an opportunity, not just the primary. This section presents a series of tools and
techniques that may be applied to solve a particular case.
Note that over-counting will occur when performing the many-to-many join.
Note that over-counting will occur when performing the many-to-many join
It is important to note that the weighting factors must all add up to 1 (One), as they
are effectively percentages of a whole. Additional ETL effort will be required to
complete this solution.
By level setting the Revenue metrics to the Employee level in the Employee
Dimension, this same report will return the following:
Although not intuitively obvious as to the cause of the breakout to the end user, the
over counting scenario is prevented. When the user adds the Employee to the
report, the breakout becomes clearer:
As an example, assume that the Employee Dimension requires the NAME field from
W_ORG_D, and two ATTRIB columns from W_TERR_DX. In order to accomplish this,
Analytics must join in at run time, the W_PERSON_F, W_PERSON_FX, W_TERR_DX
and W_ORG_D tables. Having this logic applied at load time, storing these 3 values
in the W_PERSON_DX table will not only speed up the queries, but will simplify the
Analytics MetaData by removing 4 aliases and their required joins.
A second example demonstrates how this is done in the Siebel Analytics Horizontal
Application:
Source Table Source OLAP Table OLAP Column
Column
S_ORG_EXT LOC W_PERSON_D EMP_ACCNT_LOC
S_ORG_EXT LOC W_OPTY_D ACCNT_LOC
S_ORG_EXT LOC W_ORG_D ACCNT_LOC
S_ORG_EXT LOC W_PRODUCT_D VENDOR_LOC
This table clearly shows how the physical LOC column is used in four different
dimensions. With such denormalizations, it will not be necessary to perform any
additional joins to retrieve the data from the LOC column from other tables.
and a filter on the TYPE column. Although this can be modeled in Analytics, it is far
from ideal, as it complicates the model and forces additional and unnecessary joins.
Modify the ETL process to do these lookups, and join based on the ROW_WIDs. Note
that the aliasing of W_LOV_D will still be required.
The parent table will not be used unless it is used in the query (no overjoin)
An index will be used on the parent when it is joined into queries on using the
child
No child records will be lost in queries that group by the parent. This is
critical, as a Data Warehouse should account for all numerical values, even if
the proper dimensionality is not known
Note that the ETL should ensure RI before loading, as it is good warehousing practice
to remove FK constraints in the database to speed load times.