Вы находитесь на странице: 1из 5

As defined by Oracle, customisations fall into the following categories:

Category 1 customisations where we add a new column to an existing fact or


dimension table.

Category 2 customisations where we create new fact or dimension tables.

Category 3 customisations where we add an additional data source to an


existing fact or dimension table.

Other trivial customisations such as changing filter conditions.

Typically, with the trivial and category 1 customisations, there isnt much to go
wrong. We may need to think about auxiliary change capture (see below) if the
column comes from a table other than the base table in the OLTP.
With the category 2 customisations, we need to consider whether new fact tables
are suitable for being added to an existing subject area or (more commonly)
whether they merit a subject area of their own. I sometimes see subject areas with
multiple fact tables and many non-conformed dimensions. This happens when a
subject area is iteratively expanded again and again until the usability of the subject
area is severely diminished. Things have probably gone wrong if either a) it is
possible to create queries that the BI server refuses to interpret or b) the
development process constantly involves setting metrics to total over nonconformed dimensions.
Regarding category 2 customisations involving new dimensions, another
complication that arises is the modelling of many-to-many (M:M) relationships from
the OLTP. Take this example: a user looks at an opportunity in Siebel and as well as
seeing all the fields that are visible in OBIEE, they also see a field called Sales Team.
The user asks for this to be added to the Sales CRM Pipeline subject area. This is
non-trivial to model in OBIEE as the Siebel Sales Team field actually opens a shuttle
applet that displays the multiple sales team members for the opportunity. It might
seem obvious that Sales Team cant be modelled as an attribute of the opportunity
dimension but what should be done instead? 99% of the time, my advice would be
to examine the business reason behind the request to add the column and find an
alternative. It may be that just the primary sales team member is required for
reporting purposes or that the Created By or Last Updated By users could be used
instead. In the remaining 1% of cases we can consider the use of bridge tables, but
considering the number of OOTB metrics that would have to be remodelled to
accommodate this change, the functional advantage gained from the change may
not be enough to justify such dramatic technical changes. In situations like this,
knowing when not to customise can be important for both ease of use and ease of
upgrade.

Category 3 customisations do not strictly require RPD changes but we should at


least have a column to tell us which source system the row of data originated from.
Also, if factual rows from any particular source system cannot be reconciled with
certain dimensions (due to the lack of any foreign key column), it can be beneficial
to add a row to that dimension (with ROW_WID equal to -1 for example) to cater for
these fact rows. It allows us to differentiate between rows from an Oracle source
system that do not have an associated dimension row and rows from a non-Oracle
source system that cannot have an associated dimension row.
Data Lineage / Tracking Customisations
One of the questions that I sometimes get asked by end users is where does this
column come from in EBS / Siebel?. When dealing with OOTB BI Apps, this is usually
quite easy to answer because:

Even though our semantic model may be extremely vast, it is usually quite
simple and

Oracle provides good data lineage documentation and more importantly,


descriptive text in the RPD which manifests as hover-over text in the UI.

However, I often find that after a couple of years of customising, the question isnt
so easy to answer. From an IT point-of-view, its usually quite easy to determine
what is vanilla and what is custom, but from an end-user point-of-view, the exact
functional definition of custom columns is not always that obvious. The simple
solution is to follow the same standards that Oracle adheres to for the vanilla
material: give new subject areas, presentation tables and presentation columns
useful descriptions; keep subject areas small and intuitive (1 or 2 fact tables per
area); and maintain a data lineage spread sheet that maps columns between source
applications and the Oracle data warehouse.
This documentation becomes very useful when we come to upgrade BI Apps.
Typically, BI Apps upgrades involve a combination of automated and manual code
merging using a 3-way repository merge on the RPD and some more manual steps
to re-apply customisations to the new vanilla Informatica metadata. When testing
the merged RPD and customising copies of the new Informatica mappings, the new
code should be functionally identical but may be technically different due to data
model changes between BI Apps versions. At this point, the above documentation
becomes invaluable.
Indexing and Partitioning
In my previous post, I talked about building performance testing and monitoring into
the project lifecycle. Here we will focus more on the design and development
stages.

In the above section, I describe category 1 customisations as extremely simple.


Well, they are, but the one mistake that people often make is not considering
whether an index should be created along with the new column. This can lead to a
slow but steady increase in average query times as more and more non-indexed
columns are used to group and filter data. Generally, if any dimensional attribute is
going to be used for grouping or filtering, it will probably benefit from an index. The
decision about whether to make it a bitmap index depends on data density. For
example, for the field Customer Priority (high/med/low), use a bitmap and for
Customer ID, dont. Also, if a new column is used as a key in an update statement,
make sure that the corresponding index has a type of ETL in DAC so that it is not
dropped as part of the ETL.
Partitioning is an interesting topic in the context of BI Apps. Typically, OOTB BI Apps
reports and dashboards perform well even with large data volumes, due to the
thousands of indexes that are provided with the OBAW. Therefore, why partition? I
can think of 2 main reasons:

We have extremely large data volumes in some of our fact tables and have
decided to partition by year and include a corresponding filter (i.e. current
year) in one of our logical table sources.

We have performed a category 3 customization or similar and again have a


logical table source filter that could be used as a partition key. For example,
our revenue fact contains rows sourced from both Siebel and another system,
and only the Siebel data should be visible in the OOTB subject areas.

In both of the above scenarios, we know that the partition key will be used in
queries generated by the BI server due to the RPD customisations that accompany
the partitioning. So what about scenarios that dont involve the corresponding RPD
changes? Should we consider partitioning to speed up a certain group of reports
that have certain filters applied? In my opinion, no, we should only consider it as a
viable option when we can guarantee that a) the partition key will be used in
queries and b) we expect little or no movement of rows across partitions. Even if
these criteria are met, we should only be looking at partitioning after we are
convinced that our indexing strategy is perfect. Why? Because badly chosen
partition keys can make performance worse! Even the addition of a partition that
speeds up some reports can have a negative impact on those that do not reference
the partition key due to the increased number of I/O operations involved in reading
indexes on multiple partitions.
One important point to note is that bitmap indexes must be defined as LOCAL on
partitioned tables. This means that we have to change how DAC creates its OOTB
bitmap indexes if we partition an OOTB table. This can be done using Index Actions
in DAC but should serve as another deterrent to unnecessary partitioning!
Customising the ETL

There are two main types of customisations that we make to the OOTB DAC and
Informatica metadata. Firstly, we may take a copy of an OOTB mapping and make
some minor changes. Typically, these will include new columns, sources and
lookups, depending on the type of mapping. Secondly, we can create some custom
mappings to extract from new sources or load into new targets. Before I give any
advice about making these changes, let me first make a point about ETL vs ELT.
In OBIA 7.x, the Informatica server is the ETL engine. Compare this with Oracle Data
Integrator where we typically use the Oracle Database as the ELT engine. With ODI,
a fact mapping can be used to generate a single SQL statement that takes data
from a staging table and loads into a target, populating the necessary foreign keys
and generating surrogate keys where necessary. With Informatica, this will instead
involve multiple SQL statements just to extract the staging data and data used in
lookups, followed by a large amount of processing on the Informatica server itself.
Finally, row-by-row insert / update statements will be issued to the target table.
Clearly, this is less efficient than the ELT option regardless of how much
optimisation we do on the Informatica server side.
The above weakness of the tool means that when we need a mapping to exclusively
perform updates, it is often tempting to add some custom SQL to the end of a
mapping / workflow rather than creating a whole new mapping that will a) take
longer to develop and b) probably run slower. The problem with this approach is
code readability. The same applies for the use of stored procedures within mappings
(which is often not great for performance either). So my advice is to minimise the
number of PLP mappings where possible (save them for populating aggregate
tables) and to stick to standard Informatica development methods where possible.
Changed Data Capture
Changed data capture (CDC) is one of the topics that seem to create problems for BI
Apps developers. Its not an intrinsically difficult concept to master but I have seen
it overlooked in both the design and testing phases of a couple of projects (mainly
when working with Siebel). Its worth pointing out that CDC is handled very
differently for different source systems due to the varying degrees of metadata
capture and audit trail functionality between systems.
CDC for EBS is pretty simple EBS tables have nice metadata columns such as
LAST_UPDATE_DATE that allow us to easily extract data that has changes since the
last ETL run (minus a few days). Siebel is slightly more complicated and involves
creating change capture tables and triggers in the OLTP database. We still have
nice metadata columns like in EBS, but we have to worry more often about auxiliary
change capture (see below) and tracking hard-deletes. When working with
Peoplesoft, complications exist because update timestamps do not exist on some
base tables.

Importantly, when customising the Oracle Business Analytics Warehouse and adding
new source tables to the ETL, we must consider the CDC requirements for each new
source. As an example, imagine we are loading opportunities from the S_OPTY table
in Siebel. By default, our SDE mapping will extract records from the associated
change capture view V_OPTY that has been left outer joined to a bunch of auxiliary
tables. Now imagine that we need to extract a new custom column called
X_IMPORTANCE from the auxiliary table S_OPTY_X. The steps to achieve this are
obvious add S_OPTY_X as a source in the SDE mapping and create a left outer join
between S_OPTY and S_OPTY_X, then map the new column from source to target.
However, what happens when the value of X_IMPORTANCE is updated in the OLTP?
Do we see the update in the OBAW? The answer is maybe, it depends if the core
opportunity record also got updated since the last ETL.
By default, the change in the auxiliary table will not lead to the corresponding
opportunity appearing in our change capture view. If we want to see the update in
OBIEE without relying on an update to the base opportunity record, we must create
or extend a separate mapping that exists purely to populate the change capture
view used in the SDE mapping. In this situation, there is a trade-off between
functionality (seeing all source system updates in OBIEE) and ETL performance (due
to the extra auxiliary change capture process) but I advise starting with the
assumption that auxiliary change capture is required. I have seen faulty auxiliary
change capture go unnoticed a couple of times so make sure that this functionality
is tested!
Thats it for BI Apps 7.9.6. Keep an eye out for more posts in 2014 where I will be
blogging about BI Apps 11g.

Вам также может понравиться