Вы находитесь на странице: 1из 44

Experiences with Real-Time Data

Warehousing Using Oracle


Database
10G
Mike Schmitz
High Performance Data Warehousing
mike.schmitz@databaseperformance.com
Michael Brey
Principal Member Technical Staff
ST/NEDC Oracle Engineering
Oracle Corporation

Agenda

The meaning of Real-Time in Data Warehousing


Customer Business Scenario

Our Real-Time Solution

Real-Time data architecture


Incremental Operational Source Change Capture
Transformation and Population into DW Target

Simplified Functional Demonstration


Asynchronous Change Data Capture (Oracle)

Customer Environment
Real-Time Requirement

Performance Characteristics and Considerations

Mike Schmitz

My Background
An independent data warehousing consultant specializing in the
dimensional approach to data warehouse / data mart design and
implementation with in-depth experience utilizing efficient, scalable
techniques whether dealing with large-scale data warehouses or smallscale, platform constrained data mart implementations. I deliver
dimensional design and implementation as well as ETL workshops in the
U.S. and Europe.
I have helped implement data warehouses using Redbrick, Oracle,
Teradata, DB2, Informix, and SQL Server on mainframe, UNIX, and NT
platforms, working with small and large businesses across a variety of
industries including such customers as Hewlett Packard, American
Express, General Mills, AT&T, Bell South, MCI, Oracle Slovakia, J.D.
Power and Associates, Mobil Oil, The Health Alliance of Greater Cincinnati,
and the French Railroad SNCF.

Mike Schmitz

Real-Time in Data Warehousing

Data Warehousing Systems are complex


environments

Almost never pure Real-Time

Some latency is a given

What do you need?

Business rules
Various data process flows and dependencies

Real Time
Near Real-Time
Just in Time for the business

Mike Schmitz

Customer Business Scenario

Client provides software solutions for utility companies


Utility companies have plants generating energy supply

Peak demand periods are somewhat predictable


Each day is pre-planned on historical behavior

Cheaper to buy energy ahead


Expensive to have unused capacity

Existing data warehouse supports the planning function

Recommended maximum output capacity


Reserve Capacity
Buy supplemental energy as needed

Reduced option expenses


Cut down of supplemental energy costs

Mike Schmitz

Customer Real-Time Requirement

Getting more in-time accuracy enhances


operational business

Customer Target

Compare today's plant output volumes to yesterdays


or last weeks average
Know when to purchase additional options or supplies
Actual data within a 5 minute lag
Use a single query
Use a single tool
Mike Schmitz

Sample Analysis Graph

Mike Schmitz

Our Real-Time Solution


Overview

Three-Step Approach:
1.

2.

3.

Implement a real-time DW data


architecture
Near real-time incremental change
capture from operational system
Transformation and Propagation
(population) of change data to DW
Mike Schmitz

Our Real-Time Solution


Real-Time DW Data Architecture

Add a Real-Time Partition to our Plant


Output Fact Table for current day activity

Separate physical table


No indexes or RFI constraints (data coming in
will have RFI enforced) during daily activity
UNION ALL viewed to the Plant Output Fact
Table

Mike Schmitz

Our Real-Time Solution


Change Capture and Population
Incremental change capture from operational site

1.

Synchronous or Asynchronous

Transformation and Propagation (population) of


change data to the DW

2.

Continuous trickle feed or periodic batch


Synch CDC

Trigger

Staging
Operations
Asynch CDC
Batch
10

DW

Mike Schmitz

Our Real-Time Solution


Incremental Change Capture

Done with Oracles Change Data Capture


(CDC) functionality

Asynchronous CDC is the preferred


mechanism

11

Synchronous CDC available with Oracle9i


Asynchronous CDC with Oracle10g

Decoupling of change capture from the


operational transaction
Mike Schmitz

Asynchronous CDC
Redo
log
files

Based on
Log Miner

Logical
Change Data

Oracle10g

DW
Tables

Transform
SQL, PL/SQL,
Java

OLTP
DB

SQL interface to change data


Publish/subscribe paradigm
Parallel access to log files, leveraging
Oracle Streams
Parallel transformation of data

Mike Schmitz

Our Real-Time Solution


Population of Change Data into DW

Continuous

Periodic Batch

13

Change table owner creates trigger to populate


warehouse real-time partition
Utilize the Subscribe Interface
Subscribe to specific table and column changes
through view
Sets a window and extracts the changes at required
period
Purges view and moves window
Mike Schmitz

Our Real-Time Solution


The Daily Process

Integrate daily changes into historical fact


table

At the end of the day


index the current day table and apply constraints
(no validate)
Create new fact table partition
Exchange current day table with new partition
Create next days Real-Time Partition table

14

Mike Schmitz

Simplified Functional Demo


Schema Owners

AO_CDC_OP

AO_CDC

Owns the CDC change sets and change tables (needs


special cdc privileges)
? CDC Publish Role

AO_CDC_DW

15

Owns the operational schema

Owns the data warehouse schema (also needs


special cdc privileges)
? CDC Subscribe Role

Mike Schmitz

Simplified Functional Demo


Operational Schema

Simplified Functional Demo


Data Warehouse Schema
D_OUTPUT_DAY
OUTPUT_DAY_KEY: NUMBER
F_PLANT_OUTPUT

D_OUTPUT_MINUTE
OUTPUT_MINUTE_KEY: NUMBER

OUTPUT_DAY_KEY: NUMBER
OUTPUT_MINUTE_KEY: NUMBER
GENERATING_PLANT_KEY: NUMBER(4)
OUTPUT_ACTUAL_QTY_IN_KWH: NUMBER(15)

F_CURRENT_DAY_PLANT_OUTPUT

D_GENERATING_PLANT
GENERATING_PLANT_KEY: NUMBER(4)
PLANT_ID: VARCHAR2(24)
PLANT_NAME: VARCHAR2(32)
PLANT_STATUS: VARCHAR2(15)
PLANT_TARGET_MAX_CAPACITY_KWH: NUMBER(15)
PLANT_ABSOL_MAX_CAPACITY_KWH: NUMBER(15)
UPDATE_TS: TIMESTAMP(6)

OUTPUT_DAY_KEY: NUMBER(7)
OUTPUT_MINUTE_KEY: NUMBER(4)
GENERATING_PLANT_KEY: NUMBER(4)
OUTPUT_ACTUAL_QTY_IN_KWH: NUMBER(15)

What do we have?

Operational transaction table

DW historical partitioned fact table

AO_CDC_DW.F_CURRENT_DAY_PLANT_OUTPUT

Data Warehouse UNION ALL view

18

AO_CDC_DW.F_PLANT_OUTPUT

DW current day table (Real-Time Partition)

AO_CDC_OP.PLANT_OUTPUT

AO_CDC_DW.V_PLANT_OUTPUT
Mike Schmitz

First

The CDC user publishes

19

Create a Change Set (CDC_DW)


Add supplemental logging for the operational
table
Create a change table for the operational
table (CT_PLANT_OUTPUT)
Force database logging on the tablespace to
catch any bulk insert /*+ APPEND */ (nonlogged) activity
Mike Schmitz

Next Transform and Populate

One of two ways

Continuous Feed

Logged Insert activity


Permits nearer real-time
Constant system load

Periodic Batch Feed

Permits non-logged bulk operations


You set the lag time how often do you run the batch
process?

20

Hourly
Every five minutes

Less system load overall

Mike Schmitz

The Continuous Feed

21

Put an insert trigger on the change


table which joins to the dimension
tables picking up the dimension keys
and does any necessary
transformations

Mike Schmitz

The Batch Feed

The CDC schema owner

Authorizes AO_CDC_DW to select from the change table (the


select will be accomplished via a generated view)

The DW schema owner

Subscribes to the change table and the columns he needs (with


a centralized EDW approach this would usually be the whole
change table) with a subscription and view name
Activates the subscription
Extract

22

Extend the window


Extracts changed data via the view (same code as trigger)
Purges the window (logical Delete physical deletion is handled by
the CDC schema owner)

Mike Schmitz

Extraction from Change Table


View
insert /*+ APPEND*/ into ao_cdc_dw.F_CURRENT_DAY_PLANT_OUTPUT
(generating_plant_key, output_day_key, output_minute_key,
output_actual_qty_in_kwh)
select p.generating_plant_key ,d.output_day_key ,m.output_minute_key
,new.output_in_kwh
from ao_cdc_dw.PO_ACTIVITY_VIEW new
inner join ao_cdc_dw.d_generating_plant p
on new.plant_id = p.plant_id
inner join ao_cdc_dw.d_output_day d
on trunc(new.output_ts) = d.output_day
inner join ao_cdc_dw.d_output_minute m
on to_number(substr(to_char(new.output_ts,'YYYYMMDD
HH:II:SS'),10,2)||substr(to_char(new.output_ts,'YYYYMMDD HH:II:SS'),13,2)) =
m.output_time_24hr_nbr;

23

Mike Schmitz

Next Step

Add the current days activity (the contents


of the current day fact table) to the
historical fact table as a new partition

24

Index and apply constraints to the current day


fact table
Add a new empty partition to the fact table
Exchange the current day fact table with the
partition
Create the new current day fact table
Mike Schmitz

Lets step thru this live

25

Mike Schmitz

Summary

26

We created a real-time partition for current day activity


We put CDC on the operational table and created a change
table populated by an asynchronous process (reads redo
log)
We demonstrated continuous feed to the DW by using a
trigger based approach
We demonstrated a batch DW feed by using the CDC
subscribe process
We showed how to add the current day table to the fact
table and set up the next days table
An electronic copy of the SQL used to build this prototype is
available by emailing
mike.schmitz@databaseperormance.com

Mike Schmitz

Michael Brey
Principal Member Technical Staff
ST/NEDC Oracle Engineering
Oracle Corporation

Overview

Benchmark Description
System Description
Database Parameters
Performance Data

The Benchmark

Customer OLTP benchmark run internally at Oracle


Insurance application handling customer inquires and
quotes over the phone
N users perform M quotes
Quote = actual work performed during a call with a
customer
Mixture of Inserts, Updates, Deletes, Singleton Selects,
Cursor Fetches, Rollbacks/commits, savepoints
Compute average time for all quotes across users

System Info

SunFire 4800
A standard Shared Memory Processor (SMP)
8 900-Mhz CPUs
16 GB physical memory
Solaris 5.8
Database storage: striped across 8 Sun
StorEdge T3 arrays (9X36.4MB each)

Database Parameters

Parallel_max_servers 20
Streams_pool_size 400M (default 10% shared
pool)
Shared_pool_size 600M
Buffer cache 128M
Redo buffers 4M
Processes 600

Change Data Capture (CDC)


Sync

Async
HotLog

Async
AutoLog

Available

Oracle 9i

Oracle 10g

Oracle 10g

source
system cost

System
resources

System
resources

Minimal

Part of txn

YES

NO

NO

Changes
seen

Real time

Near real
time

Variable

Systems

Tests

Conducted tests with Asynchronous Hotlog CDC


enabled and disabled and with Sync CDC.
Asynchronous Hotlog CDC tests conducted at
different log usage levels

Appr. 10, 50, and 100% of all OLTP tables with DML
operations were included in CDC

Tests run with:

250 concurrent users


Continuous peak workload after ramp-up
175 transactions per second

Impact on Transaction Time

CPU Consumption
Supplemental Logging
no CDC

USR + SYS Time

no CDC w/ suppl

4
3
2
1

Time (s)

985

915

845

775

705

635

565

495

425

355

285

215

145

75

0
5

Usage (#CPUS)

CPU Consumption
10% DML Change tracking
no CDC w/suppl
CDC 10%

USR + SYS Time

5
4
3
2
1

Time (s)

980

905

830

755

680

605

530

455

380

305

230

155

80

0
5

Usage (#CPUS)

CPU Consumption
50% DML Change tracking
no CDC w/suppl
CDC 50%

USR + SYS Time


5
4
3
2
1

Time (s)

985

915

845

775

705

635

565

495

425

355

285

215

145

75

0
5

Usage (#CPUS)

CPU Consumption
10%,100% DML Change tracking

Time (s)

985

845

775

705

635

565

495

425

355

285

215

145

75

8
7
6
5
4
3
2
1
0
5

Usage (#CPUS)

USR + SYS Time

915

no CDC w/suppl
CDC 10%
CDC 100%

Latency of Change Tracking

Latency is defined as the time between the actual change


and its reflection in the Change Capture Table

Latency measurement were made for the 100%


Asynchronous Hotlog CDC run
99.7% of records arrived in less than 2 secs

Latency = time[change record insert] time[redo log insert]

53.5% of records arrived in less than 1 sec

Remaining records arrived in less than 3 sec


Asynchronous CDC kept up with the constant high OLTP
workload all the time

Summary

Change Data Capture enables enterprise-ready


near real-time capturing of change data

No fallback for constant high-load OLTP


environments
Minimal impact on origin OLTP transactions
Predictable additional resource requirements,
solely driven by the amount of change tracking

Oracle provides the flexibility to meet your ontime business needs

Q&
A

Next Steps.
Data Warehousing DB Sessions
Monday

Tuesday

11:00 AM
#40153, Room 304

8:30 AM
#40125, Room 130

Oracle Warehouse Builder:


New Oracle Database 10g Release

Oracle Database 10g:


A Spatial VLDB Case Study

3:30 PM
#40176, Room 303

3:30 PM
#40177, Room 303

Security and the Data Warehouse

Building a Terabyte Data Warehouse,


Using Linux and RAC

4:00 PM
#40166, Room 130

5:00 PM
#40043, Room 104

Oracle Database 10g


SQL Model Clause

Data Pump in Oracle Database 10g:


Foundation for Ultrahigh-Speed Data
Movement

For More Info On Oracle BI/DW Go To http://otn.oracle.com/products/bi/db/dbbi.html

Next Steps.
Data Warehousing DB Sessions
Thursday
8:30 AM
#40179, Room 304

Business Intelligence and Data


Warehousing Demos All Four Days
In The Oracle Demo Campground

Oracle Database 10g Data


Warehouse Backup and Recovery

Oracle Database 10g

11:00 AM
#36782, Room 304

Oracle OLAP

Experiences with Real-Time Data


Warehousing using Oracle 10g

Oracle Data Mining

1:00PM
#40150, Room 102

Turbocharge your Database, Using


the Oracle Database 10g
SQLAccess Advisor

Oracle Warehouse Builder


Oracle Application Server 10

For More Info On Oracle BI/DW Go To http://otn.oracle.com/products/bi/db/dbbi.html

Reminder
please complete the OracleWorld
online session survey
Thank you.

Вам также может понравиться