Вы находитесь на странице: 1из 45

<Insert Picture Here>

Oracle Data Integrator Solution Overview


Nguyen Tuan Khang, khang.nguyen@oracle.com
Senior Solutions Consultant
Fusion Middleware
Oracle Vietnam

Why Data Integration?


NEED
Information How and Where you Want It
Business Intelligence

Corporate Performance
Management

Business Process
Management

Business Activity
Monitoring

Data Integration
Migration

Data
Warehousing

Data
Synchronization

Master Data
Management

-----

Federation

Real Time
Messaging

-----

HAVE
Data in Disparate Sources
---------------

---

Legacy

ERP

---------------

CRM

---------------

Best-of-breed Applications
3

Sync

Async

Batch

3 Pillars of Data Integration

Enterprise Information Integration


The Traditional Approach

Source
Applications

Extract

Transform

Load

Target Data
Warehouse

ETL processes often use batch processing approaches

Example: Customer nightly batch runs can take > 24 hours!

Services that operate on data are not easily reusable in other contexts

ETL Services and Processes are insecure and hard to monitor (i.e.
no SLA)
5

Challenges
In Data Integration

CHALLENGE

1.

Increasing data volumes;


decreasing batch windows

2.

Non-integrated integration

3.

Complexity, manual effort of


conventional ETL design

4.

Lack of knowledge capture

Oracle Data Integrator


Based on Technology from

Data Movement and Transformation from


Multiple Sources to Heterogeneous Targets
BENEFIT

DIFFERENTIATOR

1
2
3
4

Best Performance

Heterogeneous E-LT

Productivity

Declarative Design

Real-time Integration

Declarative CDC

Hot-Pluggable

Knowledge Modules

Future Proof

The Chosen Integration


Technology of Oracle Fusion

Typical Considerations for ODI


High volume data synchronization
more than 20MB/min

Heterogeneous data sources


DB2/AS400, Oracle, Excel, File, SQL, BAM

Capture new data changes regardless of


data sources
CDC using Native Journal, LogMiner or Trigger

Real-time data synchronization


Easy to implement the solution without
changing your current IT infrastructure
No separate server required

Challenges & Emerging Solutions


In Data Integration

CHALLENGE

EMERGING SOLUTION

1.

Increasing data volumes;


decreasing batch windows

Shift from E-T-L to E-LT

2.

Non-integrated integration

Convergence of integration
solutions

3.

Complexity, manual effort of


conventional ETL design

Shift from custom coding to


declarative design

4.

Lack of knowledge capture

Shift to pattern-driven
development

11

E-LT Architecture
High Performance
Conventional ETL Architecture

Transform in Separate ETL Server


Proprietary Engine
Poor Performance
High Costs

Extract

Transform

Load

Eg. Informatica, IBM Datastage

Transform in Existing RDBMS


Leverage Resources
Efficient
High Performance

Next Generation Architecture

Benefits

Transform

 Optimal Performance & Scalability


 Easier to Manage & Lower Cost

E-LT
Transform
Extract

Load

Oracle Data Integrator


10

11

Technical Detail

Traditional E-T-L

Need one powerful server for Transform


Server and for its staging data tables
High total cost for maintenance
It is not flexible when we add more source
and target data sources
Require coding
Conventional ETL Architecture
Bad performance
(more I/O among staging
Transform
Server
Extract
tables and source/target)
Load

S1

Target 1
S2
ETL DB
S3
-------------

Repository
Staging tables

11

11

Next General Architecture: E-LT

Technical Detail

Leverage resources for transformation for high


performance, less I/O, and license
Design data flow by pre-defined templates,
open for all types of data sources (drag & drop)
Capture changes data for near real-time data
synchronization
E-LT Architecture
No coding required

Target 1

S1
Extract

Load
Transform

S2
Staging tables

S3
-------------

ODI Agent
ODI Designer

For scheduling and


real-time monitoring
changes only
No need at
production

12

22

Active Integration
Batch, Event-based, and Service-oriented Integration

Evolve from Batch to Near


Real-time Warehousing on
Common Platform
Unify the Silos of Data
Integration
Data Integrity on the Fly
Services Plug into Oracle
SOA Suite

Oracle Data Integrator


Event Conductor

Service Conductor

Event-oriented
Integration

Service-oriented
Integration

Metadata
Declarative Design

Data-oriented Integration
Data Conductor

Benefits
 Enables real-time data warehousing and operational data hubs
 Services plug into Oracle SOA Suite for comprehensive integration
13

33

Declarative Design
Developer Productivity

Specify ETL Data Flow Graph

Conventional ETL Design

Developer must define every step of


Complex ETL Flow Logic
Traditional approach requires specialized
ETL skills
And significant development and
maintenance efforts

Declarative Set-based Design


Simplifies the number of steps
Automatically generates the Data Flow
whatever the sources and target DB

ODI Declarative Design


1

Define

Benefits
 Significantly reduce the learning curve
 Shorter implementation times
 Streamline access to non-IT pros

Automatically
Generate
What
Dataflow
You Want

Define How: Built-in Templates

14

44

Pluggable Data Integration Architecture


Hot-Pluggable: Modular, Flexible, Extensible

Pluggable Architecture
Reverse
Engineer Metadata

Journalize
Read from CDC
Source

Load
From Sources to
Staging

Check
Constraints before
Load

Integrate
Transform and Move
to Targets

Service
Expose Data and
Transformation
Services

Reverse
WS

WS

WS

Staging Tables

Load
CDC
Sources

Journalize

Integrate
Check

Services

Target Tables

Benefits

 Tailor to existing best practices


 Ease administration work
 Depend on the specific data source, we will select right pre-defined coding
module (Knowledge Module) -> Hot-Pluggable
 Support all types of data sources (DB2/AS400, Oracle, Excel, File)

 Reduce cost of ownership


15

44

Knowledge Modules
Hot-Pluggable: Modular, Flexible, Extensible

Pluggable Knowledge Modules Architecture


Reverse
Engineer Metadata

Journalize
Read from CDC
Source

Load
From Sources to
Staging

Check
Constraints before
Load

Integrate
Transform and Move
to Targets

Service
Expose Data and
Transformation
Services

Reverse
WS

WS

WS

Staging Tables

Load

Integrate

CDC
Sources

Target Tables

Check

Journalize

Services

Error Tables

Sample out-of-the-box Knowledge Modules


SAP/R3
Siebel

Log Miner

SQL Server
Triggers

DB2 Journals

Oracle
DBLink

DB2 Exp/Imp

JMS Queues

Oracle
SQL*Loader

Check MS
Excel

Check
Sybase

TPump/
Multiload

Type II SCD

Oracle Merge
Siebel EIM
Schema

Oracle Web
Services

DB2 Web
Services

Benefits

 Tailor to existing best practices


 Ease administration work
 Reduce cost of ownership
16

44

KMs: Truly Heterogeneous

Generic SQL DB
Oracle DB 9i
Oracle DB 10g
Oracle DB 10g XE
IBM DB2/400
IBM DB2/UDB
IBM Informix SE
IBM LDAP Server
MS SQL Server 2000
MS SQL Server 2005
MS SQL Server 2005 SE
MS Office Access 2000
MS Office Excel 2000
MS Active Directory
Sybase ASA 8.x & 9.x
Sybase IQ 12.x
Sonic MQ v7.0
Teradata V2R5.x
Teradata V2R6.x

Netezza Performance Server 2.2.1


Hyperion Essbase
PostgresSQL 8.1
MySQL 4.0
MySQL 5.0
Oracle BI Suite 10g
Oracle BAM 10g
Oracle Internet Directory 9i
OpenLDAP 2.3
Out-of-Box
Siebel CRM 7.8
JD Edwards
Knowledge
PeopleSoft
Modules
SAP R/3
Oracle EBusiness Suite
Oracle AQ 10g
Oracle SOA Suite
Oracle ESB 10g
SalesForce.com App Exchange
Any JMS Standard Implementation
17

Popular Usage Scenarios

18

E-LT for Data Warehouse


Create Data Warehouse for Business Intelligence
Populate Warehouse with High Performance ODI

Load
Transform
Capture Changes

Incremental Update
Data Integrity

Aggregate
Export

Cube

Data Warehouse

Cube

Cube

Metadata

Data Transformation

Analytics

Operational

-------------

 Heterogeneous sources
and targets
 Incremental load
 Slowly changing
dimensions
 Data integrity and
consistency
 Changed data capture
 Data lineage

Data Warehousing

19

ODI for Master Data Management


Common Data Quality, and Middleware Services

Solutions & Applications


Master Data Management
Telco
Telco

Energy
Energy Banking
Banking

Retail
Retail

Customer
Customer Supplier
Supplier Employee
Employee Product
Product

Industry
.
. Solutions

Mfr
Mfr

Asset
Asset

MDM
.
. Applications

Fusion Middleware Foundation


Oracle Data Integrator
E-LT Agent

Other
Sources

SAP/R3

Golden
Master
Records

E-LT
Metadata

PeopleSoft

Oracle
EBS

 Vertical Driven
 Data Object Centric
 Application Focus

Middleware Foundation





Process Orchestration
Business Intelligence
Registry & Policies
Data Integration & Quality

Oracle Data Integrator


 Batch & Real-time Integration
 Data Quality & Profiling
 Transformation & Data Routing

Siebel
CRM

20

ODI Enhances Oracle BI


Populate Warehouse with High Performance ODI
Oracle BI Suite EE
Answers

Interactive
Dashboards

Publisher

Delivers

Oracle Business Intelligence


Suite EE:

Oracle BI Presentation Server


Oracle BI Server

 Simplified Business Model View


 Advanced Calculation & Integration
Engine
 Intelligent Request Generation
 Optimized Data Access

Oracle BI
Enterprise Data
Warehouse

Bulk E-LT
Oracle Data Integrator
E-LT Agent

Other
Sources

SAP/R3

Oracle Data Integrator:

E-LT
Metadata

PeopleSoft

Oracle
EBS

Siebel
CRM

 Populate Enterprise Data Warehouse


 Optimized Performance for Load and
Transform
 Extensible Pre-packaged E-LT
Content

21

ODI Enhances Oracle SOA Suite


Add Bulk Data Transformation to BPEL Process
Oracle SOA Suite:

Oracle SOA Suite


Business Activity
Monitoring

BPEL Process Manager

Web Services
Manager
Declarative Rules
Engine

 BPEL Process Manager for


Business Process
Orchestration

Enterprise Service
Bus

Oracle Data Integrator


E-LT Agent

E-LT
Metadata

Oracle Data Integrator:


 Efficient Bulk Data Processing
as Part of Business Process
 Interact via Data Services and
Transformation Services

Bulk Data
Processing

22

ODI with BAM


Populate BAM with ETL Data Efficiently

Oracle SOA Suite

Oracle SOA Suite


Business Activity Monitoring
Event Monitoring Web Applications
BPEL Process
Manager
Web Services
Manager
Business Rules
Engine
Event Engine

Enterprise Service
Bus

Report Cache

Active Data Cache

Oracle Data Integrator


Bulk and
Real-Time
Data Processing

Agent

CDC

Data
Warehouse

PeopleSoft
SAP/R3

Message
Queues

Metadata

 Business Activity Monitoring


for Real-time Business Insight
 Message-based, eventdriven, memory-resident
architecture

Oracle Data Integrator


 High Performance Loading of
BAMs Active Data Cache
 Pre-built and Integrated via
Knowledge Modules
 BAM Java APIs Exposed
through Interface Like Any
Other Target

Sample Combined Use


Cases
 Monitor Together Events and
the Aggregate Implications of
Events
23

Integration with SOA/BI/Fusion


Resolve All Integration Challenges
Oracle BPA and
Human Workflow

Oracle BI

Invoke

Invoke
Dashboards, Reporting,
Analysis, Publishing

Invoke

BPEL Process
Manager

Oracle Data Integrator


Invoke

Invoke

Transformation
Data Services
Services
E-LT Agent
Knowledge
Modules

Oracle BAM
Invoke

Metadata
Repository

Active
Data Cache

WSDL

Generate Data
Services

Service as
Data Source

High speed
Batch ELT

High speed
JMS ELT

CDC based
ELT

Oracle JMS

XML

Oracle BI
Enterprise Data
Warehouse

CDC

24

Performance

25

ODI vs. ESB

Recommended
Considered
Can use

26

Performance Report

Source and Target: 2 dual


core CPU, 12GB RAM

27

ODI with ESB


Data
Latency
Batch
(over 2 hours)

Oracle Data Integrator


Asynchronous

Oracle Enterprise Service Bus


Synchronous
(immediate)

e
lif ios
l
a ar
Re cen
S
Message by
Message

Mini Batches

Large Volume
(over 1M)

Data Volume
Processing

28

Understanding Performance Choices


When you need to transform data at large size

Depends on whether an
intermediary XML format
is useful for other
processing (use ESB),
or if joining File data to
tabular RDB data is
required (use ODI)

u
(so

rce

u
(so

)
rce

DB

XML

ESB

ESB

ESB

File

ESB

ESB

depends

DB

ESB

depends

ODI

XML

File

DB

XML

depends

depends

ODI

File

depends

ODI

ODI

DB

ODI

ODI

ODI

XML

File

DB

XML

depends

ODI

ODI

File

ODI

ODI

ODI

DB

ODI

ODI

ODI

Greater than 50MB


If the source and target
are both XML, and there
is no cross-referencing
of data among rows,
then a streaming-type or
parallel-engine-type
approach might scale

u
(so

)
rce

t)

File

Between 10-50MB
Depends on ho much
cross-referencing
among the data values
and rows is required
during transformation
the more there is, the
faster ODI will perform
relative to ESB

e
rg
(ta

XML

Less than 10MB

e
rg
(ta

e
rg
(ta

t)

t)

*caveat always benchmark if you are unsure and require best possible results

29

Topology 1 Oracle to Oracle


Vietnamese Customer PoC
Hardware: Quad Core/4 GB RAM

Oracle 10.2+/Linux
ODI Designer

Data Synchronization

Oracle 10.2+/Win
Hardware: Dual Core/2 GB RAM

Repositories
Agent

Performance Results
100k rows, 15 fields
Load: LKM DBLink 3s
Real-time synchronization (JKM DBLink)
Update 65k: 13s
Delete 30k: 8s

1.2m rows, 8 fields (about 120 bytes/row)


Load: LKM DBLink 24s, JDBC 4.5 minutes
Real-time synchronization (JKM DBLink)
Update 5000 rows, 8s
Delete 5000 rows, 8s

Real-time Synchronization with CDC


CPU Usage

Without CDC: CPU 10%, 1s-1.5s


Enable CDC (LogMiner) and Use AgentScheduler
CPU 2%, 1s-1.5s
Scenario with 1.2m rows
Update 3900 rows, CPU 23%, 2s
Delete 3900 rows, CPU 21%, 2s

Summary

35

Oracle Data Integrator


Data Movement and Transformation from
Multiple Sources to Heterogeneous Targets
BENEFIT

1 Best Performance
2 Productivity

DIFFERENTIATOR
Heterogeneous E-LT
Declarative Design

Real-time
3 Integration

Declarative CDC

4 Hot-Pluggable

Knowledge Modules

5 Future Proof

The Chosen Integration


Technology of Oracle Fusion
36

Reference Customers

37

Customer: Overstock.com
Solution: High-Volume Real-Time Data Transformation
Technology: Oracle Data Integrator, Oracle 9i & 10g RAC,
Dell Linux, IBM AIX, Teradata 8-node 54000
Oracle Data Integrator Solution:

Oracle Data Integrator is helping us


turn our data into gold
Data Integrator allows us to perform data
transformations using the power of our Teradata
Enterprise Warehousing platform. [] With Oracle,
over 300 users are now able to have access to their
relevant data in real-time, hourly, daily, or weekly
depending upon their needs.

Having access to key business metrics in real-time is no


longer a fantasy.
Found a way to ensure that Teradata data warehouse

was constantly updated.


Even highly complex transformations are
automated within the
Supporting several terabytes of data stored in the
enterprise warehouse, and millions of daily transactions

In short, Oracle Data Integrator give us the ability to make


better decisions and better manage our bottom line.
Solution Architecture:

Business Problem:

Wanted to enable sales, finance, marketing and

merchandising teams to have access to near


real-time data so that they could make timely,
more intelligent business decisions.
Wanted to know at any point in time if company
performance is meeting the target metrics.
Needed a data integration product that could
handle our high-volume loading and
transformation requirements in near real time.
Company: Overstock.com

Data Sources, Targets, and Platforms


Oracle 9i RAC & 10g RAC

Teradata 8-node 54000

GoldenGate TDM
Transactional Management

Platforms:
IBM AIX, Dell Linux

Data Integration Architecture


Oracle Data Integrator: 100% Java architecture, high-performance ELT transformations, business-rules driven transformation design tool,
automatic load script generation
>1.2M SKUs, > 5M daily transactions, >300 users, deployable for
both batch and real-time use cases, leverages power of Teradata
engine for improved speed of data transformation

Overstock.com, Inc. (NASDAQ: OSTK) operates as an online retailer offering bed-and-bath


goods, furniture, watches, jewelry, electronics, sporting goods, and designer accessories.

Product: Oracle Data Integrator


Contact: Miranda Nash
Email: miranda.nash@oracle.com

38

Customer: Sabre Holdings


Solution: High-Volume Real-Time Data Transformation
Technology: Oracle Data Integrator, Oracle DB, MQ
sources, Teradata Data Warehouse target
Oracle Data Integrator Solution:

We needed a data integration tool


that would reduce our
dependency on manual coding of
E-LT scripts and leverage the
power of our Teradata Warehouse
for data transformation.

E-LT architecture maximizes performance and


leverages existing investment in Teradata
infrastructure
Lower development and maintenance costs for
E-LT driven by declarative design tools
Bottom Line: Integrated travel industry data in
consolidated view enables Sabre to better serve
their customers and travel suppliers
Solution Architecture:

Business Problem:

High costs associated with Data Warehouse


loading from new sources
Large Teradata Data Warehouse requires top
performance for loading data in near-real time
Integrated views of data require complex
transformations, expensive to maintain

Company: Sabre Holdings

Data Sources, Targets, and Platforms


Oracle RDBMS

Teradata Data Warehouse

Flat Files

Various other sources over MQ

Data Integration Architecture


Oracle Data Integrator: 100% Java architecture, high-performance ELT transformations, business-rules driven transformation design tool,
automatic load script generation

For more than 40 years, Sabre Holdings (NYSE: TSG) has transformed the airline industry
through technological advancement, the Company offers a portfolio of travel marketing,
distribution and technology solutions.

Product: Oracle Data Integrator


Contact: Miranda Nash
Email: miranda.nash@oracle.com

39

Customer: DHL
Solution: High-Volume Real-Time Data Transformation
Technology: Oracle Data Integrator, Oracle RDBMSs,
Teradata Data Warehouse, Cobol Flat Files
Oracle Data Integrator Solution:

Solution completely meets our


needs. [] Oracle Data Integrator
was developed by ETL developers,
who really know and understand
ETL concerns and pains, and how
to do things better.

With Oracle Data Integrator, every batch that used


to last one hour now lasts seconds
Reducing window time is critical to adding more
functionality
Running mini-batches more often results in more
customer services and more revenue
Using the RDBMS as an engine for data
transformation simplifies the administrative workload

Solution Architecture:

Business Problem:

24/7 business cannot be compromised by long


ETL batches (via an ETL Tool)
Every daily load cannot last more than one hour
When the volume of data doubles, execution
time triples
Data Integration was the bottleneck in providing
more services

Company: DHL

Data Sources, Targets, and Platforms


Oracle RDBMS

Teradata Data Warehouse

Flat Files

Platforms:
Linux, Cobol

Data Integration Architecture


Oracle Data Integrator: 100% Java architecture, high-performance ELT transformations, business-rules driven transformation design tool,
automatic load script generation
2.5 terabytes loaded every 15 minutes from 8 major data sources
>50 events, >5 shipments and > piece/parcel records per day

For more than 35 years, DHL has built the world's premier global delivery network by
trailblazing express shipping in one country after another. Over 220 countries and territories
later, DHL is the global market leader of the international express and logistics industry.

Product: Oracle Data Integrator


Contact: Miranda Nash
Email: miranda.nash@oracle.com

40

Customer: iBasis
Solution: High-Volume Real-Time Data Transformation
Technology: Oracle Data Integrator, Oracle 10g, Netezza
PowerCenter NPS8350 Warehouse Appliance
Oracle Data Integrator Solution:

The first thing that struck us


was the speed with which we
ramped up our ETL
developments with Oracle
Data Integrator.

"Given the massive volumes of data we need to


process every day, getting timely data in the data
warehouse requires high performance loading
processes. Using Oracle Data Integrators set of
Knowledge Modules for Netezza, we are able to
take advantage of the massively parallel processing
capabilities of Netezza and to reduce load times
significantly. [] as our goal is to go more and
more toward real-time, it will be easy for us to
change the latency of these flows without having
to redevelop them."
Solution Architecture:

Business Problem:

Data warehouse had become obsolete and could


not respond to the growing requirements of
management, sales, and operational centers
Needed more accurate and timely data
Replaced entire Data Warehouse infrastructure
Needed a data integration that would provide the
scalability and performance they needed to
aggregate, transform, and load their data

Company: iBasis

Data Sources, Targets, and Platforms


Oracle RDBMS

Netezza PowerCenter NPS8350

Flat Files

Applications (future):
Call Billing, Network Monitoring

Data Integration Architecture


Oracle Data Integrator: 100% Java architecture, high-performance ELT transformations, business-rules driven transformation design tool,
automatic load script generation
4.5TB data warehouse, > 8 billion records, company processes >150
million transactions per day

Founded in 1996, iBasis (NASDAQ: IBAS) is one of the largest carriers of international voice
traffic in the world and a leading provider of prepaid calling services.

Product: Oracle Data Integrator


Contact: Miranda Nash
Email: miranda.nash@oracle.com

41

Analysts Coverage

42

Gartner
Sunopsis (Oracle) has made strides in building
market awareness beyond its base in Europe.
Sunopsis has a range of capabilities, spanning ETL
and real-time messaging, and an architecture that
enables distribution of transformation workload
across data sources and targets.
Ted Friedman, Bill Gassman,
Magic Quadrant for Extraction, Transformation and Loading, 1H05,
May 11, 2005

43

Bloor Research
While there are many relatively young
vendors within the ETL market, Sunopsis has
undoubtedly made the biggest impression,
both in terms of the users that it has gained
and in the way that its approach has
influenced the market.
Philip Howard,
Bullseye Report - Extract, Transform & Load,
March 28, 2006

44

Gartner
By purchasing Sunopsis, Oracle has acquired a server-independent and
platform-independent data integration tool, which will be renamed Oracle Data
Integrator (ODI). OFM and Oracle Applications customers will welcome the
addition of the ODI's database independence. In particular, the acquisition could
provide needed new momentum for Fusion Middleware. Fusion Middleware
customers have heterogeneous IT environments, as do former PeopleSoft,
Siebel Systems and JD Edwards customers, who have an ongoing requirement
for integration with non- Oracle systems. The acquisition will provide OFM with
a data integration tool that is capable of deploying small-grained data services
within a service-oriented architecture (SOA) environment. This capability could
have a positive influence on Fusion Middleware - if Oracle leverages the
Sunopsis philosophy.
Mark A. Beyer, Ted Friedman
Sunopsis Data Integration May Fuel Oracle Fusion Middleware
October 23, 2006

45

Forrester Research
Oracle has recognized that its customers require
diverse data integration features without having to
integrate and manage products from many vendors.
Integrating Sunopsis heterogeneous extract, load,
transform (ELT) and event-driven CDC capabilities
within its middleware offerings is a great start.
Rob Karel
Oracle Makes Serious Move In Data Heterogeneity by Acquiring
Sunopsis
October 29, 2006

46

Вам также может понравиться