Вы находитесь на странице: 1из 41

Business Analytics with IDS

Fred Ho, IDS Development


Disclaimer

© Copyright IBM Corporation 2009. All rights reserved.


U.S. Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule
Contract with IBM Corp.

THE INFORMATION CONTAINED IN THIS PRESENTATION IS PROVIDED FOR INFORMATIONAL PURPOSES


ONLY. WHILE EFFORTS WERE MADE TO VERIFY THE COMPLETENESS AND ACCURACY OF THE
INFORMATION CONTAINED IN THIS PRESENTATION, IT IS PROVIDED “AS IS” WITHOUT WARRANTY OF
ANY KIND, EXPRESS OR IMPLIED. IN ADDITION, THIS INFORMATION IS BASED ON IBM’S CURRENT
PRODUCT PLANS AND STRATEGY, WHICH ARE SUBJECT TO CHANGE BY IBM WITHOUT NOTICE. IBM
SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USE OF, OR OTHERWISE
RELATED TO, THIS PRESENTATION OR ANY OTHER DOCUMENTATION. NOTHING CONTAINED IN THIS
PRESENTATION IS INTENDED TO, NOR SHALL HAVE THE EFFECT OF, CREATING ANY WARRANTIES OR
REPRESENTATIONS FROM IBM (OR ITS SUPPLIERS OR LICENSORS), OR ALTERING THE TERMS AND
CONDITIONS OF ANY AGREEMENT OR LICENSE GOVERNING THE USE OF IBM PRODUCTS AND/OR
SOFTWARE.

IBM, the IBM logo, ibm.com, Informix, solid, DataMirror, Optim, Cognos are trademarks or registered trademarks of
International Business Machines Corporation in the United States, other countries, or both. If these and other IBM
trademarked terms are marked on their first occurrence in this information with a trademark symbol (® or ™), these symbols
indicate U.S. registered or common law trademarks owned by IBM at the time this information was published. Such
trademarks may also be registered or common law trademarks in other countries.

Other company, product, or service names may be trademarks or service marks of others.
Contents

  Definition of BI/DW/BA
  Types of IDS BI Users

  OLTP vs. Data Warehousing


  Informix Warehouse

  IDS Storage Optimization


  Your Feedback and Requirements
Business Intelligence

•  A set of concepts and methodologies to improve decision making in


business through use of facts and fact-based systems
…..Howard Dresner, The Gartner Group
•  The processes, technologies, and tools needed to turn data into
information, information into knowledge, and knowledge into plans that
drive profitable business actions
… .David Loshin, Business Intelligence: The Savvy Manager’s Guide

The foundation that enables BI is the enterprise architecture


– business, data, and technology. A well-implemented data warehousing
program provides much of that foundation.
Data Warehousing
•  A data warehouse is a subject-oriented, integrated, non-volatile, time
variant collection of data organized to support management needs
….W H Inmon
•  The Data Warehouse is nothing more than the union of all the constituent
data marts
….Ralph Kimball, et al, The Data Warehouse Life Cycle Toolkit

The data warehousing process turns raw data into potentially valuable
information usable by people and systems. Warehousing enhances data
assets value by:
–  Applying standards and consistency to the data
–  Organizing the data into subject areas that cross business functional
lines
–  Integrating the data
–  Enforcing data consistency over time to provide meaningful history
–  Acting as a stable and reliable source
–  Providing easy access to data
Business Analytics
The process of using information to enhance knowledge and apply that
knowledge to help a business achieve its objectives. Analytic applications
provide tools to facilitate the business analytics process.
  Business Metrics and Business Management
  Business Process Management

  Business Performance Management

  Business Activity Monitoring


  Customer Relationship Management

  Supply Chain Management

  Performance Dashboards for Information Delivery

  Real-time (or near Real-time) Monitoring


  Scorecards for Information Delivery

  Monitoring history & trends

  Analytic Applications for Information Delivery

  Customer Analysis, Marketplace Analysis, Sales Channel Analysis, …


Range of Business Analytics
High

Prediction
Using Predictive
Analysis tools

Monitoring
Complexity

Using Dashboards
& Scorecards

Analysis Using OLAP &


Virtualization
tools

Reporting

Using Query,
Reporting and
search tools

Low High
Business Value
Source: TDWI
IDS in BI/Warehousing
•  Given the IDS Characteristics of Reliability, High Availability,
Performance, Ease of Use, why isn’t IDS in this space?
–  IDS has traditionally been viewed as an OLTP solution
•  However, there a lot more warehousing users on IDS than one
realizes!
–  Some customers have implemented IDS warehouses at
Terabyte levels
–  There are a lot of features already in IDS that make it suitable
for BI/Warehousing
–  BI tools have become very sophisticated over the years
•  We recognize the need to provide better warehousing capabilities
for IDS users
What’s Available? IDS Warehousing Features

•  Performance & Scalability


–  Inherent SMP Multi-threading
–  Parallel Data Query (PDQ)
–  Light Scan for fast table scans
–  Online Index build
–  Efficient Hash Joins
–  Auto Fragment Elimination
–  Memory Grant Manager (MGM)
–  High Performance Loader
–  Optimistic Concurrency
•  Easy of Management
–  Time cyclic data management using Range Partitioning
–  OPTCOMPIND optimization
BI Users Classification

1.  BI on Existing OLTP Schema (Operational BI)


2.  BI on Star Schema (Data Mart)
3.  BI in a Mix-Workload Environment
4.  Enterprise BI
Type 1: BI/Analytics on OLTP Schema

•  Majority of today’s IDS customers have the need to do BI/


Analytics on their existing IDS (OLTP) database.
•  They currently use a combination of 4GL programs, Excel,
and BI tools (Business Objects, Cognos, Crystal Reports)
•  Custom code and maintenance required by customer
•  Performance may be acceptable even on an OLTP schema
•  Allows for “operational BI”
OLTP vs. Data Warehousing Workload

•  Short Transactions •  Longer Transactions


–  Relatively simple SQL –  Complex SQL with analytics
•  Random Updates •  Sequential Updates
–  Few Rows accessed –  Many Rows Accessed
•  Sub-second response time •  Secs to Mins response time
•  ER Modeling •  Dimensional Modeling
–  Minimizes redundancy –  OK to have redundancy
•  Normalized data (5NF) •  De-normalized data (3NF)
–  Minimizes duplicates –  Duplicates are OK
•  Few indexes •  OK to have more indexes
–  Avoids index maintenance –  Mostly read only
•  Pre-compiled queries •  Ad-hoc queries
–  Repeated execution of queries –  Unpredictable load
Type 2. BI/Analytics on IDS on Star Schema

•  Transform OLTP database into Star


Schema database
•  Better performance for data
warehousing and dimensional
queries
•  Star Schema database may be on a
separate machine/domain
•  Suitable for customers building
separate data mart
•  Use IDS as is against Star Schema
What’s Available? BI Tools

The Performance Management Framework


Cognos identifies best-practice decision areas, or
information sweet spots by business function:

Cognos 8 provides a comprehensive set of BI tools for:


  Reporting

  Analysis
  Dashboards

  Scorecards

Performance Management Framework for:


  Solutions for different areas of the organization
Cognos Business Intelligence and Performance Management
One Platform, One Architecture

Industry and
Functional Solutions

Complete Coverage
of all capabilities

Enterprise-Class
SOA Platform
Data Warehouse Architecture
SQL Warehousing Tool Overview

–  Warehousing Process
–  Design Studio
–  Admin Console
–  Summary
SQL Warehousing Tools Overview
•  SQW Solution
•  Typical process –  Data Modeling
–  Identify requirements •  Physical Data Model (Reverse
•  Data Architect engineering, New from scratch,
generate DDL), compare & sync
–  Data Flows
–  Define data transformation (ETL/ELT)
process •  Visual Design
•  SQL/ETL developer •  Optimized SQL code generation
–  Development of sql/shell scripts •  Control flow supports programming
logic
•  SQL/ETL developer
–  Deployment in production system –  Admin Console
•  Schedule, Monitor, Parameterized
•  Application Architect, DBA values
–  Eclipse free reporting tool
–  Reporting •  e.g. BIRT
•  Business user –  Reusable flows
–  Refine requirements •  Easy refinement
•  Challenges •  Copy & paste, refactor
–  Dynamic requirements •  Values
•  Constantly refinement –  Easy to design & reuse
–  Multiple roles, tools •  Increased productivity
•  Each have different –  Integrated tools
perspective •  Seamless integration inside
•  Communication cost/ Eclipse
information loss –  Auto generated code from visualized
–  Unreadable, hard-to-debug scripts flows
•  Poor productivity •  Optimized SQL code
–  Impact analysis for any data model
change
SQW Architecture
Design Center
(Eclipse) IDS
SQW
DESIGN Data Flows + Control Execution
SQL Server
Flows DB

Databases
Oracle

DB2
Design

IDS
Studio

Deployment Warehouse Data Source

preparation DB
DEPLOY
User scripts
Deployment Code Units
package Build Profile
IDS

SQW
Admin Console

Deploy
y Control DB
Deplo
HTTP service (WAS ) tion
Execu
RUNTIME SQW Runtime

Applications
Other Servers
(DataStage)
SQW: Design Studio
•  Design Studio
–  Eclipse based IDE
•  Integrated tools, shell sharing
–  Team development
•  CVS, clearcase for checkin/checkout
projects, flows
•  Data Warehousing Project
–  Data Models
–  Data Flows
–  Control Flows
–  Warehouse Applications (deployment
packages)
–  Subflow & Subprocess (reusable flow
module)
–  Variables
•  Data Source Explorer
–  Database connections to multiple
vendors, e.g. Informix, DB2 LUW,
Oracle, SQL Server, MySQL, DB2 z/OS
•  DataStage Servers
–  Integration with IBM DataStage
SQW: Data Modeling

 Physical Data Model


 Visualized data modeling
 Impact analysis
 Reverse engineering or new from scratch
 Compare & sync
 Generate DDL
 Overview diagram

 Shell Sharing with Rational Data Architect & other Data


Studio products
SQW: Data Flows

File source
Table Table target
join

aggregation

Table source
Data Flow Operators:
-- source & target operators (table, file)
-- SQL Transformation operators
-- Warehousing operators
SQW: Data Flows

A simple flow

Generated SQL code


-- optimization across SQL statements.
-- optimized staging strategy
-- in-database transformation
SQW: Control Flows

Control flow
  Common utility operators
  Control logic, parallel execution, loop iteration
  Error handling
SQW Overview

Design Studio Admin Console


Eclipse Based Design Environment Production Environment in Websphere

manage
create

deploy
Application package (zip file)
Manage warehouse applications
 deployment profile
(database connections, machine resources,
  Schedule
variable definitions, DDL files etc..)   Monitor
  Generated code
Admin Console

 Flex RIA based Warehouse


Admin Console
 Admin Console manages
common resources (e.g.
databases connections, ftp
servers, datastage servers)
 Schedule & monitor warehouse
processes
XPS Customers Looking to Migrate to IDS

•  External Tables
–  XPS style loader for easy migration
•  Partitioning Strategies
–  Auto fragmentation
–  Fragment Advisor
–  Fragment stats Update
–  Truncate Fragments
•  Primary Storage Manager (PSM)
–  For simpler, easier management of backups
(replacing ISM)
•  Merge
–  UpSert capabilities

* Features to be included in the next release(s)


Using Mach11 for OLTP/Warehousing in IDS
Use Separate Boxes

OLTP Apps (ETL) Data SQW


OLTP Warehouse
Database Database

OR
Use MACH 11 Blade Server

Single
User transparency database
OLTP Apps view

MACH 11
Primary
“OLTP”
Node
Group
Connection Manager SDS

Shared
SDS Disk
“SQW”
Node
Group
SDS
SQW
IDS Storage Optimization

  Now Available as of 11.50xC4


  Deep Compression + Storage Optimization
Row Compression Concepts

•  Compression looks for repeating patterns across the entire table


–  When pattern found, string replaced by a 12 bit symbol
–  Symbols are stored in a dictionary for fast lookup

•  Data resides compressed on pages (both on-disk and in bufferpool)


–  Significant I/O bandwidth savings – better performance
–  Significant memory savings – more efficient memory utilization
–  Some CPU overhead costs
•  Rows must be uncompressed before being processed for
evaluation
Row Compression Using a Compression Dictionary

•  Dictionary contains repeated information from the rows in


the table
–  Compression candidates can be across column boundaries
or within columns

Dictionary

PartCode SPart Quantity LotNum BinLoc Aisle 01 NCPRPLT

ANCPRPLT 220J 200 Z165-3 NE132 6157 02 Z165-3NE1326157


… …
SNCPRPLT 580T 132 Z165-3 NE132 6157

ANCPRPLT 220J 200 Z165-3 NE132 6157 SNCPRPLT 580T 132 Z165-3 NE132 6157 …

220 200
A (01) 220J (02) S (01) 580 132
580T (02) …
A (01) 200 (02) S (01) 132 (02) …
J T

Animated
Slide
Storage savings
•  Tables will often compress in the range of 60% - 80%
•  Overall database storage savings will be between 40% and 50%
•  That’s 50% less disk space needed to support IDS 11 database!

78% Smaller
81% Smaller

Sales Table Product Table


Performance Benefit
•  Performance can be improved using compression
•  Many queries will benefit from compression with fewer I/Os
•  Consumes more CPU - most customers not 100% CPU bound

–  Lab tests show I/O bound


workloads improve by 30-40%
•  Many utility (backup and recovery
for example) will be faster

40% Faster
–  2x as fast in some cases as the
database may now be ½ the size
IDS 11 Compression Operations

•  estimate_compression
–  Estimates compression ratio on a table
•  create_dictionary
–  Creates compression dictionary for a table
•  compress
–  Does implicit create_dictionary and compress all previous data
•  uncompress
–  Uncompress the table and deactivates compression
•  uncompress_offline
–  XLOCK table and uncompress it. Also deactivates compression
•  purge_dictionary
–  Delete old inactive dictionaries
Storage Optimization Operations

•  repack
–  Move rows within a table or fragment to consolidate free space
•  repack_offline
–  XLOCK the table and move rows within a table or fragment to
consolidate free space
•  shrink
–  Return free space at end of table or fragment to the dbspace
–  Normally done after a repack
Compression On Data Page With Multiple Rows
Multiple
Dictionary Compressed
Pages

compress repack

shrink

Uncompressed Compressed Compressed

Animated Empty Data Pages


Slide
Admin API Interface

•  All compression and storage optimization operations are invoked


via the IDS Admin API built-in UDRs
–  execute function task(…);
–  execute function admin(…);
•  Example
execute function task
(
”table compress repack shrink”,
”table_name”, ”database_name”, ”owner_name”
);
Features That Cannot Be Compressed

•  Out-of-row data (e.g. blobs)


•  Indexes
•  Temp tables
•  Catalog tables (Data Dictionary)
•  Partition tables (Tablespace Tablespace)
•  Dictionary Partitions
•  Tables in the following databases:
–  Sysmaster
–  Sysutils
–  Sysuser
–  Syscdr
–  Syscdcv1
HDR, ER, CDC (DataMirror) and Compression
•  All are supported on compressed tables
•  HDR
–  Tables will be compressed on secondary iff they are
compressed on primary
•  ER
–  Compression status of tables is independent between source
and target, specified by user
•  CDC
–  Compression of targets is a function of what the target
database supports and what use specifies
Summary

•  Storage optimization through IDS 11 compression can save


40-50% of your database storage requirements
•  For IO-bound workloads Compression can also improve
performance
•  You not only see your online database shrink but often more
importantly, your backup storage and disaster recovery storage is
cut in half as well
•  In real customer examples storage savings are realized and
performance benefits are apparent
•  Add in the time savings with utilities processing (particularly
database backup and recover time is cut in half) and you can see
the benefits of IDS 11 compression

Вам также может понравиться