Вы находитесь на странице: 1из 45

Bala Peddi

TERADATA Master
November 15, 2008
Introduction to Data Warehousing
2
2
Bala Peddi
Principal DW Consultant
Bala Peddi, graduated in BE computer science in 1993 from
Karnataka university.
15+ years of real time industry experience in Data Warehousing and
computer programming.
Joined Satyam computers in 1995 as Unix & C programmer
Went to USA in 1996, worked in various fortune 500 organizations
such as Fidelity Investments at Boston, AT&T , NCR at San Francisco.
First Union Bank, Wachovia bank, Wells Fargo bank at Charlotte, NC.
Heavily worked in Technologies such as Data Warehousing,
Informatica, Ab Initio ETL,IBM Data Stage ETL, TERADATA RDBMS,
Informix, UNIX, parallel processing, multi-terabyte environments.
Worked as DBA for large production systems.
Founded Simply Track Stock Market tracking product.
www.vbssol.com
Provided senior-level consulting support for a number of high profile
Data Warehousing projects.
3
3
Balas Data Warehousing Projects
In 1996-1997 Implemented Data Warehouse of Fidelity Investments.
Converted Informix to Teradata warehouse.
In 1998 - 2000 Implemented Corporate Data Warehouse For Fist union Bank,
Charlotte, NC. Implemented Corporate Data Warehouse (CDW)
From 2000 to 2002 , Implemented Operational Data Store (ODS) for Wachoiva
Bank, Charlotte, NC
In 2003 I have implemented Anti Money Laundering (AML) warehouse to
monitor terrorist activities.
From 2004 to 2006 , Work in Wachovia bank to convert large data warehouse
from Informix to Teradata . Implemented Enterprise Data Warehouse.
From 2007 to 2008 , Implemented Corporate Risk Data Warehouse to support
BASELL II regulatory requirements.
From 2008 to 2010 , Implemented Profitability Data Warehouse to report
Customer Level Profitability Reporting ( CLPR).
All the Above Data Warehousing Projects used technologies like Teradata,
oracle, Informatica 8.x, IBM DataStage 8.x, Ab Initio ETL, Unix
4
4
Testimonials
Wow Bala, What a shock. I will miss you. I always enjoyed working with you and have and will always have tremednous
respect for your knoweldge and your wonderful attitude. I hope all goes as well as it can for you and your family and I
wish you the very best of everything.
Thanks,
Ken Weicholz
Systems Analyst/Solutions Delivery Enterprise Information Services (EIS)
5
5
What is Raw Data ?
Raw data has no use until it becomes information
6
6
Information
Metadata
Record
Format or
Filed
names,
Column
names etc.
Add data types such as
decimal, char, integer to
become more useful
7
7
Information is in files, folders etc..?
What is Data ?
What is information?
Why cant we use files for
everything ?
How do you find venkats
salary from 100s of Excel
files ?
Need to open 1 at a time
and search for venkat
8
8
Organize information into Tables, Columns and Relations in
RDBMS,
NAME City Salary Date of join
srinivas Hyderabad 30,000 1/1/2009
raj Hyderabad 30,001 1/2/2009
bala Hyderabad 30,002 1/3/2009
santhosh Hyderabad 30,003 1/4/2009
veera Hyderabad 30,004 1/5/2009
ravi Hyderabad 30,005 1/6/2009
subba Hyderabad 30,006 1/7/2009
venkat Hyderabad 30,007 1/8/2009
ramesh Hyderabad 30,008 1/9/2009
kishore Hyderabad 30,009 1/10/2009
kumar Hyderabad 30,010 1/11/2009
Select salray from emp where name = venkat
Table
Columns
Rows
RDBMS
Some Examples of RDBMS software's are
1. ORACLE
2. SQL SERVER
3. DB2
4. Mysql etc ..
9
9
RDBMS
In RDBMS Tables are related, They are called relationships.
Every table has Primary Key
10
10
SQL Structured Query Language
DDL
> Create , Alter, Drop
DML
> Insert, update, delete
DQL Data Query Language
> Select <columns> from <tab> where <condition>
> Select <columns> from <tab> group by <columns>
> Select <columns> from <tab> Having < >
> Select <columns> from <tab> order by < >
> Select * from emp where depno = 20 and job = manager;
> JOINS , UNION, MINUS etc ..
11
11
What is a Transaction ?
A Unit of work in a RDBMS is called transactions.
During transaction, Data base either update , insert or
delete the rows from tables.
Any failures , it will put back the way it was ( All or
nothing)
Example of Transactions :
> Withdraw money from ATM Its a transaction.
> Buy a book in bookstore Its a transaction.
> Buy Train ticket -- Its a transaction.
> Close the account in bank Its a transaction.
> Open an account Its a transaction.
> Buy stocks Its a Transaction. Etc..
12
12
What is OLTP?
On Line Transaction Processing (OLTP) System
Most RDBMS systems are OLTP application
Database contains day to day transactions.
Mostly Inserts with few updates and deletes
Optimized for specific application or business
Historical data is archived for performance reasons.
13
13
Example OLTP application
Walk into reliance store , you see OLTP
Walk into ATM , you see OLTP server
Purchase Train ticket, OLTP
Buy LIC policy
Purchase Air ticket
Buy TV in electronic shop, OLTP
Buy a book in Amazon.com, OLTP
Buy stocks in a broker like karvey,Etrade OLTP
14
14
Problems with OLTP
Not for reporting
Not for analysis
Data must be deleted or archived or backed up
OLTP system must be fast , can not go down.
15
15
What is Warehouse ?
Picture below shows W that makes shoes.
16
16
An Idea behind Data Warehousing ?
Study the past if you would define the future.
Confucius
Chinese philosopher & reformer (551 BC - 479
BC)
17
17
What is Data Warehouse ?
It is just an RDBMS like OLTP system
Storing historic information from various source
systems for analysis and study the past.
Also called DSS ( Decision support systems)
Database is optimized for Select , Joins
Large volumes , In Terabytes
OLTP SYS1
OLTP SYS2
OLTP SYS3
Data
Warehouse
18
18
A simple OLTP Transaction table?
Trans_id Time Product Quantity Price Total
Amount
100 22/08/2010 8:00 AM Soap 5 10 50
101 22/08/2010 8:10 AM Soap 3 10 30
102 22/08/2010 8:20 AM Soap 4 10 40
103 22/08/2010 8:30 AM Soap 2 10 20
19
19
A simple DW table, aggregated view?
August 22
nd
2010
Date Product Quantity Price Total
Amount
22/08/2010 Soap 14 10 140
Trans_id Time Product Quantity Price Total
Amount
100 22/08/2010 8:00 AM Soap 5 10 50
101 22/08/2010 8:10 AM Soap 3 10 30
102 22/08/2010 8:20 AM Soap 4 10 40
103 22/08/2010 8:30 AM Soap 2 10 20
OLTP
DW
20
20
A simple DW table, aggregated view?
August 23
rd
2010
Date Product Quantity Price Total
Amount
22/08/2010 Soap 14 10 140
23/08/2010 Soap 19 10 190
Trans_id Time Product Quantity Price Total
Amount
100 23/08/2010 8:00 AM Soap 10 10 50
101 23/08/2010 8:10 AM Soap 3 10 30
102 23/08/2010 8:20 AM Soap 4 10 40
103 23/08/2010 8:30 AM Soap 2 10 20
OLTP
DW
21
21
A History is more important then aggregation
Following table has daily history
Date Product Quantity Avg Price Total Amount
22/08/2009 Soap 20 8 160
23/08/2009 Soap 10 8 80
24/08/2009 Soap 15 8 120
And so on.. Until
24/08/2010
25/08/2010 Soap 14 10 140
26/08/2010 Soap 14 10 140
27/08/2010 Soap 14 10 140
This Table has two important things
1. History
2. Aggregated (summary) by day
22
22
Example of Why we need history
To make decision we need lots of data from fast.
You make better decisions when you have accurate
history for last 5 + years.
In the next slide we take simple example why we need
know history to make decissions
23
23
Marks List ( 1
st
Quarter) Real time DW
example
Min Max Score Result
Math 35 100 90 Very Good
Science 35 100 30 Fail
Social 35 100 87 Very Good
English 35 100 65 Good
24
24
Marks List (Half Yearly) ) Real time DW
example
Min Max Score Result
Math 35 100 95 Very Good
Science 35 100 27 Fail
Social 35 100 84 Very Good
English 35 100 72 Good
25
25
Marks List (Three Dimensional ) Real time
DW example
Min Max Score Result
Math 35 100 95 Very
Good
Scienc
e
35 100 27 Fail
Social 35 100 84 Very
Good
Englis
h
35 100 72 Good
Min Max Score Result
Math 35 100 90 Very
Good
Scienc
e
35 100 30 Fail
Social 35 100 87 Very
Good
Englis
h
35 100 65 Good
Mom has history , now she can make decisions
1. Change Teacher
2. Change School 3. Tuition (Decision Support System)
26
26
In Summary Data Warehouse Definition
A Data Warehouse is storing historic information into
RDBMS for analysis. Historic information is copied from
operational systems , also called OLTP systems.
In most cases data is aggregated during the copy from
OLTP systems.
It is also called DSS ( Decision Support System)
In Short
> History of your business
> Summary of your business.
27
27
Who needs Data Warehouse ?
High Management like CEOs to looks at overall business
trends.
Middle managers to look at regional business.
Low managers to look at their own store or branch.
Marketing team for Cross selling
Business who want to Make more money and be
competitive you need DW
To retain customers you need DW
To track campaigns or advertisements you need DW
To find suspected behaviors from customers in financial
industry you need Data Warehouse. ( AML).
Other governments regulatory requirements like KYC,
BASEL II etc you need DW.
28
28
It is important to use the information?
Data Warehouse
John
Marry
Both Are District Managers
Both Ran Following report

Show me, for all my stores,


a breakdown of
secondquarter sales
compared to firstquarter
sales, each store's
secondquarter sales from a
year earlier, and the sales of
all competitors within two
square miles of each store's
location.
Marry calls store managers
whose sales are down or
Flat. Ask them to run the
promotion. With out DW
she cant make this
decision.
Some Store manager
complained about
inventory issues .. So she
took care of it
29
29
Why DW ? Why cant use OLTP to do all
Can not integrate with other system. Some time
customer information for a company is in many OLTP
systems.
Do not effect online system performance
OLTP are not for query
Need new database and new way of creating tables for
faster queries.
Reporting tools works best with DW models
Industry standards
Data Warehousing needs 2 to 10 years of history, Not
possible in OLTP
30
30
Data Warehouse book definition?
Data warehouse is relational database used for query
analysis and reporting. By definition data warehouse is
Integrated, Non-volatile, Time variant, Subject-oriented.
Integrated Data collected from multiple
sources integrated into a
user readable unique format.
Non volatile Maintain Historical date.
Time variant data display the weekly,
monthly, yearly.
Subject oriented Data warehouse is maintained
particular subject.
31
31
Integrated
32
32
Non volatile
33
33
Time variant
34
34
Subject oriented
35
35
Data Marts
The data marts are considered sub-sets of the data warehouse. Each data
mart is designed for a particular department and is optimized for the analysis
needs of one department.
Two types
> Dependent Data Mart
> Independent Data Mart
Data
Warehouse
Marketing Mart
Sales Mart
Accounting Mart
Dependent Data Mart
Marketing Mart
Sales Mart
Accounting Mart
Independent Data Mart
Data
Warehouse
Source
36
36
Top down and Bottom Up approach?
Top-Down Bottom-Up
Practitioner Bill Inman Ralph Kimball
Emphasize Data Warehouse Data Marts
Design Enterprise based normalized
model; marts use a subject
orient dimensional model
Dimensional model of data
mart, consists star schema
Architect Multi-tier comprised of staging
area and dependent data
marts
Staging area and data marts
Data set DW atomic level data; marts
summary data
Contains both atomic and
summary data
37
37
Operational Data Store (ODS)
An ODS is usually designed to contain low-level data
(such as transactions and prices) with limited history
that is captured "real time" or "near real time" as
opposed to the much greater volumes of data stored in
the Data warehouse generally on a less-frequent basis.
ODS systems mainly used for following applications
> Call Centers
> Product support
> On Demand Marketing
> More inserts/deletes compare to DW
38
38
OLTP vs. Data Warehousing
OLTP
DATA
WAREHOUSING
Transactional Business Need Analytical
Simple Query Complex
Point-in-Time Timeframe Historical
Known Business Question Unknown
Static
Business
Environment
Dynamic
39
39
Real-time Banking DW examples.
What is EDW ?
Credit Cards
OLTP
Deposits &
Withdraw
OLTP
Loans
OLTP
Investments
OLTP
Enterprise
Data
Warehouse
I
n
f
o
r
m
a
t
i
c
a
/
D
a
t
a

S
t
a
g
e

E
T
L
Teradata
40
40
Real time use of DW in Banking
Cross Selling , If customer open saving account, Send an
offer for Credit Card and vice versa
If customer apply for car loan , call him to see if he can
open current account/saving account.
Campaign management : Run TV adv in New York city in
Jan 2010 , Run DW report to see if sales in NY increased
? IF yes run adv across the country if no , dump the adv.
Customer retention : Act immediately if customer leaves
Profitability : Calculate profit at customer level. Treat
profitable customers with benefits.
Financial Forecast : We made 2 Billion $ profit in last 3
months , How much we can expect if trend continues.
Keeping Track banks over all capital requirements etc..
41
41
Data Warehousing usage in Anti Money
Laundering.
After Sep 11
th
attacks , American Government introduced
a Law call AML.
AML is data warehousing system to track customer
behavior over the period of time.
AML Data Warehouse tracks following activities
If customer gets deposits in large amounts
If customer gets deposits from different countries (
rogue countries)
If customer gets too many deposits from various
people in short time
If customer with draw lot of cash
42
42
Size of Data Warehouse.
1 Bit = Binary Digit
8 Bits = 1 Byte
1000 Bytes = 1 Kilobyte
1000 Kilobytes = 1 Megabyte
1000 Megabytes = 1 Gigabyte
1000 Gigabytes = 1 Terabyte
1000 Terabytes = 1 Petabyte
1000 Petabytes = 1 Exabyte
1000 Exabytes = 1 Zettabyte
1000 Zettabytes = 1 Yottabyte
1000 Yottabytes = 1 Brontobyte
1000 Brontobytes = 1 Geopbyte
Top 500 companies in the world has DW, they are more
than 50 terabytes..
43
43
What do we offer ?
Data Warehouse concepts
5 sessions
Teradata Programming
5 sessions , Teradata development tasks and concepts
Informatica / Datastage / Ab Initio ETL tool
35 sessions , includes practical example, project.
You can choose any ETL tool, we recommend
Datastage or Informatica
IBM Cognos
30 sessions , includes Real time reports.
A Real time Project with Teradata Backend and ETL tool
5 sessions , includes Real time reports
44
44
Difference between ETL tools
Informatica Data Stage Ab Initio
Comapy
Informatica , Formed
1992 IBM , Formed in 1900
Ab Initio , Israil based
company, 1997
Learn Very easy to learn Easy to learn Little complex to learn
Jobs
More Jobs ( 1479 jobs
in last few days
according to jobs
search)
Lesser than Informatica
( 1000 jobs in search)
< Datastage, 700 jobs in
search
Resource
More People know
about Informatica, easy
to find people, more
competition
Very few, hence you
are from very few
Very very few, you will be in
one of them
Who uses
Small to medium
compnies
Medium to large
companies
Large, Fortune 1000
companies
Cost Cheap reasonable Very expensive
Speed Slow better Very Fast
Parallel
Processing limited yes best
45
45
Your Goal
On your resume you should be able to present following
skills
DW, Unix, Data Modeling, Informatica ETL, Data
stage, Cognos..
Jobs that you can apply
Data Warehousing programmer
ETL developer
Informatica developer , Datastage developer,
Teradata developer.
Cognos report developer

Вам также может понравиться