Академический Документы
Профессиональный Документы
Культура Документы
TERADATA Master
November 15, 2008
Introduction to Data Warehousing
2
2
Bala Peddi
Principal DW Consultant
Bala Peddi, graduated in BE computer science in 1993 from
Karnataka university.
15+ years of real time industry experience in Data Warehousing and
computer programming.
Joined Satyam computers in 1995 as Unix & C programmer
Went to USA in 1996, worked in various fortune 500 organizations
such as Fidelity Investments at Boston, AT&T , NCR at San Francisco.
First Union Bank, Wachovia bank, Wells Fargo bank at Charlotte, NC.
Heavily worked in Technologies such as Data Warehousing,
Informatica, Ab Initio ETL,IBM Data Stage ETL, TERADATA RDBMS,
Informix, UNIX, parallel processing, multi-terabyte environments.
Worked as DBA for large production systems.
Founded Simply Track Stock Market tracking product.
www.vbssol.com
Provided senior-level consulting support for a number of high profile
Data Warehousing projects.
3
3
Balas Data Warehousing Projects
In 1996-1997 Implemented Data Warehouse of Fidelity Investments.
Converted Informix to Teradata warehouse.
In 1998 - 2000 Implemented Corporate Data Warehouse For Fist union Bank,
Charlotte, NC. Implemented Corporate Data Warehouse (CDW)
From 2000 to 2002 , Implemented Operational Data Store (ODS) for Wachoiva
Bank, Charlotte, NC
In 2003 I have implemented Anti Money Laundering (AML) warehouse to
monitor terrorist activities.
From 2004 to 2006 , Work in Wachovia bank to convert large data warehouse
from Informix to Teradata . Implemented Enterprise Data Warehouse.
From 2007 to 2008 , Implemented Corporate Risk Data Warehouse to support
BASELL II regulatory requirements.
From 2008 to 2010 , Implemented Profitability Data Warehouse to report
Customer Level Profitability Reporting ( CLPR).
All the Above Data Warehousing Projects used technologies like Teradata,
oracle, Informatica 8.x, IBM DataStage 8.x, Ab Initio ETL, Unix
4
4
Testimonials
Wow Bala, What a shock. I will miss you. I always enjoyed working with you and have and will always have tremednous
respect for your knoweldge and your wonderful attitude. I hope all goes as well as it can for you and your family and I
wish you the very best of everything.
Thanks,
Ken Weicholz
Systems Analyst/Solutions Delivery Enterprise Information Services (EIS)
5
5
What is Raw Data ?
Raw data has no use until it becomes information
6
6
Information
Metadata
Record
Format or
Filed
names,
Column
names etc.
Add data types such as
decimal, char, integer to
become more useful
7
7
Information is in files, folders etc..?
What is Data ?
What is information?
Why cant we use files for
everything ?
How do you find venkats
salary from 100s of Excel
files ?
Need to open 1 at a time
and search for venkat
8
8
Organize information into Tables, Columns and Relations in
RDBMS,
NAME City Salary Date of join
srinivas Hyderabad 30,000 1/1/2009
raj Hyderabad 30,001 1/2/2009
bala Hyderabad 30,002 1/3/2009
santhosh Hyderabad 30,003 1/4/2009
veera Hyderabad 30,004 1/5/2009
ravi Hyderabad 30,005 1/6/2009
subba Hyderabad 30,006 1/7/2009
venkat Hyderabad 30,007 1/8/2009
ramesh Hyderabad 30,008 1/9/2009
kishore Hyderabad 30,009 1/10/2009
kumar Hyderabad 30,010 1/11/2009
Select salray from emp where name = venkat
Table
Columns
Rows
RDBMS
Some Examples of RDBMS software's are
1. ORACLE
2. SQL SERVER
3. DB2
4. Mysql etc ..
9
9
RDBMS
In RDBMS Tables are related, They are called relationships.
Every table has Primary Key
10
10
SQL Structured Query Language
DDL
> Create , Alter, Drop
DML
> Insert, update, delete
DQL Data Query Language
> Select <columns> from <tab> where <condition>
> Select <columns> from <tab> group by <columns>
> Select <columns> from <tab> Having < >
> Select <columns> from <tab> order by < >
> Select * from emp where depno = 20 and job = manager;
> JOINS , UNION, MINUS etc ..
11
11
What is a Transaction ?
A Unit of work in a RDBMS is called transactions.
During transaction, Data base either update , insert or
delete the rows from tables.
Any failures , it will put back the way it was ( All or
nothing)
Example of Transactions :
> Withdraw money from ATM Its a transaction.
> Buy a book in bookstore Its a transaction.
> Buy Train ticket -- Its a transaction.
> Close the account in bank Its a transaction.
> Open an account Its a transaction.
> Buy stocks Its a Transaction. Etc..
12
12
What is OLTP?
On Line Transaction Processing (OLTP) System
Most RDBMS systems are OLTP application
Database contains day to day transactions.
Mostly Inserts with few updates and deletes
Optimized for specific application or business
Historical data is archived for performance reasons.
13
13
Example OLTP application
Walk into reliance store , you see OLTP
Walk into ATM , you see OLTP server
Purchase Train ticket, OLTP
Buy LIC policy
Purchase Air ticket
Buy TV in electronic shop, OLTP
Buy a book in Amazon.com, OLTP
Buy stocks in a broker like karvey,Etrade OLTP
14
14
Problems with OLTP
Not for reporting
Not for analysis
Data must be deleted or archived or backed up
OLTP system must be fast , can not go down.
15
15
What is Warehouse ?
Picture below shows W that makes shoes.
16
16
An Idea behind Data Warehousing ?
Study the past if you would define the future.
Confucius
Chinese philosopher & reformer (551 BC - 479
BC)
17
17
What is Data Warehouse ?
It is just an RDBMS like OLTP system
Storing historic information from various source
systems for analysis and study the past.
Also called DSS ( Decision support systems)
Database is optimized for Select , Joins
Large volumes , In Terabytes
OLTP SYS1
OLTP SYS2
OLTP SYS3
Data
Warehouse
18
18
A simple OLTP Transaction table?
Trans_id Time Product Quantity Price Total
Amount
100 22/08/2010 8:00 AM Soap 5 10 50
101 22/08/2010 8:10 AM Soap 3 10 30
102 22/08/2010 8:20 AM Soap 4 10 40
103 22/08/2010 8:30 AM Soap 2 10 20
19
19
A simple DW table, aggregated view?
August 22
nd
2010
Date Product Quantity Price Total
Amount
22/08/2010 Soap 14 10 140
Trans_id Time Product Quantity Price Total
Amount
100 22/08/2010 8:00 AM Soap 5 10 50
101 22/08/2010 8:10 AM Soap 3 10 30
102 22/08/2010 8:20 AM Soap 4 10 40
103 22/08/2010 8:30 AM Soap 2 10 20
OLTP
DW
20
20
A simple DW table, aggregated view?
August 23
rd
2010
Date Product Quantity Price Total
Amount
22/08/2010 Soap 14 10 140
23/08/2010 Soap 19 10 190
Trans_id Time Product Quantity Price Total
Amount
100 23/08/2010 8:00 AM Soap 10 10 50
101 23/08/2010 8:10 AM Soap 3 10 30
102 23/08/2010 8:20 AM Soap 4 10 40
103 23/08/2010 8:30 AM Soap 2 10 20
OLTP
DW
21
21
A History is more important then aggregation
Following table has daily history
Date Product Quantity Avg Price Total Amount
22/08/2009 Soap 20 8 160
23/08/2009 Soap 10 8 80
24/08/2009 Soap 15 8 120
And so on.. Until
24/08/2010
25/08/2010 Soap 14 10 140
26/08/2010 Soap 14 10 140
27/08/2010 Soap 14 10 140
This Table has two important things
1. History
2. Aggregated (summary) by day
22
22
Example of Why we need history
To make decision we need lots of data from fast.
You make better decisions when you have accurate
history for last 5 + years.
In the next slide we take simple example why we need
know history to make decissions
23
23
Marks List ( 1
st
Quarter) Real time DW
example
Min Max Score Result
Math 35 100 90 Very Good
Science 35 100 30 Fail
Social 35 100 87 Very Good
English 35 100 65 Good
24
24
Marks List (Half Yearly) ) Real time DW
example
Min Max Score Result
Math 35 100 95 Very Good
Science 35 100 27 Fail
Social 35 100 84 Very Good
English 35 100 72 Good
25
25
Marks List (Three Dimensional ) Real time
DW example
Min Max Score Result
Math 35 100 95 Very
Good
Scienc
e
35 100 27 Fail
Social 35 100 84 Very
Good
Englis
h
35 100 72 Good
Min Max Score Result
Math 35 100 90 Very
Good
Scienc
e
35 100 30 Fail
Social 35 100 87 Very
Good
Englis
h
35 100 65 Good
Mom has history , now she can make decisions
1. Change Teacher
2. Change School 3. Tuition (Decision Support System)
26
26
In Summary Data Warehouse Definition
A Data Warehouse is storing historic information into
RDBMS for analysis. Historic information is copied from
operational systems , also called OLTP systems.
In most cases data is aggregated during the copy from
OLTP systems.
It is also called DSS ( Decision Support System)
In Short
> History of your business
> Summary of your business.
27
27
Who needs Data Warehouse ?
High Management like CEOs to looks at overall business
trends.
Middle managers to look at regional business.
Low managers to look at their own store or branch.
Marketing team for Cross selling
Business who want to Make more money and be
competitive you need DW
To retain customers you need DW
To track campaigns or advertisements you need DW
To find suspected behaviors from customers in financial
industry you need Data Warehouse. ( AML).
Other governments regulatory requirements like KYC,
BASEL II etc you need DW.
28
28
It is important to use the information?
Data Warehouse
John
Marry
Both Are District Managers
Both Ran Following report