Вы находитесь на странице: 1из 37

Data Warehousing Concepts

13th April 2004

2004 Infosys Technologies Ltd. Private and confidential. All rights reserved.

Agenda
 Datawarehouse Introduction  Datawarehouse - Architecture  Common DW Terminology  OLTP v/s OLAP  ETL  OLAP & Reporting Tools

DW - Pulling the rabbit out of the hat act..


DM DW ODS Marketing Hunt for the rabbit (information) Finance DM

Branch ops

Infoscions at work

Business Users CXOs

Introduction
 Datawarehousing
It is all about information It is a journey and not a destination

 Uses some serious Hardware and Software components to extract, transform, cleanse, store and analyze massive amounts of data.  Intelligent way of managing Data
DATA HISTORY INFORMATION KNOWLEDGE WISDOM

DATA

INFORMATION HISTORY

Datawarehousing in Simple Words

A warehouse is place where goods are physically stocked, to facilitate smooth flow of business without any production downtime or crisis. In Simple words: A data warehouse is read only database which copies/stores the data from the transactional database.

Datawarehouse is
 Corporate Informational Repository  Created from wealth of Operational data.  A source for reporting by the various business functions
Sales & marketing, Finance Operations

 Provides the tools for User to access consolidated corporate information  Collection of data to assist Decision support  Data and tools.

Datawarehouse Definitions
 Datawarehousing is the process of extracting, integrating, filtering, standardizing, transforming, cleaning and quality checking of the organization applications data and storing it in a consolidated database.  Bill Inmon defines DW as, Subject oriented All relevant information specific to a subject e.g Sales Integrated Integration is closely related to subject orientation. Data Warehouses must put together data from disparate form into consistent format. They must resolve the naming conflicts and inconsistencies among units of measure,in order to be integrated. Non Volatile Read only, Non Volatile means that once data has been entered the data warehouse, it cannot be changed. Time Variant Time is a key dimension System which is used by management authorities for making important business decisions

Need for DW
    Difficulty in obtaining integrated information Information structure not able to provide full and dynamic analysis of information available Inconsistent results obtained from queries and reports arising from heterogeneous data sources Increased difficulty in delivering consistent comprehensive information in a timely fashion

DW - Goals
 Understand business trends and make better forecasting decisions.  Analyze daily,weekly, monthly & yearly sales information  Bring better products to market in a more timely manner  Provide the ammunition to the company to differentiate themselves from their peers  The well known successes of DW in the early days
Wal Mart & Fed Ex

Features of DW
 Datawarehouse provides
Ability to have consistent data. Ability to access Enterprise data from a single source. Ability to perform analysis quick and easy for the various business user communities. Information about the data in DW Meta Data End Users terminology to define and refer the data Ease of access to information

A warehouse helps to find answers.


 A Companys CFO wants to know
What is the contribution of each operating division for the total revenue? Who are the top 10 customers in each continent in the last 4 quarters? What is the %age revenue comes from the top 10 customers? What is their revenue split across the operating divisions of DHL? (DHL express, DHL freight, DHL solutions & DHL Danzas air & Ocean)

 Division CEOs want to know


What is % increase of their revenue contribution in the last 4 quarters for DHL? What is the net margin % in the last 3 months for each of the product lines? What is monthly employee turn over % in the express services in the last one year?

10

What DW can do
 Track the most profitable customers and segments  Regional trends on sales, profits & transactions  Product profitability  On the spot decisions

11

DW Benefits

 Helps in efficiently converting huge stacks of data into information and further into better business decision making.  Develop applications quickly on changing needs ensuring highest returns on your investments.  Analyzing daily sales information.  Competitive edge to Company.

12

DW Terminology
 OLTP Online Transaction Processing (Data capture Screens)  OLAP Online Analytical Processing (Reporting)
ROLAP (Relational), MOLAP (Multi Dimensional)

 Transformation Process of changing the OLTP data into OLAP information  DATA MART .. Is a data structure that is optimized for access. It supports a single analytic application.  METADATA.. The information about the data which is stored as part of the DW is called Meta data. In other words, the results of data modeling activity when stored in a tool or repository is called Metadata  CUBE .. Central object of your data containing information in a multidimensional structure. Each cube is defined by a set of dimensions and measures

13

Concepts and Terms Operational data - OLTP


 Updated often and through Online transactions  Non historical data ( not more than 3 to 6 months)  Optimized for Transaction processing I.e Concurrancy  Highly normalized for easy update, maintenance and integrity.  Data which is stored , retrieved and maintained by OLTP  Typically stored in Relational database, but can be also in Hierarchical or flat files  Usually handles one record at a time
Create new customer Update customer address Withdraw USD1000$ for a gift Enter the payment for the last bill from the gas company

14

Concepts and Terms Informational data - OLAP


 Stored in a format to ease out Analysis.  Analysis means in the form of Decision support ( Queries), Reports, Executive Information system, In-depth statistical analysis.  Summarized Operational data  De-normalized  Infrequently updated from Operational systems  Optimized for decision support  Read-only  Stored in separate systems to lessen impact on Operational data.

15

OLTP v/s OLAP OLTP system runs the business, Data Warehouses tell you how to run the business
Characteristic Orientation Data Access Updates Response time OLTP Transaction Record at a time Frequent & Unscheduled Seconds required OLAP Analysis Set at a time Periodic & Scheduled Minutes acceptable

Concurrent users Availability Data structures

Many Guaranteed Highly normalized

Few As needed Often de-normalized

Data nature

Current 16

historical

Data Warehouse Processes


Transformation  Data is extracted from OLTP databases and external data source data. Data transformation will have to be carried out on the extracted data before data is carried to Datawarehouse. Standardization  As the Data will be moved to from different OLTP database or flat file system, to one target, data need to be standardized. For example Date fields or Flag fields Filtering  Data filtering is the process of extracting the required data from the OLTP or external source data sources. For example the user may be interested only last five years sales data. Aggregation  Aggregate data is the data that has undergone data summarization

Data Scrubbing & Cleansing  Process of filtering, merging, standardizing, initializing and translating the operational data in order to create informational data that can be stored in Datawarehouse  To ensure the Data quality, accuracy
17

ODS Key Differences


 Operational Data Store: Operational data which are stored and kept for more periods than the OLTP without any transformation
Essentially data is the same as OLTP but kept upto 2 to 3 years No Transformation, cleansing of the data involved. All the information translation will be done by the reporting side. Ex: Mainframe sequential files & long running SAS jobs that scan across 12 months of historical files. Sometimes the data is loaded into RDBMS and kept for query tools reasons.

18

Steps DW Building

 Identify key business drivers, sponsorship, risks, ROI  Survey information needs and identify desired functionality and define functional requirements for initial subject area.  Architect long-term, data warehousing architecture  Evaluate and Finalize DW tool & technology  Conduct Proof-of-Concept

Cont.
19

Steps DW Building

 Design target data base schema  Build data mapping, extract, transformation, cleansing and aggregation/summarization rules  Build initial data mart, using exact subset of enterprise data warehousing architecture and expand to enterprise architecture over subsequent phases  Maintain and administer data warehouse

20

DW Process Overview

Heterogeneous Source Systems


Operational

Extraction, Transformation & Loading (ETL)

Business Intelligence
Query & Reporting

Legacy

Staging Area

Data Warehouse

OLAP

External Data Mining


21

Datawarehousing End to End


Finance Orders

Billing

ODS

Customer

E T L

DW
Customer Service

Marketing

Product

OLTP METADATA
22

Star Schema
A central fact table surrounded by a number of dimension tables. Dimensions are business entities on which calculations are done. They can be numeric or alphanumeric. Example: Product table comprising brand name, category, packaging type, size.

Facts are numerical measurements of business with respect to dimensions.They are numeric and additive (summable across any combination) e.g. A sales fact table could contain time, product and store key along with dollars sold, units sold, dollars cost.

23

Snow Flake Schema

Normalized version of the star schema with the addition of normalized dimension tables.

Normalization helps to reduce redundancy in the dimension tables, but affects performance and user comprehension.

24

Multidimensional Schemas
 Star schema  Snowflake schema

25

ETL Process
The process of extracting data from source systems and bringing it into the data warehouse is commonly called ETL, which stands for extraction, transformation, and loading.

The acronym ETL is perhaps too simplistic, because it omits the transportation phase and implies that each of the other phases of the process is distinct.

Refer to it as entire process,including data loading, as ETL. You should understand that ETL refers to a broad process, and not three well-defined steps.

26

ETL Process What happens


During extraction, the desired data is identified and extracted from many different sources, including database systems and applications.very often more data,is extracted so the identification of the relevant data will be done at a later point in time. Depending on the source system's capabilities (for example, operating system resources), some transformations may take place during this extraction process. After extracting data, it has to be physically transported to the target system or an intermediate system for further processing. Depending on the chosen way of transportation, some transformations can be done during this process, too. For example, a SQL statement which directly accesses a remote target through a gateway can concatenate two columns as part of the SELECT statement.

27

ETL Tools

Major ETL Tools are: Informatica Power Mart Informatica Power Center DataStage Ab Initio DP Warehouse Oracle Express Data Mirror

28

OLAP Terminology
OLAP Tools available for exploring the information built in a DW :

Multi-dimensional On-line Analytical Processing (MOLAP) The data from data warehouse is queried and dumped periodically on to a server on local network to a data storage called Multi-dimensional Database (MDDB) provided by the OLAP tool. This MDDB forms a Data Mart which is then used for querying and reporting.

Relational On-Line Analytical Processing (ROLAP) Refers to the ability to conduct OLAP analysis directly against a relational warehouse without any constraints on the number of dimensions, database size, analytical complexity, or number and type of users.

Hybrid On-line Analytical Processing (HOLAP) An environment with a combination of MOLAP and ROLAP data storage. Summarized information is typically stored in an MDDB and detailed data is stored in a Relational environment.

29

MOLAP

high performance, multidimensional data storage format.  data is stored on the OLAP server. gives the best query performance, for small to medium-sized data sets
30

ROLAP

With ROLAP data

remains in the original relational tables. A separate set of relational tables is used to store and reference aggregation data. ROLAP is ideal for large databases or legacy data that is infrequently queried.
31

HOLAP

HOLAP combines elements from MOLAP and ROLAP. HOLAP keeps the original data in relational tables but stores aggregations in a multidimensional format.
32

OLAP Terminology Drill Up/ Drill Down


Region State District

Location Month

Product
 Analytical technique whereby the user navigates from the most summarized to the most detailed level.

33

OLAP Terminology Slice / Dice


 Rotation Or Dicing

M O N T H PRODUCT

P R O D U C T Region

34

OLAP Tools
 Querying & Reporting
Oracle Discoverer Business Objects Brio Enterprise Oracle Express Hyperion Essbase Cognos

35

THANK YOU

2004 Infosys Technologies Ltd. Private and confidential. All rights reserved.

Вам также может понравиться