Вы находитесь на странице: 1из 34

Data Warehousing

L.Ramanathan Asst. Prof. SCSE VIT University Room No. MB E 407

The Architectural Components

Understanding Data Warehouse Architecture


Architecture- Definition
The Structure that brings all the components of data warehouse together. Architecture includes the following factors: Integrated data Things necessary to prepare and store data Means for delivering information from the data warehouse

Understanding Data Warehouse Architecture Contd..

Composed of rules, procedures and functions that enable the data warehouse to work and fulfill the business requirements. Made up of technology that empowers the data warehouse

It defines the standards, measurements, general design and support techniques

Source Data
Operational data source1

Operational data source 2

Operational data source n

Operational data store (ods)

D A T A A C Q U I S I T I O N

Management and Control


Warehouse Manager

Information Delivery

Meta-data
Lightly summarized data

High summarized data

Reporting, query,application development, Query EIS(executive information system) tools and

Manage

Detailed data

DBMS

OLAP(online analytical processing) tools

Data mining

(Third Tier) Archive/backup data End-user access tools Data Mart

Data Staging Data Storage

summarized data(Relational database)

Summarized data (Multi-dimension database)

(Second Tier)

Typical data warehouse and data mart architecture

Distinguishing Characteristics
Different Objectives and Scope The DW architecture must have components that will work to provide data to users in large volumes in a single session.

Distinguishing Characteristics
Different Objectives and Scope - Contd.. Defining the scope of the DW is also difficult. Factors that is to be considered is Number and extent of data sources. Data granularity and data volumes. Impact of DW on existing Operational systems Scope is measured in terms of data transformation and integration functions

Distinguishing Characteristics
Data Content Read only data in the DW is the primary component in the architecture. Architecture should support business subjects as well as high data volumes.

Distinguishing Characteristics
Complex Analysis and Quick Response Architecture should support complex analysis of strategic information since information retrieval is complex. Users must be able to: Drill down, roll up, slice and dice the data. Review results in different output options Results in tabular format as well as graphical. Finally the architecture should provide a platform for making rapid decisions and to deal with situations quickly

Distinguishing Characteristics
Flexible and Dynamic DW architecture should be flexible enough to accommodate additional requirements as and when they require. Ex: The missed items in the business requirements or those that arises because of the change in the business conditions.

Distinguishing Characteristics
Meta Data- Driven Holds data about every phase of the data movement. It interleaves with and connects other components.

Questions
1.

2.

3.

What are the three major architectural components? State any three characteristics of DW architecture. What is the use of Meta data?

Answers
1.

2.

3.

Data acquisition, Data Storage, Data Delivery. Architecture should support different objectives and scope, read only data, complex analysis and quick response, Flexible and dynamic, and Meta data driven. Meta data details every phase of data movement and it interleaves and connects with other components.

TECHNICAL ARCHITECTURE

The technical architecture of a DW is the complete set of functions and services provided within its components.. Includes the procedures and rules that are required to perform the functions and provide the services. Includes the data stores needed for each component to provide the services Tools are the means to implement the architecture

Components Functions & Services

Components

Data Acquisition
Data Flow
Data Extraction, Data Transformation, Data Staging

Data Storage
Data Flow
Flow, Data Groups, Data Repository

Information delivery
Data Flow
Flow, Service locations, Data Stores

Data Acquisition: Technical Architecture


Source Data Management & control

Metadata

Data Extraction

Intermediary Flat Files

Data Transformation

Relational DB or Flat Files

Data Staging

DATA ACQUISITION
1.
2.
3.

Extract data from data sources Move to the staging area Prepare the data for loading into the DW Components source data and data staging Data Flow Flow Data flow begins at the data sources and pauses at the staging area. After transformation and integration, the data is ready for loading into DW repository.

TECHNICAL ARCHITECTUREDATA ACQUISITION

Data Flow Contd

Data Sources Primary data source consists of * Enterprises operational systems (+) Consolidated data, ready to use (-) Proprietary tools required to extract data * Legacy data resides on hierarchical or network databases

TECHNICAL ARCHITECTURE DATA ACQUISITION

Data Flow Contd Intermediate Data Stores


* Data from data sources moved to temporary files * Homogenous data from several sources are merged with other temporary files before moving into staging area * Flat files are used to extract data from operational systems

TECHNICAL ARCHITECTURE DATA ACQUISITION

Data Flow Contd Staging Area Each extracted file is examined, reviewed for business rules, perform various transformation functions , sort and merge data, resolve inconsistencies and cleanse the data. This data temporarily resides in staging area before loaded into the DW repository.

TECHNICAL ARCHITECTURE DATA ACQUISITION

Functions and Services

List of functions and Services Data Extraction Select data sources and determine the types of filters to be applied to individual sources Generate automatic extract files from operational systems using replication Create intermediate files to store selected data to be merged later Transport extracted data from multiple platforms

TECHNICAL ARCHITECTURE DATA ACQUISITION

Functions and Services

List of functions and Services Data Extraction Contd.

Provide automated job control services for creating extract files Reformat input from outside sources, departmental data files, databases etc Generate common application code for data extraction Resolve inconsistencies for common data elements from multiple sources

TECHNICAL ARCHITECTURE DATA ACQUISITION

Functions and Services

List of functions and Services Data Transformation


Map input data to data for DW repository Clean data, de duplicate and merge/purge De normalize extracted data structures Convert data types Calculate and derive attribute values Check for referential integrity Aggregate data as needed Resolve missing values Consolidate and integrate data

TECHNICAL ARCHITECTURE DATA ACQUISITION

Functions and Services

List of functions and Services Data Staging

Provide back up and recovery for staging area repositories Sort and merge files Create files as input to make changes to dimension tables Create and populate database if relational database Preserve audit trail to relate each data item in the DW to input source Resolve and create primary and foreign keys for load tables Consolidate data sets and create flat files If storage is relational extract load files

DATA STORAGE : TECHNICAL ARCHITECTURE


Management & Control E-R Model

Relational DB E-R Model

Data Storage

Relational DB Dimensional Model

Data Marts

TECHNICAL ARCHITECTURE DATA STORAGE

Deals with the entire process loading the data from the staging area into the DW repository.

Data Flow

Flow From the staging area to DW repository * Top-down


Enterprise repository to dependent data marts

* Bottom-up
Independent data marts to DW

TECHNICAL ARCHITECTURE- DATA STORAGE

Data Flow Contd

Data Groups * First group


Set of files or tables containing data for a full refresh Meant for the initial loading of the DW set of files or tables containing ongoing incremental loads

* Second group

TECHNICAL ARCHITECTURE- DATA STORAGE

Data Flow Contd


Data Repository

All DW databases are relational Capabilities of RDBMS is available for processing of data

TECHNICAL ARCHITECTURE DATA STORAGE

Functions and Services

List of functions and Services


Load data for full refreshes of DW tables Perform incremental loads at specified intervals Support loading into multiple tables Optimize the loading process Provide automated job control services for loading data into DW Provide backup and recovery Provide security Monitor and fine tune the database Periodically archive data from the database.

INFORMATION DELIVERY : TECHNICAL ARCHITECTURE


Management & Control Metadata Information Delivery

Multidimensional Database

OLAP

Temporary Result Sets Standard Reporting Data Stores

DM

REPORT / QUERY

TECHNICAL ARCHITECTURE INFORMATION DELIVERY


Deals with the entire process of providing information to the user in a flexible manner Data Flow

Flow

May be top-down or bottom-up User query data is transformed into information either in the form of regular or ad hoc report

TECHNICAL ARCHITECTURE INFORMATION DELIVERY

Data Flow

Service Location
* Query service may be from user desktop / an application server /database * A comprehensive reporting service is needed for producing reports at regular intervals

TECHNICAL ARCHITECTURE INFORMATION DELIVERY

Data Flow

Data Stores The following intermediary data stores are used for information delivery:
Proprietary temporary stores to hold results of individual queries and reports for repeated use Data stores for standard reporting Proprietary multidimensional databases.

TECHNICAL ARCHITECTURE INFORMATION DELIVERY

Functions and Services List of functions and Services


Provide security to control information access Monitor user access to improve service and for future enhancements. Allow users to browse DW content Simplify access by hiding internal complexities of data storage from users. Automatically reformat queries for optimal execution Govern Queries and control runaway queries Store result set of queries and reports for future use. Provide multiple levels of granularity Provide event triggers to monitor data loading Make provision to perform complex analysis thru OLAP

Вам также может понравиться