Вы находитесь на странице: 1из 21

SEMINAR REPORT

ON
DATA WAREHOUSE

SAM HIGGINBOTTOM INSTITUTE OF AGRICULTURE,


TECHNOLOGY & SCIENCES (DEEMED TO BE UNIVERSITY)
Department of Computer Science and I.T

Submitted To:
Mudita Shrivastava

Submitted By:
Ayush Barnawal (13BTCSE007)

CONTENTS
HISTORY
OBJECTIVES
EVOLUTION IN ORGANIZATIONAL USE
ARCHITECTURE
OLTP
OLAP
BENEFITS OF DATA WAREHOUSE
STRATEGIC USE
LIMITATIONS

HISTORY
The concept of data warehousing dates back to the late

1980s when IBM researchers Barry Devlin and Paul


Murphy developed the "business data warehouse.
Bill Inmon, Father Of Data Warehouse.
According to Inmons Definition:
It is a collection of integrated, subject-oriented databases

designed to support the DSS function, where each unit of


data is non-volatile and relevant to some moment in time.

DATA WAREHOUSE
A single, complete and consistent store of

data obtained from a variety of different


sources made available to end users in a what
they can understand and use in a business
context.
A subject-oriented, integrated, time-variant,

non-updatable collection of data used in


support of management decision-making
processes

DATA WAREHOUSING IS
Relational or Multidimensional Database Management System

designed to support Management Decision Making.


Technique for Assembling and Managing Data from various sources

for the purpose of answering Business questions.


A subject-oriented, integrated, time-variant, non-updatable

collection of data used in support of management decision-making


processes.
Common Accessing systems include queries, analysis and reporting.
The final result, however, is homogeneous data, which can be more

easily manipulated.

SUBJECT ORIENTED
DATA WAREHOUSE
Organized around major subjects, such as customer,

product, sales.
Focusing on the modeling and analysis of data for decision

makers, not on daily operations or transaction processing.


Provide a simple and concise view around particular

subject issues by excluding data that are not useful in the


decision support process.

INTEGRATED DATA WAREHOUSE


Constructed by integrating multiple, heterogeneous

data sources.
Relational databases, flat files, on-line transaction records.

Data cleaning and Data Integration techniques are

applied.
Ensures Consistency in naming conventions, encoding

structures, attribute structures, etc. among different data sources.


E.g., Hotel price: currency, tax, breakfast covered, etc.

When data is moved to the warehouse, it is converted.

TIME VARIANT DATA WAREHOUSE


The time horizon for the data warehouse is significantly

longer than that of operational systems.


Operational database: current value data.
Data warehouse data: provide information from a historical

perspective (e.g., past 5-10 years)

Every key structure in the Data Warehouse.


Contains an element of time, explicitly or implicitly.
But the key of operational data may or may not contain time

element.

NON VOLATILE
DATA WAREHOUSE
Physically separate store of Data transformed from the

Operational Environment.
Operational update of data does not occur in the data

warehouse environment.
Does not require transaction processing, recovery, and concurrency

control mechanisms.
Requires only two operations in Data Accessing.
Initial loading of Data.
Access of Data.

EVOLUTION
IN
ORGANIZATIONAL USE
Data, Data Everywhere yet,
I cant find the data I need
Data is Scattered over the Network
Many versions, Subtle differences
I cant understand the data I found
Available Data poorly documented
I cant use the data I found
Results are Unexpected
Data needs to be transformed from one form to Other

WHY DATA WAREHOUSING ?


Which
Whichare
areour
our
lowest/highest
lowest/highestmargin
margin
customers
customers??
Who
Whoare
aremy
mycustomers
customers
and
andwhat
whatproducts
products
are
arethey
theybuying?
buying?

What
Whatisisthe
themost
most
effective
effectivedistribution
distribution
channel?
channel?

What
Whatproduct
productpromprom-otions
-otionshave
havethe
thebiggest
biggest
impact
impacton
onrevenue?
revenue?

Which
Whichcustomers
customers
are
aremost
mostlikely
likelyto
togo
go
to
tothe
thecompetition
competition??
What
Whatimpact
impactwill
will
new
products/services
new products/services
have
haveon
onrevenue
revenue
and
andmargins?
margins?

ARCHITECTURE
Client

Client
Query & Analysis

Metadata

Warehouse

Integration

Source

Source

Source

OLTP
OLTP- ONLINE TRANSACTION PROCESSING
Special data organization, access methods and

implementation methods are needed to support data


warehouse queries (typically multidimensional queries).
OLTP systems are tuned for known transactions and

workloads while workload is not known a priori in a Data


Warehouse.
E.g., Average Amount spent on phone calls between 9AM-5PM in

Pune during the month of December

OLTP Vs DW
OLTP

DW

Application Oriented

Subject Oriented

Used to run business

Used to Analyze Business

Detailed Data

Summarized and Refined

Current Up-date Data

Snapshot Data

Isolated Data

Integrated Data

Few Records Accessed at a Time(tens)

Large Records Accessed at a time


(millions)

Read/Update Access

Mostly Read(Batch Update)

No Data Redundancy

Redundancy Present

Database Size (100MB -100 GB)

Database Size (100GB Few Terabytes)

Transaction throughput is performance


Metric.

Query throughput is the performance


Metric.

Thousands of users

Hundreds of Users

META DATA
Data about Data. Its Important for Designing, Constructing,
Retrieving, and controlling the Warehouse Data.
TECHNICAL META DATA
Include where the data come from, how the data were changed, how the data are
organized, how the data are stored, who owns the data, who is responsible for the
data and how to contact them, who can access the data , and the date of last update.

BUSINESS META DATA


Include what data are available, where the data are, what the data mean, how to
access the data, predefined reports and queries, and how current the data are.

OLAP
Online Analytical Processing - coined by EF Codd in 1994

paper contracted by Arbor Software.


Online analytical processing refers to such end user

activities as DSS modelling using spreadsheets and


graphics that are done online.
OLAP involves many different data items in complex

relationships.
Objective of OLAP is to analyze complex relationships

and look for patterns, trends and exceptions.

OLAP IS FASMI
FAST
ANALYTICAL
SHARED
MULTI-DIMENSIONAL
INFORMATION

BENEFITS OF DW
Provides business users with a customer-centric view of the

companys Heterogeneous data by helping to integrate data from


sales, service, manufacturing and distribution, and other customerrelated Business systems.
Provides added value to the companys customers by allowing them

to access better information when Data Warehousing is coupled with


Internet technology.
Consolidates data about individual customers and provides a

repository of all customer contacts for segmentation modeling,


customer retention planning, and cross sales analysis.
Reports on trends across multidivisional, multinational operating

units, including trends or relationships in areas such as


merchandising, production planning etc.

STRATEGIC USE

LIMITATIONS
Data warehouses are not the optimal environment for

unstructured data.
Over their life, data warehouses can have high Maintenance

costs.
Data warehouses can get outdated relatively quickly. There is a

cost of delivering suboptimal information to the organization.


There is often a fine line between data warehouses and

operational systems. Duplicate, expensive functionality may be


developed. Or, functionality may be developed in the data
warehouse that, in retrospect, should have been developed in
the operational systems and vice versa .

K
N
A
H
T

!
!
!
U
O
Y

Вам также может понравиться