Вы находитесь на странице: 1из 28

Data warehousing

The basic concepts & its


applications
What is a Data warehouse?
It means…
 collection of databases with information extracted
from other databases.
 a repository of an organization's electronically
stored data
 model of corporation's data, put together to
answers the corporation's business questions.
 A data warehouse is designed to support strategic
decision making.
 designed to facilitate reporting and analysis.
Comparing a Data Warehouse
and an Operational Database
Data Warehouse, Operational Database

subject oriented application oriented

integrated multiple diverse sources

time –variant, current real-time

nonvolatile updateable
History of DATA
WAREHOUSING
 concept dates back to the late
1980s when IBM researchers Barry Devlin
and Paul Murphy developed "business
data warehouse".
 In essence, data warehousing concept
was intended to provide an architectural
model for the flow of data from
operational systems to decision
support environments
 In absence of a data warehousing architecture, an
enormous amount of redundancy of information was
required to support the multiple decision support
environments that usually existed.

 In larger corporations it was typical for multiple


decision support environments to operate
independently. Each environment served different
users but often required much of the same data.

 The process of gathering, cleaning and integrating


data from various sources, usually long existing
operational systems was replicated for each
environment.
 the operational systems were frequently reexamined as
new decision support requirements emerged.

 Based on analogies with real-life warehouses, data


warehouses were intended as large-scale
collection/storage/staging areas for corporate data.

 Data could be retrieved from one central point or data


could be distributed to "retail stores" or "data marts"
which were tailored for ready access by users.
Types of Data Warehousing
Applications
 Data warehousing systems target three
different types of applications:

 personal productivity

 query and reporting

 planning and analysis


"Personal productivity applications

 spreadsheets, statistical packages


and graphics tools,
 are useful for manipulating and
presenting data on individual PCs.
 Developed for a standalone
environment, these tools address
applications requiring only small
volumes of warehouse data.
"Data query and reporting applications

 deliverwarehouse-wide data access


through simple, list-oriented queries,
 the generation of basic reports.
 These reports provide a view of historical
data
 they do not address the enterprise need
for in-depth analysis and planning.
"Planning and analysis applications

address essential business requirements as

 budgeting,
 forecasting,
 product line and customer profitability,
 sales analysis,
 financial consolidations
 manufacturing mix analysis
 --applications that use historical, projected and derived
data.
Benefits of Data
Warehousing
 Has a subject area orientation
 Data is arranged by subject rather than by application,
and is more intuitive for users to navigate.
 closer in concept to the way decision makers think
about their business.

 Integrates data from multiple,


diverse sources
 the integration provided by a data warehouse was seen
as very desirable.
 Allows grouping multiple, diverse sources of data &
integrating them into single administrative suite
 Allows for analysis of data over time
 data snapshots taken at times that are significant to the decision
making process make it possible to analyze trends over time.
 This can readily indicate areas where further analysis is warranted.

 Adds ad hoc reporting and enquiry


 In addition to the points highlighted in the definition, a data
warehouse is designed to be accessible with end-user tools, and this
allows ad hoc reporting and analysis by end-users.
 Relieves the development burden on IT
 With data warehouse and users trained in the use of appropriate desktop
tools, users can find answers to their own questions.
 Provides improved performance for complex analytical queries

 provides improved performance for complex


analytical queries
 Using denormalization and aggregation.
 Frequently used aggregates are often precomputed and materialized in
views known as summary tables
 These materialized views provide fast access to integrated data,
regardless of the original data sources.
 Relieves processing burden on
transaction oriented databases
 Provides analysis capabilities to decision
makers
 Allows for a continuous planning process
 Converts Corporate Data into Strategic
Information
Drawbacks of data
warehousing
 1. Time spent in careful analysis of
measurable needs
 2. Design and implementation effort
 3. Hardware costs
 4. Software costs
 5. On-going support and maintenance
 6. Resulting re-engineering effort
Design process of data
warehouse
The actual design process for developing
data warehouse

 talk to the users


 determine their needs in terms that can be measured
 design a database to support those needs
 document the data descriptions and other attributes
 design the logic for translating data from various sources
into an integrated data store
 Writing code for extracting data from various sources &
transforming it into data warehouse, with updates to the
metadata
 Finally package the procedures to handle scheduling,
management and maintenance
Data Warehouse Architecture

 Architecture,in the context of an


organization's data warehousing efforts,
is a conceptualization of how the data
warehouse is built.

 conceptualizationof data warehouse


architecture consists of the following
interconnected layers:
 OPERATIONAL DATABASE LAYER
 source data for the data warehouse - An organization's EIS systems fall
into this layer.

 INFORMATIONAL ACCESS LAYER


 data accessed for reporting and analyzing & tools for reporting and
analyzing data - Business intelligence tools fall into this layer.

 DATA ACCESS LAYER


 The interface between the operational and informational access layer -
Tools to extract, transform, load data into the warehouse fall into this
layer.

 METADATA LAYER
 The data directory -This is often usually more detailed than an
operational system data directory. There are dictionaries for the entire
warehouse and sometimes dictionaries for the data that can be
accessed by a particular reporting and analysis tool.
Data Warehouse
Components
 LOAD MANAGEMENT
 relates to the collection of info. from disparate internal or external
sources.
 loading process includes summarizing, manipulating and changing data
structures into format that lends itself to analytical processing.
 Actual raw data kept alongside or within data warehouse enabling
construction of new and different representations.

 WAREHOUSE MANAGEMENT
 relates to day-to-day management of data warehouse.
 management tasks associated with warehouse includes ensuring its
availability, the effective backup of its contents & security.

 QUERY MANAGEMENT
 relates to provision of access to contents of warehouse
 includes partitioning of info. into different areas with different
privileges to different users.
 Access may be provided through custom-built applications, or ad hoc
query tools.
Technologies Involved In Data
Warehousing

 source system identification:


 data warehouse design and creation:
 changed data capture:
 data acquisition:
 data cleansing:
 data aggregation
 multi-dimensional analysis tools:
 business intelligence (bi)
 metadata management
 data mining tools:
 data visualization tools:
 query tools:
APPROACHES TO STORING
DATA
 leading approaches
to storing data in a
data warehouse –
 DIMENSIONAL
APPROACH
 NORMALIZED
APPROACH.
DIMENSIONAL APPROACH
 transaction data are partitioned into either "facts" (numeric transaction data), or
"dimensions", (reference information giving context to facts).

 Key advantage :
 easier for the user to understand and to use.
 the retrieval of data from the data warehouse tends to operate very quickly.

 disadvantages

 to maintain integrity of facts and dimensions, loading the data warehouse with data
from different operational systems is complicated
 difficult to modify data warehouse structure if organization adopting the
dimensional approach changes the way in which it does business.
NORMALIZED APPROACH
 the data in the data warehouse are stored following the Codd
normalization rule.

 Tables grouped by subject areas that reflect general data categories


data on customers, products, finance, etc.)

 Key advantage
 it is straightforward to add information into the database.

 disadvantage
 because of no. of tables involved, difficult for users to
 1) Join data from different sources into meaningful information
 2) Access the information without a precise understanding of
sources of data & data structure of the data warehouse.

Вам также может понравиться