Вы находитесь на странице: 1из 42

Chapter 7

Data Warehouse & OLAP

Database Systems: Design, Implementation, and Management


4th Edition

Peter Rob & Carlos Coronel


The Need for Data Analysis
 Constant pressure from external and internal forces
requires prompt tactical and strategic decisions.
 The decision-making cycle time is reduced, while
13 problems are increasingly complex with a growing
number of internal and external variables.
 Managers need support systems for facilitating quick
decision making in a complex environment.
u Decision support systems (DSS).
u Building a Stock Decision Support Tool in Microsoft Excel
2010, https://www.youtube.com/watch?v=iXfxxHx21so
13
13
Data warehouse
 A data warehouse is a database that provides
support for decision making
 A data warehouse database must be:

13  Integrated
 Subject-oriented
 Time-variant
 Non-volatile
 Benefits of Data warehouse (video)
The Data Warehouse
 The Data Warehouse is an integrated,
subject-oriented, time-variant, non-volatile
database that provides support for decision
13 making.

u Subject-oriented as the warehouse is organized


around the major subjects of the enterprise (such as
customers, products, and sales) rather than the major
application areas (such as customer invoicing, stock
control, and product sales). This is reflected in the
need to store decision-support data rather than
application-oriented data.
The Data Warehouse
u Integrated because of the coming together of source data
from different enterprise-wide applications systems. The
source data is often inconsistent using, for example, different
formats. The integrated data source must be made consistent

13 u
to present a unified view of the data to the users.
Time-variant because data in the warehouse is only accurate
and valid at some point in time or over some time interval. The
time-variance of the data warehouse is also shown in the
extended time that the data is held, the implicit or explicit
association of time with all data, and the fact that the data
represents a series of snapshots.
u Non-volatile as the data is not updated in real time but is
refreshed from operational systems on a regular basis. New
data is always added as a supplement to the database, rather
than a replacement. The database continually absorbs this
new data, incrementally integrating it with the previous data
13

Table 13.6A Comparison Of Data Warehouse And Operational


Database Characteristics
Creating A Data Warehouse

13

Figure 13.3
A Data Warehouse Framework and Views

13
The Data Warehouse
Twelve Rules That Define a Data Warehouse
1. The Data Warehouse and operational environments are
separated.
2. The Data Warehouse data are integrated.
3. The Data Warehouse contains historical data over a

13 long time horizon.


4. The Data Warehouse data are snapshot data captured at
a given point in time.
5. The Data Warehouse data are subject-oriented.
6. The Data Warehouse data are mainly read-only with
periodic batch updates from operational data. No online
updates are allowed.
7. The Data Warehouse development life cycle differs from
classical systems development. The Data Warehouse
development is data driven; the classical approach is
process driven.
The Data Warehouse
8. The Data Warehouse contains data with several levels
of detail; current detail data, old detail data, lightly
summarized, and highly summarized data.
9. The Data Warehouse environment is characterized by
read-only transactions to very large data sets. The
operational environment is characterized by numerous

13 update transactions to a few data entities at the time.


10. The Data Warehouse environment has a system that
traces data resources, transformation, and storage.
11. The Data Warehouse’s metadata are a critical
component of this environment. The metadata identify
and define all data elements. The metadata provide the
source, transformation, integration, storage, usage,
relationships, and history of each data element.
12. The Data Warehouse contains a charge-back
mechanism for resource usage that enforces optimal
use of the data by end users.
Architecture of Web-Based Data
Warehousing

13
OLAP vs. OLTP

13
We can divide IT systems into transactional (OLTP) and analytical
(OLAP).
In general we can assume that OLTP systems provide source data
to data warehouses, whereas OLAP systems help to analyze it.
OLTP
 OLTP deals with recording the real time transactions
that use in operational system such as transactions
happen in e-commerce and also banking ATM

13 system.
 OLTP (On-line Transaction Processing) is
characterized by a large number of short on-
line transactions (INSERT, UPDATE, DELETE).
 The main emphasis for OLTP systems is put on very fast
query processing, maintaining data integrity in multi-
access environments and an effectiveness measured
by number of transactions per second.
 In OLTP database there is detailed and current data, and
schema used to store transactional databases is the
entity model (usually 3NF).
On-Line Analytical Processing
 On-Line Analytical Processing (OLAP) is
deals with analyzing the data store in the
data warehouse.
 an advanced data analysis environment that

13 supports decision making, business


modeling, and operations research
activities.

 Four Main Characteristics of OLAP


 Use multidimensional data analysis techniques
 Provide advanced database support
u Provide easy-to-use end user interfaces
u Support client/server architecture
OLAP
 OLAP (On-line Analytical Processing) is characterized
by relatively low volume of transactions.
 Queries are often very complex and involve

13 aggregations.
 For OLAP systems a response time is an effectiveness
measure. OLAP applications are widely used by Data
Mining techniques. In OLAP database there is
aggregated, historical data, stored in multi-dimensional
schemas (usually star schema).
More video

 Introduction to OLAP
13  https://www.youtube.com/watch?v=2ry
G3Jy6eIY
 Excel Tutorial: What is Business
Intelligence and an OLAP Cube?
 https://www.youtube.com/watch?v=yo
E6bgJv08E
On-Line Analytical Processing
 Multidimensional Data Analysis Techniques
 The processing of data in which data are viewed

13 
as part of a multidimensional structure.
Multidimensional view allows end users to
consolidate or aggregate data at different levels.
 Multidimensional view allows a business analyst
to easily switch business perspectives.
 Refer to example : Excel
13

Figure 13.4 Operational Vs. Multidimensional View Of Sales


13

Figure 13.5 Integration Of OLAP With A Spreadsheet Program


INTEGRATION OF OLAP WITH A SPREADSHEET
PROGRAM - Pivot table in Excel

13
On-Line Analytical Processing
 OLAP Architecture
 Three Main Modules

13 


OLAP Graphical User Interface (GUI)
OLAP Analytical Processing Logic
 OLAP Data Processing Logic

 OLAP systems are designed to use both


operational and Data Warehouse data.
13

As Figure 13.17 illustrates, OLAP systems are designed to use both operational and
data warehouse data. The figure shows the OLAP system components on a single computer,
but this single-user scenario is only one of many. In fact, one problem with the
installation shown here is that each data analyst must have a powerful computer to store
the OLAP system and perform all data processing locally.
13
Types of On-Line Analytical Processing

 Relational OLAP (ROLAP)


o Relational On-Line Analytical Processing (ROLAP)

13 provides OLAP functionality by using relational


database and familiar relational query tools.
 Multidimensional OLAP (MOLAP)
o MOLAP extends OLAP functionality to
multidimensional databases (MDBMS).
o MDBMS end users visualize the stored data as a
multidimensional cube known as a data cube.
o Data cubes are created by extracting data from the
operational databases or from the data warehouse.
o Watch the video:
l https://www.youtube.com/watch?v=LzmAbi5ZOhE
ROLAP – using query tool

13
Multidimensional OLAP (continued)

13
Relational Vs. Multidimensional OLAP

13

Table 13.8
Star Schema
• The star schema is a data-modeling technique used
to map multidimensional decision support into a
relational database.

13 • Star schemas yield an easily implemented model for


multidimensional data analysis while still preserving
the relational structure of the operational database.
• A star schema has four Components:
• Facts
• Dimensions
• Attributes
• Attribute hierarchies
13
Star Schema

• Facts
• Facts are numeric measurements (values) that represent a
specific business aspect or activity. For example, sales
figures are numeric measurements that represent product

13 •
and service sales.
Facts commonly used in business data analysis are units,
costs, prices, and revenues. Facts are normally stored in a
fact table that is the center of the star schema.
• The fact table contains facts that are linked through their
dimensions, which are explained in the next section.
• Facts can also be computed or derived at run time. Such
computed or derived facts are sometimes called metrics to
differentiate them from stored facts.
• The fact table is updated periodically with data from
operational databases.
Star Schema
• Dimensions
• Dimensions are qualifying characteristics that provide
additional perspectives to a given fact. For instance,
sales might be compared by product from region to
region and from one time period to the next.

13 • The kind of problem typically addressed by a BI


system might be to compare the sales of unit X by
region for the first quarters of 2006 through 2016.
• In that example, sales have product, location, and
time dimensions. In effect, dimensions are the
magnifying glass through which you study the facts.
• Such dimensions are normally stored in dimension
tables. Figure 13.6 depicts a star schema for sales
with product, location, and time dimensions.
A Simple Star Schema

13
Star Schema
• Attributes
u Each dimension table contains attributes. Attributes are
often used to search, filter, or classify facts.

13 u Dimensions provide descriptive characteristics about


the facts through their attributes.

Table 13.10 Possible Attributes For Sales Dimensions


Star Schema
• OLAP consists of three basic analytical operations:
• consolidation (roll-up)
• Consolidation involves the aggregation of data that can be
accumulated and computed in one or more dimensions.

13 •
• For example, all sales offices are rolled up to the sales
department or sales division to anticipate sales trends
drill-down
• the drill-down is a technique that allows users to navigate
through the details.
• For instance, users can view the sales by individual products
that make up a region's sales
• slicing and dicing.
• Slicing and dicing is a feature whereby users can take out
(slicing) a specific set of data of the OLAP cube and view
(dicing) the slices from different viewpoints.
• These viewpoints are sometimes called dimensions (such as
looking at the same sales by salesperson or by date or by
customer or by product or by region, etc.)
Example of Aggregation in

13
13
A Location Attribute Hierarchy

13

Figure 13.15
Attribute Hierarchies In Multidimensional Analysis

13

Figure 13.16
Data Warehouse Implementation Road Map

13

Figure 13.21
• Refer to the following video about “Data Warehouse
Architecture”
• https://www.youtube.com/watch?v=CHYPF7jxlik

13 • Excel Tutorial: What is Business Intelligence and an


OLAP Cube?
• https://www.youtube.com/watch?v=yoE6bgJv08E

• Data Cube Operations – SQL Queries


• https://blogs.perficient.com/2017/08/02/data-cube-
operations-sql-queries/

Вам также может понравиться