Академический Документы
Профессиональный Документы
Культура Документы
Rahul Sachdeva
Syllabus
1
13-09-2019
Syllabus
CSE601 4
2
13-09-2019
...
Wrapper Wrappe Wrapper
r
...
Source Source Source
Disadvantages of Query-Driven
Approach
3
13-09-2019
4
13-09-2019
Characteristics of TPSs
Characteristic OLTP
Screens Unchanging
Orientation Records
CSE601 10
10
5
13-09-2019
CSE601 11
11
CSE601 12
12
6
13-09-2019
13
CSE601 14
14
7
13-09-2019
Problems
CSE601 15
15
World
Scientific Databases
Wide
Web
Digital Libraries
Different interfaces
Different data representations
Duplicate and inconsistent information
CSE601 16
16
8
13-09-2019
17
Integration System
World
Wide
Personal
Web
Digital Libraries Scientific Databases Databases
18
9
13-09-2019
...
Wrapper Wrapper Wrapper
...
Source Source Source
CSE601 19
19
Disadvantages of Query-Driven
Approach
CSE601 20
20
10
13-09-2019
...
Source Source Source
CSE601 21
21
22
11
13-09-2019
CSE601 23
23
12
13-09-2019
Summary
Business Business Information
Information Guide Interface
Data
Data Warehouse
Warehouse
Catalog
Data Warehouse
Population
Enterprise
Modeling
Operational Systems
26
13
13-09-2019
Warehouse is a Specialized DB
Standard DB Warehouse
• Mostly updates Mostly reads
• Many small transactions Queries are long and complex
• Mb - Gb of data Gb - Tb of data
• Current snapshot History
• Index/hash on p.k. Lots of scans
• Raw data Summarized, reconciled data
• Thousands of users (e.g., clerical Hundreds of users (e.g.,
users) decision-makers, analysts)
27
28
14
13-09-2019
Types of Data
Ad hoc access
Production
platforms
30
15
13-09-2019
Production
platforms
Operational reports
31
32
16
13-09-2019
Management Issues
34
17
13-09-2019
CSE601 35
35
“A DW is a
− subject-oriented,
− integrated,
− time-varying,
− non-volatile
collection of data that is used primarily in
organizational decision making.”
-- W.H. Inmon, Building the Data Warehouse, 1992
CSE601 36
36
18
13-09-2019
− Integrated
• The Data Warehouse is a centralized, consolidated
database that integrates data retrieved from the
entire organization.
− Subject-Oriented
• The Data Warehouse data is arranged and optimized
to provide answers to questions coming from
diverse functional areas within a company.
37
− Time Variant
• The Warehouse data represent the flow of data
through time. It can even contain projected data.
− Non-Volatile
• Once data enter the Data Warehouse, they are
never removed.
• The Data Warehouse is always growing.
38
19
13-09-2019
CSE601 39
39
… Cont’d
• Large volume of data (Gb, Tb)
• Non-volatile
− Historical
− Time attributes are important
• Updates infrequent
• May be append-only
• Examples
− All transactions ever at Sainsbury’s
− Complete client histories at insurance firm
− LSE financial information and portfolios
CSE601 40
40
20
13-09-2019
Warehouse Metadata
Maintenance
Integrator Optimization
...
CSE601 41
41
• Single-layer
systems systems
− Virtual warehouse
CSE601 42
42
21
13-09-2019
Three-layer Architecture:
Conceptual View
• Transformation of real-time data to derived
data really requires two steps
Operational Informational
systems systems
View level
“Particular informational
Derived Data
needs”
Reconciled Data
Physical Implementation
of the Data Warehouse
Real-time data
CSE601 43
43
DW vs DM
CSE601 44
44
22
13-09-2019
CSE601 45
45
CSE601 46
46
23
13-09-2019
CSE601 47
47
CSE601 48
48
24
13-09-2019
CSE601 49
49
CSE601 50
50
25
13-09-2019
Issues (1)
51
Issues (2)
52
26
13-09-2019
Wrapper
53
Wrapper Generation
54
27
13-09-2019
Routine When...
• Many tools for dealing with “standard situations”
− Standard sources with full/many capabilities
• e.g., most commercial DBMSs, all ODBC-compliant
sources
− Standard interactions
• e.g., pass-through queries, extraction from rel. tables,
replication
− Cooperative sources or sources under our control
• Tools
− Replication tools, ODBC, report writers, third-party
“wrappers”
55
56
28
13-09-2019
Data Transformations
57
Monitors
58
29
13-09-2019
Data Integration
59
Data Cleansing
60
30
13-09-2019
Warehouse is a Specialized DB
Standard DB (OLTP) Warehouse (OLAP)
• Mostly updates • Mostly reads
• Many small transactions • Queries are long and complex
• Mb - Gb of data • Gb - Tb of data
• Current snapshot • History
• Index/hash on p.k. • Lots of scans
• Raw data • Summarized, reconciled data
• Thousands of users (e.g., • Hundreds of users (e.g.,
clerical users) decision-makers, analysts)
CSE601 61
61
CSE601 62
62
31
13-09-2019
Decision Support
• Information technology to help the
knowledge worker (executive, manager,
analyst) make faster & better decisions
− “What were the sales volumes by region and product category for
the last year?”
− “How did the share price of comp. manufacturers correlate with
quarterly profits over the past 10 years?”
− “Which orders should we fill to maximize revenues?”
CSE601 63
63
64
32
13-09-2019
extract Query/Reporting
transform
load serve
refresh
etc. e.g., ROLAP
Operational
DB’s Data Mining
serve
Data Marts
CSE601 65
65
66
33
13-09-2019
67
CSE601 68
68
34
13-09-2019
CSE601 69
69
35