Вы находитесь на странице: 1из 26

Data Modelling

Pertemuan ke 3 & 4
Kompetensi Khusus
• Mahasiswa mampu menjelaskan siklus hidup
data warehouse serta jenis-jenis data yang
digunakan dalam pemodelan data warehouse
(C2)
Materi
1. Data Warehouse Lifecycle
2. Designing the Dimensional Data Store
3. Data Mart
4. Data Hierarchy
5. Physical Database Design
1. Data Warehouse Lifecycle
• In software engineering, the discipline that
studies the process people use to develop an
information system is called the system
development life cycle(SDLC) or the system
development methodology.
• There are two main variants: waterfall and
iterative.
• There are three main streams of work: ETL,
data stores, and frontend applications.
1.1 Data Warehouse Lifecycle
Analysis

Test and
Design
deploy

Install front-
Import data
end tools

Fig 1. Data Warehouse Lifecycle


1.1.1 Stage 1: Analysis
• Identify:
–Target Questions
–Data needs
–Timeliness of data
–Granularity
• Create an enterprise-level data dictionary
• Dimensional analysis
–Identify facts and dimensions
1.1.2 Stage 2: Design
• Star schema
• Data Transformation
• Aggregates
• Pre-calculated Values
• HW/SW Architecture
1.1.2 Stage 2: Design (Cont..)
• Fact Table: The primary table in a dimensional
model that is meant to contain
measurements of the business.
• Dimension Table: One of a set of companion
tables to a fact table. Most dimension tables
contain many textual attributes that are the
basis for constraining and grouping within
data warehouse queries.
1.1.3 Stage 3: Import Data
• Identify data sources
• Extract the needed data from existing systems to
a data staging area
• Transform and Clean the data
–Resolve data type conflicts
–Resolve naming and key conflicts
–Remove, correct, or flag bad data
–Conform Dimensions
• Load the data into the warehouse
1.1.3 Stage 3: Import Data (Cont..)
OLTP 1

Data Staging Area Data


OLTP 2
Warehouse

OLTP 3

Operational Systems
(source systems)
1.1.4 Stage 4: Install Front-end Tools
• Reporting tools
• Data mining tools
• GIS
• Etc.
1.1.5 Stage 5: Test and Deploy
• Usability tests
• Software installation
• User training
• Performance tweaking based on usage
2. Designing the
Dimensional Data Store
• Dimensional modeling is widely accepted as
the preferred technique for presenting
analytic data because it addresses two
simultaneous requirements:
 Deliver data that’s understandable to the
business users.
 Deliver fast query performance.
• Dimensional modeling is a longstanding
technique for making databases simple.
Fig 2. Star schema versus OLAP cube

Sumber: Kimball Ralph., Ross Margy, 2013


3. Data Mart
Fig 3. Dimensional star schema for communication subscription data mart
Sumber: Vincent Rainardi, 2008
Fig 4. Dimensional star schema for campaign results data mart
Sumber: Vincent Rainardi, 2008
4. Data Hierarchy
• In dimension tables, there is a certain structure called
hierarchy.
• Hierarchy is important because it provides you with paths
that you can use to roll up and drill down when analyzing
the data.
• In the dimension tables, sometimes an attribute (column) is
a subset of another attribute, meaning that the attribute
values can be grouped by the other attribute.
• The attribute that can be used to group is said to be on a
higher level than the attribute that is being grouped.
• if A is a subset of B, then A is on a higher level than B.
These levels in the dimensions tables are called
dimensional hierarchy.
Fig 5. Hierarchy of the date dimension,
the store, customer, and product dimensions

Sumber: Vincent Rainardi, 2008


5. Physical Database Design
• Physical database design is a fundamental
part of data warehouse design.
• The performance of a data warehouse is
largely affected by the physical design of the
underlying databases and the environment
where the data-bases are running.
• To do the physical database design, it is
important to understand the physical system
architecture on which the database will be
operating.
Fig 6. System architecture for the Amadeus Entertainment data warehouse
Sumber: Vincent Rainardi, 2008
Summary
• We need to define the business requirements when
building a data warehouse using data warehouse
lifecycle.
• Dimensional modeling is a longstanding technique
for making databases simple.
• Hierarchy is important because it provides you with
paths that you can use to roll up and drill down when
analyzing the data.
TERIMA KASIH

Вам также может понравиться