Вы находитесь на странице: 1из 32

Module 3

Designing and Implementing a


Data Warehouse
Module Overview

Data Warehouse Design Overview


Designing Dimension Tables
Designing Fact Tables
Physical Design for a Data Warehouse
Lesson 1: Data Warehouse Design Overview

The Dimensional Model


The Data Warehouse Design Process
Dimensional Modeling
Documenting Dimensional Models
The Dimensional Model

Dimension
Attributes

Dimension Dimension
Attributes Attributes

Fact
Measures

Star schema

Dimension
Attributes Dimension
Attributes
Snowflake schema Dimension
Attributes
The Data Warehouse Design Process

1. Determine analytical and reporting requirements


2. Identify the business processes that generate the
required data
3. Examine the source data for those business
processes
4. Conform dimensions across business processes
5. Prioritize processes and create a dimensional
model for each
6. Document and refine the models to determine
the database logical schema
7. Design the physical data structures for the
database
Dimensional Modeling

Factory Line
Salesperson

Department

Warehouse
Customer

Account
Product

Shipper
Time
Business Processes
Manufacturing x x x
Order Processing x x x x
Order Fulfilment x x x
Financial Accounting x x x
Inventory Management x x x

Grain: 1 row per order item


Dimensions: Time (order date and ship date), Product, Customer, Salesperson
Facts: Item Quantity, Unit Cost, Total Cost, Unit Price, Sales Amount, Shipping Cost
Documenting Dimensional Models

Time
(Order Date Salesperson
and Ship Date)
Calendar Year Region
Month Country
Date Territory
Fiscal Year Manager
Fiscal Quarter
Sales Order Name
Month Name
Date Item Quantity
Unit Cost
Total Cost
Unit Price
Sales Amount Country
Category Shipping Cost State or Province
Subcategory City
Product Name Age
Color Marital Status
Size Gender

Product Customer
Lesson 2: Designing Dimension Tables

Considerations for Dimension Keys


Dimension Attributes and Hierarchies
Unknown and None
Designing Slowly Changing Dimensions
Time Dimension Tables
Self-Referencing Dimension Tables
Junk Dimensions
Considerations for Dimension Keys

CustomerKey CustomerAltKey Name


1 1002 Amy Alberts
2 1005 Neil Black

Surrogate Key Business (Alternate) Key

ProductKey ProductAltKey ProductName Color Size


1 MB1-B-32 MB1 Mountain Bike Blue 32
2 MB1-R-32 MB1 Mountain Bike Red 32
Dimension Attributes and Hierarchies

CustKey CustAltKey Name Country State City Phone Gender

1 1002 Amy Alberts Canada BC Vancouver 555 123 F

2 1005 Neil Black USA CA Irvine 555 321 M

3 1006 Ye Xu USA NY New York 555 222 M

Hierarchy

Drill-through detail
Unknown and None

Identify the semantic meaning of NULL


Unknown or None?

Do not assume NULL equality


Use ISNULL( )

OrderNo Discount DiscountType

1000 1.20 Bulk Discount

1001 0.00 N/A

1002 2.00
Dimension Table
DiscKey DiscAltKey DiscountType
1003 0.50 Promotion
-1 Unknown Unknown
1004 2.50 Other
0 N/A None
1005 0.00 N/A
1 Bulk Discount Bulk Discount
1006 1.50
2 Promotion Promotion
Source 3 Other Other
Designing Slowly Changing Dimensions
CustKey CustAltKey Name Phone

1 1002 Amy Alberts 555 123

CustKey CustAltKey Name Phone


Type 1 1 1002 Amy Alberts 555 222

CustKey CustAltKey Name City Current Start End

1 1002 Amy Alberts Vancouver Yes 1/1/2000

Type 2
CustKey CustAltKey Name City Current Start End

1 1002 Amy Alberts Vancouver No 1/1/2000 1/1/2012

4 1002 Amy Alberts Toronto Yes 1/1/2012

CustKey CustAltKey Name Cars

1 1002 Amy Alberts 0

Type 3 CustKey CustAltKey Name Prior Cars Current Cars

1 1002 Amy Alberts 0 1


Time Dimension Tables

DateKey DateAltKey MonthDay WeekDay Day MonthNo Month Year


00000000 01-01-1753 NULL NULL NULL NULL NULL NULL

20130101 01-01-2013 1 3 Tue 01 Jan 2013

20130102 01-02-2013 2 4 Wed 01 Jan 2013

20130103 01-03-2013 3 5 Thu 01 Jan 2013

20130104 01-04-2013 4 6 Fri 01 Jan 2013

Surrogate key
Granularity
Range
Attributes and hierarchies
Multiple calendars
Unknown values
Self-Referencing Dimension Tables

EmployeeKey EmployeeAltKey EmployeeName ManagerKey


1 1000 Kim Abercrombie NULL

2 1001 Kamil Amireh 1

3 1002 Cesar Garcia 1

4 1003 Jeff Hay 2

Kim Abercrombie
Kamil Amireh
Jeff Hay
Cesar Garcia
Junk Dimensions

JunkKey OutOfStockFlag FreeShippingFlag CreditOrDebit


1 1 1 Credit

2 1 1 Debit

3 1 0 Credit

4 1 0 Debit

5 0 1 Credit

6 0 1 Debit

7 0 0 Credit

8 0 0 Debit

Combine low-cardinality attributes that dont


belong in existing dimensions into a junk
dimension
Avoids creating many small dimension tables
Lesson 3: Designing Fact Tables

Fact Table Columns


Types of Measure
Types of Fact Table
Fact Table Columns

Dimension Keys
OrderDateKey ProductKey CustomerKey OrderNo Qty SalesAmount
20120101 25 120 1000 1 350.99

20120101 99 120 1000 2 6.98

20120101 25 178 1001 2 701.98

Measures
OrderDateKey ProductKey CustomerKey OrderNo Qty SalesAmount
20120101 25 120 1000 1 350.99

20120101 99 120 1000 2 6.98

20120101 25 178 1001 2 701.98

Degenerate Dimensions
OrderDateKey ProductKey CustomerKey OrderNo Qty SalesAmount
20120101 25 120 1000 1 350.99

20120101 99 120 1000 2 6.98

20120101 25 178 1001 2 701.98


Types of Measure

Additive
OrderDateKey ProductKey CustomerKey SalesAmount
20120101 25 120 350.99

20120101 99 120 6.98

20120102 25 178 701.98

Semi-Additive
DateKey ProductKey StockCount
20120101 25 23

20120101 99 118

20120102 25 22

Non-Additive
OrderDateKey ProductKey CustomerKey ProfitMargin
20120101 25 120 25

20120101 99 120 22

20120102 25 178 27
Types of Fact Table

Transaction Fact Tables


OrderDateKey ProductKey CustomerKey OrderNo Qty Cost SalesAmount
20120101 25 120 1000 1 125.00 350.99

20120101 99 120 1000 2 2.50 6.98

20120101 25 178 1001 2 250.00 701.98

Periodic Snapshot Fact Tables


DateKey ProductKey OpeningStock UnitsIn UnitsOut ClosingStock
20120101 25 25 1 3 23

20120101 99 120 0 2 118

Accumulating Snapshot Fact Tables


OrderNo OrderDateKey ShipDateKey DeliveryDateKey
1000 20120101 20120102 20120105

1001 20120101 20120102 00000000

1002 20120102 00000000 00000000


Lesson 4: Physical Design for a Data Warehouse

Data Warehouses I/O Activity


Consideration for Database Files
Table Partitioning
Demonstration: Partitioning a Fact Table
Considerations for Indexes
Demonstration: Creating Indexes
Data Compression
Demonstration: Implementing Data Compression
Using Views to Abstract Base Tables
Data Warehouses I/O Activity

Data Model Processing


Mostly table/index scans

ETL Loads
Bulk inserts
Data Models
Report Processing
Some lookups and Predictable queries
updates Many rows with range-based
query filters

ETL
Reports
Large fact
tables
Self-Service BI
Potentially
Star joins to
unpredictable
dimension
queries
tables

User Queries
Consideration for Database Files

Data files and filegroups


Staging tables
TempDB
Transaction logs
Backup files
Table Partitioning

OrderDateKey ProductKey CustomerKey OrderNo Qty Cost SalesAmount


20120101 25 120 1000 1 125.00 350.99

20120101 99 120 1000 2 2.50 6.98

20120101 25 178 1001 2 250.00 701.98

...

20120201 23 76 2124 1 95.00 125.00

20120201 89 6 2125 1 45.00 76.99

Jan Feb
Pre-2010 2010 2011
2012 2012
Demonstration: Partitioning a Fact Table

In this demonstration, you will see how to:


Create a Partitioned Table
View Partition Metadata
Split a Partition
Merge Partitions
Considerations for Indexes

Dimension table indexes


Clustered index on surrogate key column
Non-clustered index on business key and SCD columns
Non-clustered indexes on frequently searched columns

Fact table indexes


Clustered index on most commonly searched date key
Non-clustered indexes on other dimension keys
Or
Columnstore index on all columns
Demonstration: Creating Indexes

In this demonstration, you will see how to:


Create Indexes on Dimension Tables
View Index Usage and Execution Statistics
Create Indexes on a Fact Table
Create a Columnstore Index
Data Compression

Apply page compression on all dimension tables,


indexes, and fact table partitions
If performance becomes CPU-bound, fall back to
row compression on the most queried partitions
Demonstration: Implementing Data
Compression

In this demonstration, you will see how to:


Create Uncompressed Tables and Indexes
Estimate Compression Savings
Create Compressed Tables and Indexes
Compare Query Performance
Using Views to Abstract Base Tables

CREATE VIEW dw_views.SalesOrder


WITH SCHEMABINDING
AS
SELECT [OrderDateKey]
,[ProductKey]
,[ShipDateKey]
,[CustomerKey]
,[OrderNumber]
,[OrderQuantity]
,[UnitPrice]
,[SalesAmount]
FROM [dbo].[FactSalesOrder]
WITH (NOLOCK)
Lab: Implementing a Data Warehouse

Exercise 1: Implement a Star Schema


Exercise 2: Implementing a Snowflake Schema
Exercise 3: Implementing a Time Dimension Table

Logon Information
Virtual machine: 20463C-MIA-SQL
User name: ADVENTUREWORKS\Student
Password: Pa$$w0rd

Estimated Time: 45 Minutes


Lab Scenario

You have gathered analytical and reporting


requirements from stakeholders at Adventure
Works Cycles. Now you must implement a data
warehouse schema to support them.
Module Review and Takeaways

Review Question(s)

Вам также может понравиться