Вы находитесь на странице: 1из 20

Building a

Data
Warehouse
using
SQL Server
2008
Presented by Wes Dumey
Orlando SQL Saturday
October 16, 2010
First Things First
Networking is key at these events, please
take a minute and introduce yourself to
the person to the left and right of you
Lets Talk Trash.
Well discuss data warehousing with a
view of how a trash company like Waste
Management could build a data
warehouse
All photographs and logos are property of
Waste Management, Inc.
Fun Facts about Trash
Municipal solid waste (a.k.a. trash) is
generated at a rate of 250 million tons of
trash per year (in the USA)
Each person produces an average of 4.5
lbs of trash per day
The nationwide recycling rate in 2008 was
33.2%

*source www.epa.gov
About the Presenter
Senior Consultant, Durable Impact
Consulting, a Florida-based data
warehouse consulting practice
10+ years experience developing business
intelligence solutions
Personal Interests: Economics and
Aviation
Agenda
Overview of Data Warehouse principles
Data Modeling and Data Warehouse
Architecting exercises
SSIS Example
Question/Answer Session
Lets Get Started
Our client today is Waste Management,
Inc.
Our project is to develop a business
intelligence solution covering residential
and commercial service routes

Problem Definition
We need to solve the following business
problems:
Business has no long term trend picture of
commissioned employee performance
Business has no ability to verify whether sales
contracts are profitable
Business would like to be able to conduct
elasticity modeling on pricing
Steps to Complete Project
Determine metrics to be captured
Analyze source systems
Develop data model
Architect ETL solution
Design and develop reporting/analysis
solution
Project Overview
Overview of a data warehouse:
A centralized database system optimized for
analysis that contains information from one or
more source systems
ETL (extract, transform, and load) jobs are
created to load the data warehouse
A reporting package typically sits on top of
the data warehouse to provide end user
analysis
Data Modeling Primer
A data model is a logical and physical
representation of the star (or snowflake)
schemas used for the relational model
Three schematic table types:
Dimension: descriptions and attributes
Facts: measures and quantities
Aggregates: pre-computed answers (rolled up
facts)
Exercise: Can you think of some dimensions, facts,
and aggregates used for this example?
Data Model
Dimensions: Date, Customer, Employee, Route, Vehicle,
Rate
Facts: Sales activity, haul activity
Aggregates: Sales amount by employee, hauls by vehicle


How Facts and Dimensions are joined
By use of a surrogate key (generally meaningless number)
Each dimension has a surrogate key as the primary identifier
Natural keys in the data are used to find the surrogate keys
which are then passed into the fact tables
This design allows for high performance
Aggregates are joined to facts through the common keys
Data Warehouse Dimensions
EDW_DATE_DIM (date_key, date attibutes, )
EDW_CUSTOMER_DIM (customer_key,
customer name, customer address, )
EDW_EMPLOYEE_DIM (employee_key,
employee id, employee name, )
EDW_ROUTE_DIM (route_key, route id, route
name, city, state, region, )
EDW_VEHICLE_DIM (vehicle_key, vehicle id,
vehicle type, make, model, year, acquire date,
disposal date, )
EDW_RATE_DIM (rate_key, rate id, rate type,
begin date, end date, current ind, ) SCD
Facts and Aggregates
EDW_SALES_ACT_FACT (account_key,
customer_key, employee_key, date_key,
sales_amount, .)
EDW_HAUL_ACT_FACT (account_key,
customer_key, date_key, vehicle_key, haul
volume, )
EDW_DAILY_SALES_AGG (account_key,
customer_key, employee_key, date_key,
sales_amount, ) key determinant here is granularity
ETL Solution
ETL = Extract, transform, and load
Typically performed using ETL tools such
as SQL Server 2008
Designed to read data from the source
system and load it into the star schema
Typically scheduled on a repeating basis
to keep data current
Can be simple or very complex
Data Architecture Considerations
To stage or not to stage (creating a staging
area, a temporary place for source data)
Data volumes will depend on how we
build our jobs
Designed for ease of support and
maintenance
Auditing
Use batch audit tables to keep track of
what is running
Track insert/update metrics
Always know what is going on in your
warehouse (and maybe trash, too)
Reporting Solution
Create reports using SQL Server Reporting
Services





Introduction to SSIS

Question/Answer Session

Additional Resources
Durable Impact white papers
www.durableimpact.com
Microsoft blogs
Some books of interest:

Вам также может понравиться