Warehouse Architecture
May 7-9, 2014 | San Jose, CA
Other Presentations
About Me
Business Intelligence Consultant, in IT for 28 years
Microsoft, Big Data Evangelist
Worked as desktop/web/database developer, DBA, BI and DW architect and
developer, MDM architect, PDW developer
Been perm, contractor, consultant, business owner
Presenter at PASS Business Analytics Conference and PASS Summit
MCSE for SQL Server 2012: Data Platform and BI
Blog at JamesSerra.com
SQL Server MVP
Author of book Reporting with Microsoft SQL Server 2012
Agenda
A data warehouse is not a copy of a source database with the name prefixed with
DW
It is not a copy of multiple tables (i.e. customer) from various sources systems
unioned together in a view
It is not a dumping ground for tables from various sources with not much design
put into it
Courtesy of Wayne
Eckerson
Finance
Marketing
Sales
Accounting
Management
Reporting
Engineering
Single
version of
the truth
Enterprise Data
Warehouse
Actuarial
Human
Resources
Every question = decision
Two purposes of data warehouse: 1) save time building reports; 2) slice in dice in ways you could
not do before
Hardware Solutions
Fast Track Data Warehouse - A reference configuration optimized for data
warehousing. This saves an organization from having to commit resources to
configure and build the server hardware. Fast Track Data Warehouse hardware is
tested for data warehousing which eliminates guesswork and is designed to save
you months of configuration, setup, testing and tuning.You just need to install
the OS and SQL Server
Appliances - Microsoft has made availableSQL Server appliances (SMP and MPP)
that allow customers to deploydata warehouse (DW), business intelligence (BI)
and database consolidation solutions in a very short time, with all the
components pre-configured and pre-optimized. These appliances include all the
hardware, software and services for a complete, ready-to-run, out-of-the-box,
high performance, energy-efficient solutions
Windows Server
2012 R2
SQL Server 2014
Processors
Networking
Servers
Storage
Software
SQL Server 2014 Enterprise
Windows Server 2012 R2
Database configuration
Workload-specific
Database architecture
SQL Server settings
Windows Server settings
Performance guidance
By yourself
HIGH
With a
reference
architecture
With an
appliance
Time to
solution
Installation
Installation
Installation
LOW
Existing or procured
hardware and support
Existing or procured
hardware and support
Procured appliance
and support
Price
HIGH
Offerings
Offerings
Offerings
Faster
Deployment
Reduced risk
Flexibility and
Choice
Dell
EMC
HP/ScanDisk
Lenovo
NEC
Tegile
Myth: Inmon requires a ton of up-front design that takes a long time
Inmon says to build DW iteratively, not big bang approach (p. 91 BDW, p. 21
Imhoff)
Myth: Star schema data marts are not allowed in Inmons model
Inmon says they are good for direct end-user access of data (p. 365 BDW),
good for data marts (p. 12 TTA)
Dimensional Modeling:
Dimensional
Model
Inmon:
Kimball Model
DW Bus Architecture
Data Warehouse (star
schema subject areas)
Staging
SSIS
Staging
Area 1
SSIS
Cube Process
Data Mart 1
Multidimensional
Atomic Data
OLTP Data
Sources
SSIS
SSIS
Staging
Area 2
Staging
Area 3
Reporting
Layer
SSIS
Data Mart 2
Dimensionalized View
Multidimensional
Inmon Model
Staging
SSIS
Staging
Area 1
Corporate Information
Factory (CIF)
SSIS
SSIS
OLTP Data
Sources
SSIS
Staging
Area 2
SSIS
Staging
Area 3
SSIS
Dimensionalized
View
Multidimensional
Reporting
Layer
Data Warehouse
(Normalized)
SSIS
SSIS
Data Mart 1
(Normalized)
Atomic Data
Data Mart 2
(Normalized)
Cube Process
Tabular
Hybrid Model
DW Bus Architecture
Staging
SSIS
OLTP Data
Sources
Staging
Area 1
Corporate Information
Factory (CIF)
SSIS
EDW
SSIS
Cube Process
Data Mart 1
Multidimensional
Mirror
OLTP
(subset)
SSIS
Staging
Area 2
SSIS
Data Warehouse
(Normalized)
Reporting
Layer
SSIS
Data Mart 2
Atomic Data
SSIS
Staging
Area 3
SSIS
Atomic Data
Cube Process
Tabular
In the DW Bus Architecture, each data mart could be a schema (broken out by business process subject
areas), all in one database. Another option is to have each data mart in its own database with all databases
on one server or spread among multiple servers. Also, the staging areas, CIF, and DW Bus can all be on the
same powerful server (MPP)
Advice: Use SQL Server Views to interface between each level in the model
Kimball Methodology
ETL vs ELT
Surrogate Keys
Surrogate Keys Unique identifier not derived from source
system
SSAS Cubes
Reasons to report off cubes instead of the data warehouse:
Scalable
MPP architecture
HDInsight
Polybase
Resources
Q&A