Академический Документы
Профессиональный Документы
Культура Документы
Warehouse Architecture
Finding business value in Big Data (What exactly is Big Data and why
should I care?)
Very similar to “Building a Big Data Solution” but target audience is business users/CxO instead of architects
Power BI, Azure ML, Azure HDInsights, Azure Data Factory, etc
Deep dives into the various Microsoft Big Data related products
About Me
Business Intelligence Consultant, in IT for 28 years
Microsoft, Big Data Evangelist
Worked as desktop/web/database developer, DBA, BI and DW architect and developer, MDM
architect, PDW developer
Been perm, contractor, consultant, business owner
Presenter at PASS Business Analytics Conference and PASS Summit
MCSE for SQL Server 2012: Data Platform and BI
Blog at JamesSerra.com
SQL Server MVP
Author of book “Reporting with Microsoft SQL Server 2012”
I tried to build a data warehouse on my own…
• A data warehouse is not a copy of a source database with the name prefixed with “DW”
• It is not a copy of multiple tables (i.e. customer) from various sources systems unioned
together in a view
• It is not a dumping ground for tables from various sources with not much design put into it
Data Warehouse Maturity Model
Parts
Management Accounting
Single version
Logistics Management of the truth
Reporting
Shipping Enterprise Data
Engineering
Warehouse
Raw Goods Actuarial
Two purposes of data warehouse: 1) save time building reports; 2) slice in dice in ways you could not do before
Hardware Solutions
Fast Track Data Warehouse - A reference configuration optimized for data warehousing. This
saves an organization from having to commit resources to configure and build the server
hardware. Fast Track Data Warehouse hardware is tested for data warehousing which
eliminates guesswork and is designed to save you months of configuration, setup, testing
and tuning. You just need to install the OS and SQL Server
Appliances - Microsoft has made available SQL Server appliances (SMP and MPP) that allow
customers to deploy data warehouse (DW), business intelligence (BI) and database
consolidation solutions in a very short time, with all the components pre-configured and
pre-optimized. These appliances include all the hardware, software and services for a
complete, ready-to-run, out-of-the-box, high performance, energy-efficient solutions
Data Warehouse Fast Track for SQL
Server 2014
Software
Windows Server • SQL Server 2014 Enterprise
2012 R2 • Windows Server 2012 R2
Configuration
Tuning and optimization
Time to Configuration
Tuning and optimization
solution Installation
Installation Installation
LOW
Existing or procured Existing or procured
hardware and support hardware and support
Price
HIGH
Offerings Offerings Offerings
• SQL Server 2014 • Private Cloud Fast Track • Data Warehouse Fast Track
• Windows Server 2012 R2 • Data Warehouse Fast Track • Analytics Platform System
• System Center 2012 SP1
Optional, if you have hardware already
Data Warehouse Fast Track advantages
In short, a data warehouse as contains many subject areas, and a data mart
contains just one of those subject areas
Kimball and Inmon Methodologies
Two approaches for building data warehouses
Kimball and Inmon Myths
In the DW Bus Architecture, each data mart could be a schema (broken out by business process subject areas), all in one
database. Another option is to have each data mart in its own database with all databases on one server or spread among
multiple servers. Also, the staging areas, CIF, and DW Bus can all be on the same powerful server (MPP)
Advice: Use SQL Server Views to interface between each level in the model
Kimball Methodology
Kimball defines a development lifecycle, where Inmon is just about the data warehouse (not “how” used)
Populating a Data Warehouse
Determine frequency of data pull (daily, weekly, etc)
Full Extraction – All data (usually dimension tables)
Incremental Extraction – Only data changed from last run (fact tables)
How to determine data that has changed
Timestamp - Last Updated
Change Data Capture (CDC)
Partitioning by date
Triggers on tables
MERGE SQL Statement
Column DEFAULT value populated with date
Online Extraction – Data from source. First create copy of source:
Replication
Database Snapshot
Availability Groups
Offline Extraction – Data from flat file
ETL vs ELT
• Extract, Transform, and Load (ETL)
• Transform while hitting source system
• No staging tables
• Processing done by ETL tools (SSIS)
• Extract, Load, Transform (ELT)
• Uses staging tables
• Processing done by target database engine (SSIS: Execute T-SQL Statement task instead
of Data Flow Transform tasks)
• Use for big volumes of data
• Use when source and target databases are the same
• Use with the Analytics Platform System (APS)
Follow-on presentation: “Building a Big Data Solution (Building an Effective Data Warehouse
Architecture with Hadoop, the cloud, and MPP)”
Resources
Data Warehouse Architecture – Kimball and Inmon methodologies: http://bit.ly/SrzNHy
SQL Server 2012: Multidimensional vs tabular: http://bit.ly/SrzX1x
Data Warehouse vs Data Mart: http://bit.ly/SrAi4p
Fast Track Data Warehouse Reference Architecture for SQL Server 2014: http://bit.ly/1xuX9m6
Complex reporting off a SSAS cube: http://bit.ly/SrAEYw
Surrogate Keys: http://bit.ly/SrAIrp
Normalizing Your Database: http://bit.ly/SrAHnc
Difference between ETL and ELT: http://bit.ly/SrAKQa
Microsoft’s Data Warehouse offerings: http://bit.ly/xAZy9h
Microsoft SQL Server Reference Architecture and Appliances: http://bit.ly/y7bXY5
Methods for populating a data warehouse: http://bit.ly/SrARuZ
Great white paper: Microsoft EDW Architecture, Guidance and Deployment Best Practices: http://bit.ly/SrAZug
End-User Microsoft BI Tools – Clearing up the confusion: http://bit.ly/SrBMLT
Microsoft Appliances: http://bit.ly/YQIXzM
Why You Need a Data Warehouse: http://bit.ly/1fwEq0j
Data Warehouse Maturity Model: http://bit.ly/xl4mGM
Operational Data Store (ODS) Defined: http://bit.ly/1H6wnE7
The Modern Data Warehouse: http://bit.ly/1xuX4Py
Q&A ?
James Serra, Big Data Evangelist
Email me at: JamesSerra3@gmail.com
Follow me at: @JamesSerra
Link to me at: www.linkedin.com/in/JamesSerra
Visit my blog at: JamesSerra.com (where this slide deck will be posted under the “Presentations” tab)