You are on page 1of 214

M I C R O S O F T

L E A R N I N G

P R O D U C T

10777A
Implementing a Data Warehouse with
Microsoft SQL Server 2012

MCT USE ONLY. STUDENT USE PROHIBITED

O F F I C I A L

Implementing a Data Warehouse with Microsoft SQL Server 2012

MCT USE ONLY. STUDENT USE PROHIBITED

ii

Information in this document, including URL and other Internet Web site references, is subject to change
without notice. Unless otherwise noted, the example companies, organizations, products, domain names,
e-mail addresses, logos, people, places, and events depicted herein are fictitious, and no association with
any real company, organization, product, domain name, e-mail address, logo, person, place or event is
intended or should be inferred. Complying with all applicable copyright laws is the responsibility of the
user. Without limiting the rights under copyright, no part of this document may be reproduced, stored in
or introduced into a retrieval system, or transmitted in any form or by any means (electronic, mechanical,
photocopying, recording, or otherwise), or for any purpose, without the express written permission of
Microsoft Corporation.
Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual property
rights covering subject matter in this document. Except as expressly provided in any written license
agreement from Microsoft, the furnishing of this document does not give you any license to these
patents, trademarks, copyrights, or other intellectual property.

The names of manufacturers, products, or URLs are provided for informational purposes only and
Microsoft makes no representations and warranties, either expressed, implied, or statutory, regarding
these manufacturers or the use of the products with any Microsoft technologies. The inclusion of a
manufacturer or product does not imply endorsement of Microsoft of the manufacturer or product. Links
may be provided to third party sites. Such sites are not under the control of Microsoft and Microsoft is not
responsible for the contents of any linked site or any link contained in a linked site, or any changes or
updates to such sites. Microsoft is not responsible for webcasting or any other form of transmission
received from any linked site. Microsoft is providing these links to you only as a convenience, and the
inclusion of any link does not imply endorsement of Microsoft of the site or the products contained
therein.
2012 Microsoft Corporation. All rights reserved.

Microsoft and the trademarks listed at http://www.microsoft.com/about/legal/en/us/IntellectualProperty


/Trademarks/EN-US.aspx are trademarks of the Microsoft group of companies. All other trademarks are
property of their respective owners.

Product Number: 10777A


Part Number: X18-28026
Released: 05/2012

MCT USE ONLY. STUDENT USE PROHIBITED

MICROSOFT LICENSE TERMS


OFFICIAL MICROSOFT LEARNING PRODUCTS
MICROSOFT OFFICIAL COURSE Pre-Release and Final Release Versions

These license terms are an agreement between Microsoft Corporation and you. Please read them. They apply to
the Licensed Content named above, which includes the media on which you received it, if any. These license
terms also apply to any updates, supplements, internet based services and support services for the Licensed
Content, unless other terms accompany those items. If so, those terms apply.
BY DOWNLOADING OR USING THE LICENSED CONTENT, YOU ACCEPT THESE TERMS. IF YOU DO NOT ACCEPT
THEM, DO NOT DOWNLOAD OR USE THE LICENSED CONTENT.
If you comply with these license terms, you have the rights below.
1.

DEFINITIONS.

a. Authorized Learning Center means a Microsoft Learning Competency Member, Microsoft IT Academy
Program Member, or such other entity as Microsoft may designate from time to time.
b. Authorized Training Session means the Microsoft-authorized instructor-led training class using only
MOC Courses that are conducted by a MCT at or through an Authorized Learning Center.

c. Classroom Device means one (1) dedicated, secure computer that you own or control that meets or
exceeds the hardware level specified for the particular MOC Course located at your training facilities or
primary business location.
d. End User means an individual who is (i) duly enrolled for an Authorized Training Session or Private
Training Session, (ii) an employee of a MPN Member, or (iii) a Microsoft full-time employee.
e. Licensed Content means the MOC Course and any other content accompanying this agreement.
Licensed Content may include (i) Trainer Content, (ii) sample code, and (iii) associated media.
f.

Microsoft Certified Trainer or MCT means an individual who is (i) engaged to teach a training session
to End Users on behalf of an Authorized Learning Center or MPN Member, (ii) currently certified as a
Microsoft Certified Trainer under the Microsoft Certification Program, and (iii) holds a Microsoft
Certification in the technology that is the subject of the training session.

g. Microsoft IT Academy Member means a current, active member of the Microsoft IT Academy
Program.

h. Microsoft Learning Competency Member means a Microsoft Partner Network Program Member in
good standing that currently holds the Learning Competency status.
i.

Microsoft Official Course or MOC Course means the Official Microsoft Learning Product instructorled courseware that educates IT professionals or developers on Microsoft technologies.

MCT USE ONLY. STUDENT USE PROHIBITED

j.

Microsoft Partner Network Member or MPN Member means a silver or gold-level Microsoft Partner
Network program member in good standing.

k. Personal Device means one (1) device, workstation or other digital electronic device that you
personally own or control that meets or exceeds the hardware level specified for the particular MOC
Course.
l. Private Training Session means the instructor-led training classes provided by MPN Members for
corporate customers to teach a predefined learning objective. These classes are not advertised or
promoted to the general public and class attendance is restricted to individuals employed by or
contracted by the corporate customer.

m. Trainer Content means the trainer version of the MOC Course and additional content designated
solely for trainers to use to teach a training session using a MOC Course. Trainer Content may include
Microsoft PowerPoint presentations, instructor notes, lab setup guide, demonstration guides, beta
feedback form and trainer preparation guide for the MOC Course. To clarify, Trainer Content does not
include virtual hard disks or virtual machines.
2.

INSTALLATION AND USE RIGHTS. The Licensed Content is licensed not sold. The Licensed Content is
licensed on a one copy per user basis, such that you must acquire a license for each individual that
accesses or uses the Licensed Content.
2.1

Below are four separate sets of installation and use rights. Only one set of rights apply to you.

a. If you are a Authorized Learning Center:


i. If the Licensed Content is in digital format for each license you acquire you may either:
1. install one (1) copy of the Licensed Content in the form provided to you on a dedicated, secure
server located on your premises where the Authorized Training Session is held for access and
use by one (1) End User attending the Authorized Training Session, or by one (1) MCT teaching
the Authorized Training Session, or
2. install one (1) copy of the Licensed Content in the form provided to you on one (1) Classroom
Device for access and use by one (1) End User attending the Authorized Training Session, or by
one (1) MCT teaching the Authorized Training Session.
ii. You agree that:
1. you will acquire a license for each End User and MCT that accesses the Licensed Content,
2. each End User and MCT will be presented with a copy of this agreement and each individual
will agree that their use of the Licensed Content will be subject to these license terms prior to
their accessing the Licensed Content. Each individual will be required to denote their
acceptance of the EULA in a manner that is enforceable under local law prior to their accessing
the Licensed Content,
3. for all Authorized Training Sessions, you will only use qualified MCTs who hold the applicable
competency to teach the particular MOC Course that is the subject of the training session,
4. you will not alter or remove any copyright or other protective notices contained in the
Licensed Content,

MCT USE ONLY. STUDENT USE PROHIBITED

5. you will remove and irretrievably delete all Licensed Content from all Classroom Devices and
servers at the end of the Authorized Training Session,
6. you will only provide access to the Licensed Content to End Users and MCTs,
7. you will only provide access to the Trainer Content to MCTs, and
8. any Licensed Content installed for use during a training session will be done in accordance
with the applicable classroom set-up guide.

b. If you are a MPN Member.


i. If the Licensed Content is in digital format for each license you acquire you may either:
1. install one (1) copy of the Licensed Content in the form provided to you on (A) one (1)
Classroom Device, or (B) one (1) dedicated, secure server located at your premises where
the training session is held for use by one (1) of your employees attending a training session
provided by you, or by one (1) MCT that is teaching the training session, or
2. install one (1) copy of the Licensed Content in the form provided to you on one (1)
Classroom Device for use by one (1) End User attending a Private Training Session, or one (1)
MCT that is teaching the Private Training Session.
ii. You agree that:
1. you will acquire a license for each End User and MCT that accesses the Licensed Content,
2. each End User and MCT will be presented with a copy of this agreement and each individual
will agree that their use of the Licensed Content will be subject to these license terms prior
to their accessing the Licensed Content. Each individual will be required to denote their
acceptance of the EULA in a manner that is enforceable under local law prior to their
accessing the Licensed Content,
3. for all training sessions, you will only use qualified MCTs who hold the applicable
competency to teach the particular MOC Course that is the subject of the training session,
4. you will not alter or remove any copyright or other protective notices contained in the
Licensed Content,
5. you will remove and irretrievably delete all Licensed Content from all Classroom Devices and
servers at the end of each training session,
6. you will only provide access to the Licensed Content to End Users and MCTs,
7. you will only provide access to the Trainer Content to MCTs, and
8. any Licensed Content installed for use during a training session will be done in accordance
with the applicable classroom set-up guide.
c. If you are an End User:
You may use the Licensed Content solely for your personal training use. If the Licensed Content is in
digital format, for each license you acquire you may (i) install one (1) copy of the Licensed Content in
the form provided to you on one (1) Personal Device and install another copy on another Personal
Device as a backup copy, which may be used only to reinstall the Licensed Content; or (ii) print one (1)
copy of the Licensed Content. You may not install or use a copy of the Licensed Content on a device
you do not own or control.

MCT USE ONLY. STUDENT USE PROHIBITED

d. If you are a MCT.


i. For each license you acquire, you may use the Licensed Content solely to prepare and deliver an
Authorized Training Session or Private Training Session. For each license you acquire, you may
install and use one (1) copy of the Licensed Content in the form provided to you on one (1) Personal
Device and install one (1) additional copy on another Personal Device as a backup copy, which may
be used only to reinstall the Licensed Content. You may not install or use a copy of the Licensed
Content on a device you do not own or control.
ii.

Use of Instructional Components in Trainer Content. You may customize, in accordance with the
most recent version of the MCT Agreement, those portions of the Trainer Content that are logically
associated with instruction of a training session. If you elect to exercise the foregoing rights, you
agree: (a) that any of these customizations will only be used for providing a training session, (b) any
customizations will comply with the terms and conditions for Modified Training Sessions and
Supplemental Materials in the most recent version of the MCT agreement and with this agreement.
For clarity, any use of customize refers only to changing the order of slides and content, and/or
not using all the slides or content, it does not mean changing or modifying any slide or content.

2.2 Separation of Components. The Licensed Content components are licensed as a single unit and you
may not separate the components and install them on different devices.

2.3 Reproduction/Redistribution Licensed Content. Except as expressly provided in the applicable


installation and use rights above, you may not reproduce or distribute the Licensed Content or any portion
thereof (including any permitted modifications) to any third parties without the express written permission
of Microsoft.

2.4 Third Party Programs. The Licensed Content may contain third party programs or services. These
license terms will apply to your use of those third party programs or services, unless other terms accompany
those programs and services.
2.5 Additional Terms. Some Licensed Content may contain components with additional terms,
conditions, and licenses regarding its use. Any non-conflicting terms in those conditions and licenses also
apply to that respective component and supplements the terms described in this Agreement.
3.

PRE-RELEASE VERSIONS. If the Licensed Content is a pre-release (beta) version, in addition to the other
provisions in this agreement, then these terms also apply:
a. Pre-Release Licensed Content. This Licensed Content is a pre-release version. It may not contain the
same information and/or work the way a final version of the Licensed Content will. We may change it
for the final version. We also may not release a final version. Microsoft is under no obligation to
provide you with any further content, including the final release version of the Licensed Content.

b. Feedback. If you agree to give feedback about the Licensed Content to Microsoft, either directly or
through its third party designee, you give to Microsoft without charge, the right to use, share and
commercialize your feedback in any way and for any purpose. You also give to third parties, without
charge, any patent rights needed for their products, technologies and services to use or interface with
any specific parts of a Microsoft software, Microsoft product, or service that includes the feedback. You
will not give feedback that is subject to a license that requires Microsoft to license its software,
technologies, or products to third parties because we include your feedback in them. These rights

MCT USE ONLY. STUDENT USE PROHIBITED

survive this agreement.

c. Term. If you are an Authorized Training Center, MCT or MPN, you agree to cease using all copies of the
beta version of the Licensed Content upon (i) the date which Microsoft informs you is the end date for
using the beta version, or (ii) sixty (60) days after the commercial release of the Licensed Content,
whichever is earliest (beta term). Upon expiration or termination of the beta term, you will
irretrievably delete and destroy all copies of same in the possession or under your control.
4.

INTERNET-BASED SERVICES. Classroom Devices located at Authorized Learning Centers physical location
may contain virtual machines and virtual hard disks for use while attending an Authorized Training
Session. You may only use the software on the virtual machines and virtual hard disks on a Classroom
Device solely to perform the virtual lab activities included in the MOC Course while attending the
Authorized Training Session. Microsoft may provide Internet-based services with the software included
with the virtual machines and virtual hard disks. It may change or cancel them at any time. If the
software is pre-release versions of software, some of its Internet-based services may be turned on by
default. The default setting in these versions of the software do not necessarily reflect how the features
will be configured in the commercially released versions. If Internet-based services are included with the
software, they are typically simulated for demonstration purposes in the software and no transmission
over the Internet takes place. However, should the software be configured to transmit over the Internet,
the following terms apply:
a. Consent for Internet-Based Services. The software features described below connect to Microsoft or
service provider computer systems over the Internet. In some cases, you will not receive a separate
notice when they connect. You may switch off these features or not use them. By using these features,
you consent to the transmission of this information. Microsoft does not use the information to identify
or contact you.
b. Computer Information. The following features use Internet protocols, which send to the appropriate
systems computer information, such as your Internet protocol address, the type of operating system,
browser and name and version of the software you are using, and the language code of the device
where you installed the software. Microsoft uses this information to make the Internet-based services
available to you.

Accelerators. When you use click on or move your mouse over an Accelerator, the title and full web
address or URL of the current webpage, as well as standard computer information, and any content
you have selected, might be sent to the service provider. If you use an Accelerator provided by
Microsoft, the information sent is subject to the Microsoft Online Privacy Statement, which is
available at go.microsoft.com/fwlink/?linkid=31493. If you use an Accelerator provided by a third
party, use of the information sent will be subject to the third partys privacy practices.

Automatic Updates. This software contains an Automatic Update feature that is on by default. For
more information about this feature, including instructions for turning it off, see
go.microsoft.com/fwlink/?LinkId=178857. You may turn off this feature while the software is
running (opt out). Unless you expressly opt out of this feature, this feature will (a) connect to
Microsoft or service provider computer systems over the Internet, (b) use Internet protocols to send
to the appropriate systems standard computer information, such as your computers Internet
protocol address, the type of operating system, browser and name and version of the software you
are using, and the language code of the device where you installed the software, and (c)
automatically download and install, or prompt you to download and/or install, current Updates to
the software. In some cases, you will not receive a separate notice before this feature takes effect.

MCT USE ONLY. STUDENT USE PROHIBITED

By installing the software, you consent to the transmission of standard computer information and
the automatic downloading and installation of updates.

Auto Root Update. The Auto Root Update feature updates the list of trusted certificate authorities.
you can switch off the Auto Root Update feature.

Customer Experience Improvement Program (CEIP), Error and Usage Reporting; Error Reports. This
software uses CEIP and Error and Usage Reporting components enabled by default that
automatically send to Microsoft information about your hardware and how you use this software.
This software also automatically sends error reports to Microsoft that describe which software
components had errors and may also include memory dumps. You may choose not to use these
software components. For more information please go to
<http://go.microsoft.com/fwlink/?LinkID=196910>.

Digital Certificates. The software uses digital certificates. These digital certificates confirm the
identity of Internet users sending X.509 standard encrypted information. They also can be used to
digitally sign files and macros, to verify the integrity and origin of the file contents. The software
retrieves certificates and updates certificate revocation lists. These security features operate only
when you use the Internet.

Extension Manager. The Extension Manager can retrieve other software through the internet from
the Visual Studio Gallery website. To provide this other software, the Extension Manager sends to
Microsoft the name and version of the software you are using and language code of the device
where you installed the software. This other software is provided by third parties to Visual Studio
Gallery. It is licensed to users under terms provided by the third parties, not from Microsoft. Read
the Visual Studio Gallery terms of use for more information.

IPv6 Network Address Translation (NAT) Traversal service (Teredo). This feature helps existing
home Internet gateway devices transition to IPv6. IPv6 is a next generation Internet protocol. It
helps enable end-to-end connectivity often needed by peer-to-peer applications. To do so, each
time you start up the software the Teredo client service will attempt to locate a public Teredo
Internet service. It does so by sending a query over the Internet. This query only transfers standard
Domain Name Service information to determine if your computer is connected to the Internet and
can locate a public Teredo service. If you

use an application that needs IPv6 connectivity or


configure your firewall to always enable IPv6 connectivity

by default standard Internet Protocol information will be sent to the Teredo service at Microsoft at
regular intervals. No other information is sent to Microsoft. You can change this default to use nonMicrosoft servers. You can also switch off this feature using a command line utility named netsh.

Malicious Software Removal. During setup, if you select Get important updates for installation,
the software may check and remove certain malware from your device. Malware is malicious
software. If the software runs, it will remove the Malware listed and updated at
www.support.microsoft.com/?kbid=890830. During a Malware check, a report will be sent to
Microsoft with specific information about Malware detected, errors, and other information about
your device. This information is used to improve the software and other Microsoft products and
services. No information included in these reports will be used to identify or contact you. You may
disable the softwares reporting functionality by following the instructions found at

MCT USE ONLY. STUDENT USE PROHIBITED

www.support.microsoft.com/?kbid=890830. For more information, read the Windows Malicious


Software Removal Tool privacy statement at go.microsoft.com/fwlink/?LinkId=113995.

Microsoft Digital Rights Management. If you use the software to access content that has been
protected with Microsoft Digital Rights Management (DRM), then, in order to let you play the
content, the software may automatically request media usage rights from a rights server on the
Internet and download and install available DRM updates. For more information, see
go.microsoft.com/fwlink/?LinkId=178857.

Microsoft Telemetry Reporting Participation. If you choose to participate in Microsoft Telemetry


Reporting through a basic or advanced membership, information regarding filtered URLs,
malware and other attacks on your network is sent to Microsoft. This information helps Microsoft
improve the ability of Forefront Threat Management Gateway to identify attack patterns and
mitigate threats. In some cases, personal information may be inadvertently sent, but Microsoft will
not use the information to identify or contact you. You can switch off Telemetry Reporting. For
more information on this feature, see http://go.microsoft.com/fwlink/?LinkId=130980.

Microsoft Update Feature. To help keep the software up-to-date, from time to time, the software
connects to Microsoft or service provider computer systems over the Internet. In some cases, you
will not receive a separate notice when they connect. When the software does so, we check your
version of the software and recommend or download updates to your devices. You may not receive
notice when we download the update. You may switch off this feature.

Network Awareness. This feature determines whether a system is connected to a network by either
passive monitoring of network traffic or active DNS or HTTP queries. The query only transfers
standard TCP/IP or DNS information for routing purposes. You can switch off the active query
feature through a registry setting.

Plug and Play and Plug and Play Extensions. You may connect new hardware to your device, either
directly or over a network. Your device may not have the drivers needed to communicate with that
hardware. If so, the update feature of the software can obtain the correct driver from Microsoft and
install it on your device. An administrator can disable this update feature.

Real Simple Syndication (RSS) Feed. This software start page contains updated content that is
supplied by means of an RSS feed online from Microsoft.

Search Suggestions Service. When you type a search query in Internet Explorer by using the Instant
Search box or by typing a question mark (?) before your search term in the Address bar, you will see
search suggestions as you type (if supported by your search provider). Everything you type in the
Instant Search box or in the Address bar when preceded by a question mark (?) is sent to your
search provider as you type it. In addition, when you press Enter or click the Search button, all the
text that is in the search box or Address bar is sent to the search provider. If you use a Microsoft
search provider, the information you send is subject to the Microsoft Online Privacy Statement,
which is available at go.microsoft.com/fwlink/?linkid=31493. If you use a third-party search
provider, use of the information sent will be subject to the third partys privacy practices. You can
turn search suggestions off at any time in Internet Explorer by using Manage Add-ons under the
Tools button. For more information about the search suggestions service, see
go.microsoft.com/fwlink/?linkid=128106.

SQL Server Reporting Services Map Report Item. The software may include features that retrieve
content such as maps, images and other data through the Bing Maps (or successor branded)

MCT USE ONLY. STUDENT USE PROHIBITED

application programming interface (the Bing Maps APIs). The purpose of these features is to
create reports displaying data on top of maps, aerial and hybrid imagery. If these features are
included, you may use them to create and view dynamic or static documents. This may be done only
in conjunction with and through methods and means of access integrated in the software. You may
not otherwise copy, store, archive, or create a database of the content available through the Bing
Maps APIs. you may not use the following for any purpose even if they are available through the
Bing Maps APIs:

Bing Maps APIs to provide sensor based guidance/routing, or

Any Road Traffic Data or Birds Eye Imagery (or associated metadata).

Your use of the Bing Maps APIs and associated content is also subject to the additional terms and
conditions at http://www.microsoft.com/maps/product/terms.html.

URL Filtering. The URL Filtering feature identifies certain types of web sites based upon predefined
URL categories, and allows you to deny access to such web sites, such as known malicious sites and
sites displaying inappropriate or pornographic materials. To apply URL filtering, Microsoft queries
the online Microsoft Reputation Service for URL categorization. You can switch off URL filtering. For
more information on this feature, see http://go.microsoft.com/fwlink/?LinkId=130980

Web Content Features. Features in the software can retrieve related content from Microsoft and
provide it to you. To provide the content, these features send to Microsoft the type of operating
system, name and version of the software you are using, type of browser and language code of the
device where you run the software. Examples of these features are clip art, templates, online
training, online assistance and Appshelp. You may choose not to use these web content features.

Windows Media Digital Rights Management. Content owners use Windows Media digital rights
management technology (WMDRM) to protect their intellectual property, including copyrights. This
software and third party software use WMDRM to play and copy WMDRM-protected content. If the
software fails to protect the content, content owners may ask Microsoft to revoke the softwares
ability to use WMDRM to play or copy protected content. Revocation does not affect other content.
When you download licenses for protected content, you agree that Microsoft may include a
revocation list with the licenses. Content owners may require you to upgrade WMDRM to access
their content. Microsoft software that includes WMDRM will ask for your consent prior to the
upgrade. If you decline an upgrade, you will not be able to access content that requires the upgrade.
You may switch off WMDRM features that access the Internet. When these features are off, you can
still play content for which you have a valid license.

Windows Media Player. When you use Windows Media Player, it checks with Microsoft for

compatible online music services in your region;


new versions of the player; and
codecs if your device does not have the correct ones for playing content.

You can switch off this last feature. For more information, go to
www.microsoft.com/windows/windowsmedia/player/11/privacy.aspx.

Windows Rights Management Services. The software contains a feature that allows you to create
content that cannot be printed, copied or sent to others without your permission. For more
information, go to www.microsoft.com/rms. you may choose not to use this feature

MCT USE ONLY. STUDENT USE PROHIBITED

Windows Time Service. This service synchronizes with time.windows.com once a week to provide
your computer with the correct time. You can turn this feature off or choose your preferred time
source within the Date and Time Control Panel applet. The connection uses standard NTP protocol.

Windows Update Feature. You may connect new hardware to the device where you run the
software. Your device may not have the drivers needed to communicate with that hardware. If so,
the update feature of the software can obtain the correct driver from Microsoft and run it on your
device. You can switch off this update feature.

c. Use of Information. Microsoft may use the device information, error reports, and malware reports to
improve our software and services. We may also share it with others, such as hardware and software
vendors. They may use the information to improve how their products run with Microsoft software.

d. Misuse of Internet-based Services. You may not use any Internet-based service in any way that could
harm it or impair anyone elses use of it. You may not use the service to try to gain unauthorized access
to any service, data, account or network by any means.
5.

SCOPE OF LICENSE. The Licensed Content is licensed, not sold. This agreement only gives you some rights
to use the Licensed Content. Microsoft reserves all other rights. Unless applicable law gives you more
rights despite this limitation, you may use the Licensed Content only as expressly permitted in this
agreement. In doing so, you must comply with any technical limitations in the Licensed Content that only
allows you to use it in certain ways. Except as expressly permitted in this agreement, you may not:

install more copies of the Licensed Content on devices than the number of licenses you acquired;

allow more individuals to access the Licensed Content than the number of licenses you acquired;

publicly display, or make the Licensed Content available for others to access or use;

install, sell, publish, transmit, encumber, pledge, lend, copy, adapt, link to, post, rent, lease or lend,
make available or distribute the Licensed Content to any third party, except as expressly permitted
by this Agreement.

reverse engineer, decompile, remove or otherwise thwart any protections or disassemble the
Licensed Content except and only to the extent that applicable law expressly permits, despite this
limitation;

access or use any Licensed Content for which you are not providing a training session to End Users
using the Licensed Content;

access or use any Licensed Content that you have not been authorized by Microsoft to access and
use; or

transfer the Licensed Content, in whole or in part, or assign this agreement to any third party.

6.

RESERVATION OF RIGHTS AND OWNERSHIP. Microsoft reserves all rights not expressly granted to you in
this agreement. The Licensed Content is protected by copyright and other intellectual property laws and
treaties. Microsoft or its suppliers own the title, copyright, and other intellectual property rights in the
Licensed Content. You may not remove or obscure any copyright, trademark or patent notices that
appear on the Licensed Content or any components thereof, as delivered to you.

7.

EXPORT RESTRICTIONS. The Licensed Content is subject to United States export laws and regulations. You
must comply with all domestic and international export laws and regulations that apply to the Licensed
Content. These laws include restrictions on destinations, End Users and end use. For additional
information, see www.microsoft.com/exporting.

MCT USE ONLY. STUDENT USE PROHIBITED

8.

LIMITATIONS ON SALE, RENTAL, ETC. AND CERTAIN ASSIGNMENTS. You may not sell, rent, lease, lend or
sublicense the Licensed Content or any portion thereof, or transfer or assign this agreement.

9.

SUPPORT SERVICES. Because the Licensed Content is as is, we may not provide support services for it.

10.

TERMINATION. Without prejudice to any other rights, Microsoft may terminate this agreement if you fail
to comply with the terms and conditions of this agreement. Upon any termination of this agreement, you
agree to immediately stop all use of and to irretrievable delete and destroy all copies of the Licensed
Content in your possession or under your control.

11.

LINKS TO THIRD PARTY SITES. You may link to third party sites through the use of the Licensed Content.
The third party sites are not under the control of Microsoft, and Microsoft is not responsible for the
contents of any third party sites, any links contained in third party sites, or any changes or updates to third
party sites. Microsoft is not responsible for webcasting or any other form of transmission received from
any third party sites. Microsoft is providing these links to third party sites to you only as a convenience,
and the inclusion of any link does not imply an endorsement by Microsoft of the third party site.

12.

ENTIRE AGREEMENT. This agreement, and the terms for supplements, updates and support services are
the entire agreement for the Licensed Content.

13.

APPLICABLE LAW.
a. United States. If you acquired the Licensed Content in the United States, Washington state law governs
the interpretation of this agreement and applies to claims for breach of it, regardless of conflict of laws
principles. The laws of the state where you live govern all other claims, including claims under state
consumer protection laws, unfair competition laws, and in tort.
b. Outside the United States. If you acquired the Licensed Content in any other country, the laws of that
country apply.

14.

LEGAL EFFECT. This agreement describes certain legal rights. You may have other rights under the laws of
your country. You may also have rights with respect to the party from whom you acquired the Licensed
Content. This agreement does not change your rights under the laws of your country if the laws of your
country do not permit it to do so.

15.

DISCLAIMER OF WARRANTY. THE LICENSED CONTENT IS LICENSED "AS-IS," "WITH ALL FAULTS," AND "AS
AVAILABLE." YOU BEAR THE RISK OF USING IT. MICROSOFT CORPORATION AND ITS RESPECTIVE
AFFILIATES GIVE NO EXPRESS WARRANTIES, GUARANTEES, OR CONDITIONS UNDER OR IN RELATION TO
THE LICENSED CONTENT. YOU MAY HAVE ADDITIONAL CONSUMER RIGHTS UNDER YOUR LOCAL LAWS
WHICH THIS AGREEMENT CANNOT CHANGE. TO THE EXTENT PERMITTED UNDER YOUR LOCAL LAWS,
MICROSOFT CORPORATION AND ITS RESPECTIVE AFFILIATES EXCLUDE ANY IMPLIED WARRANTIES OR
CONDITIONS, INCLUDING THOSE OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
NON-INFRINGEMENT.

16.

LIMITATION ON AND EXCLUSION OF REMEDIES AND DAMAGES. TO THE EXTENT NOT PROHIBITED BY
LAW, YOU CAN RECOVER FROM MICROSOFT CORPORATION AND ITS SUPPLIERS ONLY DIRECT
DAMAGES UP TO USD$5.00. YOU AGREE NOT TO SEEK TO RECOVER ANY OTHER DAMAGES, INCLUDING
CONSEQUENTIAL, LOST PROFITS, SPECIAL, INDIRECT OR INCIDENTAL DAMAGES FROM MICROSOFT
CORPORATION AND ITS RESPECTIVE SUPPLIERS.

MCT USE ONLY. STUDENT USE PROHIBITED

This limitation applies to


o
anything related to the Licensed Content, services made available through the Licensed Content, or
content (including code) on third party Internet sites or third-party programs; and
o
claims for breach of contract, breach of warranty, guarantee or condition, strict liability, negligence,
or other tort to the extent permitted by applicable law.
It also applies even if Microsoft knew or should have known about the possibility of the damages. The
above limitation or exclusion may not apply to you because your country may not allow the exclusion or
limitation of incidental, consequential or other damages.

Please note: As this Licensed Content is distributed in Quebec, Canada, some of the clauses in this agreement
are provided below in French.
Remarque : Ce le contenu sous licence tant distribu au Qubec, Canada, certaines des clauses dans ce
contrat sont fournies ci-dessous en franais.

EXONRATION DE GARANTIE. Le contenu sous licence vis par une licence est offert tel quel . Toute
utilisation de ce contenu sous licence est votre seule risque et pril. Microsoft naccorde aucune autre garantie
expresse. Vous pouvez bnficier de droits additionnels en vertu du droit local sur la protection dues
consommateurs, que ce contrat ne peut modifier. La ou elles sont permises par le droit locale, les garanties
implicites de qualit marchande, dadquation un usage particulier et dabsence de contrefaon sont exclues.
LIMITATION DES DOMMAGES-INTRTS ET EXCLUSION DE RESPONSABILIT POUR LES DOMMAGES. Vous
pouvez obtenir de Microsoft et de ses fournisseurs une indemnisation en cas de dommages directs uniquement
hauteur de 5,00 $ US. Vous ne pouvez prtendre aucune indemnisation pour les autres dommages, y
compris les dommages spciaux, indirects ou accessoires et pertes de bnfices.
Cette limitation concerne:
tout ce qui est reli au le contenu sous licence , aux services ou au contenu (y compris le code)
figurant sur des sites Internet tiers ou dans des programmes tiers ; et
les rclamations au titre de violation de contrat ou de garantie, ou au titre de responsabilit
stricte, de ngligence ou dune autre faute dans la limite autorise par la loi en vigueur.

Elle sapplique galement, mme si Microsoft connaissait ou devrait connatre lventualit dun tel dommage.
Si votre pays nautorise pas lexclusion ou la limitation de responsabilit pour les dommages indirects,
accessoires ou de quelque nature que ce soit, il se peut que la limitation ou lexclusion ci-dessus ne sappliquera
pas votre gard.

EFFET JURIDIQUE. Le prsent contrat dcrit certains droits juridiques. Vous pourriez avoir dautres droits prvus
par les lois de votre pays. Le prsent contrat ne modifie pas les droits que vous confrent les lois de votre pays
si celles-ci ne le permettent pas.
Revised March 2012

Implementing a Data Warehouse with Microsoft SQL Server 2012

MCT USE ONLY. STUDENT USE PROHIBITED

xiv

Acknowledgments

MCT USE ONLY. STUDENT USE PROHIBITED

Implementing a Data Warehouse with Microsoft SQL Server 2012

xv

Microsoft Learning would like to acknowledge and thank the following for their contribution towards
developing this title. Their effort at various stages in the development has ensured that you have a good
classroom experience.

Graeme Malcolm Lead Content Developer

Graeme Malcolm is a Microsoft SQL Server subject matter expert and professional content developer at
Content Mastera division of CM Group Ltd. As a Microsoft Certified Trainer, Graeme has delivered
training courses on SQL Server since version 4.2; as an author, Graeme has written numerous books,
articles, and training courses on SQL Server; and as a consultant, Graeme has designed and implemented
business solutions based on SQL Server for customers all over the world.

Geoff Allix Content Developer

Geoff Allix is a Microsoft SQL Server subject matter expert and professional content developer at Content
Mastera division of CM Group Ltd. Geoff is a Microsoft Certified IT Professional for SQL Server with
extensive experience in designing and implementing database and BI solutions on SQL Server
technologies, and has provided consultancy services to organizations seeking to implement and optimize
data warehousing and OLAP solutions.

Martin Ellis Content Developer

Martin Ellis is a Microsoft SQL Server subject matter expert and professional content developer at Content
Mastera division of CM Group Ltd. Martin is a Microsoft Certified Technical Specialist on SQL Server and
an MCSE. He has been working with SQL Server since version 7.0, as a DBA, consultant and Microsoft
Certified Trainer, and has developed a wide range of technical collateral for Microsoft Corp. and other
technology enterprises.

Chris Testa-ONeill Technical Reviewer

Chris Testa-ONeil is a Senior Consultant at Coeo (www.coeo.com), a leading provider of SQL Server
Managed Support and Consulting in the UK and Europe. He is also a Microsoft Certified Trainer, Microsoft
Most Valuable Professional for SQL Server, and lead author of Microsoft E-Learning MCTS courses for SQL
Server 2008. Chris has spoken at a range of SQL Server events in the UK, Europe, Australia and the United
States. He is also one of the organizers of SQLBits, SQLServerFAQ and a UK Regional Mentor for SQLPASS.
You can contact Chris at chris@coeo.com, @ctesta_oneill or through his blog at
http://www.coeo.com/sql-server-events/sql-events-and-blogs.aspx.

Implementing a Data Warehouse with Microsoft SQL Server 2012

Contents
Module 1: Introduction to Data Warehousing
Lesson 1: Overview of Data Warehousing
Lesson 2: Considerations for a Data Warehouse Solution
Lab 1: Exploring a Data Warehousing Solution

1-3
1-14
1-28

Module 2: Data Warehouse Hardware


Lesson 1: Considerations for Building a Data Warehouse
Lesson 2: Data Warehouse Reference Architectures and Appliances

2-3
2-11

Module 3: Designing and Implementing a Data Warehouse


Lesson 1: Logical Design for a Data Warehouse
Lesson 2: Physical Design for a Data Warehouse
Lab 3: Implementing a Data Warehouse Schema

3-3
3-17
3-27

Module 4: Creating an ETL Solution with SSIS


Lesson 1: Introduction to ETL with SSIS
Lesson 2: Exploring Source Data
Lesson 3: Implementing Data Flow
Lab 4: Implementing Data Flow in an SSIS Package

4-3
4-10
4-21
4-38

Module 5: Implementing Control Flow in an SSIS Package


Lesson 1: Introduction to Control Flow
Lesson 2: Creating Dynamic Packages
Lesson 3: Using Containers
Lab 5A: Implementing Control Flow in an SSIS Package
Lesson 4: Managing Consistency
Lab 5B: Using Transactions and Checkpoints

5-3
5-14
5-21
5-33
5-41
5-51

Module 6: Debugging and Troubleshooting SSIS Packages


Lesson 1: Debugging an SSIS Package
Lesson 2: Logging SSIS Package Events
Lesson 3: Handling Errors in an SSIS Package
Lab 6: Debugging and Troubleshooting an SSIS Package

6-3
6-12
6-21
6-30

Module 7: Implementing an Incremental ETL Process


Lesson 1: Introduction to Incremental ETL
Lesson 2: Extracting Modified Data
Lab 7A: Extracting Modified Data
Lesson 3: Loading Modified Data
Lab 7B: Loading Incremental Changes

7-3
7-9
7-31
7-54
7-73

MCT USE ONLY. STUDENT USE PROHIBITED

xvi

Module 8: Incorporating Data from the Cloud into a Data Warehouse


Lesson 1: Overview of Cloud Data Sources
Lesson 2: SQL Azure
Lesson 3: The Windows Azure Marketplace DataMarket
Lab: Using Cloud Data in a Data Warehouse Solution

8-3
8-9
8-19
8-26

Module 9: Enforcing Data Quality


Lesson 1: Introduction to Data Quality
Lesson 2: Using Data Quality Services to Cleanse Data
Lab 9A: Cleansing Data
Lesson 3: Using Data Quality Services to Match Data
Lab 9B: Deduplicating Data

9-3
9-13
9-20
9-29
9-38

Module 10: Using Master Data Services


Lesson 1: Introduction to Master Data Services
Lesson 2: Implementing a Master Data Services Model
Lesson 3: Managing Master Data
Lesson 4: Creating a Master Data Hub
Lab 10: Implementing Master Data Services

10-3
10-10
10-23
10-36
10-46

Module 11: Extending SQL Server Integration Services


Lesson 1: Using Custom Components in SSIS
Lesson 2: Using Scripts in SSIS
Lab 11: Using Custom Components and Scripts

11-3
11-10
11-21

Module 12: Deploying and Configuring SSIS Packages


Lesson 1: Overview of SSIS Deployment
Lesson 2: Deploying SSIS Projects
Lesson 3: Planning SSIS Package Execution
Lab 12: Deploying and Configuring SSIS Packages

12-3
12-9
12-19
12-30

Module 13: Consuming Data in a Data Warehouse


Lesson 1: Introduction to Business Intelligence
Lesson 2: Introduction to Reporting
Lesson 3: Introduction to Data Analysis
Lab 13: Using Business Intelligence Tools

13-3
13-8
13-12
13-18

MCT USE ONLY. STUDENT USE PROHIBITED

Implementing a Data Warehouse with Microsoft SQL Server 2012

xvii

Implementing a Data Warehouse with Microsoft SQL Server 2012

Appendix: Lab Answer Keys


Module 1 Lab 1: Exploring a Data Warehousing Solution
Module 3 Lab 3: Implementing a Data Warehouse Schema
Module 4 Lab 4: Implementing Data Flow in an SSIS Package
Module 5 Lab 5A: Implementing Control Flow in an SSIS Package
Module 5 Lab 5B: Using Transactions and Checkpoints
Module 6 Lab 6: Debugging and Troubleshooting an SSIS Package
Module 7 Lab 7A: Extracting Modified Data
Module 7 Lab 7B: Loading Incremental Changes
Module 8 Lab 8: Using Cloud Data in a Data Warehouse Solution
Module 9 Lab 9A: Cleansing Data
Module 9 Lab 9B: Deduplicating Data
Module 10 Lab 10: Implementing Master Data Services
Module 11 Lab 11: Using Custom Components and Scripts
Module 12 Lab 12: Deploying and Configuring SSIS Packages
Module 13 Lab 13: Using Business Intelligence Tools

L1-1
L3-7
L4-13
L5-25
L5-33
L6-37
L7-45
L7-65
L8-81
L9-91
L9-99
L10-105
L11-117
L12-123
L13-129

MCT USE ONLY. STUDENT USE PROHIBITED

xviii

About This Course

MCT USE ONLY. STUDENT USE PROHIBITED

About This Course

xix

This section provides you with a brief description of the course, audience, suggested prerequisites, and
course objectives.

Course Description

This course describes how to implement a BI platform to support information worker analytics. Students
will learn how to create a data warehouse with MicrosoftSQL Server 2012, implement ETL with SQL
Server Integration Services, and validate and cleanse data with SQL Server Data Quality Services and SQL
Server Master Data Services.

Audience

This course is intended for database professionals who need to fulfill a Business (BI) Intelligence Developer
role. They will need to focus on hands-on work creating BI solutions including Data Warehouse
implementation, ETL, and data cleansing. Primary responsibilities include:

Implementing a data warehouse.

Developing SSIS packages for data extraction, transformation, and loading.

Enforcing data integrity by using Master Data Services.

Cleansing data by using Data Quality Services.

Student Prerequisites
This course requires that you meet the following prerequisites:

At least 2 years experience of working with relational databases, including:

Designing a normalized database.

Creating tables and relationships.

Querying with Transact-SQL.

Some exposure to basic programming constructs (such as looping and branching).

An awareness of key business priorities such as revenue, profitability, and financial accounting is
desirable.

Course Objectives
After completing this course, students will be able to:

Describe data warehouse concepts and architecture considerations.

Select an appropriate hardware platform for a data warehouse.

Design and implement a data warehouse.

Implement Data Flow in an SSIS Package.

Implement Control Flow in an SSIS Package.

Debug and Troubleshoot SSIS packages.

Implement an SSIS solution that supports incremental data warehouse loads and changing data.

Integrate cloud data into a data warehouse ecosystem infrastructure.

About This Course

Implement data cleansing by using Microsoft Data Quality Services.

Implement Master Data Services to enforce data integrity.

Extend SSIS with custom scripts and components.

Deploy and Configure SSIS packages.

Describe how information workers can consume data from the data warehouse.

Course Outline
This section provides an outline of the course:
Module 1, Introduction to Data Warehousing
Module 2, Data Warehouse Hardware
Module 3, Designing and Implementing a Data Warehouse
Module 4, Creating an ETL Solution with SSIS
Module 5, Implementing Control Flow in an SSIS Package
Module 6, Debugging and Troubleshooting SSIS Packages
Module 7, Implementing an Incremental ETL Process
Module 8, Incorporating Data from the Cloud into a Data Warehouse
Module 9, Enforcing Data Quality
Module 10, Using Master Data Services
Module 11, Extending SQL Server Integration Services
Module 12, Deploying and Configuring SSIS Packages
Module 13, Consuming Data in a Data Warehouse

MCT USE ONLY. STUDENT USE PROHIBITED

xx

Course Materials
The following materials are included with your kit:

Course Handbook A succinct classroom learning guide that provides all the critical technical
information in a crisp, tightly-focused format, which is just right for an effective in-class learning
experience.

MCT USE ONLY. STUDENT USE PROHIBITED

About This Course

xxi

Lessons: Guide you through the learning objectives and provide the key points that are critical to
the success of the in-class learning experience.

Labs: Provide a real-world, hands-on platform for you to apply the knowledge and skills learned
in the module.

Module Reviews and Takeaways: Provide improved on-the-job reference material to boost
knowledge and skills retention.

Lab Answer Keys: Provide step-by-step lab solution guidance at your finger tips when its
needed.

Course Companion Content on the http://www.microsoft.com/learning/companionmoc/ Site:


Searchable, easy-to-navigate digital content with integrated premium on-line resources designed to
supplement the Course Handbook.

Modules: Include companion content, such as questions and answers, detailed demo steps and
additional reading links, for each lesson. Additionally, they include Lab Review questions and answers
and Module Reviews and Takeaways sections, which contain the review questions and answers, best
practices, common issues and troubleshooting tips with answers, and real-world issues and scenarios
with answers.

Resources: Include well-categorized additional resources that give you immediate access to the most
up-to-date premium content on TechNet, MSDN, Microsoft Press.

Student Course files on the http://www.microsoft.com/learning/companionmoc/ Site: Includes the


Allfiles.exe, a self-extracting executable file that contains all the files required for the labs and
demonstrations.

Course evaluation At the end of the course, you will have the opportunity to complete an online
evaluation to provide feedback on the course, training facility, and instructor.

To provide additional comments or feedback on the course, send e-mail to


support@mscourseware.com. To inquire about the Microsoft Certification Program, send e-mail
to mcphelp@microsoft.com.

About This Course

Virtual Machine Environment


This section provides the information for setting up the classroom environment to support the business
scenario of the course.

Virtual Machine Configuration


In this course, you will use Microsoft Hyper-V to perform the labs.
The following table shows the role of each virtual machine used in this course:
Virtual machine

Role

10777-8A-MIA-SQLBI

Application Server

10777-8-MIA-DC1

Domain Controller

MT11-MSL-TMG1

Internet Gateway

Software Configuration
The following software is installed on each VM:

Windows Server 2008 R2 SP1

Microsoft SQL Server 2012 (on 10777-8A-MIA-SQLBI only)

Microsoft SharePoint Server 2010 (on 10777-8A-MIA-SQLBI only)

Microsoft Office 2010 (on 10777-8A-MIA-SQLBI only)

Course Files
There are files associated with the labs in this course. The lab files are located in the folder
D:\10777A\Labfiles\LabXX on the 10777-8A-MIA-SQLBI VM.

Classroom Setup
Each classroom computer will have the same virtual machine configured in the same way.
Course Hardware Level 6+
To ensure a satisfactory student experience, Microsoft Learning requires a minimum equipment
configuration for trainer and student computers in all Microsoft Certified Partner for Learning Solutions
(CPLS) classrooms in which Official Microsoft Learning Product courseware are taught.

MCT USE ONLY. STUDENT USE PROHIBITED

xxii

MCT USE ONLY. STUDENT USE PROHIBITED


1-1

Module 1
Introduction to Data Warehousing
Contents:
Lesson 1: Overview of Data Warehousing

1-3

Lesson 2: Considerations for a Data Warehouse Solution

1-14

Lab 1: Exploring a Data Warehousing Solution

1-28

Introduction to Data Warehousing

Module Overrview

Data warehousing
g is a solution that organizattions can use tto centralize business data fo
or reporting an
nd
analysis. Impleme
enting a data warehouse
w
solu
ution can provvide a businesss or other orgaanization with
sign
nificant benefitts, including:

Comprehensive and accura


ate reporting of
o key businesss information.

A centralized source of bussiness data for analysis and d


decision makin
ng.

The foundatio
on for an ente
erprise business intelligence ((BI) solution.

Thiss module provides an introduction to the key


e
k componen
nts of a data w
warehousing so
olution and the
high
h-level conside
erations that you
y must take into account w
when you emb
bark on a data warehousing
projject.
Afte
er completing this module, you
y will be able to:

Describe the key elements of a data ware


ehousing soluttion.

Describe the key considerattions for a data warehousing


g project.

MCT USE ONLY. STUDENT USE PROHIBITED

1-2

MCT USE ONLY. STUDENT USE PROHIBITED

10777A: Implementing a Datta Warehouse with M


Microsoft SQL Server 2012

Lesson
n1

Overv
view of Data Warehou
W
using

1-3

Data warehousing is a well-esstablished tech


hnique for cen
ntralizing busin
ness data for re
eporting and aanalysis.
Although the sp
pecific details of
o individual so
olutions can vvary, there are some common elements in most
da
ata warehousing implementtations. Familia
arity with thesee elements willl enable you tto better plan and
bu
uild an effectivve data wareho
ousing solution.
After completin
ng this lesson, you
y will be able to:

Describe th
he business pro
oblem that datta warehousess address.

Define a da
ata warehouse.

Describe th
he commonly used
u
data ware
ehouse architeectures.

Identify the
e components of a data ware
ehousing soluttion.

Describe a high-level app


proach to impllementing a daata warehousing project.

e roles that are


e involved in a data warehou
using project.
Identify the

Describe th
he componentss and features of Microsoft
SQL Server
and other Microsoft produ
ucts that
you can use
e in a data warrehousing solu
ution.

Introduction to Data Warehousing

The Businesss Problem

MCT USE ONLY. STUDENT USE PROHIBITED

1-4

Run
nning a business effectively can
c present a significant
s
cha llenge, particu
ularly as the bu
usiness grows or is
affe
ected by trendss in the busine
esss target ma
arket or the glo
obal economyy. To be successful, a businesss
musst adapt to cha
anging conditiions, which req
quires individu
uals within thee organization to make good
d
wing businesss problems can
strategic and tactical business decisions.
d
How
wever, the follow
n often make
effe
ective business decision making difficult:

Key business data


d
is distribu
uted across mu
ultiple systems.. This makes it hard to collatte all of the
information necessary
n
for a particular bussiness decision
n.

Finding the in
nformation req
quired for busin
ness decision m
making is time--consuming an
nd error-pronee. The
need to gathe
er and reconciile data from multiple
m
sourcees results in slo
ow, inefficient decision making
processes tha
at can be further undermined
d through inco
onsistencies beetween duplicate, contradicttory
sources of the
e same inform
mation.

Fundamental business quesstions are hard


d to answer. Mo
ost business deecisions requirre a knowledge of
fundamental facts, such as How many cu
ustomers do w
we have? or W
Which productts do we sell m
most
often? Altho
ough these may seem like sim
mple questionss, the distributtion of data th
hroughout mulltiple
systems in a typical
t
organizzation can mak
ke them difficu
ult, or even im
mpossible, to an
nswer.

r
these
e problems, it is possible to make
m
effectivee decisions thaat will help the business to be
By resolving
morre successful
both at the sttrategic, executive level and during day-to
o-day businesss operations.

MCT USE ONLY. STUDENT USE PROHIBITED

10777A: Implementing a Datta Warehouse with M


Microsoft SQL Server 2012

What
W
Is a Data
D
Wareh
house?

1-5

A data warehou
use provides a solution to the problem of d
distributed da ta that preven
nts effective bu
usiness
de
ecision making
g. There are many
m
definition
ns for the term data wareho
ouse, and disaagreements ovver
sp
pecific implem
mentation detaiils, but it is gen
nerally agreed
d that a data w
warehouse is a centralized sto
ore of
bu
usiness data th
hat can be use
ed for reporting
g and analysiss to inform bussiness decision
ns.
Tyypically, a data
a warehouse:

Contains a large volume of data that re


elates to historrical business ttransactions.

Is optimized
d for read ope
erations that su
upport queryin
ng the data. Th
his is in contraast to a typical online
transaction processing (O
OLTP) database
e that is design
ned to supporrt data insert, u
update, and de
elete
operations, too.

Is loaded with
w new or updated data at regular intervaals.

Provides the basis for entterprise BI app


plications.

Introduction to Data Warehousing

Da
ata Wareho
ouse Archiitectures

There are many ways


w
that you can
c implementt a data wareh
house solution in an organization. Some
com
mmon approacches include:

Creating a sin
ngle, central en
nterprise data warehouse fo r all business u
units.

Creating small, departmenttal data warehouses for indivvidual business units.

Creating a hu
ub-and-spoke architecture th
hat synchronizzes a central en
nterprise data warehouse wiith
departmental data marts th
hat contain a subset
s
of the d
data warehous e data.

The right architeccture for a give


en business might be one off these, or a co
ombination of various elements
from
m all three app
proaches.

MCT USE ONLY. STUDENT USE PROHIBITED

1-6

MCT USE ONLY. STUDENT USE PROHIBITED

10777A: Implementing a Datta Warehouse with M


Microsoft SQL Server 2012

Componen
C
nts of a Data Wareho
ousing Solution

A data warehou
using solution usually consistts of the follow
wing elementss:

1-7

Data sourcees. Sources of business data for the data w


warehouse, ofteen including O
OLTP applicatio
on
databases and
a data that has been expo
orted from pro
oprietary system
ms such as acccounting appliications.

An extract, transform, and


d load (ETL) prrocess. A workfflow for accesssing data in th
he data sourcess,
modifying it to conform to
t the data mo
odel for the daata warehousee, and loading it into the datta
warehouse..

Data stagin
ng areas. Interm
mediary locations where thee data that is b
being transferred to the dataa
warehouse is stored to prrepare it for im
mport into the data warehou
use and synchrronize data
warehouse loads.

A data wareehouse. A relattional databasse that has beeen designed to


o provide high
h-performance
e
querying off historical bussiness data forr reporting and
d analysis.

In
n addition, man
ny data warehousing solutio
ons also includ e:

Data cleanssing and dedup


plication. A sollution for reso
olving quality isssues in the daata before it is loaded
into the data warehouse.

a managemen
nt (MDM). A so
olution that pro
ovides an auth
horitative dataa definition forr
Master data
business en
ntities that multiple systems across the org
ganization use..

Introduction to Data Warehousing

Da
ata Wareho
ousing Pro
ojects

MCT USE ONLY. STUDENT USE PROHIBITED

1-8

A da
ata warehousing project hass a great deal in
i common wiith any other ITT implementation project, so
o it is
possible to apply most common
nly used metho
odologies, succh as Agile or M
Microsoft Solu
utions Framework
(MS
SF). However, a data warehou
using project often
o
requires a deeper understanding of the key busine
ess
obje
ectives and me
etrics that are used to drive decision
d
makin
ng than other software deve
elopment or
infra
astructure projjects.
A hiigh-level approach to implementing a datta warehousing
g project usuaally includes th
he following ste
eps:
1.

2.

Work with bu
usiness stakeho
olders and info
ormation workkers to determ ine the busine
ess questions to
which the datta warehouse must provide answers. Theyy may include q
questions such
h as:

What was the total sale


es revenue for each geograp
phic sales territtory in a given
n month?

What are
e our most pro
ofitable produccts or services??

Are our costs


c
growing or reducing ovver time?

Which sa
ales employeess are meeting their sales targ
gets?

Determine the data that is required to an


nswer these qu
uestions. It is normal to thinkk of this data in
terms of dim
mensions and facts. Facts contain
c
the nu
umerical measu
ures that you n
need to aggregate
so that you ca
an answer the business quesstions that werre identified in
n step 1 (for exxample, to
determine sales revenue, yo
ou may need the
t sales amou
unt for each in
ndividual sales transaction).
Dimensions represent the different
d
aspeccts of the busin
ness by which you want to aggregate the
measures (forr example, to determine
d
sale
es revenue for each territoryy in a given mo
onth, you may need
two dimensio
ons: a geograp
phic dimension
n so that you ccan aggregate sales by territo
ory, and a time
dimension so
o that you can aggregate sale
es by month).
Note Fact and dimensional modeling is covered in m
more detail in Module 3, De
esigning
and Implementing a Data Warehouse.

3.

4.

MCT USE ONLY. STUDENT USE PROHIBITED

10777A: Implementing a Data Warehouse with Microsoft SQL Server 2012

1-9

Identify data sources that contain the data that is required to answer the business questions. These
are commonly relational databases that existing line-of-business applications use, but they can also
include:

Flat files or XML documents that have been extracted from proprietary systems.

Data in Microsoft SharePoint lists.

Commercially available data that has been purchased from a data supplier such as the Microsoft
Windows Azure Marketplace.

Determine the priority of each business question based on:

The importance of answering the question in relation to driving key business objectives.

The feasibility of answering the question from the data available.

Business importance of the question >

A common approach to prioritizing the business questions that you will address in the data
warehousing solution is to work with key business stakeholders and plot each question on a
quadrant-based matrix like the one shown below. The position of the questions in the matrix helps
you to agree the scope of the data warehousing project.
High importance,
low feasibility

High importance,
high feasibility

Low importance,
low feasibility

Low importance,
high feasibility

Feasibility of answering the question >

If a large number of questions fall into the high importance, high feasibility category, you may want to
consider taking an incremental approach to the project in which you break down the challenge into a
number of sub-projects. Each sub-project tackles the problem of implementing the data warehouse
schema, ETL solution, and data quality procedures for a specific area of the business, starting with the
highest-priority business questions. If you take this incremental approach, you should take care to create
an overall design for dimension and fact tables in early iterations of the solution so that subsequent
additions to the solution can reuse them.

Da
ata Wareho
ousing Pro
oject Roless

A da
ata warehousing project typ
pically involves several roles. These roles in
nclude:

MCT USE ONLY. STUDENT USE PROHIBITED

1-10 Introductiion to Data Warehouusing

A project man
nager. Coordin
nates project ta
asks and sched
dules and ensu
ures that the p
project is comp
pleted
on time and within
w
budget.

A solution arcchitect. Has ove


erall responsib
bility for the teechnical design
n of the data w
warehousing
solution.

A data modeller. Designs the


e data wareho
ouse schema.

A database ad
dministrator. Designs
D
the ph
hysical architeccture and conffiguration of th
he data warehouse
database. In addition,
a
datab
base administrrators who havve responsibility for data sou
urces that are used
in the data warehousing so
olution must be
e involved in tthe project to p
provide accesss to the data
sources that the
t ETL processs uses.

An infrastructture specialist. Implements th


he server and network infrasstructure for th
he data
warehousing solution.

An ETL develo
oper. Builds the
e ETL workflow
w for the data warehousing solution.

Business userss. Provide requ


uirements and
d help to prioriitize the busineess questions tthat the data
warehousing solution will answer.
a
Often, the team inclu
udes a businesss analyst as a full-time mem
mber
to help to interpret the bussiness question
ns and ensure that the solutiion design me
eets the needs of the
users.

Testers. Verifyy the business and operation


nal functionalitty of the solution as it is devveloped.

Data steward
ds for each key subject area in
n the data warrehousing solu
ution. Determin
ne data qualityy rules
and validate data
d
before it enters the datta warehouse. Data stewardss are sometime
es referred to as
data governors.

MCT USE ONLY. STUDENT USE PROHIBITED

10777A: Implementing a Data Warehouse with Microsoft SQL Server 2012

1-11

In addition to ensuring the appropriate assignment of these roles, you should also consider the
importance of executive-level sponsorship of the data warehousing project. The project is significantly
more likely to succeed if a high-profile executive sponsor is seen to actively support the creation of the
data warehousing solution.

SQ
QL Server As
A a Data Warehousi
W
ng Platforrm

SQLL Server includes componentts and featuress that you can use to implem
ment various aarchitectural
elem
ments of a data warehousing
g solution. The
ese componen ts and featurees include:

MCT USE ONLY. STUDENT USE PROHIBITED

1-12 Introductiion to Data Warehouusing

The SQL Serveer database en


ngine. A highlyy scalable relattional databasee managemen
nt system (RDB
BMS)
on which you
u can implement a data ware
ehouse. SQL Seerver Enterprisse includes feaatures that make it
particularly appropriate forr data warehou
using solutionss. One feature is optimizatio
on of star join
queries, which significantly enhances the performance of queries in a typical data w
warehouse sch
hema.
Another featu
ure is column store
s
indexes, which can sig nificantly enhaance the perfo
ormance of datta
warehouse workloads.

SQL Server Integration Servvices. A compre


ehensive and eextensible plattform for creatting ETL solutio
ons,
including sup
pport for a wid
de range of datta sources and
d numerous bu
uilt-in data flow
w transformations
and control fllow tasks for common
c
ETL re
equirements.

SQL Server Master


M
Data Serrvices. A maste
er data manag ement solutio
on that enabless organizations to
create authorritative data de
efinitions for key
k business en
ntities, and enssure data conssistency acrosss
multiple applications and syystems.

ata Quality Serrvices. A know


wledge-based ssolution for vaalidating, clean
nsing, and
SQL Server Da
deduplicating
g data.

Microsoft SQLL Azure. A clo


oud-based dattabase platfor m that could b
be used to pro
ovide a data so
ource
in a data ware
ehousing soluttion.

The Windowss Azure Markettplace DataMa


arket. A cloud--based reposito
ory of comme
ercially available
datasets that can be incorp
porated into yo
our data wareh
house or that SSQL Server Data Quality Servvices
can use to validate and clea
anse data.

MCT USE ONLY. STUDENT USE PROHIBITED

10777A: Implementing a Data Warehouse with Microsoft SQL Server 2012

1-13

In addition, you can use some SQL Server components and other Microsoft products to build an
enterprise BI solution that extends the value of your data warehouse significantly. These components and
products include:

SQL Server Analysis Services. A service for creating multidimensional and tabular analytical data
models for so-called slice and dice analysis, and for implementing data mining models that you can
use to identify trends and patterns in your data.

SQL Server Reporting Services. A solution for creating and distributing reports in a variety of formats
for online viewing or printing.

Microsoft SharePoint Server. A web-based portal through which information workers can consume
reports and other BI deliverables.

Microsoft Excel. The worlds most commonly used spreadsheet and data analysis tool.

Microsoft PowerPivot technologies. A powerful analytical engine that enables analysis of large volumes
of data in Excel and sharing of tabular data models in SharePoint Server.

Microsoft Power View. A data visualization tool that provides an intuitive, interactive experience for
users who need to perform unstructured analysis of data in a BI semantic model.

Lesson 2

Consid
deration
ns for a Data Warehou
W
use Solu
ution

Befo
ore starting a data
d
warehoussing project, th
here are severaal consideratio
ons of which you should be
awa
are. Understanding these con
nsiderations will
w help you to
o create a data warehousing solution that
add
dresses your sp
pecific needs and constraintss.
Thiss lesson describ
bes some of th
he key conside
erations for plaanning a data warehousing ssolution. Afterr
com
mpleting this le
esson, you will be able to:

Describe conssiderations forr designing a data


d
warehousse database.

Describe conssiderations forr data sources.

Describe conssiderations forr designing an ETL process.

Describe conssiderations forr implementing


g data quality and master daata manageme
ent.

MCT USE ONLY. STUDENT USE PROHIBITED

1-14 Introductiion to Data Warehouusing

Data
D
Wareh
house Database and
d Storage

MCT USE ONLY. STUDENT USE PROHIBITED

10777A: Im
mplementing a Data Warehouse with Miccrosoft SQL Server 20012

1-15

A data warehou
use is a relation
nal database that is optimizeed for reading data for analyysis and reportting.
When
W
you are planning
p
a data warehouse, you
y should takke the followin
ng considerations into accou
unt.

Database
D
Sch
hema

Th
he logical sche
ema of a data warehouse is typically
t
desig ned to denorm
malize the data into a structure that
minimizes
m
the number
n
of JOIN
N operations that
t
are requirred in the querries that are ussed to retrieve
e and
ag
ggregate data. A common approach is to design a star sschema in whi ch numerical m
measures are sstored
in
n fact tables that have foreign keys to multtiple dimension
n tables that ccontain the business entities by
which
w
the meassures can be ag
ggregated. Beffore you desig
gn your data w
warehouse, you
u must know w
which
diimensions you
ur business use
ers need to use
e when aggreg
gating data, w hich measuress need to be an
nalyzed
an
nd at what gra
anularity, and which
w
facts incclude those meeasures. You m
must also plan the keys that will be
ussed to link factts to dimensio
ons carefully, and consider w
whether your data warehouse
e must supporrt the
usse of dimensio
ons that chang
ge over time (fo
or example, haandling dimen
nsion records ffor customers w
who
ch
hange their ad
ddress).
Yo
ou must also consider
c
the ph
hysical implem
mentation of th
he database, b
because this wiill affect the
pe
erformance an
nd manageability of the data
a warehouse. I t is common tto use table paartitioning to
diistribute large fact data acro
oss multiple file
egroups, each on a differentt physical disk.. This can incre
ease
qu
uery performa
ance and enables you to imp
plement a fileg
group-based b
backup strategy that can help
p
re
educe downtim
me in the event of a single-d
disk failure. You
u should also cconsider the aappropriate ind
dexing
sttrategy for you
ur data, and wh
hether to use data
d
compresssion when storring the data.
Note De
esigning a data
a warehouse schema is coveered in more detail in Modulle 3,
Designing
g and Impleme
enting a Data Warehouse.

Hardware
The choice of hardware for your data warehouse solution can make a significant difference to the
performance, manageability, and cost of your data warehouse. The hardware considerations for a data
warehouse include:

Query processing requirementsincluding anticipated peak memory and CPU utilization.

Storage volume and disk input/output requirements.

Network connectivity and bandwidth.

Component redundancy for high availability.

You can choose to build your own data warehouse solution by purchasing and assembling individual
components, use a pretested reference architecture, or purchase a hardware appliance that includes
preconfigured components in a ready-to-use package. Factors that influence your choice of hardware
include:

Budget.

Existing enterprise agreements with hardware vendors.

Time to solution.

Hardware assembly and configuration expertise.


Note Hardware for data warehousing solutions is discussed in more detail in Module 2,
Data Warehouse Hardware.

High Availability and Disaster Recovery

MCT USE ONLY. STUDENT USE PROHIBITED

1-16 Introduction to Data Warehousing

A data warehouse can very quickly become a business-critical part of your overall application
infrastructure, so it is essential to consider how you will ensure its availability. SQL Server includes support
for several high-availability techniques including database mirroring and server clustering. You must
assess these technologies and choose the best one for your individual solution based on:

Failover time requirements.

Hardware requirements and cost.

Configuration and management complexity.

In addition to a server-level high-availability solution, you must also consider redundancy at the individual
component level for network interfaces and storage arrays.

MCT USE ONLY. STUDENT USE PROHIBITED

10777A: Implementing a Data Warehouse with Microsoft SQL Server 2012

1-17

The most robust high-availability solution cannot protect your data warehouse from every eventuality, so
you must also plan a suitable disaster recovery solution that includes a comprehensive backup strategy.
Your backup strategy should take into account:

The volume of data in the data warehouse.

The frequency of changes to data in the data warehouse.

The effect of the backup process on data warehouse performance.

The time to recover the database in the event of a failure.


For More Information For more information about high availability and disaster
recovery techniques for SQL Server, you should attend Course 10775A, Administering
Microsoft SQL Server 2012 Databases.

Security

Your data warehouse contains a huge volume of data that is typically commercially sensitive. In addition,
you may want to provide access to some data by all users, but restrict access to some data for a subset of
users.
Considerations for securing your data warehouse include:

The authentication mechanisms that you must support to provide access to the data warehouse.

The permissions that the various users who access the data warehouse will require.

The connections over which data is accessed.

The physical security of the database and backup media.


For More Information For more information about security features in SQL Server, you
should attend Course 10775A, Administering Microsoft SQL Server 2012 Databases.

Da
ata Sourcess

You
u must identifyy the data sourrces that provide the data fo
or your data waarehouse, and
d consider the
follo
owing factors when
w
planning
g your solution
n.

Datta Source Connection Types


T

MCT USE ONLY. STUDENT USE PROHIBITED

1-18 Introductiion to Data Warehouusing

You
ur data wareho
ouse may require data from a variety of daata sources. Fo
or each source,, you must con
nsider
how
w your ETL pro
ocess can connect and extracct the required
d data. In manyy cases, your d
data sources w
will be
relational databasses for which you
y can use an
n OLE DB or Op
pen Database Connectivity ((ODBC) provid
der.
How
hat requires a bespoke provvider or for which
wever, some data sources ma
ay use proprie
etary storage th
no provider
p
existss. In this case, you
y must deve
elop a custom provider or deetermine whetther it is possib
ble to
export data from the data sourcce in a format that the ETL p
process can ea sily consume ((such as XML o
or
com
mma-delimited
d text).

Cre
edentials an
nd Permissio
ons
Mosst data sourcess require securre access in the form of userr authenticatio
on and potentiially individuall
perm
missions on th
he data. You must
m
work with the owners off the data sourrces that you u
use in your datta
warehousing solution to establish:

hat your ETL process can use


e to access thee data source.
Credentials th

The required permissions to


o access the da
ata that the daata warehousee uses.

Data Formats

MCT USE ONLY. STUDENT USE PROHIBITED

10777A: Implementing a Data Warehouse with Microsoft SQL Server 2012

1-19

A data source may store data in a different format. Your solution must take into account issues arising
from this, including:

Conversion of data from one data type to anotherfor example, extracting numeric values from a
text file.

Truncation of data when copying data to a destination that has a limited data length.

Date and time formats that are used in data sources.

Numeric formats, scales, and precisions.

Support for Unicode characters.

Data Acquisition Windows

Depending on the workload patterns of the business, each data source may have time periods where the
data source is unavailable or the level of usage is such that the additional overhead of a data extraction is
undesirable. When you plan a data warehousing solution, you must work with each data source owner to
determine appropriate data acquisition windows based on:

The workload pattern of the data source, and its resource utilization and capacity levels.

The volume of data to be extracted, and the time that it takes to extract it.

The frequency with which you need to update the data warehouse with fresh data.

If applicable, the time zones in which business users are accessing the data.

Exttract, Transform, and


d Load Pro
ocesses

A significant part of the effort in


n creating a da
ata warehousee solution is th e implementation of an ETLL
proccess. When yo
ou design an ET
TL process for a data wareho
ousing solution, you must co
onsider the
follo
owing factors.

Sta
aging
In so
ome data ware
ehousing soluttions, you can transfer data directly from d
data sources to
o the data
warehouse withou
ut any interme
ediary staging. However, in m
many cases, yo
ou should conssider staging
data
a to:

MCT USE ONLY. STUDENT USE PROHIBITED

1-20 Introductiion to Data Warehouusing

Synchronize a data warehouse refresh tha


at includes sou
urce data that has been extrracted during
multiple data
a acquisition windows.
w

Perform data validation, cle


eansing, and deduplication
d
o
operations on the data before it is loaded into
the data ware
ehouse.

Perform transsformations on
n the data thatt cannot be peerformed durin
ng the data exxtraction or da
ata
flow processe
es.

If a staging area iss required in your


y
solution, you
y must decid
de on a formaat for the stage
ed data. Possib
ble
form
mats include:

A relational database.
d

Text or XML files.


f

Raw files (binary files in a proprietary form


mat of the ETLL platform bein
ng used).

The decision on fo
ormat is based
d on several factors including
g:

The need to access


a
and mo
odify the stage
ed data.

The time thatt is taken to sto


ore and read the
t staging da ta.

MCT USE ONLY. STUDENT USE PROHIBITED

10777A: Implementing a Data Warehouse with Microsoft SQL Server 2012

1-21

Finally, if a relational database is used as the staging area, you must decide where this database will reside.
Possible choices include:

A dedicated staging server.

A dedicated SQL Server instance on the data warehouse server.

A dedicated staging database in the same instance of SQL Server as the data warehouse.

A collection of staging tables (perhaps in a dedicated schema) in the data warehouse database.

Factors that you should consider when deciding the location of the staging database include:

Server hardware requirements and cost.

The time that is taken to transfer data across network connections.

The use of Transact-SQL loading techniques that perform better when the staging data and data
warehouse are co-located on the same SQL Server instance.

The server resource overheads that are associated with the staging and data warehouse load
processes.

Required Transformations

Most ETL processes require that the data that is being extracted from data sources is modified to match
the schema of the data warehouse. When you plan an ETL process for a data warehousing solution, you
must examine the source data and destination schema, and identify what transformations are required.
Then you must determine the optimal place within the ETL process to perform these transformations.
Choices for implementing data transformations include:

During the data extraction. For example, by concatenating two fields in a SQL Server data source into
a single field in the Transact-SQL query that is used to extract the data.

In the data flow. For example, by using a Derived Column data transformation task in a SQL Server
Integration Services data flow.

In the staging area. For example, by using a Transact-SQL query to apply default values to null fields
in a staging table.

Factors that affect the choice of data transformation technique include:

The performance overhead of the transformation. Typically, it is best to use the approach that has the
least performance overhead. Set-based operations that are performed in Transact-SQL queries usually
perform better than row-based transformations that are applied in a data flow.

The level of support for querying and updating in the data source or staging area. In cases where you
are extracting data from a comma-delimited file and staging it in a raw file, your options to perform
transformations are limited to row-by-row transformations in the data flow.

Dependencies on data that is required for the transformation. For example, you might need to look up
a value in one data source to obtain additional data from another data source. In this case, you must
perform the data transformation in a location where both data sources are accessible.

The complexity of the logic that is involved in the transformation. In some cases, a transformation may
require multiple steps and branches depending on the presence or value of specific data fields. In this
case, it is often easier to apply the transformation by combining several steps in a data flow than it is
to create a Transact-SQL statement to perform the transformation.

Incremental ETL

MCT USE ONLY. STUDENT USE PROHIBITED

1-22 Introduction to Data Warehousing

After the initial load of the data warehouse, you will usually need to incrementally load new or updated
source data into the data warehouse. When you plan your data warehousing solution, you must consider
the following factors that relate to incremental ETL:

How will you identify new or modified records in the data sources?

Do you need to delete records in the data warehouse when corresponding records in the data
sources are deleted? If so, will you physically delete the records, or simply mark them as inactive
(often referred to as a logical delete)?

How will you determine whether a record that is to be loaded into the data warehouse should be a
new record or an update to an existing record?

Are there records in the data warehouse for which historical values must be preserved by creating a
new version of the record instead of updating the existing record?
Note Managing data changes in an incremental ETL process is discussed in more detail in
Module 7, Implementing an Incremental ETL Process.

Data
D
Qualitty and Master Data Managem
ment

MCT USE ONLY. STUDENT USE PROHIBITED

10777A: Im
mplementing a Data Warehouse with Miccrosoft SQL Server 20012

1-23

Th
he usefulness of
o a data ware
ehouse is large
ely determined
d by the qualityy of the data tthat it containss. For
th
his reason, whe
en you plan a data warehousing project, yyou should dettermine how yyou will ensure
e data
qu
uality and you
u should consid
der the use of a master data managementt solution.

Data
D
Quality
y

To
o validate and enforce the quality
q
of the data
d
in your daata warehousee, it is recomme
ended that bu
usiness
ussers who have knowledge off each subject area that the data warehouse addresses ttake on the rolle of
da
ata steward fo
or that area. A data steward is responsible ffor:

ntifies commo
Building an
nd maintaining
g a knowledge base that iden
on data errors and correction
ns.

Validating data
d
against th
he knowledge base.

multiple formss of the value may be


Ensuring th
hat consistent values
v
are used
d for data attrributes where m
considered valid (for exam
mple, ensuring
g that a Counttry field alwayss uses the value United Stattes
when referrring to Americca, even though USA, The U.S. and Am
merica are also
o valid values).

Identifying and correcting


g missing data
a values.

Identifying and consolida


ating duplicate
e data entities (such as a cusstomer record for Robert Sm
mith
and a custo
omer record fo
or Bob Smith that both refeer to the samee physical custo
omer).

Yo
ou can use SQ
QL Server Data Quality Services to provide a data quality solution that helps the dataa
stteward to perfo
orm these task
ks.
Note SQ
QL Server Data Quality Servicces is discussed
d in more detaail in Module 9
9, Enforcing
Data Quality.

Master Data Management

MCT USE ONLY. STUDENT USE PROHIBITED

1-24 Introduction to Data Warehousing

It is common for large organizations to have multiple business applications, and in many cases, these
systems perform tasks that are related to the same business entities. For example, an organization may
have an e-commerce application that enables customers to purchase products, and a separate inventory
management system that also stores data about products. A record representing a particular product may
exist in both systems. It can be useful in this scenario to implement a master data management system
that provides an authoritative definition of each business entity (in this example, a particular product) that
you can use across multiple applications to ensure consistency.
In a data warehousing scenario, the use of master data management is especially important because it
ensures that the data in the data warehouse conforms to the agreed definition for the business entities
that will be included in any analysis and reporting solutions that it must support.
You can use SQL Server Master Data Services to implement a master data management solution.
Note SQL Server Master Data Services is discussed in more detail in Module 10, Using
Master Data Services.

Lab Scenario

MCT USE ONLY. STUDENT USE PROHIBITED

10777A: Im
mplementing a Data Warehouse with Miccrosoft SQL Server 20012

1-25

Th
he labs in this course are bassed on a fictional company n
named Adven ture Works Cyycles that
manufactures
m
and sells cycles and cycling accessories to ccustomers all o
over the world
d. Adventure W
Works
se
ells direct to cu
ustomers throu
ugh an e-commerce websitee, and also thro
ough an intern
national netwo
ork of
re
esellers.
Th
hroughout thiss course, you will
w develop a data warehou
using solution ffor Adventure Works Cycles,,
in
ncluding a data
a warehouse, an
a ETL processs to extract datta from sourcee systems and populate the data
warehouse,
w
a da
ata quality solu
ution, and a master
m
data maanagement sollution.

Th
he lab for this module provid
des a high-levvel overview off the solution tthat you will create in later labs. Use
th
his lab to become familiar wiith the variouss elements of tthe data wareh
housing solutio
on that you will learn
to
o build in laterr modules. Don
nt worry if you
u do not undeerstand the speecific details off how each
co
omponent of the
t solution ha
as been built, you
y will explorre each eleme nt of the soluttion in greater depth
la
ater in the courrse, as describe
ed in the follow
wing table.
Lab

Tasks

Explore
e the complete
e data warehou
using solution that will be developed in th
his course.

Create a data wareho


ouse schema.

Use SQ
QL Server Integration Servicess to implemen
nt data flows th
hat extract, loaad, and transfo
orm
data.

5A

Implem
ment control flo
ow to perform
m sequential an
nd iterative tassks in an ETL so
olution.

5B

Enhancce the reliabilitty of the ETL process with traansactions and


d checkpoints.

Debug an SSIS packa


age and add lo
ogging and errror handling fu
unctionality.

7A

Modifyy the ETL proce


ess to extract only
o
data that has been mod
dified since the
e previous load
d cycle.

(continued)
Lab

Tasks

7B

Modify the ETL process to insert or update data in the data warehouse as appropriate.

Include a cloud-based data source in an ETL process.

9A

Use Data Quality Services to cleanse data before loading it into the data warehouse.

9B

Use Data Quality Services to deduplicate data before loading it into the data warehouse.

10

Use Master Data Services to manage data entity consistency across the enterprise.

11

Extend the ETL solution with custom components and scripts.

12

Deploy and manage a SQL Server Integration Services project.

13

Explore business intelligence solutions based on the data warehouse you have created.

The completed lab solution that you will create throughout this course is illustrated in the following
image.

MCT USE ONLY. STUDENT USE PROHIBITED

1-26 Introduction to Data Warehousing

Note The illustration includes a Master Data Services model for product data, a Data
Quality Services task to cleanse data as it is staged, and cloud data sources. These elements
form part of the complete solution for the lab scenario in this course, but they are not
present in this lab.

MCT USE ONLY. STUDENT USE PROHIBITED

10777A: Implementing a Data Warehouse with Microsoft SQL Server 2012

1-27

Lab 1: Exploring a Da
ata Warrehousing Solu
ution

Exe
ercise 1: Ex
xploring Data
D
Sourcces
Sce
enario

MCT USE ONLY. STUDENT USE PROHIBITED

1-28 Introductiion to Data Warehouusing

Advventure Works uses various software


s
appliccations to man
nage different aspects of the
e business, and
d each
app
plication has itss own data sto
ore. Specificallyy:

Internet saless are processed


d through an e-commerce
e
w
web application
n.

Reseller sales are processed


d by sales repre
esentatives, wh
ho use a reselller sales appliccation. Details of the
sales employe
ees themselves are stored in a separate hu
uman resourcees system.

Reseller paym
ments are processed by an acccounting app
plication.

The senior sales executives use a SharePo


oint application
n to manage rreseller accoun
nt managers.

Products are managed in a product catalog and inventtory system.

Some businesss partners, succh as the mark


keting agency that Adventurre Works uses to conduct
marketing campaigns, provvide data to Ad
dventure Workks through clo
oud-based dataa stores.

Thiss distribution of
o data has ma
ade it difficult for
f business ussers to answerr key questionss about the ovverall
perfformance of th
he business.

In th
his exercise, yo
ou will examine some of the data sources w
within Adventture Works thaat will be used in the
data
a warehousing
g solution.

The main tasks for this exercise are as follows:


1.

Prepare the lab environment.

2.

View the solution architecture.

3.

View the Internet Sales data source.

4.

View the Reseller Sales data source.

5.

View the Products data source.

6.

View the Human Resources data source.

7.

View the Accounts data source.

8.

View the Regional Account Managers data source.

9.

View the staging database.

X Task 1: Prepare the lab environment

MCT USE ONLY. STUDENT USE PROHIBITED

10777A: Implementing a Data Warehouse with Microsoft SQL Server 2012

1-29

Ensure that the MIA-DC1 and MIA-SQLBI virtual machines are both running, and then log on to
MIA-SQLBI as ADVENTUREWORKS\Student with the password Pa$$w0rd.

Run the Setup Windows Command Script file (Setup.cmd) in the D:\10777A\Labfiles\Lab01\Starter
folder as Administrator.

X Task 2: View the solution architecture

Use Paint to view the Adventure Works DW Solution.jpg JPEG image in the
D:\10777A\Labfiles\Lab01\Starter folder, and note the data sources in the solution architecture.

X Task 3: View the Internet Sales data source

Use Microsoft SQL Server Management Studio to open the View Internet Sales.sql Microsoft
SQL Server query file in the D:\10777A\Labfiles\Lab01\Starter folder. Use Windows authentication to
connect to the localhost instance of SQL Server.

Execute the query and examine the results. Note that this data source contains data about customers
and the orders that they have placed through the e-commerce web application.

X Task 4: View the Reseller Sales data source

Use SQL Server Management Studio to open the View Reseller Sales.sql Microsoft SQL Server query
file in the D:\10777A\Labfiles\Lab01\Starter folder.

Execute the query and examine the results. Note that this data source contains data about resellers
and the orders that they have placed through Adventure Works reseller account managers.

X Task 5: View the Products data source

Use SQL Server Management Studio to open the View Products.sql Microsoft SQL Server query file
in the D:\10777A\Labfiles\Lab01\Starter folder.

Execute the query and examine the results. Note that this database contains data about products that
Adventure Works sells, and that products are organized into categories and subcategories.

X Task 6: View the Human Resources data source

MCT USE ONLY. STUDENT USE PROHIBITED

1-30 Introduction to Data Warehousing

Use SQL Server Management Studio to open the View Employees.sql Microsoft SQL Server query file
in the D:\10777A\Labfiles\Lab01\Starter folder.

Execute the query and examine the results. Note that this database contains data about employees,
including sales representatives.

X Task 7: View the Accounts data source

Examine the comma-delimited text files in the D:\10777A\Accounts folder by opening them in
Microsoft Excel 2010, and note that they contain details of payments that resellers have made.

Close all files when you have finished reviewing them.

X Task 8: View the Regional Account Managers data source

Use Internet Explorer to view the SharePoint site at http://mia-sqlbi, and examine the Regional
Account Managers list. There is a link to the Regional Account Managers list in the Quick Launch
area of the SharePoint site home page.

X Task 9: View the Staging database

In SQL Server Management Studio, in the Object Explorer pane, examine the tables in the Staging
database in the localhost instance of SQL Server (ensure you examine the Staging database, not the
DQS_STAGING_DATA database).

Note that all tables other than dbo.ExtractLog in this database are empty.

Results: After this exercise, you should have viewed data in the InternetSales, ResellerSales, and
Products SQL Server databases; viewed payments data in comma-delimited files; viewed a list of regional
account managers in a SharePoint site; and viewed an empty staging database.

Exercise 2: Exploring an ETL Process


Scenario

MCT USE ONLY. STUDENT USE PROHIBITED

10777A: Implementing a Data Warehouse with Microsoft SQL Server 2012

1-31

Now that you are familiar with the data sources in the Adventure Works data warehousing solution, you
will examine the ETL process that is used to stage the data, and then load it into the data warehouse.
Adventure Works uses a solution based on SQL Server Integration Services to perform this ETL process.
The main tasks for this exercise are as follows:
1.

View the solution architecture.

2.

Run the ETL staging process.

3.

View the staged data.

4.

Run the ETL data warehouse load process.

X Task 1: View the solution architecture

Use Paint to view the Adventure Works DW Solution.jpg JPEG image in the
D:\10777A\Labfiles\Lab01\Starter folder, and note the ETL processes in the solution architecture.

X Task 2: Run
n the ETL sta
aging processs

MCT USE ONLY. STUDENT USE PROHIBITED

1-32 Introductiion to Data Warehouusing

Open the AdventureWork


ksETL.sln soluttion file in the D:\10777A\Laabfiles\Lab01\SStarter folder w
with
SQL Server Data Tools whicch is a Microso
oft Visual Studiio-based deveelopment envirronment).

In the Solutio
on Explorer pan
ne, view the SS
SIS packages tthat this solutio
on contains, and then doubleclick Stage Data.dtsx
D
to op
pen it in the designer. The p
package should
d resemble this.

View the conttrol flow of the


e Stage Data.dtsx package,, and then run
n the package by clicking Sta
art
Debugging on
o the Debug menu. The pa
ackage will run
n other packag
ges to perform
m the tasks in the
control flow. This may take several minuttes.

When the pacckage has finisshed running, a message bo


ox will be displaayed. After vie
ewing this messsage
box, stop the package by clicking Stop Debugging
D
on
n the Debug m
menu.
m be hidden
n by the Visua l Studio windo
ow. Look for a new icon
Note The message box may
bar, and then click
c
it to bring
g the message box to the fro
ont.
on the taskb

X Task 3: View
w the staged
d data

Use SQL Server Manageme


ent Studio to view the Stagin
ng database in
n the localhosst instance of SSQL
Server (take care
c
to view th
he Staging dattabase, not thee DQS_STAGIN
NG_DATA dattabase).

Note that som


me of the table
es now contain
n data.

X Task 4: Ru
un the ETL data
d
warehouse load pro
ocess

MCT USE ONLY. STUDENT USE PROHIBITED

10777A: Im
mplementing a Data Warehouse with Miccrosoft SQL Server 20012

1-33

In Visual Studio, in the So


olution Explore
er pane, view tthe SSIS packaages that the A
AdventureWo
orksETL
solution contains, and the
en double-clicck Load DW.d
dtsx to open itt in the designe
er. The packag
ge
should rese
emble this.

View the co
ontrol flow of the
t Load DW..dtsx packagee, and then run
n the package by clicking Sta
art
Debugging
g on the Debu
ug menu. The package will rrun other packkages to perforrm the tasks in
n the
control flow
w. This may tak
ke several minutes.

When the package


p
has finished running
g, a message b
box will be displayed. After vviewing this m
message
box, stop th
he package byy clicking Stop
p Debugging o
on the Debug
g menu.
Note Th
he message box may be hidd
den by the Visu
ual Studio win
ndow. Look forr a new icon
on the taskbar, and then
n click it to bring the messag
ge box to the ffront.

Results: After th
his exercise, yo
ou should have
e viewed and run the SQL Seerver Integratiion Services paackages
th
hat perform the ETL process for the Adven
nture Works daata warehousin
ng solution.

Exercise 3: Exploring a Data Warehouse


Scenario
Now that you have explored the ETL process that is used to populate the Adventure Works data
warehouse, you can explore the data warehouse itself to see how it enables business users to view key
business information.
The main tasks for this exercise are as follows:
1.

View the solution architecture.

2.

Query the data warehouse.

X Task 1: View the solution architecture

Use Paint to view the Adventure Works DW Solution.jpg JPEG image in the
D:\10777A\Labfiles\Lab01\Starter folder, and note the data warehouse in the solution architecture.

X Task 2: Query the data warehouse

MCT USE ONLY. STUDENT USE PROHIBITED

1-34 Introduction to Data Warehousing

Use SQL Server Management Studio to open the Query DW.sql Microsoft SQL Server query file in the
D:\10777A\Labfiles\Lab01\Starter folder.

Use Windows authentication to connect to the localhost instance of SQL Server, and execute the
query in the AWDataWarehouse database.

Execute the query and examine the results. Note that the data warehouse contains the data necessary
to view key business metrics across multiple aspects of the business.

Results: After this exercise, you should have successfully retrieved business information from the data
warehouse.

Modu
ule Reviiew and
d Takeaw
ways

Review
R
Quesstions

MCT USE ONLY. STUDENT USE PROHIBITED

10777A: Im
mplementing a Data Warehouse with Miccrosoft SQL Server 20012

1..

Why mightt you consider including a staging area in yyour ETL soluttion?

2..

What optio
ons might you consider for performing
p
datta transformattions in an ETLL solution?

3..

Why would
d you assign th
he data steward
d role to a bussiness user rather than a dattabase technollogy
specialist?
For More
e Information
n For more in
nformation abo
out Best Practiices for Data W
Warehousing
with SQL Server 2008 R2
2, see http://g
go.microsoft.co
om/fwlink/?Lin
nkID=246719.

1-35

MCT USE ONLY. STUDENT USE PROHIBITED

MCT USE ONLY. STUDENT USE PROHIBITED


2-1

Module 2
Data Warehouse Hardware
Contents:
Lesson 1: Considerations for Building a Data Warehouse
Lesson 2: Data Warehouse Reference Architectures and Appliances

2-3
2-11

Data Warehhouse Hardware

Module Overrview

MCT USE ONLY. STUDENT USE PROHIBITED

2-2

The hardware cho


oices that you make when de
esigning and b
building a dataa warehouse w
will directly affe
ect
the performance that
t
the data warehouse
w
dellivers. Therefo re, it is very im
mportant to ide
entify the right
hard
dware at an ea
arly stage in th
he design proccess. However, data warehou
use workloads differ significaantly
from
m the workload
ds of transactional systems, and it is not a lways obvious what the bestt approach to
hard
dware design might be for any
a given situa
ation.

Thiss module describes the chara


acteristics of tyypical data waarehouse work loads, and exp
plains how you
u can
use reference arch
hitectures and data warehou
use appliances to ensure thaat you build the system that is
righ
ht for your organization.
Afte
er completing this module, you
y will be able to:

Describe the main hardware consideratio


ons for building
g a data wareh
house.

Explain how to
t use referencce architecture
es and data waarehouse appliances to creatte a data
warehouse.

MCT USE ONLY. STUDENT USE PROHIBITED

10777A: Implementing a Datta Warehouse with M


Microsoft SQL Server 2012

Lesson
n1

Consiideratio
ons for Building
B
g a Dataa Wareh
house

2-3

To
o build a data warehouse that meets the requirements
r
o
of your organi zation, it is important that yyou
un
nderstand the characteristicss of typical datta warehouse workloads, ho
ow hardware affects data warehouse
pe
erformance, an
nd what options are available to you for im
mplementing a data wareho
ouse solution.
Th
his lesson desccribes data warehouse workloads and expllains how theyy differ from th
he workloads tthat
trransactional da
atabases handle. It also expla
ains how hardw
ware affects data warehouse
e performance
e, and
de
escribes the ch
hoices that you
u have for buillding a data w
warehouse.
After completin
ng this lesson, you
y will be able to:

Describe da
ata warehouse
e workloads.

Describe th
he typical comp
ponents of a data
d
warehousse system.

Describe th
he consideratio
ons for data wa
arehouse hard
dware.

Describe th
he options for implementing data warehou
use hardware.

Data Warehhouse Hardware

Da
ata Wareho
ouse Work
kloads

MCT USE ONLY. STUDENT USE PROHIBITED

2-4

A da
ata warehouse
e might contain millions of rows of data, a nd will increasse in size with every data loaad. A
typical data wareh
house query in
nvolves selectin
ng, summarizi ng, grouping, and filtering rrows to return a
he rows in thee database. Forr example, a
rang
ge of data thatt might itself consist
c
of a large subset of th
business analyst might
m
issue a query
q
that retu
urns a summarry of sales for a particular pro
oduct between
n
two
o defined datess. Depending on
o the dates th
hat the analystt chooses, the query might rrequire Microssoft
SQLL Server to acccess hundred
ds of thousands or even milliions of rows. TThis is quite diffferent to the w
way
thatt an online transactional pro
ocessing (OLTP
P) database is g
generally used
d. With OLTP d
databases, mosst
activvity involves th
he addition of new rows, and
d the updating
g or deleting o
of existing row
ws. Users usually
worrk with data in OLTP databasses a few rows at a time; theerefore, adminiistrators must optimize the
data
abase for the retrieval
r
of small numbers off rows, such ass creating nonclustered inde
exes.
The different charracteristics of data
d
warehousse queries requ
uire a differentt approach to hardware and
d
softtware configurration than for OLTP databasses. Generally, you should op
ptimize data w
warehouses forr
sequ
uential disk input/output (I/O) activity, wh
hich involves reeading rows frrom the disk in
n the order thaat
theyy are requested. For example
e, if most querries request daata for ranges of dates, then you can store
e the
data
a in date order, which enables the date to be read from the disk as a ssequence. You
u should also kkeep
the following poin
nts in mind wh
hen considerin
ng data wareho
ouse workload
ds:

Queries typica
ally scan large numbers of ro
ows. Scanning instead of seeeking to retrievve rows is morre
efficient when
n a large number of rows is involved, part icularly when tthose rows are
e stored
sequentially on
o the disk. Fo
or example, in a fact table wh
hich stores row
ws ordered by date, it is posssible
to process qu
ueries for date ranges by acccessing the datta sequentiallyy.

Data warehou
uses contain reelatively static data. The conttents of a dataa warehouse tyypically remain
n
static between each bulk lo
oading of data because userss rarely perform update or d
delete operatio
ons.
Consequentlyy, database fra
agmentation iss minimized an
nd data remain
ns in the same
e sequential order
on the disk, which
w
improves scanning perrformance.

MCT USE ONLY. STUDENT USE PROHIBITED

10777A: Implementing a Data Warehouse with Microsoft SQL Server 2012

2-5

Nonclustered indexes can decrease performance. Although nonclustered indexes can speed up
queries that return a small number of rows, for queries that return large datasets nonclustered
indexes can reduce response times because of the random I/O scans that their use generates. In
addition, nonclustered indexes require maintenance and must be rebuilt every time you load data,
which adds considerable management and processing overheads, and which can be problematic
when you have very narrow processing windows available.

Partitioning can improve query response times. Partitioning enables faster processing of data because
it reduces contention and can reduce the number of rows included in a table scan. Using partitions
also simplifies management of sets of data in the data warehouse and helps to minimize
fragmentation.
Note Workloads can vary significantly between data warehouses, so it is important to
assess each data warehouse independently and not to assume that the considerations
outlined above will apply in every case.

Data Warehhouse Hardware

Da
ata Wareho
ouse Syste
em Archite
ecture

MCT USE ONLY. STUDENT USE PROHIBITED

2-6

While you can insstall database software


s
on prractically any ccomputer hard
dware and use it as a data
warehouse, you will
w realize optimal performance and effecttiveness by using a system architecture thaat is
optimized for data warehouse workloads.
w

Cho
oosing the righ
ht componentss for your data
a warehouse iss not just abou
ut purchasing tthe fastest storrage
solu
ution or as much memory ass possible. To build
b
an effect ive data wareh
house solution
n, you must baalance
thesse components together so that
t
a single component do
oes not become a bottleneckk in the system
m and
slow
w down overall throughput. Additionally, you
y must balan
nce the hardw
ware specificatiion for your daata
warehouse againsst the cost of the components. Over-speciffying the hard
dware configurration for yourr
data
a warehouse may
m result in exxpensive, unde
er-utilized harrdware that ex ceeds the requ
uirements for
your data wareho
ouse workload..

Sofftware

SQLL Server data warehouses


w
sho
ould be based on the Dataceentre or Enterprise edition o
of Windows Se
erver
and SQL Server 20
012 Enterprise
e Edition. These
e editions of th
he software en
nable your datta warehouse
system to take ma
aximum advan
ntage of hardw
ware resourcess such as mem ory and storag
ge, as well as
ente
erprise-level ca
apabilities such as server clustering for hig
gh availability.
Dep
pending on the
e specific serve
er and storage
e hardware useed, you may allso benefit from
m hardware
vendor-specific management
m
so
oftware to help
p configure, m
monitor, and op
perate the datta warehouse
hard
dware.

Server Hardware
A data warehouse requires appropriate server hardware to manage its workload. In most enterprise
scenarios, a data warehouse is implemented as one or more server nodes in a rack, and includes the
following hardware resources:

MCT USE ONLY. STUDENT USE PROHIBITED

10777A: Implementing a Data Warehouse with Microsoft SQL Server 2012

2-7

Processors. The number of processor cores and processor speed can be limiting factors where the
total processing capacity is not great enough to handle the throughput from the other components
in the system. However, adding more or faster processors will only improve performance if the other
components of the system can pass data to and from the processors at fast enough speeds.

Memory. Memory aids performance in various ways, for example by enabling SQL Server queries to
be answered from cache, or by enabling join and sort operations to be performed more efficiently.
When there is insufficient memory present, sort and join operations can utilize disk space, which
reduces the available disk capacity and can cause fragmentation.

Storage
While it is possible to host a data warehouse on internal hard-disks in a database server, in enterprise
scenarios it is more common to use a dedicated storage subsystem that includes:

Enclosures. Storage enclosures include on-board disk controllers that manage redundant array of
independent disks (RAID) storage across multiple disks. The server is connected to the storage
enclosures through a direct access connection to a host adapter, or more commonly through a
network connection.

Disk arrays. Each enclosure in a data warehouse system contains multiple hard disks, usually
configured as RAID 10 arrays. The number and speed of disks in a storage array can affect the
performance of the data warehouse. You can choose from a number of disk form factors depending
on your requirements for storage capacity, physical size, and read/write performance. Some disk form
factors to consider include:

Serial Attached SCSI (SAS) magnetic disk drives in large form-factor (LFF) or small form-factor
(SFF). SAS disks offer large storage capacities and sufficient read/write performance for data
warehouse workloads.

Solid State Drive (SSD) a storage device that uses solid state memory instead of a spinning disk.
The lack of moving parts makes SSDs robust and reduces access time when reading data;
increasing overall performance. Additionally, SSDs typically require less power than mechanical
disks. However, the cost-per-gigabyte of SSDs is typically higher than that of SAS disks.

Disks are often the cause of bottlenecks in data warehouses because if the storage system does not
have enough drives, or the drives are not fast enough, throughput to the other components in the
system is lower, and performance will suffer. Additionally, storage requirements for a data warehouse
typically grow considerably over time, so you must plan for extensibility.

Data Warehouse Hardware

Networking
When implementing networking for a data warehouse system, you must consider two network
connections:

MCT USE ONLY. STUDENT USE PROHIBITED

2-8

Storage connectivity. The data warehouse server is typically connected to the storage subsystem
through a network connection. In most enterprise data warehouse systems, a fibre channel switch is
used to provide high-speed connectivity between a host bus adapter (HBA) on the server and the
storage enclosures.

External network connectivity. In addition to the internal connection between the data warehouse
server and its storage subsystem, you must consider how you will connect the data warehouse to an
external network so that client applications can connect and use the data warehouse. The type of
network connectivity used for client access to the data warehouse depends on the network topology
of your organizations local area network (LAN), but you should use a networking technology that
provides adequate bandwidth and throughput for the volumes of data that will be loaded into the
data warehouse by the extract, transform, and load (ETL) process, and retrieved by client applications.

MCT USE ONLY. STUDENT USE PROHIBITED

10777A: Implementing a Datta Warehouse with M


Microsoft SQL Server 2012

Options
O
forr Impleme
enting a Da
ata Wareh
house

2-9

To
o build a data warehouse that meets the reporting
r
and data analysis d
demands of yo
our organizatio
on
re
equires careful planning. A data
d
warehouse
e is not simplyy a modified veersion of a transactional dattabase.
Th
he design conssiderations forr data warehou
uses are quite different to th
hose for OLTP systems. When
de
eciding how to
o approach bu
uilding a data warehouse,
w
yo
ou should conssider several faactors, includin
ng the
avvailable budge
et, the planned
d delivery date
e for the comp
pleted solution
n, and whetherr your organizaation
ha
as individuals who
w have the right skills and
d experience to
o design and b
build a data w
warehouse.

Custom-Buil
C
d Solution

Custom-build so
olutions generrally take the greatest
g
amou
unt of time to ccomplete. The
ey also require the
orrganization to design, assesss, assemble, an
nd test everyth
hing in-house. Therefore, the
e organization
n must
eiither already employ
e
individuals with the necessary
n
skillss, or hire them
m. Although the
e apparent cost of a
cu
ustom-build so
olution might be less than fo
or reference arrchitecture or aappliance-based solutions,
exxtended development times and hiring skiilled individua ls can significaantly increase those costs.
Fu
urthermore, th
here is a risk th
hat despite the
e planning and
d testing that yyou perform, a self-built systtem
might
m
not be ca
apable of meeting the dema
ands placed on
n it. This is partticularly a risk when the indiividuals
in
nvolved have liimited experie
ence with data warehouse im
mplementation
ns and limited knowledge of data
warehouse
w
arch
hitecture.

Reference
R
Arrchitecturess

Th
he purpose of data warehou
use reference architectures
a
iss to minimize tthe risk of failu
ure, reduce costs, and
to
o speed up the
e time to delive
ery for the solution. A refereence architectu
ure is essentiallly a blueprint that
en
nables you to create a data warehouse
w
tha
at is based on a tried and tessted design, re
educing the de
esign
time and level of
o knowledge and
a expertise that an organ ization requirees. Microsoft FFast Track Dataa
Warehouse
W
is a set of referencce architecture
es that are bassed on the SQ L Server platfo
orm. The Fast T
Track
re
eference archittectures use a range of dediccated hardwarre configuratio
ons that are de
esigned to suitt many
diifferent require
ements, enabling companiess to get their d
data warehousse up and runn
ning quickly an
nd in a
co
ost-effective manner.
m

Appliances

MCT USE ONLY. STUDENT USE PROHIBITED

2-10 Data Warehouse Hardware

A data warehouse appliance is a pre-built system that is designed and optimized for data warehousing.
Appliances include servers, storage hardware, an operating system, and a database management system
(DBMS). Data warehouse appliances can be based on symmetric multiprocessing (SMP) hardware
architectures, or increasingly very powerful, massively parallel processing (MPP) systems that are targeted
at large organizations. Because every component in an appliance is already built and configured, they
offer the simplest implementation experience, but can be a less flexible solution than self-build or
reference architectures.

Lesson
n2

MCT USE ONLY. STUDENT USE PROHIBITED

10777A: Im
mplementing a Data Warehouse with Miccrosoft SQL Server 20012

2-11

Data Wareho
ouse Re
eference
e Archittecturess and Appliancces

Bu
uilding a data warehouse byy sourcing and
d testing hardw
ware componeents yourself caan be a complex,
exxpensive, and time-consuming process. Re
eference archittectures and d
data warehouse
e appliances simplify
th
he process of choosing
c
data warehouse ha
ardware, helpin
ng you to stayy on budget an
nd on schedule
e, and
to
o create a data
a warehouse th
hat genuinely meets the neeeds of your com
mpany.
After completin
ng this lesson, you
y will be able to:

ng a data ware
Explain the benefits of Fa
ast Track refere
ence architectu
ures for buildin
ehouse.

Explain the benefits of ussing a data warehouse appliaance for your d


data warehouse.

Describe th
he key featuress of a Parallel Data
D
Warehou
use appliance.

Fasst Track Da
ata Wareh
house

Vallidated Hard
dware Conffigurations

MCT USE ONLY. STUDENT USE PROHIBITED

2-12 Data Warrehouse Hardware

Fastt Track Data Warehouse


W
enables organizattions to createe a data wareh ouse that is baased on a valid
dated
desiign in a time-e
efficient and co
ost-efficient way.
w Fast Track Data Warehouse includes a set of validate
ed
systems from a range of well-kn
nown hardware vendors. Theese systems usse standard haardware, which
h
redu
uces costs, and
d the range of vendors enab
bles companiess who already have relationsships with a
partticular vendor to remain with that vendor if they choosee. When selecting a data warrehouse system
m
from
m the pre-existting configurations, you can choose eitherr a basic evalu ation or a full evaluation. Th
he
basiic evaluation in
nvolves a work
kload-based assessment of tthe data wareh
house requirem
ments. This pro
ocess
is re
elatively brief, and enables co
ompanies to get
g their data w
warehouses up
p and running very quickly. T
The
full evaluation option involves undertaking
u
a more rigorouss assessment o
of workloads, w
which results in a
long
ger wait time, but delivers a system that is likely to meett the organizattions requirem
ments. Furtherrmore,
a full evaluation can
c reduce the
e cost of hardw
ware if testing reveals that, fo
or example, a less powerful
system is required
d than a basic evaluation wo
ould have reco mmended.
ou can use thee Fast Track m
In addition to sele
ecting from the pre-configurred systems, yo
methodology to
o help
to design
d
and build your own data
d
warehouse
e. The Fast Traack methodolo
ogy enables yo
ou to profile
worrkloads and ide
entify benchm
marks so that yo
ou can be con fident in the d
design that you
u create, but this
app
proach can be time
t
consumin
ng and require
es technical kn
nowledge and experience to ensure successs.

Ballanced Hard
dware

Fastt Track Data Warehouse


W
con
nfigurations ba
alance the com
mponent parts of the system to ensure that it
achieves optimal throughput
t
an
nd that no botttlenecks are a ccidentally creeated that will impede
perfformance. A balanced appro
oach starts with
h the processo
ors, evaluating the amount o
of data that eaach
core
e can process as
a it is fed in, and
a the other components aare balanced aagainst this. In addition to
iden
ntifying the op
ptimal hardwarre setup for a given
g
scenario
o, Fast Track Data Warehouse also provide
es
reco
ommendationss for the config
guration of SQ
QL Server, inclu
uding Resource Governor, partitions, indexxes,
and data compresssion, as well as
a recommend
dations on how
w to perform d
data loads with
hout disrupting
g the
sequ
uential organizzation of data on the disks.

Fast Track System Sizing Tool

MCT USE ONLY. STUDENT USE PROHIBITED

10777A: Implementing a Data Warehouse with Microsoft SQL Server 2012

2-13

You can use the Fast Track System Sizing Tool to help you to get a basic understanding of the type of
system that you might require. The Fast Track System Sizing Tool is a Microsoft Excel document into
which you can enter maximum consumption rate (MCR), number of concurrent sessions, and data
capacity requirements values, and it will calculate the approximate number of processor cores and storage
units that are required to satisfy these requirements. MCR is a measure of throughput in MBs per second.
To calculate MCR, you should execute a predefined, read-only, query from the buffer cache and measure
the time it takes to execute the query and the amount of data processed.
For More Information For more information about SQL Server Fast Track Data
Warehousing for Microsoft SQL Server 2008 R2, see http://go.microsoft.com/fwlink
/?LinkID=246719. You can also download the Fast Track System Sizing Tool from this
website.

Da
ata Wareho
ouse Appliiances

MCT USE ONLY. STUDENT USE PROHIBITED

2-14 Data Warrehouse Hardware

While Fast Track Data


D
Warehouse reference architectures
a
caan reduce the time and effo
ort taken to
imp
plement a data
a warehouse, organizations
o
still
s require tecchnical expertise to assemble
e the solution. To
redu
uce the technical burden on organizationss that need a d
data warehousse, and reduce
e the time it takes to
imp
plement a soluttion, Microsoftt has partnered with hardwaare vendors to create pre-co
onfigured data
warehouse appliances that you can procure with
w a single pu
urchase.
The data warehou
use appliances that are availa
able from Mic rosoft and its hardware parttners are based
d on
tested configuratiions, including
g Fast Track refference architeectures, and caan significantlyy reduce the tiime it
take
es to design, in
nstall, and optiimize a data warehouse
w
systtem.
Data warehouse appliances
a
based on Fast Tra
ack Data Ware house reference architecture
es are available for
orga
anizations or departments
d
th
hat need to de
eploy a data w
warehouse solu
ution quickly and with minim
mal
installation and co
onfiguration effort. Addition
nally, large org
ganizations thaat need an enterprise data
warehouse can pu
urchase an app
pliance based on SQL Serverr Parallel Data Warehouse fo
or extreme
scalability and performance.

Data warehouse appliances


a
form
m part of a ran
nge of SQL Serrver-based app
pliances that M
Microsoft and its
hard
dware partners have develop
ped for common database w
workloads. Oth
her types of ap
ppliance includ
de
business decision appliances tha
at provide selff-service busin ess intelligencce (BI) capabilities, and datab
base
servver consolidation appliances that use virtualization techn
nologies to creeate a private ccloud infrastru
ucture
for database
d
serve
ers. SQL Serverr-based applia
ances are availaable from mulltiple hardware
e vendors, and
d
include technical support for the entire applia
ance, including
g software and
d hardware.
nformation For more info
ormation abou
ut SQL Server 2
2008 R2 Data W
Warehouse
For More In
and Businesss Intelligence Appliances,
A
see
e http://go.mi crosoft.com/fw
wlink/?LinkID=
=246721.

Parallel
P
Datta Wareho
ouse Appliances

MCT USE ONLY. STUDENT USE PROHIBITED

10777A: Im
mplementing a Data Warehouse with Miccrosoft SQL Server 20012

2-15

Fa
ast Track Data Warehouse syystems and appliances that aare based on tthem use a sym
mmetric
multiprocessing
m
g (SMP) archite
ecture. With SM
MP systems, th
he system bus is the limiting
g component that
prevents scaling
g up beyond a certain level. As the numbeer of processorrs and the dataa load increase
es, the
bu
us can become
e overloaded and
a becomes a bottleneck. FFor data wareh
houses that require greater
sccalability than a SMP system can provide, you
y can use an
n enterprise daata warehouse
e appliance baased on
Microsoft
M
SQL Server
S
Parallel Data Warehou
use.

SQL Server Parallel


P
Data
a Warehousse
Microsoft
M
SQL Server
S
Parallel Data Warehou
use is an editio
on of SQL Servver that is onlyy available as a
preinstalled and
d configured solution in ente
erprise data waarehouse applliances from M
Microsoft and its
ha
ardware partners. Parallel Da
ata Warehouse
e is designed sspecifically forr extremely larg
ge-scale data
warehouses
w
tha
at need to store and query hundreds of terrabytes of dataa.

Massively
M
Pa
arallel Proce
essing

Pa
arallel Data Warehouse usess a shared- notthing, massive ly parallel processing (MPP) architecture, w
which
de
elivers improved scalability and
a performan
nce over SMP systems. MPP systems delive
er much better
pe
erformance than SMP serverrs for large datta loads. MPP systems use m
multiple servers, called nodess, which
process queries independently in parallel. Parallel
P
processsing involves d
distributing qu
ueries across th
he
no
odes so that each node proccesses only a part
p of the queery; the results of the partial queries are
co
ombined after processing co
ompletes to cre
eate a single r esult set.

Shared-Noth
hing Archite
ecture

Syystems that use shared comp


ponents, such as memory orr disk storage, can suffer from performancce issues
be
ecause of conttention for tho
ose shared com
mponents. Con
ntention occurrs when multip
ple nodes attem
mpt to
acccess a compo
onent at the same time, and it usually resu lts in degrade d performance
e as nodes que
eue to
acccess resources. Shared-noth
hing architectu
ures eliminate contention beecause each no
ode has its ow
wn
de
edicated set of hardware, wh
hich is not use
ed by the othe r nodes. Remo
oving contention from a systtem
re
esults in impro
oved performance, and enables it to handlee larger workloads.

Control Nodes, Compute Nodes, and Storage Nodes

MCT USE ONLY. STUDENT USE PROHIBITED

2-16 Data Warehouse Hardware

A Parallel Data Warehouse appliance consists of a server that acts as the control node, and multiple
servers that act as compute nodes and storage nodes. Each compute node has its own dedicated
processors, memory, and is associated with a dedicated storage node. A dual InfiniBand network connects
the nodes together, and dual fiber channels link the compute nodes to the storage nodes. The control
node intercepts incoming queries, divides each query into multiple smaller operations, and then passes
these on to the compute nodes to process. Each compute node returns the results of its processing back
to the control node. The control node integrates the data to create a result set, which it then returns to
the client.

Control nodes are housed in a rack called the control rack. There are three other types of nodes that share
this rack with the control node:

Management nodes, through which administrators manage the appliance.

Landing Zone nodes, which act as staging areas for data that you load into the data warehouse by
using and extract, transform, and load (ETL) tool.

Backup nodes, which back up the data warehouse.

Compute nodes and storage nodes are housed in a separate rack called the data rack. To scale the
application, you can add more racks as required. Hardware components are duplicated, including control
and compute nodes, to provide redundancy.
You can use a Parallel Data Warehouse appliance as the hub in a hub and spoke configuration, and
populate data marts directly from the data warehouse. Using a hub and spoke configuration enables you
to integrate the appliance with existing data marts or to create local data marts as required. If you use
Fast Track Data Warehouse systems to build the data marts, you can achieve very fast transfers of data
between the hub and the spokes.
For More Information For more information about the Parallel Data Warehouse for
Microsoft SQL Server 2008 R2, see http://go.microsoft.com/fwlink/?LinkID=246722.

Modu
ule Reviiew

Review
R
Quesstions
1..

How do data warehouse workloads difffer from typicaal OLTP worklo


oads?

2..

What are th
he advantagess of using referrence architecttures to createe a data wareh
house?

3..

What are th
he key differen
nces between SMP
S
and MPP
P systems?

MCT USE ONLY. STUDENT USE PROHIBITED

10777A: Im
mplementing a Data Warehouse with Miccrosoft SQL Server 20012

2-17

MCT USE ONLY. STUDENT USE PROHIBITED

MCT USE ONLY. STUDENT USE PROHIBITED


3-1

Module 3
Designing and Implementing a Data Warehouse
Contents:
Lesson 1: Logical Design for a Data Warehouse

3-3

Lesson 2: Physical Design for a Data Warehouse

3-17

Lab 3: Implementing a Data Warehouse Schema

3-27

Designing and
a Implementing a Data Warehouse

Module Overrview

MCT USE ONLY. STUDENT USE PROHIBITED

3-2

A da
ata warehouse
e provides a ce
entralized sourrce of data forr reporting and
d analysis. In m
most cases, a data
warehouse must store
s
extremely large volume
es of data and
d provide userss with fast resp
ponses to com
mplex
que
eries. It is there
efore imperativve that you implement a datta warehouse b
by using desig
gn principles th
hat
optimize data storage efficiencyy and query pe
erformance.

In th
his module, yo
ou will learn ho
ow to impleme
ent the logical and physical architecture o
of a data wareh
house
base
ed on industryy-proven desig
gn principles.
Afte
er completing this module, you
y will be able to:

Implement a logical design for a data warehouse.

Implement a physical desig


gn for a data warehouse.
w

MCT USE ONLY. STUDENT USE PROHIBITED

10777A: Implementing a Datta Warehouse with M


Microsoft SQL Server 2012

Lesson
n1

Logiccal Desig
gn for a Data Wareho
W
ouse

3-3

Th
he logical sche
ema of a data warehouse pla
ays an importaant role in deteermining its efffectiveness ass a
so
ource of reportting and analyytical data. A data
d
warehousee is used prim arily to answer questions ab
bout the
ptimizing the data in this waay
bu
usiness, and must
m
therefore be optimized for data read operations. Op
makes
m
data warrehouses funda
amentally diffe
erent from on line transactio
on processing ((OLTP) databases
ussed by businesss applicationss, which usuallyy need to hand
dle a combinaation of data reead and write
op
perations.
Although there are several da
ata warehouse design metho
odologies, the dimensional m
modeling apprroach
th
hat this lesson describes is an
n industry-proven techniquee for creating eeffective data warehouses.
After completin
ng this lesson, you
y will be able to:

Describe th
he key principle
es of design modeling.
m

Design a star schema for a data wareho


ouse.

Design and
d implement dimension table
es.

Design and
d implement fa
act tables.

Design a sn
nowflake schem
ma for a data warehouse.
w

Design and
d implement a time dimensio
on table.

Designing and
a Implementing a Data Warehouse

Inttroduction
n to Dimen
nsional Mo
odeling

MCT USE ONLY. STUDENT USE PROHIBITED

3-4

A da
ata warehouse
e is designed to support repo
orting and anaalysis that answ
wers key questtions about th
he
business. In most cases, the que
estions that business executivves and inform
mation workers ask are concerned
with
h numerical measures (such as sales revenu
ue, cost, profitt, or stock leveel) aggregated by various keyy
aspe
ects, or dimensions, of the business
b
(such as products, cu
ustomers, emp
ployees, or fisccal time period
ds).
For example, it is common for business
b
executives to requeest reports thatt show measurres such as:

Sales revenue
e by salesperso
on

Profit by prod
duct line

Order quantitty by product

Cost by product

Sales revenue
e by customer

Profit by region

Sales revenue
e by a time perriod such as fisscal quarter

In each
e
of these examples,
e
the required
r
inform
mation consistts of a numericcal business m
measure, aggreg
gated
by a different dim
mension of the business. This approach to m
modeling a daata warehouse is called
dim
mensional modeling.
The first step in de
esigning a dim
mensional mod
del for a data w
warehouse is to determine the questions tthat
the business userss want the data warehouse to
wers to. Comp
t provide answ
piling a list of tthese question
ns will
help
p you identify the numerical measures (som
metimes referrred to as fact s) and dimensions that the data
warehouse must support.
s

MCT USE ONLY. STUDENT USE PROHIBITED

10777A: Implementing a Datta Warehouse with M


Microsoft SQL Server 2012

Star Schem
mas

3-5

When
W
you have
e identified the
e measures and
d dimensions tthat your dataa warehouse m
must support, yyou can
sttart to design the
t logical schema of the da
atabase. A com
mmon techniqu
ue for data waarehouse desig
gn is to
usse a star schem
ma, in which:

Related dim
mensions are grouped
g
into one
o or more diimension tablees.

Related me
easures are gro
ouped into one
e or more factt tables.

Fact tables are related to dimension tab


bles by a forei gn key.

Fa
act tables are generally
g
relatted to multiple
e dimension taables, creating a schema thatt can be visuallized
with
w the fact tab
ble at the centter and each dimension
d
tabl e as the point of a star.
Fa
act tables gene
erally store row
ws that contain
n numerical m
measures involvved in a discre
ete business evvent,
su
uch as a sales order
o
or an acccount transacttion. The dimeension tables sttore data abou
ut the business
en
ntities that are
e involved in th
hese events, su
uch as the custtomer or sales person. In add
dition, most daata
warehouses
w
include a dimension table that stores temporral (time-based
d) data, so thaat you can iden
ntify
when
w
a particular fact event occurred.
o

Designing and
a Implementing a Data Warehouse

Co
onsideratio
ons for Dim
mension Ta
ables

MCT USE ONLY. STUDENT USE PROHIBITED

3-6

Dim
mension tables contain the business attribu
utes by which u
users may wan
nt to aggregatte the measure
es in
the fact tables. When implemen
nting dimensio
on tables, you should consider two importtant aspects off
dim
mension table design:
d
denorm
malization and keys.

Den
normalization

In most
m
data ware
ehouses, the dimension table
es are often wi
wide, meaning tthat they inclu
ude a potentially
larg
ge number of columns
c
to sup
pport the attributes by which
h users want tto view the datta. In some casses,
this design can lea
ad to a large amount
a
of dup
plication in thee table. For exaample, conside
er a dimension
n
tablle named Dim
mSalesPerson that
t
stores info
ormation abou
ut sales emplo
oyees where eaach sales employee
is ba
ased in a particular store. Im
mplementing th
his dimension as a single dim
mension table may result in the
follo
owing data.
Sa
alesPersonKey

EmployeeN
No

SalesPerrsonName

SttoreName

StoreCity

StoreReg
gion

S1201

Ellen Ad
dams

W
West Seattle

Seattle

Washing
gton

S1343

Jeff Pricce

W
West Seattle

Seattle

Washing
gton

L1214

Don Hall

H
Hollywood

Los Angeless

California

L1567

Jane Do
ow

H
Hollywood

Los Angeless

California

MCT USE ONLY. STUDENT USE PROHIBITED

10777A: Implementing a Data Warehouse with Microsoft SQL Server 2012

3-7

In an OLTP database, you would typically normalize this table to eliminate redundancy by creating a
separate Store table. You could even further normalize the data by creating separate City and Region
tables. You could then retrieve the salesperson, store name, city, and region in a single query that uses
JOIN clauses to link the tables. The primary advantages to this approach in an OLTP solution are:

Any modifications to a store name, city, or region can be limited to updating a single value in a single
row, rather than updating the same store name or region in multiple rows.

The data requires less physical storage space.

However, join operations can slow query performance, and in a data warehouse solution, query
performance is generally more important than saving disk space. In addition, because data warehouse
workloads generally involve few or no updates, the value of normalizing the data is diminished. For these
reasons, dimension tables in a data warehouse are generally denormalized to optimize query
performance, and therefore usually contain duplicate values.

Keys

Each row in a dimension table is uniquely identified by a primary key. In most data warehousing solutions,
the data for the dimensions in the data warehouse originates in a business application, where it may
already have a key assigned. For example, the sales employee records in the DimSalesPerson table that
was described earlier in this topic may have been extracted from an existing human resources database,
where each employee is identified by a unique employee number. In a data warehouse, keys that are
assigned in the source system are generally referred to as business keys.
It may seem sensible, therefore, to reuse the existing source business key in the data warehouse. However,
the best practice is generally to define a new key, known as a surrogate key, for the rows in the dimension
table. This is for the following reasons:

The dimension table may contain records that originate from multiple source systems. In this case,
there is no guarantee that the source system keys are unique or of compatible data types.

The business key used in the source system could be a complex string or unique identifier (GUID) data
type. Although it is possible to use such values as primary keys in a data warehouse, simple integer
keys usually result in better query performance when joins must be made between fact and
dimension tables. This generally makes it more effective to create a new integer surrogate key than to
use the source business key.

Data warehouses deal with historical data, and you should anticipate potential changes in dimension
attributes. For example, an employee may transfer from the West Seattle store to the Hollywood
store, but your data warehouse must still reflect sales by that employee prior to the transfer as being
related to the West Seattle store, while sales after the transfer should be related to the Hollywood
store. To accomplish this, you need two versions of the salesperson record in the DimSalesPerson
table, and if the employee number was used as the primary key, this would result in a unique key
violation.
Note A dimension that retains historical versions while reflecting updates in source data
as described above is known as a slowly changing dimension. Slowly changing dimensions
are discussed in Module 7, Implementing an Incremental ETL Process.

In most cases, your dimension tables should include a unique surrogate key as the primary key for the
table, with the original business key retained as a column in the dimension table as a second identifier for
the dimension business entity. For this reason, the business key is sometimes referred to as the alternative
key.

Designing and Implementing a Data Warehouse

Conformed Dimensions

MCT USE ONLY. STUDENT USE PROHIBITED

3-8

In most cases, data warehouses include conformed dimensions. A conformed dimension is a dimension
table representing a business entity that has the same meaning for all fact tables. For example, a date
dimension table usually contains values for calendar and fiscal dates that are applicable across the entire
organization.

In some cases, some users of the data warehouse may use different definitions for business entities that
other users. For example, the manufacturing division might require a different definition for a product
than all other departments. In this scenario, you can either create a separate non-conformed product
dimension for the manufacturing division to use, or add additional columns to the product dimension to
satisfy the needs of the manufacturing division while maintaining a single, conformed product dimension.

MCT USE ONLY. STUDENT USE PROHIBITED

10777A: Implementing a Datta Warehouse with M


Microsoft SQL Server 2012

Considerat
C
ions for Fa
act Tables

3-9

Fa
act tables conttain the busine
ess measures that can be agg
gregated acro
oss dimensionss. When you
im
mplement fact tables, you must consider th
he level of dettail stored in th
he table, the key columns in the
ta
able, the measures in the tab
ble, and any ad
dditional dimeension data thaat needs to be
e stored in the fact
ta
able.

Grain
G

One
O of the mosst important co
onsiderations for
f a fact tablee is the granulaarity, or grain, of the measures that
it contains. For example, conssider a data wa
arehouse in wh
hich sales ordeers must be sto
ored in a fact ttable. A
single order can
n include multiple line items. Therefore, yo
ou can store th
he sales order measures at th
he order
le
evel, or at the line item level.
If you choose to
o define the fa
act table at the
e order level, th
he measures in
n your table m
might look like the
fo
ollowing.
CustomerKey

SalesPerson
nKey

OrderN
No

SalesA
Amount

Shiipping

Discount

1001

$3000

$3
30

$
$6

1002

$1500

$1
10

$
$5

Th
his level of gra
ain enables you
u to include measures
m
that rreflect the totaal sales amoun
nt for the indivvidual
ite
ems in the ord
der, in addition
n to measures such as shippiing and discou
unt that exist aat the order levvel.
However, becau
use the order can
c include mu
ultiple productts, you cannott aggregate th
he measures accross a
product dimenssion.

Defining the fact table at the line item level might result in a table like the following table.
CustomerKey

SalesPersonKey

ProductKey OrderNo

ItemNo

Quantity

SalesAmount

MCT USE ONLY. STUDENT USE PROHIBITED

3-10 Designing and Implementing a Data Warehouse

Shipping Discount

45

1001

$100

$10

106

1001

$200

$20

15

1002

$150

$10

$2
$4
$5

Using the line item level of grain enables you to aggregate the sales amount by product, and also include
a quantity measure so that you can aggregate sales volume in terms of the number of units sold. The
order-level measures, such as discount and shipping, are spread proportionally across the line items in the
order.
In practice, many organizations need to analyze data at multiple levels of grain, in which case you should
consider creating multiple fact tables. For example, in the scenario described above, you could create a
detailed sales order fact table, and a sales order summary fact table.

Keys

The primary key of a fact table is usually a composite key that includes the columns containing the
foreign-key references to the dimension tables. In some cases, you may choose to include additional
business key columns in the primary key to ensure uniqueness. For example, in the sales order fact table
discussed previously, the primary key should consist of the foreign-key columns and the OrderNo and
ItemNo columns, because it is possible for a customer to place two identical orders with the same
salesperson.

Measures

The measures in a fact table are usually aggregated across dimensions, and the most common way to
aggregate measures is to use a sum function to add them together. However, when defining a fact table,
you must consider the kinds of measure that it contains and how they can be aggregated. Measures
typically fall into one of three categories:

Additive measures. Measures that can be added together across all dimensions to create a meaningful
summary. For example, in a sales order fact table, a sales amount measure can be totaled across
products, customers, or employees.

Nonadditive measures. Measures that cannot be added together across any dimension. For example, a
sales order fact table might include a measure for profit margin. However, four sales orders that have
a profit margin of 25 percent do not add up to a total profit margin of 100 percent.

Semi-additive measures. Measures that can be summed across some dimensions, but not others. For
example, a bank transactions fact table might contain an account balance measure. The account
balance measure can be added across a customer dimension to calculate the total amount of
customer money deposited, but adding the balances across a time dimension would result in a
meaningless total because the balance for the year would be calculated as the sum of the balances
for January, February, and so on.

Degenerate Dimensions

Sometimes it makes sense for a fact table to contain some dimension attributes. Typically, this is the case
for attributes that would ordinarily belong in a dimension, but no other related attributes exist (so the
dimension would have only one attribute, the business key); or instances where a dimension would have
the same cardinality as the fact (such as the line number of an invoice).

Snowflake Schemas

In
n most cases, a star schema in which fact ta
ables are relat ed to denormalized dimension tables is th
he
op
ptimal design for the data warehouse.
w
How
wever, in somee cases, it can make sense to
o partially or
co
ompletely norm
malize some dimension
d
tables to create w
whats common
nly referred to as a snowflakee
scchema.
Yo
ou should con
nsider a snowflake schema in
n the following
g scenarios:

MCT USE ONLY. STUDENT USE PROHIBITED

10777A: Im
mplementing a Data Warehouse with Miccrosoft SQL Server 20012

3-11

A subdimen
nsion can be sh
hared between
n multiple dimeensions. For exxample, a data warehouse m
may
contain a dimension
d
table for customers and a dimen
nsion table forr stores. Both ccustomers and
d stores
include som
me dimension attributes thatt relate to theiir geographicaal location, succh as a street aaddress,
a city, a state, a postal code, and a country. You can eensure consisttency of geogrraphic hierarch
hies
across both
h dimensions by
b creating a separate table for the geograaphy dimensio
on and relating
g both
the custom
mer and store dimension
d
tables to the new geography dimension table
e.

A hierarchyy exists, and thee dimension ta


able contains a small subset of data that m
may be changed
d
frequently. For example, consider
c
a pro
oduct dimensio
on that includees details of individual produ
ucts and
the product lines to which they belong. Storing prod
duct data and p
product line data in the sam
me table
may result in optimal que
ery performance, but if prod
duct line data cchanges frequently, you mayy want
to isolate th
hose changes from
f
the prod
duct data. By faactoring out th
he product line
e data into a sseparate
ension
table, you can
c ensure tha
at frequent cha
anges to produ
uct line data d
dont affect the
e product dime
table.

MCT USE ONLY. STUDENT USE PROHIBITED

3-12 Designing and Implementing a Data Warehouse

A sparse dimension has several different subtypes. For example, consider a dimension table for
products in an organization that sells many different kinds of products. Some products will include
an attribute such as size or color that may not be applicable for some other products. This can result
in a table that contains many null values (commonly described as being sparse). You can reduce
sparseness by creating a generic product dimension table that includes the core attributes that all
products share, and then create a type-specific dimension table for each individual kind of product,
with a relationship to the core product table.

Multiple fact tables of varying grain reference different levels in the dimension hierarchy. For example,
consider a data warehouse that includes a dimension table for salesperson. The salesperson table
may include details of the store at which the salesperson is employed. However, you may want to
create a fact table that includes measures at the individual salesperson grain, and a second fact table
that includes measures at the store grain. In this case, it makes sense to use separate tables for the
salesperson and store dimensions, with a relationship from salesperson to store to enable a reporting
hierarchy that includes both store and salesperson levels.

Time
T
Dimensions

Most
M
data reporting and anallysis includes a temporal asp
pect. For exam ple, it is comm
mon to aggreg
gate
sa
ales over time periods such as
a months, qua
arters, and yeaars. To ensure consistency when comparing
measures
m
across time, most data warehouse
es include a tim
me dimension table.
When
W
you creatte a time dime
ension table, co
onsider the fo llowing guida nce:

MCT USE ONLY. STUDENT USE PROHIBITED

10777A: Im
mplementing a Data Warehouse with Miccrosoft SQL Server 20012

3-13

Define the time


t
dimension
n at a suitable level of granu
ularity. You can
n define a time
e dimension taable at
the unit of time you need
d to record me
easures, from m
milliseconds to
o centuries. Ho
owever, the low
wer the
granularity that you choo
ose, the greate
er the number of dimension members (row
ws) you will ne
eed in
the time dimension table
e. In most busin
ness scenarios , a time dimen
nsion that inclu
udes a membe
er for
each day is appropriate. However,
H
some businesses w
where real-tim e analysis is im
mportant migh
ht
choose to record
r
events by
b the hour orr even by the m
minute, and so
ome others maay only need to
record mea
asures at weekly or monthly intervals.

Include tem
mporal hierarch
hies. Most business reporting
g and analysis involves drillin
ng down throu
ugh
summarized
d levels of datafor example, viewing salees by year, and
d then drilling into a specificc
quarter, and then drilling
g down into an
n individual mo
onth. Your tim
me dimension ttable should in
nclude
attributes fo
or each hierarchy level at wh
hich your userrs need to sum
mmarize the me
easures.

Include bussiness-specific temporal


t
attrib
butes. Some teemporal attribu
utes, such as d
days, months, aand
calendar ye
ears, are widelyy used across many
m
businessses, but some o
others may be
e specific to yo
our own
business. Fo
or example, yo
our organizatio
on may use a ffiscal year thatt runs from July to June, or tthere
may be spe
ecific public ho
olidays that your organizatio
on recognizes. Your time dim
mension table sshould
include the
ese business-sp
pecific attributes.

MCT USE ONLY. STUDENT USE PROHIBITED

3-14 Designing and Implementing a Data Warehouse

Consider how you will populate the time dimension table. Unlike most other tables in a data
warehouse, time dimension tables are not usually populated with data that has been extracted from
a source system. Generally, the data warehouse developer populates the time dimension table with
rows at the appropriate granularity. These rows usually consist of a numeric primary key that is
derived from the temporal value (for example 20110101 for January 1, 2011) and a column for each
dimension attribute (such as the date, day of year, day name, month of year, month name, year,
and so on). To generate the rows for the time dimension table, you can use one of the following
techniques:

Create a Transact-SQL script. Transact-SQL includes many date and time functions that you can
use in a loop construct to generate the required attribute values for a sequence of time intervals.
The following Transact-SQL functions are commonly used to calculate date and time values:

DATEPART (datepart, date) returns the numerical part of a date, such as the weekday
number, day of month, month of year, and so on.

DATENAME (datepart, date) returns the string name of a part of the date, such as the
weekday name or month name.

MONTH (date) returns the month number of the year for a given date.

YEAR (date) returns the year for a given date.

Use Microsoft Excel. Excel includes several functions that you can use to create formulas for
date and time values. You can then use the auto-fill functionality in Excel to quickly create a large
table of values for a sequence of time intervals.

Use a business intelligence (BI) tool to autogenerate a time dimension table. Some BI tools include
time dimension generation functionality that you can use to quickly create a time dimension
table.

Regardless of the technique that you use to populate the time dimension table, you must choose an
appropriate start and end point for the sequence of time intervals stored in the table. If necessary,
you must also consider how you will extend the range of time values stored in the table in the future.

Demonstra
D
ation: Implementing a Data W
Warehouse

X Task 1: Crreate dimension and facct tables

MCT USE ONLY. STUDENT USE PROHIBITED

10777A: Im
mplementing a Data Warehouse with Miccrosoft SQL Server 20012

3-15

1..

Ensure thatt the MIA-DC1


1 and MIA-SQLLBI virtual macchines are both
h running, and
d then log on tto
MIA-SQLBI as ADVENTU
UREWORKS\Sttudent with th
he password P
Pa$$w0rd.

2..

Start SQL Server Management Studio and


a when prom
mpted, connecct to the localh
host database
e engine
instance using Windows authentication
a
n.

3..

Open the CreateDW.sql


C
script file in th
he D:\10777A\\Demofiles\Mo
od03 folder.

4..

Select the code


c
under the
e comment CR
REATE A DATA
ABASE, and th
hen click Execcute.
This createss the database
e for the demo
onstration.

5..

Select the code


c
under the
e comment CR
REATE DIMEN
NSION TABLES
S, and then cliick Execute.
This createss DimProductt, DimGeography, DimCusttomer, DimSa
alesperson, an
nd DimDate
dimension tables. Note th
hat the DimPrroduct and DiimDate dimen
nsions are designed for a staar
schema, bu
ut that the Dim
mCustomer an
nd DimSalespe
erson tables h
have been den
normalized into
oa
snowflake schema
s
to crea
ate a shared DimGeography
D
y subdimensio
on table.

6..

Select the code


c
under the
e comment CR
REATE A FACT
T TABLE, and then click Exe
ecute.

he fact table in
This createss a fact table named
n
FactSallesOrders. No
ote that the priimary key of th
ncludes
all dimensio
on foreign keyys and also the
e degenerate d
dimensions forr the order num
mber and line item
number. Also note that th
he grain for th
his table is defi ned at the linee item level.
7..

In Object Exxplorer, expan


nd Databases, expand Demo
oDW, and theen expand Tab
bles. Note thatt the
database co
ontains five dimension table
es and one factt table.

8..

Right-click the Database


e Diagrams folder, and then
n click New Da
atabase Diagrram. If you are
e
prompted to
t create the required
r
suppo
ort objects, clicck Yes. Then, in the Add Tab
ble dialog boxx, click
each table while
w
holding down the Ctrll key to select them all, clickk Add, and the
en click Close.

9.

In the database diagram, click each table while holding down the Ctrl key to select them all, and on
the toolbar, in the Table View drop-down list, click Standard. Then arrange the tables and adjust the
zoom level so that you can see the entire database schema, and then examine the tables, noting the
columns that they contain.

10. Save the diagram as DemoDW Schema.

X Task 2: Populate a time dimension table


1.

MCT USE ONLY. STUDENT USE PROHIBITED

3-16 Designing and Implementing a Data Warehouse

Return to the CreateDW.sql query tab and select the code under the comment POPULATE THE
TIME DIMENSION TABLE, and then click Execute.

This code performs a loop that generates the required dimension attributes for the DimDate table for
a range of dates between 2005 and the current date.
2.

When the query has completed, in Object Explorer, expand Databases, expand DemoDW, expand
Tables, right-click DimDate, and then click Select Top 1000 Rows.
Note that the script has done its work and populated the table with a sequence of dates.

3.

Close SQL Server Management Studio without saving any changes.

Lesson
n2

Physical Design for a Data Wareho


ouse

MCT USE ONLY. STUDENT USE PROHIBITED

10777A: Im
mplementing a Data Warehouse with Miccrosoft SQL Server 20012

3-17

After you have designed the logical


l
schema
a for a data waarehouse, you must consider the physical design
off the data ware
ehouse. The ch
hoices that you make aboutt how the dataa is physically sstored and indexed
ha
ave a significant effect on th
he performance, scalability, aand manageab
bility of your d
data warehouse
e.
Th
herefore, it is important to understand
u
the
e implications of each aspectt of the physiccal design and adopt
proven best pra
actices that will optimize the
e effectiveness of your data w
warehouse.
Th
his lesson desccribes best practices for the physical desig n of a data waarehouse. After completing tthis
le
esson, you will be able to:

Describe th
he best practice
es for the physsical placemen
nt of data in a data warehou
use.

Design effe
ective indexes for
f a data warehouse.

Design an effective
e
partittioning strateg
gy for a data w
warehouse.

Use data co
ompression efffectively in a data
d
warehousee.

Physical Data
a Placeme
ent

MCT USE ONLY. STUDENT USE PROHIBITED

3-18 Designingg and Implementing a Data Warehouse

The physical place


ement of data on a storage device
d
has a siignificant effecct on the perfo
ormance of a d
data
warehouse. Although the speciffic details of ho
ow the data is allocated to p
physical storag
ge media can vvary
from
m one hardware solution to another, there
e are some gen
neral principlees that you sho
ould consider w
when
plan
nning the physsical storage fo
or your data warehouse:
w

Distribute datta across physiical devices. Th


he Microsoft SQ
QL Server qu
uery engine caan take advanttage
of parallel thrread processin
ng when readin
ng data from m
multiple physiccal devices. Th
his can significaantly
reduce the tim
me taken to re
etrieve data for a query. In m
most data wareehouses, the acctual distributiion of
the data acro
oss physical devvices is accomplished throug
gh a redundan
nt array of inde
ependent diskks
(RAID) storag
ge solution, usu
ually in a stora
age area netwo
ork (SAN) or a dedicated sto
orage server. In
n the
absence of a RAID solution,, you can acco
omplish the ph
hysical distribution of data by creating multiple
filegroups tha
at span physical disks and allocating the faact and dimen
nsion tables in the data ware
ehouse
to those filegroups.
When using a RAID solution
n, you must co
onsider the RA
AID level that p
provides the be
est balance off cost,
performance,, and reliabilityy for your speccific needs. Thee following listt shows some commonly use
ed
RAID levels:

RAID 0 (d
disk striping) provides
p
high performance,
p
b
but no data reedundancy, so in the event o
of a
disk failure, the data be
ecomes unavailable.

disk mirroring)) provides a full redundant ccopy of the datta, but does not enable the query
RAID 1 (d
processor to read data from multiple
e disks simulta neously.

RAID 5 (d
disk striping with parity) can provide a low
w-cost solution
n that combine
es data redund
dancy
with physical distributio
on across drive
es.

RAID 10 (a combination of disk stripiing and mirrorring) provides a highly robust, highperforma
ance solution.

MCT USE ONLY. STUDENT USE PROHIBITED

10777A: Implementing a Data Warehouse with Microsoft SQL Server 2012

3-19

In addition to distributing the data across physical disks, you should also consider allocating large fact
tables to dedicated filegroups.

Separate log files from data files. Although data warehouses have fewer logged transactions than an
OLTP database, the recognized best practice for any database is to separate log files from data files
to prevent them from competing for disk resources. You should apply this best practice to your data
warehouse.

Separate workspace objects from data. Some data warehouses include objects that are dropped
and re-created, such as staging tables or temporary workspace tables. You should allocate these
to a separate filegroup from the fact and dimension data to reduce the risk of table and index
fragmentation in the data warehouse, which can cause deterioration in query performance over time.

Preallocate space and disable autogrow. It may seem prudent to allocate only the physical space that
you need when initially creating the data warehouse and have the space allocation grow dynamically
as needed. However, this can lead to index and table fragmentation, which has a negative effect on
query performance. A better approach is to preallocate the space that you think your data warehouse
will need when it is fully populated, and disable the autogrow feature for the database files. This helps
ensure that your data is stored in contiguous blocks on the physical disk, increasing the efficiency
with which the data can be read.

Ind
dexing

MCT USE ONLY. STUDENT USE PROHIBITED

3-20 Designingg and Implementing a Data Warehouse

The SQL Server qu


uery processorr uses indexes where approp
priate to reducce physical diskk read operations
and increase querry performance. You should therefore imp
plement appropriate indexess for the querie
es
thatt your data wa
arehouse must perform. The optimal indexxes for a speciffic set of tables can vary acro
oss
data
a warehouses, but there are some general best practicess that you should consider.

Dim
mension Tab
ble Indexes
Whe
en you are deffining indexes for a dimensio
on table, cons ider the follow
wing guidelines:

Create a noncclustered prim


mary key index on the surrog ate key colum
mn.

Create a clusttered index on


n the business (or alternativee) key column. This can imprrove data loading
performance for slowly cha
anging dimenssions where thee source busin
ness key must be looked up.

Create nonclu
ustered indexe
es on any columns that are ffrequently incl uded in a query filter.

Facct Table Indexes


Whe
en you are deffining indexes for a fact table, consider thee following gu
uidelines:

Most data wa
arehouse queries involve a tiime dimension
n, so you should generally co
onsider creatin
ng a
clustered inde
ex on the mosst commonly used
u
time dimeension key in tthe fact table. This is especiaally
true if the facct table is partiitioned by a tim
me dimension
n key so that yo
ou can align in
ndex partitioniing
with table partitioning.

Create additio
onal noncluste
ered indexes on
o individual d imension foreign-key colum
mns that are
frequently inccluded in a query.
Note The SQL Server query optimizer can identify sttar join queries between factt and
dimension ta
ables, and use a variety of te
echniques to in
ncrease the peerformance of this
specific kind
d of query. It is therefore imp
portant to havee appropriate indexes on dimension
key columnss for the optim
mizer to choose
e when selectin
on plan.
ng an executio

Columnstore Indexes

MCT USE ONLY. STUDENT USE PROHIBITED

10777A: Implementing a Data Warehouse with Microsoft SQL Server 2012

3-21

SQL Server 2012 supports columnstore indexes that are based on xVelocity in-memory technology.
Columnstore indexes consist of data pages that store data from each column in the index on a dedicated
set of pages. Creating a columnstore index on multiple columns in a fact table (or a large dimension table)
can significantly increase query performance.
To create a columnstore index, use the CREATE COLUMNSTORE INDEX statement as shown in the
following code example.
CREATE COLUMNSTORE INDEX cidx_FactSalesOrder ON FactSalesOrder
(CustomerKey,
SalesPersonKey,
ProductKey,
OrderDateKey,
OrderNo,
ItemNo,
Quantity,
Cost,
SalesAmount,
Shipping,
Discount)

The use of columnstore indexes can have a dramatically positive effect on performance, but you should
be aware of the following considerations:

You can only define one columnstore index per table.

Columnstore indexes are read-only, so you must drop and rebuild the index if you load the table with
new or updated data.

Columnstore indexes on a partitioned table must be partition aligned.

The base object for a columnstore index must be a table (you cannot use a columnstore index to
create an indexed view).
For More Information For more information on Columnstore Indexes for Fast Data
Warehouse Query Processing in SQL Server 11.0, see http://go.microsoft.com/fwlink
/?LinkID=246723.

Partitioning

MCT USE ONLY. STUDENT USE PROHIBITED

3-22 Designingg and Implementing a Data Warehouse

Parttitioning is a te
echnique that is used to splitt the data in a single table o
or index acrosss multiple
fileg
groups. When you create a data
d
warehousse, you should consider partiitioning large fact tables. In most
case
es, the table sh
hould be partittioned based on
o a date key ffield, such as aan order date in a sales orde
er fact
tablle.
Parttitioning a factt table can pro
ovide the follow
wing benefits:

Improved queery performancce. SQL Server 2012 support s partitioned ttable parallelissm, enabling th
he
query processsor to read data from multip
ple partitions ssimultaneouslyy.

Faster data lo
oading and delletion. In data warehouses
w
w
where a sliding window appro
oach to adding
g and
nto a new parttition and dele
archiving data is used, you can quickly load new data in
ete the data in
n the
oldest partitio
on.

Improved index manageabiility. You can manage


m
indexees at the partittion level, for e
example, rebuiilding
an index for one
o partition where
w
the data
a has been mo
odified withoutt having to reb
build indexes o
on
other partitio
ons.

Increased bacckup and restore flexibility. Th


he table is parrtitioned acros s multiple fileg
groups, so you
u can
manage disasster recovery for each filegro
oup separatelyy. This can redu
uce backup an
nd restore time
es for
data warehou
uses that includ
de extremely large tables wiith data that c hanges infrequently.

c
a partitiioned fact tablle, you must first create a paartition functio
on that definess the boundariies of
To create
the partitions. Forr example, the following Transact-SQL cod
de example creeates a partitio
on function thaat
defiines three parttitions: one forr values up to and
a including 20081231, on
ne for values higher than
20081231 up to 20091231,
2
and one for values higher than 20101231.
CR
REATE PARTITI
ION FUNCTION pf_OrderDate
eKey(int)
AS
S RANGE LEFT
FO
OR VALUES (20
0081231, 2009
91231)

MCT USE ONLY. STUDENT USE PROHIBITED

10777A: Implementing a Data Warehouse with Microsoft SQL Server 2012

3-23

After creating the partition function, you need to create a partition scheme that assigns each partition to a
filegroup, as shown in the following code example.
CREATE PARTITION SCHEME ps_OrderDateKey
AS PARTITION pf_OrderDateKey
TO (fg2008,fg2009,fg2010,fg2011)

Note that although the partition function defines three partitions, four filegroups are specified. The fourth
filegroup is an optional parameter that specifies the next filegroup to be used when a new partition is
added during a split operation.
Now you can create the partitioned table, as shown in the following code example.
CREATE TABLE [dbo].[FactSalesOrder]
(
[CustomerKey] [int] NOT NULL,
[ProductKey] [int] NOT NULL,
[OrderDateKey] [int] NOT NULL,
[OrderNo] [int] NOT NULL,
[LineNo] [int] NOT NULL,
[Quantity] [smallint] NULL,
[SalesAmount] [money] NULL,
CONSTRAINT [PK_ FactSalesOrder] PRIMARY KEY CLUSTERED
(
[CustomerKey],[ProductKey],[OrderDateKey],[OrderNo],[LineNo]
)
)
ON ps_OrderDateKey(OrderDateKey)

To implement a sliding window solution for adding and removing data from the fact table, you can use
the Transact-SQL statements in the following code example.
-- Create a new empty partition for new data
ALTER PARTITION FUNCTION pf_OrderDateKey()
SPLIT RANGE (20101231)
GO
-- Switch the first partition in the fact table into a temporary table on the same
filegroup
ALTER TABLE FactSalesOrder SWITCH PARTITION 1
TO FactSalesOrderTemp PARTITION 1
GO
-- Insert temp data into the archive data
INSERT INTO FactSalesOrderArchive
SELECT * FROM FactSalesOrderTemp
-- Merge the deleted partition to remove it from the filegroup
ALTER PARTITION FUNCTION pf_OrderDateKey()
MERGE RANGE (20081231)
GO
-- You can now drop the temp table and remove the fg2008 filegroup from which you
archived the
-- data

Da
ata Compre
ession

MCT USE ONLY. STUDENT USE PROHIBITED

3-24 Designingg and Implementing a Data Warehouse

SQLL Server 2012 supports


s
data compression, which can red
duce the physiccal storage req
quirements forr a
data
a warehouse and
a increase th
he performance of I/O-boun
nd queries (thaat is, queries where the main
n
botttleneck is the reading
r
of the
e data from dissk). The processsing overhead
d that is incurrred to handle tthe
com
mpression can reduce the performance of CPU-bound
C
qu
ueries (that is, queries that rrequire a lot off
proccessing logic) by up to 30 pe
ercent. In man
ny cases, the in
ncreased CPU o
overhead is a w
worthwhile traadeoff
for the
t improved I/O performan
nce and space savings.

Forrms of Data Compression


Data compression
n is available in
n two forms:

Row compression. Fixed-len


ngth columns are
a internally sstored as variaable-length columns, removiing
any unused bytes
b
from the data page.

Page compresssion. In additiion to applying


g row compreession to reducce the space re
equired by fixe
edlength columns, any duplicated values on
n a page are sttored only oncce.

Enabling Compression

MCT USE ONLY. STUDENT USE PROHIBITED

10777A: Implementing a Data Warehouse with Microsoft SQL Server 2012

3-25

You can enable compression for a table, an index, or a partition by specifying the DATA_COMPRESSION
keyword. For example, the Transact-SQL code in the following code example creates a table on a partition
function, specifying page compression for the first partition, row compression for the second partition,
and no compression for any other partitions.
CREATE TABLE [dbo].[FactSalesOrder]
(
[CustomerKey] [int] NOT NULL,
[ProductKey] [int] NOT NULL,
[OrderDateKey] [int] NOT NULL,
[OrderNo] [int] NOT NULL,
[LineNo] [int] NOT NULL,
[Quantity] [smallint] NULL,
[SalesAmount] [money] NULL,
CONSTRAINT [PK_ FactSalesOrder] PRIMARY KEY CLUSTERED
(
[CustomerKey],[ProductKey],[OrderDateKey],[OrderNo],[LineNo]
)
)
ON ps_OrderDateKey(OrderDateKey)
WITH
(
DATA_COMPRESSION = PAGE ON PARTITIONS (1),
DATA_COMPRESSION = ROW ON PARTITIONS (2),
)

Lab
b Scenario
o

MCT USE ONLY. STUDENT USE PROHIBITED

3-26 Designingg and Implementing a Data Warehouse

In th
his lab, you will start the devvelopment of the
t Adventuree Works data w
warehousing so
olution by creaating
the data warehou
use itself.
An existing
e
data warehouse
w
con
ntains a fact table for resellerr sales, and dim
mension table
es for resellers,
emp
ployees, and products.
p
This enables
e
users to
t query the d
data warehousee and analyze reseller sales b
by
rese
eller, sales reprresentative, and product. Ho
owever, Adventture Works alsso sells produccts directly to
individual custom
mers through an e-commerce
e Web site, and
d the companys executives would like to be
able
e to analyze th
hese Internet sa
ales as well as the reseller saales. To enablee this analysis, you must add a
dim
mension table for customers and
a a fact tablle for Internet sales. The exissting product d
dimension is a
conformed dimen
nsion that can be used by bo
oth the resellerr sales and Inteernet sales facct tables.

Afte
er you have ad
dded the required tables for Internet sales analysis, you m
must refactor tthe data warehouse
by normalizing
n
tw
wo of the star schema
s
dimen
nsions into sno
owflake dimenssions. Specificaally, you must::

Create a hiera
archy of tabless for the produ
uct dimension to separate product subcate
egories and
categories. Making
M
this hierarchy into a snowflake
s
dimeension helps issolate any chaanges to produ
uct
subcategory or
o category records from the
e product dim
mension table.

Create a subd
dimension table for geograp
phic data that iis shared by th
he customer an
nd reseller
dimension tables.

ally, the executtives at Adventture Works wa


ant to be able to analyze resseller and Interrnet sales by o
order
Fina
date
e and ship date, so you need
d to create a tiime dimension
n table and po
opulate it with temporal dataa. In
add
dition to suppo
orting calendarr dates, the tim
me dimension table must ind
dicate fiscal ye
ears and quarte
ers for
the Adventure Wo
orks fiscal yearr, which runs from July to Ju ne.

Lab 3:
3 Implementing a Datta Ware
ehouse SSchema
a

Exercise 1: Implemen
nting a Sta
ar Schema
Scenario
Adventure Works Cycles requ
uires a data wa
arehouse to en
nable information workers an
nd executives tto
crreate reports and
a perform an
nalysis of key business
b
meassures. The com
mpany has iden
ntified two setss of
re
elated measure
es that it wants to include in fact tablesssales order meeasures that relate to sales to
o
re
esellers, and sa
ales order measures that rela
ate to Internet sales. These m
measures will b
be aggregated by
product, reseller (in the case of
o reseller sales), and custom
mer (in the casee of Internet saales) dimensio
ons.

MCT USE ONLY. STUDENT USE PROHIBITED

10777A: Im
mplementing a Data Warehouse with Miccrosoft SQL Server 20012

3-27

Th
he data warehouse has been
n partially com
mpleted, and yo
ou must now aadd the necesssary dimension
n and
fa
act tables to co
omplete a star schema.
Th
he main tasks for this exercisse are as follow
ws:
1..

Prepare the
e lab environm
ment.

2..

View the AW
WDataWareh
house databasse.

3..

Create a dimension table


e for customerss.

4..

Create a facct table for Intternet sales.

5..

View the da
atabase schem
ma.

X Task 1: Prrepare the la


ab environm
ment

Ensure thatt the MIA-DC1


1 and MIA-SQLLBI virtual macchines are both
h running, and
d then log on tto
MIA-SQLBI as ADVENTU
UREWORKS\Sttudent with th
he password P
Pa$$w0rd.

Run the Settup Windows Command Scrript file (Setup


p.cmd) in the D
D:\10777A\Lab
bfiles\Lab03\Sttarter
folder as Ad
dministrator.

X Task 2: View the AWDataWarehouse database

MCT USE ONLY. STUDENT USE PROHIBITED

3-28 Designing and Implementing a Data Warehouse

Start SQL Server Management Studio and connect to the localhost instance of the SQL Server
database engine by using Windows authentication.

Create a new database diagram in the AWDataWarehouse database (creating the required objects
to support database diagrams if prompted). The diagram should include all of the tables in the
database.

In the database diagram, modify the tables so that they are shown in Standard view, and arrange
them so that you can view the partially complete data warehouse schema, which should look similar
to the following diagram.

X Task 3: Create a dimension table for customers

MCT USE ONLY. STUDENT USE PROHIBITED

10777A: Implementing a Data Warehouse with Microsoft SQL Server 2012

3-29

Create a new table named DimCustomer in the AWDataWarehouse database. The table should be
based on the following diagram.

If preferred, you can use the DimCustomer.sql Transact-SQL query file in


D:\10777A\Labfiles\Lab03\Starter to create the table.

X Task 4: Create a fact table for Internet sales

MCT USE ONLY. STUDENT USE PROHIBITED

3-30 Designing and Implementing a Data Warehouse

Create a new table named FactInternetSales in the AWDataWarehouse database, with foreign-key
references to the DimCustomer and DimProduct tables. The table should be based on the following
diagram.

If preferred, you can use the FactInternetSales.sql Transact-SQL query file in


D:\10777A\Labfiles\Lab03\Starter to create the table.

X Task 5: View the database schema

Add the tables that you have created in this exercise to the database diagram that you created.

Note that when adding tables to a diagram, you need to click Refresh in the Add table dialog
box to see tables you have created or modified since the diagram was initially created.

Save the database diagram as AWDataWarehouse Schema.

Keep SQL Server Management Studio open for the next exercise.

Results: After this exercise, you should have a database diagram in the AWDataWarehouse database
that shows a star schema that consists of two fact tables (FactResellerSales and FactInternetSales) and
four dimension tables (DimReseller, DimEmployee, DimProduct, and DimCustomer).

Exercise 2: Implementing a Snowflake Schema


Scenario

MCT USE ONLY. STUDENT USE PROHIBITED

10777A: Implementing a Data Warehouse with Microsoft SQL Server 2012

3-31

Having created a star schema, you have identified two dimensions that would benefit from being
normalized to create a snowflake schema. Specifically, you want to create a hierarchy of related tables for
product category, product subcategory, and product, and you want to create a separate geography
dimension table that can be shared between the reseller and customer dimensions.
The main tasks for this exercise are as follows:
1.

Create product category and product subcategory dimension tables.

2.

Create a geography dimension table.

3.

View the database schema.

X Task 1: Create product category and product subcategory dimension tables

In the AWDataWarehouse database, create new tables named DimProductCategory and


DimProductSubcategory, and modify the existing DimProduct table to create a hierarchy of related
tables as shown here.

If preferred, you can use the DimProductCategory.sql Transact-SQL query file in


D:\10777A\Labfiles\Lab03\Starter to create the tables.

X Task 2: Create a geography dimension table

In the AWDataWarehouse database, create a new table named DimGeography, and modify the
existing DimCustomer and DimReseller tables to create a shared subdimension as shown here.

If preferred, you can use the DimGeography.sql Transact-SQL query file in


D:\10777A\Labfiles\Lab03\Starter to create the tables.

X Task 3: View the database schema

MCT USE ONLY. STUDENT USE PROHIBITED

3-32 Designing and Implementing a Data Warehouse

Delete the tables that you modified in the previous two tasks from the AWDataWarehouse Schema
diagram (DimProduct, DimReseller, and DimCustomer).

Add the new and modified tables that you created in this exercise to the AWDataWarehouse
Schema diagram and view the revised data warehouse schema, which now includes some snowflake
dimensions. You will need to refresh the list of tables when adding tables and you may be prompted
to update the diagram to reflect foreign-key relationships.

Results: After this exercise, you should have a database diagram in the AWDataWarehouse database
that shows a snowflake schema that contains a dimension consisting of a DimProduct,
DimProductSubcategory, and DimProductCategory hierarchy of tables, and a DimGeography
dimension table that is referenced by the DimCustomer and DimReseller dimension tables.

Exercise 3: Implementing a Time Dimension Table


Scenario

MCT USE ONLY. STUDENT USE PROHIBITED

10777A: Implementing a Data Warehouse with Microsoft SQL Server 2012

3-33

The schema for the Adventure Works data warehouse now contains two fact tables and several dimension
tables. However, users need to be able to analyze the measures in the fact table across consistent time
periods. To enable this, you must create a time dimension table.
Users will need to be able to aggregate measures across calendar years (which run from January to
December) and fiscal years (which run from July to June). Your time dimension must include the following
attributes:

Date (this should be the business key)

Day number of week (for example 1 for Sunday, 2 for Monday, and so on)

Day name of week (for example Sunday, Monday, Tuesday, and so on)

Day number of month

Day number of year

Week number of year

Month name (for example, January, February, and so on)

Month number of year (for example, 1 for January, 2 for February, and so on)

Calendar quarter (for example, 1 for dates in January, February, and March)

Calendar year

Calendar semester (for example, 1 for dates between January and June)

Fiscal quarter (for example, 1 for dates in July, August, and September)

Fiscal year

Fiscal semester (for example, 1 for dates between July and December)

The main tasks for this exercise are as follows:


1.

Create a time dimension table.

2.

View the database schema.

3.

Populate the time dimension table.

4.

View the time dimension data.

X Task 1: Create a time dimension table

In the AWDataWarehouse database, create a new table named DimDate that contains the required
time dimension attributes.

Add foreign-key columns in the FactInternetSales and FactResellerSales tables to relate sales order
dates and sales ship dates to the DimDate table. Leave the existing OrderDate and ShipDate
columns in the fact tables as degenerate dimensions.

Create nonclustered indexes on the foreign-key columns that you have added to the fact tables.

MCT USE ONLY. STUDENT USE PROHIBITED

3-34 Designing and Implementing a Data Warehouse

If preferred, you can use the DimDate.sql Transact-SQL query file in D:\10777A\Labfiles\Lab03\Starter
to create the time dimension table and modify the fact tables.

Your completed DimDate table should match the following diagram.

X Task 2: View the database schema

Delete the tables that you modified in the previous task from the AWDataWarehouse Schema
diagram (FactInternetSales and FactResellerSales).

Add the new and modified tables that you created in this exercise to the AWDataWarehouse
Schema diagram and view the revised data warehouse schema, which now includes a time
dimension.

X Task 3: Populate the time dimension table

Populate the table with appropriate values for a date range spanning from January 1, 2000 to the
current date. You can create a Transact-SQL script to do this, or you can use Excel if you wish.

If preferred, you can use the GenerateDates.sql Transact-SQL query file in


D:\10777A\Labfiles\Lab03\Starter to populate the time dimension table.

X Task 4: View the time dimension data

Query the DimDate table to verify that it contains temporal values.

Results: After this exercise, you should have a database that contains a DimDate dimension table that is
populated with date values from January 1, 2000 to the current date.

Modu
ule Reviiew and
d Takeaw
ways

Review
R
Quesstions

MCT USE ONLY. STUDENT USE PROHIBITED

10777A: Im
mplementing a Data Warehouse with Miccrosoft SQL Server 20012

3-35

1..

Why should
d you favor a star
s schema th
hat has denorm
malized dimension tables ove
e
er a snowflake
schema where the dimen
nsions are impllemented as m
multiple related
d tables?

2..

What is the
e grain of a facct table, and why
w is it importtant?

3..

If your data
a warehouse in
ncludes staging
g tables, why sshould you allo
o a different file
egroup
ocate them to
than the file
egroup for datta tables?

MCT USE ONLY. STUDENT USE PROHIBITED

MCT USE ONLY. STUDENT USE PROHIBITED


4-1

Module 4
Creating an ETL Solution with SSIS
Contents:
Lesson 1: Introduction to ETL with SSIS

4-3

Lesson 2: Exploring Source Data

4-10

Lesson 3: Implementing Data Flow

4-21

Lab 4: Implementing Data Flow in an SSIS Package

4-38

Creating ann ETL Solution with SSIS

Module Overrview

MCT USE ONLY. STUDENT USE PROHIBITED

4-2

Succcessful data warehousing so


olutions rely on
n the efficient and accurate ttransfer of datta from the varrious
data
a sources in th
he business to the
t data warehouse. This traansfer of data is referred to aas an extract,
tran
nsform, and loa
ad (ETL) proce
ess, and implem
menting ETL iss a core skill th
hat is required in any data
warehousing project.
Thiss module discu
usses considera
ations for implementing an ETL process, a nd then focuses on Microso
oft
SQLL Server Integ
gration Service
es (SSIS) as a platform
p
for bu
uilding ETL solutions.
Afte
er completing this module, you
y will be able to:

Describe the key features of


o SSIS.

Explore sourcce data for an ETL solution.

Implement a data flow by using


u
SSIS.

MCT USE ONLY. STUDENT USE PROHIBITED

10777A: Implementing a Datta Warehouse with M


Microsoft SQL Server 2012

Lesson
n1

Introd
duction
n to ETLL with SS
SIS

4-3

While
W
you can implement an ETL solution by
b using severaal tools and teechnologies, SSSIS is the primaary ETL
to
ool for SQL Serrver. Before yo
ou use it to imp
plement an ETTL solution, it is important to
o understand ssome of
itss key features and compone
ents.
Th
his lesson desccribes options for implementting an ETL so
olution, and theen introduces SSIS.
After completin
ng this lesson, you
y will be able to:

Describe th
he options for ETL.

Describe th
he key featuress of SSIS.

Describe th
he high-level architecture of an SSIS projecct.

Identify keyy elements of the


t SSIS design environmentt.

Describe strategies for up


pgrading SSIS solutions from
m previous verssions of SQL Se
erver.

Creating ann ETL Solution with SSIS

Op
ptions for ETL
E

MCT USE ONLY. STUDENT USE PROHIBITED

4-4

An ETL solution generally involvves the transfe


er of data from
m one or more data sources tto a destinatio
on
ofte
en transforming the data stru
ucture and values in the pro
ocess. There aree several toolss and technolo
ogies
thatt you can use to
t accomplish this process, each
e
of which has specific strengths and w
weaknesses thaat you
should take into account
a
when you are choossing the appro
oach for your EETL solution.
The following list describes som
me common techniques for im
mplementing an ETL solutio
on:

SQL Server Integration Servvices. This is the


e primary platfform for ETL s olutions that aare provided w
with
SQL Server, and generally provides
p
the most
m
flexible waay to implemeent an enterpriise ETL solution.

The Import an
nd Export Data
a Wizard. This wizard is inclu
uded with the SQL Server maanagement too
ols,
and provides a simple way to create an SSIS-based dataa transfer solu
ution. You shou
uld consider using
the Import an
nd Export Data
a Wizard when
n your ETL solu
ution requires only a few, sim
mple data tran
nsfers
that do not in
nclude any com
mplex transforrmations in thee data flow.

Transact-SQLL. Transact-SQLL is a powerfull language thaat can enable yyou to implem


ment complex d
data
transformatio
ons while extra
acting, inserting, or modifyin
ng data. Most EETL solutions iinclude some
Transact-SQLL logic combined with other technologies. In some scenaarios, such as w
when data sou
urces
and destinatio
ons are co-loccated, you can implement a ccomplete ETL solution by ussing only TranssactSQL queries.

The bcp utilityy. This utility provides


p
an intterface that is based on the ccommand line
e for extracting
g data
from, and inserting data intto, SQL Server.. The bcp utilitty provides a vversatile tool that you can usse to
create schedu
uled data extra
actions and inssertions, but itts relatively co mplex syntax aand console-b
based
operation ma
ake it difficult to
t create a ma
anageable, entterprise-scale EETL solution byy using the bccp
utility alone.

Replication. SQL Server includes built-in replication


r
fun ctionality thatt you can use tto synchronize
e data
across SQL Se
erver instancess. You can also
o include otherr relational datta sources such as Oracle
databases in a replication solution. Repliccation is a suitaable technolog
gy for ETL whe
en all data sou
urces
are supported
d in a replication topology, and
a the data rrequires minim
mal transformations.

MCT USE ONLY. STUDENT USE PROHIBITED

10777A: Implementing a Datta Warehouse with M


Microsoft SQL Server 2012

What
W
Is SSIIS?

4-5

SS
SIS is an extensible platform for building complex
c
ETL so
olutions. SSIS i s included with SQL Server aand
co
onsists of a Windows service that manag
ges the execut ion of ETL worrkflows, and se
everal tools an
nd
co
omponents forr developing those workflow
ws. The SSIS se rvice is installeed when you sselect Integrattion
Se
ervices on the
e Feature Sele
ection page off the SQL Serveer setup wizar d.
Note Aftter you have in
nstalled SSIS, you
y can use th
he DCOM Conffiguration tool
(Dcomcnfg
g.exe) to grantt permission to
o specific user s to use the SQ
QL Server Integ
gration
Services 11
1.0 service.

Th
he SSIS Windo
ows service is primarily
p
a con
ntrol flow engin
ne that manag
ges the executtion of task wo
orkflows.
Ta
ask workflows are defined in
n packages, wh
hich you can e xecute on dem
mand or at sch
heduled times. When
yo
ou are develop
ping an SSIS pa
ackage, the task workflow iss referred to ass the control flo
ow of the packkage.

A control flow can


c include a special
s
type off task to perforrm data flow o
operations. SSIS executes the
ese Data
Flow tasks by ussing a data flo
ow engine thatt encapsulates the data flow in a pipeline aarchitecture. Each
sttep in the Data
a Flow task ope
erates in seque
ence on a row
wset of data as it passes throu
ugh the pipeline. The
da
ata flow engin
ne uses buffers to optimize th
he rate of flow
w for the data tthat is passing
g through the
piipeline, resulting in a high-p
performance so
olution for ext racting, transfforming, and lo
oading data.
In
n addition to th
he SSIS Windo
ows service, SSIS includes:

SSIS Design
ner. A graphica
al design interfface for develo
oping SSIS solu
utions in the M
Microsoft Visuaal
Studio de
evelopment en
nvironment. Tyypically, you sttart the SQL Seerver Data Too
ols application to
access this environment.

Wizards. Grraphical utilitie


es that you can
n use to quicklly create, conffigure, and dep
ploy SSIS soluttions.

Command--line tools. Utilities that you can


c use to man
nage and execcute SSIS packkages.

Creating ann ETL Solution with SSIS

SSIIS Projectss and Pack


kages

MCT USE ONLY. STUDENT USE PROHIBITED

4-6

An SSIS
S
solution usually
u
consistss of one or mo
ore SSIS projeccts, each contaaining one or m
more SSIS packkages.

SSIS Projects

In SQL Server 201


12, a project is the unit of deployment for SSIS solutions.. You can defin
ne project-leve
el
para
ameters to ena
able users to specify run-tim
me settings, and
d project-level connection m
managers that
refe
erence data sources and desttinations used in package daata flows. You can then deplloy projects to
o an
SSIS
S catalog in a SQL
S Server insttance, and con
nfigure projectt-level parameeter values and
d connections as
app
propriate for exxecution enviro
onments. You can use SQL SServer Data To
ools to create, debug, and de
eploy
SSIS
S projects.

SSIS Packages

A project containss one or more packages, eacch defining a w


workflow of tasks to be execcuted. The worrkflow
of ta
asks in a packa
age is referred
d to as its contrrol flow. A pacckage control fflow can includ
de one or morre
Data Flow task, ea
ach of which encapsulates
e
itss own data flo
ow pipeline. Paackages can include package
e-level
para
ameters so tha
at dynamic values can be passed to the paackage at run ttime.
In previous
p
releases of SSIS, dep
ployment was managed at th
he package levvel. In SQL Serrver 2012, you can
still deploy individ
dual packages in a package deployment m
model.
Note Deplloyment of SSIIS solutions is discussed in m
more detail in M
Module 12, D
Deploying
and Configu
uring SSIS Pack
kages.

MCT USE ONLY. STUDENT USE PROHIBITED

10777A: Implementing a Datta Warehouse with M


Microsoft SQL Server 2012

The
T SSIS De
esign Environment

4-7

Yo
ou can use SQ
QL Server Data Tools to develop SSIS projeccts and packag
ges. SQL Serve
er Data Tools iss based
on
n Microsoft Visual Studio an
nd provides a graphical
g
deveelopment envirronment for business intellig
gence
(B
BI) solutions. When
W
you creatte an Integratiion Services prroject, the design environme
ent includes th
he
fo
ollowing eleme
ents:

Solution Exp
plorer. A pane in the SQL Server Data Too ls user interfacce where you ccan create and
d view
project-leve
el resources, in
ncluding param
meters, packag
ges, data conn
nection manag
gers, and otherr shared
objects. A solution
s
can co
ontain multiple
e projects, in w
which case eacch project is sh
hown in Solutio
on
Explorer.

The Propertties pane. A pa


ane in the SQLL Server Data TTools user interface that you can use to vie
ew and
edit the pro
operties of the
e currently sele
ected object.

The Control Flow design surface.


s
A grap
phical design ssurface in SSIS Designer whe
ere you can de
efine the
workflow of
o tasks for the control flow of
o a package.

The Data Flow


F
tab. A gra
aphical design surface in SSI S Designer wh
here you can d
define the pipe
eline for
a Data Flow
w task within a package.

The Event Handlers


H
tab. A graphical design surface iin SSIS Designer where you can define the
e
workflow fo
or an event handler within a package.

Package Exxplorer. A tree view


v
of the components wit hin a packagee.

The Connecction Managerrs pane. A list of


o the connecttion managerss used in a pacckage.

Creating an ETL Solution with SSIS

MCT USE ONLY. STUDENT USE PROHIBITED

4-8

The Variables pane. A list of variables used in a package. You can display this pane by clicking the
Variables button at the upper right of the design surface.

The SSIS Toolbox pane. A collection of components that you can add to a package control flow or
data flow. You can display this pane by clicking the SSIS Toolbox button at the upper right of the
design surface or by clicking SSIS Toolbox on the SSIS menu. Note that this pane is distinct from the
standard Visual Studio Toolbox pane.

MCT USE ONLY. STUDENT USE PROHIBITED

10777A: Implementing a Datta Warehouse with M


Microsoft SQL Server 2012

Upgrading
U
from Prev
vious Versions

If you have developed ETL solutions by usin


ng SSIS in SQL Server 2005 o
or 2008, or by using Data
Trransformation Services (DTS)) solutions in SQL
S Server 20000, you should
d consider how
w you will inclu
ude
th
hem in a SQL Server
S
2012 SS
SIS solution.

SQL Server 2000


2
DTS Pa
ackages

4-9

Th
here is no dire
ect upgrade pa
ath for DTS pacckages to SQLL Server 2012 SSSIS packages, and you cann
not run
a DTS package in the SQL Serrver 2012 SSIS runtime engin
ne. To upgradee a DTS-based
d solution to w
work
with
w SQL Serverr 2012, you mu
ust re-create th
he solution byy using the lateest SSIS tools aand componen
nts, or
usse the SSIS Pacckage Migratio
on Wizard in SQL
S Server 20005 or 2008 to p
perform an intterim upgrade
e of the
DTS package to
o SQL Server 20
005 or 2008 fo
ormat, and theen upgrade to the SQL Serve
er 2012 formatt.

SQL Server 2005


2
or 2008
8 SSIS Packa
ages

Yo
ou can run SSIS packages that were built by
b using SQL SServer 2005 orr SQL Server 20
008 in the SQLL Server
20
012 SSIS runtim
me engine by using the DTS
SEXEC tool. Ho
owever, you will not be able to take advanttage of
th
he new projectt-level deploym
ment to the SS
SIS catalog cap
pabilities of SQ
QL Server 2012
2. To upgrade SSSIS
pa
ackages that were
w
built by using
u
SQL Server 2005 or SQ L Server 2008 to the SQL Se
erver 2012 form
mat, use
th
he SSIS Packag
ge Migration Wizard
W
in SQL Server
S
2012.

Scripts

SS
SIS packages can
c include scrript tasks to pe
erform custom
m actions. In preevious release
es of SSIS, you could
im
mplement scrip
pted actions byy including a Microsoft
M
ActivveX Script taask (written in Microsoft Visu
ual
Ba
asic Scripting
g Edition (VBS
Script)) or a Scrript task (writteen for the .NETT Visual Studio
o for Applications, or
VSA runtime) in
n a control flow
w. In SQL Serve
er 2012, the AcctiveX Script taask is no longe
er supported, and any
VBScript-based custom logic must be replaced. In additio
on, the SQL Server 2012 Script task uses th
he Visual
Sttudio Tools forr Applications (VSTA) runtim
me, which diffe rs in some dettails from the V
VSA runtime that was
ussed in previous releases. Wh
hen you use the
e SSIS Packagee Migration W
Wizard to upgraade a package
e that
in
ncludes a Scrip
pt component, the script is au
utomatically u
updated for thee VSTA runtim
me.

Lesson 2

Explorring Sou
urce Datta

MCT USE ONLY. STUDENT USE PROHIBITED

4-10 Creating an
a ETL Solution with SSIS

Now
w that you und
derstand the basic
b
architectu
ure of SSIS, you
u can start to plan the data flows in your EETL
solu
ution. Howeverr, before you start
s
implemen
nting an ETL p rocess, you should explore tthe existing daata in
the sources that your
y
solution will
w use. By gaining a thoroug
gh knowledgee of the source
e data on whicch
your ETL solution will be based, you can desig
gn the most efffective SSIS daata flows for trransferring the
e data
and anticipate data quality issue
es that you ma
ay need to res olve in your SSSIS packages.
Thiss lesson discusses the value of
o exploring so
ource data, and
d describes techniques for e
examining and
d
proffiling source data.
Afte
er completing this lesson, yo
ou will be able to:

Describe the value of explo


oring source da
ata.

Examine an extract
e
of data from a data so
ource.

Profile source
e data by using
g the Data Pro
ofiling SSIS taskk.

Why
W Explore Source Data?

MCT USE ONLY. STUDENT USE PROHIBITED

10777A: Im
mplementing a Data Warehouse with Miccrosoft SQL Server 20012

4-11

Th
he design and integrity of yo
our data wareh
house ultimateely rely on thee data that it co
ontains. Before
e you
ca
an design an appropriate
a
ETL process to populate the daata warehousee, you must haave a thorough
h
kn
nowledge of th
he source data
a that your solution will conssume.
Sp
pecifically, you
u need to unde
erstand:

The businesss entities thatt are representted by the sou


urce data, and their attribute
es. For example
e, the
specific attrributes that fully describe a product
p
or a ccustomer entityy may be store
ed in multiple
columns, ta
ables, or even databases
d
acro
oss the organizzation.

How to inte
erpret data vallues and codess. For examplee, does a valuee of 1 in an InS
Stock column in a
Products table mean tha
at the company has a single unit in stock, o
or does 1 simp
ply indicate the value
true, mea
aning that therre is an unspeccified quantity of units in sto
ock?

The relation
nships between business enttities, and how
w those relation
nships are modeled in the data
sources.

In
n addition to understanding the data modeling of the bu
usiness entities, you also nee
ed to examine
e source
da
ata to help ide
entify:

Column datta types and leengths for speccific attributes that will be inccluded in data flows. For example,
what maxim
mum lengths exist
e
for string values? What formats are used to indicate
e date, time, and
numeric values?

me and sparsen
ness. For example, how manyy rows of saless transactions aare typically re
ecorded
Data volum
in a single trading
t
day? Are
A there any attributes
a
that frequently co ntain null valu
ues?

ere commonlyy used


Data quality
ty issues. For exxample, are th
here any obvio
ous data entry errors? Are the
values that are synonymss for one anoth
her?

Finding the answers to questions like these before you im


mplement the ETL solution caan help you
an
nticipate data flow problemss and proactively design effeective solution
ns for them.

Exa
amining So
ource Data
a

You
u can explore source
s
data byy using several tools and tech
hniques. The fo
ollowing list describes some
e of
the approaches th
hat you can usse to extract da
ata to examinee:

MCT USE ONLY. STUDENT USE PROHIBITED

4-12 Creating an
a ETL Solution with SSIS

Running querries against da


ata sources in Microsoft
M
SQL Server Manag
gement Studio
o and copying the
results to the clipboard.

Creating an SSIS
S
package with
w a data flow
w that extractss a sampling o
of data or a row
w count for a
specific data source.

Using the Imp


port and Export Data Wizard
d to extract a ssample of dataa.

Afte
er extracting th
he sample data
a, you need to
o examine it. O
One of the mosst effective ways to do this iss to
extrract the sample
e data in a format that you can
c open in M
Microsoft Excel
, such as com
mma-delimited
d text,
and then use the functionality of
o Excel to exp
plore the data. Using Excel, yyou can:

Sort the data by columns.

Apply column
n filters to help
p identify the range
r
of valuees used in particular column..

Use formulas to calculate minimum,


m
maximum, and aveerage values fo
or numerical ccolumns.

Search the da
ata for specific string values.

Demonstra
D
ation: Explo
oring Sourrce Data

X Task 1: Usse the Imporrt and Exporrt Data Wizaard to extracct a sample of data

MCT USE ONLY. STUDENT USE PROHIBITED

10777A: Im
mplementing a Data Warehouse with Miccrosoft SQL Server 20012

4-13

1..

Ensure thatt the MIA-DC1


1 and MIA-SQLLBI virtual macchines are both
h running, and
d then log on tto
MIA-SQLBI as ADVENTU
UREWORKS\Sttudent with th
he password P
Pa$$w0rd.

2..

In the D:\10
0777A\Demofiles\Mod04 folder, right-clicck Setup.cmd, and then clickk Run as
administra
ator.

3..

When you are


a prompted to confirm, click Yes, and th
hen wait for th
he batch file to
o complete.

4..

Click Start, click All Prog


grams, click Microsoft
M
SQL Server 2012, and then clickk Import and Export
Data (64-b
bit).

5..

On the Welcome to SQL


L Server Impo
ort and Exportt Wizard pagee, click Next.

6..

On the Cho
oose a Data Source page, se
elect the follow
wing options, and then clickk Next:

Data source: SQL Se


erver Native Client 11.0

Serverr name: (local)

Authentication: Use
e Windows Autthentication

Databa
ase: ResellerSa
ales

7.

On the Choose a Destination page, select the following options, and then click Next:

Destination: Flat File Destination

File name: D:\10777A\Demofiles\Mod04\Top 500 Resellers.csv

Locale: English (United States)

Unicode: Unselected

Code page: 1252 (ANSI Latin 1)

Format: Delimited

Text qualifier: " (a quotation mark)

Column names in the first data row: Selected

Note The text qualifier is used to enclose text values in the exported data. This is required
because some European address formats include a comma, and these must be distinguished
from the commas that are used to separate each column value in the exported text file.

MCT USE ONLY. STUDENT USE PROHIBITED

4-14 Creating an ETL Solution with SSIS

8.

On the Specify Table Copy or Query page, select Write a query to specify the data to transfer,
and then click Next.

9.

On the Provide a Source Query page, enter the following Transact-SQL code, and then click Next.
SELECT TOP 500 * FROM Resellers

10. On the Configure Flat File Destination page, select the following options, and then click Next:

Source query: Query

Row delimiter: {CR}{LF}

Column delimiter: Comma {,}

11. On the Save and Run Package page, select only Run immediately, and then click Next.
12. On the Complete the Wizard page, click Finish.
13. When the data extraction has completed successfully, click Close.

X Task 2: Explore source data in Microsoft Excel

MCT USE ONLY. STUDENT USE PROHIBITED

10777A: Implementing a Data Warehouse with Microsoft SQL Server 2012

4-15

1.

In the D:\10777A\Demofiles\Mod04 folder, double-click Top 500 Resellers.csv to open it in Excel.

2.

Click any cell that contains data, on the Home tab of the ribbon, click Format as Table, and then
select a table style for the data.

3.

In the Format As Table dialog box that is displayed, ensure that the range of cells that contain the
data is selected and that the My table has headers check box is selected, and then click OK.

4.

Adjust the column widths so that you can see all of the data.

5.

View the drop-down filter list for the CountryRegionCode column, and note the range of values in
this column. Then select only FR, and then click OK.

6.

Note that the table is filtered to show only the resellers in France. Note also that many of the
addresses include a comma. If no text qualifier had been selected in the Import and Export Data
Wizard, these commas would have created additional columns in these rows, making the data difficult
to examine as a table.

7.

Clear all filters for the CountryRegionCode column.

8.

In a blank cell in column O, enter the following formula.


=Min(Table1[@YearOpened])

9.

Note that this formula shows the earliest year that a store in this sample data was opened.

10. Close Excel without saving any changes.

Pro
ofiling Sou
urce Data

MCT USE ONLY. STUDENT USE PROHIBITED

4-16 Creating an
a ETL Solution with SSIS

In addition to exa
amining samples of source data, you can u
use the Data Prrofiling task in
n an SSIS packaage to
obta
ain statistics ab
bout the data. This can help you understa nd the structu
ure of the data that you will
extrract and identify columns wh
here null or miissing values a re likely. Profiling your source data can he
elp
you plan effective
e data flows for your ETL pro
ocess.

You
u can specify multiple
m
profile
e requests in a single instanc e of the Data Profiling task. The following
g kinds
of profile
p
request are available:

Candidate Key determiness whether you can use a colu


umn as a key ffor the selected
d table.

Column Leng
gth Distributiion reports the
e range of leng
gths for string
g values in a co
olumn.

Column Nulll Ratio reportss the percentage of null valu


ues in a colum n.

Column Patttern identifies regular expresssions that aree applicable to


o the values in a column.

Column Stattistics reports statistics such as minimum, maximum, and


d average valu
ues for a colum
mn.

Column Valu
ue Distributio
on reports the groupings of d
distinct valuess in a column.

Functional Dependency
D
determines
d
whether the valu e of a column is dependent on the value o
of
other column
ns in the same table.

Value Inclusiion reports the


e percentage of
o time that a column value in one table m
matches a colu
umn
in another tab
ble.

The Data Profiling


g task gathers the requested profile statist ics and writes them to an XM
ML document.. This
can be saved as a file for later analysis, or writtten to a varia ble for programmatic analyssis within the
control flow.
To view
v
the profile statistics, you can use the Data Profile V
Viewer. This is aavailable as a sstand-alone to
ool in
which you can op
pen the XML fille that the Datta Profiling tassk generates, o
or you can ope
en the Data Profile
View
wer window in
n SQL Server Data Tools from
m the Propertiies dialog boxx of the Data P
Profiling task w
while
the package is run
nning in the development en
nvironment.

Use the following procedure to collect and view data profile statistics:
1.

Create an SSIS project that includes a package.

2.

Add an ADO.NET connection manager for each data source that you want to profile.

3.

Add the Data Profiling task to the control flow of the package.

4.

Configure the Data Profiling task to specify:

The file or variable to which the resulting profile statistic should be written.

The individual profile requests that should be included in the report.

5.

Run the package.

6.

View the resulting profile statistics in the Data Profile Viewer.

MCT USE ONLY. STUDENT USE PROHIBITED

10777A: Implementing a Data Warehouse with Microsoft SQL Server 2012

4-17

De
emonstration: Using the Data Profiling
P
T
Task

X Task 1: Use the Data Prrofiling task


1.

If you did nott complete the


e previous dem
monstration:

MCT USE ONLY. STUDENT USE PROHIBITED

4-18 Creating an
a ETL Solution with SSIS

a.

Ensure th
hat the MIA-DC
C1 and MIA-SQLBI virtual m
machines are bo
oth running, aand then log o
on to
MIA-SQLLBI as ADVENT
TUREWORKS\\Student with
h the password
d Pa$$w0rd.

b.

In the D:\\10777A\Demofiles\Mod04 folder, right-cclick Setup.cm


md, and then click Run as
administtrator.

c.

When yo
ou are prompte
ed to confirm, click Yes, and
d then wait forr the batch file to complete.

2.

Start SQL Servver Data Toolss, and then cre


eate a new Inteegration Servicces project nam
med
ProfilingDem
mo in the D:\10777A\Demoffiles\Mod04 fo
older.

3.

In the Solutio
on Explorer pan
ne, create a ne
ew ADO.NET cconnection maanager with the following settings:

4.

Server name: localhost

Log on to
t the server: Use Windows Authenticatio
on

Select orr enter a data


abase name: ResellerSales
R

In the SSIS To
oolbox pane, in
n the Common section, dou
uble-click Data
a Profiling Ta
ask to add it to
o the
Control Flow surface. (Alterrnatively, you can
c drag the taask icon to thee Control Flow
w surface.)
Note

If the
e SSIS Toolboxx pane is not visible, on the SSSIS menu, click SSIS Toolb
box.

5.

Double-click the Data Proffiling Task ico


on on the Conttrol Flow surfaace.

6.

In the Data Profiling


P
Task
k Editor dialog
g box, on the G
General tab, in
n the Destinattion property vvalue
drop-down list, click <New
w File connecttion>.

MCT USE ONLY. STUDENT USE PROHIBITED

10777A: Implementing a Data Warehouse with Microsoft SQL Server 2012

4-19

7.

In the File Connection Manager Editor dialog box, in the Usage type drop-down list, click Create
file.

8.

In the File box, type D:\10777A\Demofiles\Mod04\Reseller Sales Data Profile.xml, and then
click OK.

9.

In the Data Profiling Task Editor dialog box, on the General tab, set OverwriteDestination to
True.

10. In the Data Profiling Task Editor dialog box, on the Profile Requests tab, in the Profile Type dropdown list, select Column Statistics Profile Request, and then click the RequestID column.
11. In the Request Properties pane, set the following property values. Do not click OK when finished:

ConnectionManager: localhost.ResellerSales

TableOrView: [dbo].[SalesOrderHeader]

Column: OrderDate

12. In the row under the Column Statistics Profile Request, add a Column Length Distribution
Profile Request profile type with the following settings:

ConnectionManager: localhost.ResellerSales

TableOrView: [dbo].[Resellers]

Column: AddressLine1

13. Add a Column Null Ratio Profile Request profile type with the following settings:

ConnectionManager: localhost.ResellerSales

TableOrView: [dbo].[Resellers]

Column: AddressLine2

14. Add a Value Inclusion Profile Request profile type with the following settings:

ConnectionManager: localhost.ResellerSales

SubsetTableOrView: [dbo].[SalesOrderHeader]

SupersetTableOrView: [dbo].[PaymentTypes]

InclusionColumns:
Subset side Columns: PaymentType
Superset side Columns: PaymentTypeKey

InclusionThresholdSetting: None

SupersetColumnsKeyThresholdSetting: None

MaxNumberOfViolations: 100

15. In the Data Profiling Task Editor dialog box, click OK.
16. On the Debug menu, click Start Debugging.

X Task 2: View a data profiling report

MCT USE ONLY. STUDENT USE PROHIBITED

4-20 Creating an ETL Solution with SSIS

1.

When the Data Profiling task has completed, with the package still running, double-click the Data
Profiling task, and then click Open Profile Viewer.

2.

Maximize the Data Profile Viewer window and under the [dbo].[SalesOrderHeader] table, click
Column Statistics Profiles. Then review the minimum and maximum values for the OrderDate
column.

3.

Under the [dbo].[Resellers] table, click Column Length Distribution Profiles and click the
AddressLine1 column to view the statistics. Click the bar chart for any of the column lengths, and
then click the Drill Down button (at the right-edge of the title area for the middle pane) to view the
source data that matches the selected column length.

4.

Close the Data Profile Viewer window, and then in the Data Profiling Task Editor dialog box, click
Cancel.

5.

On the Debug menu, click Stop Debugging, and then close SQL Server Data Tools, saving your
changes if you are prompted.

6.

Click Start, click All Programs, click Microsoft SQL Server 2012, click Integration Services, and
then click Data Profile Viewer to start the stand-alone Data Profile Viewer tool.

7.

Click Open, and open Reseller Sales Data Profile.xml in the D:\10777A\Demofiles\Mod04 folder.

8.

Under the [dbo].[Resellers] table, click Column Null Ratio Profiles and view the null statistics for
the AddressLine2 column. Select the AddressLine2 column, and then click the Drill Down button to
view the source data.

9.

Under the [dbo].[SalesOrderHeader] table, click Inclusion Profiles and review the inclusion
statistics for the PaymentType column. Select the inclusion violation for the payment type value of 0,
and then click the Drill Down button to view the source data.
Note The PaymentTypes table includes two payment types, using the value 1 for invoicebased payments and 2 for credit account payments. The Data Profiling task has revealed
that for some sales, the value 0 is used, which may indicate an invalid data entry or may be
used to indicate some other kind of payment that does not exist in the PaymentTypes
table.

10. Close the Data Profile Viewer window.

Lesson
n3

Imple
ementin
ng Data
a Flow

MCT USE ONLY. STUDENT USE PROHIBITED

10777A: Im
mplementing a Data Warehouse with Miccrosoft SQL Server 20012

4-21

After you have thoroughly exxplored the datta sources for your data warrehousing solu
ution, you can start to
im
mplement an ETL
E process by using SSIS. Th
his ETL processs will consist off one or more SSIS packagess, with
ea
ach package containing one
e or more Data
a Flow tasks. D
Data flow is at tthe core of anyy SSIS-based EETL
so
olution, so its important to understand
u
ho
ow you can usee the componeents of an SSISS data flow pip
peline to
exxtract, transforrm, and load data.
d
Th
his lesson desccribes the vario
ous componen
nts that are useed to implemeent a data flow
w, and provide
es some
gu
uidance for op
ptimizing data flow performa
ance.
After completin
ng this lesson, you
y will be able to:

Create a co
onnection man
nager.

Add a Data
a Flow task to a package con
ntrol flow.

Add a Sourrce componentt to a data flow


w.

Add a destiination to a da
ata flow.

Add transfo
ormations to a data flow.

Optimize data flow perfo


ormance.

Co
onnection Managers

MCT USE ONLY. STUDENT USE PROHIBITED

4-22 Creating an
a ETL Solution with SSIS

To extract
e
or load
d data, an SSIS package mustt be able to co
onnect to the d
data source orr destination. In an
SSIS
S solution, you define data connections byy creating a co nnection manager for each data source or
desttination that iss used in the workflow.
w
A connection man ager encapsulates the follow
wing informatiion,
which is used to make
m
a connecction to the da
ata source or d
destination:

The data provvider to be use


ed. For example, you can creeate a connecttion manager for a relationaal
database by using
u
an OLE DB
D or ADO.NE
ET provider. Altternatively, yo
ou can create a connection
manager for a text file by using
u
a flat file provider.

The connectio
on string used to locate the data source. FFor a relational database, the
e connection sstring
includes the network
n
name
e of the databa
ase server and the name of tthe database. FFor a file, the ffile
name and path must be spe
ecified.

The credentia
als used to acccess the data so
ource.

You
u can create co
onnection man
nagers at the project
p
level orr the package level:

Project-level connection managers are lissted in Solutio n Explorer and


d can be share
ed across multiiple
packages in the same proje
ect. Use project-level connecction managerrs when multip
ple packages n
need
to access the same data sou
urce. To create
e a project-levvel connection manager, righ
ht-click the
Connection Managers
M
folde
er in Solution Explorer or clicck the Project menu, and then click New
Connection Manager.
M

MCT USE ONLY. STUDENT USE PROHIBITED

10777A: Implementing a Data Warehouse with Microsoft SQL Server 2012

4-23

Package-level connection managers exist only within the package in which they are defined. Both
project-level and package-level connection managers that are used by a package are shown in its
Connection Managers pane in the SSIS Designer.

To create a package-level connection manager, right-click in the Connection Managers pane and
choose the kind of connection manager that you want to create. Alternatively, create a new
connection manager in the Properties dialog box of a task, source, destination, or transformation.
Note When you create a new connection manager, SQL Server Data Tools enables you to
select connection details that you have created previouslyeven if the connection
managers that relate to those connection details do not exist in the current project or have
been deleted.

The Data Flo


ow Task

MCT USE ONLY. STUDENT USE PROHIBITED

4-24 Creating an
a ETL Solution with SSIS

A pa
ackage defines a control flow
w for actions that
t
the SSIS ru
untime enginee is to perform
m. A package co
ontrol
flow
w can contain several
s
different tasks, and in
nclude compleex branching aand iteration, b
but the core taask in
any ETL control flo
ow is the Data
a Flow task.
To include a data flow in a pack
kage control flow, drag the D
Data Flow taskk from the SSISS Toolbox pane and
drop it on the Control Flow surfface. Alternativvely, you can d
double-click th
he Data Flow ttask icon in the
e SSIS
Too
olbox pane and
d the task will be added to th
he design surfface. After you
u have added a Data Flow task to
the control flow, you
y can renam
me it and set itss properties in
n the Propertiees pane.
Note This module focuses on the Data
a Flow task. Otther control flo
ow tasks will b
be
discussed in detail in Module 5, Implem
menting Contro
ol Flow in an SSSIS Package.

To define
d
the pipe
eline for the Data
D
Flow task, double-click tthe task. SSIS D
Designer will d
display a desig
gn
surfface onto whicch you can add
d data flow com
mponents. Altternatively, you
u can click the
e Data Flow taab in
SSIS
S Designer and
d then select th
he Data Flow task
t
that you w
want to edit in the drop-dow
wn list that is
disp
played at the to
op of the desig
gn surface.

A tyypical data flow


w pipeline includes one or more
m
data sourrces, transform
mations that op
perate on the d
data
as itt flows through the pipeline,, and one or more
m
destinatio
ons for the datta. The pipeline flow is defin
ned by
connecting the ou
utput from one component to the input o
of the next com
mponent in the
e pipeline.

Data
D
Sources

Th
he starting point for a data flow
f
is a data source.
s
A data source definittion includes:

MCT USE ONLY. STUDENT USE PROHIBITED

10777A: Im
mplementing a Data Warehouse with Miccrosoft SQL Server 20012

4-25

The connecction managerr that is used to


o connect to tthe data sourcee.

The table, view,


v
or query that is used to
o extract the d
data.

The column
ns that are inclluded in the output from thee data source and passed to
o the next com
mponent
in the data flow pipeline.

Th
he following ta
able describes the kinds of data
d
source thaat SSIS supporrts.
Databases
ADO.NET

Any database to wh
hich an ADO.N ET data provid
der is installed.

OLE DB

Any database for wh


hich an OLE D B provider is installed.

CDC Source

A SQLL Server or Ora


acle database iin which chang
ge data capturre (CDC) has b
been
enable
ed. CDC is disccussed in Mod
dule 7: Implem enting an Incrremental ETL P
Process.

Files
Excel

A Microsoft Excel workbook.

Flat file

Data in a text file, su


uch as commaa-delimited texxt.

XML

A file that contains data


d
in XML fo
ormat.

Raw file

An SSIS-specific binary format filee.

Other sourcess
Script component

A custtom source tha


at is implemen
nted as a scriptt.

Custom

A custtom data sourcce that is impl emented as a .NET assemblyy.

MCT USE ONLY. STUDENT USE PROHIBITED

4-26 Creating an ETL Solution with SSIS

In addition to the sources listed in the table, you can download the following additional sources from the
Microsoft Web site.

Oracle Source

SAP BI Source

Teradata Source

To add a data source for SQL Server, Excel, a flat file, or Oracle to a data flow, drag the Source Assistant
icon from the Favorites section of the SSIS Toolbox pane to the design surface and use the wizard to
select or create a connection manager for the source. For other data sources, drag the appropriate icon
from the Other Sources section of the SSIS Toolbox pane to the design surface and then double-click the
data source on the design surface to define the connection, data, and output columns for the data source.
By default, the output from a data source is represented as an arrow at the bottom of the data source
icon on the design surface. To create a data flow, you simply drag this arrow and connect it to the next
component in the data flow pipeline, which could be a destination or a transformation.

Data
D
Destin
nations

A destination is an endpoint for


f a data flow
w. It has input ccolumns, which are determin
ned by the
co
onnection from
m the previouss component in the data flow
w pipeline, but no output.
A destination de
efinition includ
des:

A connectio
on manager fo
or the data sto
ore where the d
data is to be in
nserted.

The table or
o view into wh
hich the data must
m
be inserteed (where supported).

Th
he following ta
able describes the kinds of destination
d
thaat SSIS supportts.
Databases
ADO.NET

Any datab
base for which
h an ADO.NET data provider is installed.

OLE DB

Any datab
base for which
h an OLE DB prrovider is instaalled.

SQL Server

A SQL Serrver database.

SQL Server Co
ompact

An instance of SQL Servver Compact.

Files
Excel

A Microso
oft Excel workb
book.

Flat file

A text file.

Raw file

An SSIS-sp
pecific binary fformat file.

MCT USE ONLY. STUDENT USE PROHIBITED

10777A: Im
mplementing a Data Warehouse with Miccrosoft SQL Server 20012

4-27

(continued)
SQL Server Analysis Services

MCT USE ONLY. STUDENT USE PROHIBITED

4-28 Creating an ETL Solution with SSIS

Data mining model training

Used to build data mining models for data analysis.

Dimension processing

Used to populate a dimension in an online analytical processing (OLAP)


cube.

Partition processing

Used to populate a partition in an OLAP cube.

Rowsets
DataReader

An ADO.NET DataReader interface that can be read by another


application.

Recordset

An ADO Recordset interface that can be read by another application.

Other sources
Script component

A custom destination that is implemented as a script.

Custom

A custom destination that is implemented as a .NET assembly.

To add a SQL Server, Excel, or Oracle destination to a data flow, drag the Destination Assistant icon from
the Favorites section of the SSIS Toolbox pane to the design surface and use the wizard to select or
create a connection manager for the destination. For other kinds of destination, drag the appropriate icon
from the Other Destinations section of the SSIS Toolbox pane to the design surface.
After you have added a destination to the data flow, connect the output from the previous component in
the data flow to the destination, double-click the destination, and then edit it to define:

The connection manager and destination table (if relevant) to be used when loading the data.

The column mappings between the input columns and the columns in the destination.

Data
D
Transfformations

MCT USE ONLY. STUDENT USE PROHIBITED

10777A: Im
mplementing a Data Warehouse with Miccrosoft SQL Server 20012

4-29

Data transforma
ations enable you
y to perform
m operations o
on rows of datta as they passs through the d
data
flo
ow pipeline. Transformations have both in
nputs and outp
puts.
Th
he following ta
able lists the trransformations that SSIS inc ludes.

Row transform
mationsupdate column va
alues or createe new column
ns for each row
w in the data flow
Character Ma
ap

Ap
pplies string functions to col umn values, su
uch as converssion from lowe
ercase
to uppercase.

Copy Column
n

Creates a copy of
o a column an
nd adds it to th
he data flow.

Data Converssion

Co
onverts data off one type to aanotherfor eexample, nume
erical values to
o
strrings.

Derived Colum
mn

Ad
dds a new colu
umn based on an expression
n. For example,, you could use an
expression to mu
ultiply a Quan
ntity column b
by a UnitPrice column to cre
eate a
ne
ew TotalPrice column.

Export Colum
mn

Saves the conten


nts of a colum n as a file.

Import Colum
mn

Re
eads data from
m a file and add
ds it as a colum
mn in the dataa flow.

OLE DB Comm
mand

Ru
uns a Structure
ed Query Lang uage (SQL) co
ommand for eaach row in the data
flo
ow.

(continued)
Rowset transformationscreate new rowsets
Aggregate

Creates a new rowset by applying aggregate functions such as SUM.

Sort

Creates a new sorted rowset.

Percentage Sampling

Creates a rowset by randomly selecting a specified percentage of rows.

Row Sampling

Creates a rowset by randomly selecting a specified number of rows.

Pivot

Creates a rowset by condensing multiple records with a single column into a


single record with multiple columns.

Unpivot

Creates a rowset by expanding a single record with multiple columns into


multiple records with a single column.

Split and Join transformationsmerge or branch data flows

MCT USE ONLY. STUDENT USE PROHIBITED

4-30 Creating an ETL Solution with SSIS

Conditional Split

Splits a single-input rowset into multiple-output rowsets based on conditional


logic.

Multicast

Distributes all input rows to multiple outputs.

Union All

Adds multiple inputs into a single output.

Merge

Merges two sorted inputs into a single output.

Merge Join

Joins two sorted inputs to create a single output based on a FULL, LEFT, or
INNER join operation.

Lookup

Looks up columns in a data source by matching key values in the input, and
creates an output for matched rows and a second output for rows with no
matching value in the lookup data source.

Cache

Caches data from a data source to be used by a Lookup transformation.

CDC Splitter

Splits inserts, updates, and deletes from a CDC source into separate data flows.
CDC is discussed in Module 7: Implementing an Incremental ETL Process.

Auditing transformationsadd audit information or count rows


Audit

Provides execution environment information that can be added to the data


flow.

RowCount

Counts the rows in the data flow and writes the result to a variable.

(continued)
BI transformationsperform BI tasks

MCT USE ONLY. STUDENT USE PROHIBITED

10777A: Implementing a Data Warehouse with Microsoft SQL Server 2012

4-31

Slowly Changing
Dimension

Redirects rows when loading a data warehouse to preserve historical dimension


values.

Fuzzy Grouping

Uses fuzzy logic to deduplicate rows in the data flow.

Fuzzy Lookup

Looks up columns in a data source by finding approximate matches for values


in the input.

Term Extraction

Extracts nouns or noun phrases from text for statistical analysis.

Term Lookup

Matches terms extracted from text with terms in a reference table.

Data Mining Query

Runs a data mining prediction query against the input to predict unknown
column values.

Data Cleansing

Applies a Data Quality Services knowledge base to data as it flows through the
pipeline.

Custom transformationsperform custom operations


Script Component

Runs custom script code for each row in the input.

Custom Component

A custom .NET assembly.

To add a transformation to a workflow, drag the transformation from the Common or Other Transforms
section of the SSIS Toolbox pane to the design surface, and then connect the required inputs to the
transformation. Double-click the transformation to configure the specific operation that it will perform,
and then define the columns that will be included in the outputs from the transformation.
For More Information For more information about Integration Services Transformations,
see http://go.microsoft.com/fwlink/?LinkID=246724.

Op
ptimizing Data
D
Flow Performance

There are several techniques tha


at you can app
ply to optimizee the performaance of a dataa flow. When yyou
are implementing
g a data flow, consider
c
the fo
ollowing guideelines:

MCT USE ONLY. STUDENT USE PROHIBITED

4-32 Creating an
a ETL Solution with SSIS

Optimize queeries. Select only the rows and columns thaat you need to
o reduce the ovverall volume of
data in the da
ata flow.

Avoid unneceessary sorting. If you require sorted data fro


om a single daata source, sorrt it during the
e
extraction by using a queryy with an ORDE
ER BY clause iff possible. If su
ubsequent tran
nsformations in
your data flow
w rely on sorte
ed data, use th
he IsSorted prroperty of the output to indiicate that the d
data
is already sortted.

Configure Data Flow task prroperties. Use the following properties of tthe Data Flow task to optimize
performance::

DefaultB
BufferSize and
d DefaultBuffferMaxRows. Configuring th
he size of the buffers that th
he
data flow
w uses can sign
nificantly impro
ove performan
nce. When theere is sufficientt memory available,
you shou
uld try to achie
eve a small num
mber of large buffers withou
ut incurring an
ny disk paging. The
default values for these
e properties arre 10 MB and 110,000 rows reespectively.

empStoragePath and BLOB


BTempStorag ePath. Using tthese propertiies to locate
BufferTe
temporarry objects crea
ated by the data flow to a fa st disk, or spreeading them aacross multiple
e
storage devices,
d
can im
mprove perform
mance.

EngineThreads. Setting the numberr of threads avaailable to the Data Flow taskk can improve
execution
n performance
e, particularly in
i packages w here the MaxC
ConcurrentEx
xecutables
property has been set to
t enable para
allel execution of the packag
ges tasks acrosss multiple
processors.

RunInOp
ptimizedMode. Setting a Da
ata Flow task tto run in optim
mized mode in
ncreases
performa
ance by removving any colum
mns or compon
nents that are not required ffurther downsttream
in the data flow.

Demonstra
D
ation: Implementing a Data Flo
ow

X Task 1: Co
onfigure a data source
1..

If you did not


n complete the
t previous demonstration:

MCT USE ONLY. STUDENT USE PROHIBITED

10777A: Im
mplementing a Data Warehouse with Miccrosoft SQL Server 20012

4-33

Ensure that the MIA--DC1 and MIA-SQLBI virtual machines are both running, and then log
g on to
MIA-SQ
QLBI as ADVEN
NTUREWORK
KS\Student wiith the passwo
ord Pa$$w0rd.

In the D:\10777A\De
D
emofiles\Mod0
04 folder, rightt-click Setup.ccmd, and then
n click Run as
administrator.

When you
y are promp
pted to confirm
m, click Yes, an
nd then wait ffor the batch fiile to complete
e.

2..

Use SQL Se
erver Managem
ment Studio to
o connect to th
he localhost d
database engin
ne instance and
d view
the contentts of the Product, ProductS
Subcategory, and ProductC
Category table
es in the Products
database, and
a the DimPrroduct table in
n the DemoDW
W database (w
which should b
be empty).

3..

Start SQL Server Data Too


ols and create a new Integraation Services p
project named
d DataFlowDe
emo in
the D:\1077
77A\Demofiless\Mod04 folde
er.

4..

In Solution Explorer, rena


ame Package.d
dtsx to ExtracctProducts.dttsx.

5..

In Solution Explorer, create a new OLE DB connection


n manager witth the followin
ng settings:

Serverr name: localhost

Log on
n to the serve
er: Use Window
ws Authenticattion

Select or enter a da
atabase name
e: Products

6.

In the SSIS Toolbox pane, in the Favorites section, double-click Data Flow Task to add it to the
Control Flow surface. (Alternatively, you can drag the task icon to the Control Flow surface.)
Note If the SSIS Toolbox pane is not visible, on the SSIS menu, click SSIS Toolbox.

MCT USE ONLY. STUDENT USE PROHIBITED

4-34 Creating an ETL Solution with SSIS

7.

Rename the Data Flow task to Extract Products, and then double-click it to switch to the Data Flow
tab.

8.

In the SSIS Toolbox pane, in the Favorites section, double-click Source Assistant to add a source to
the Data Flow surface. (Alternatively, you can drag the Source Assistant icon to the Data Flow
surface.)

9.

In the Source Assistant - Add New Source dialog box, in the list of types, click SQL Server; in the
list of connection managers, click localhost.Products; and then click OK.

10. Rename the OLE DB source Products, and then double-click it to edit its settings.

11. In the OLE DB Source Editor dialog box, on the Connection Manager tab, view the list of available
tables and views in the drop-down list.
12. Change the data access mode to SQL Command, and then enter the Transact-SQL in the following
code example.
SELECT ProductKey, ProductName FROM Product

13. Click Build Query to open the Query Builder dialog box.

14. In the Product table, select the ProductSubcategoryKey, StandardCost, and ListPrice columns, and
then click OK.
15. In the OLE DB Source Editor dialog box, click Preview to see a preview of the data, and then click
Close to close the preview.

16. In the OLE DB Source Editor dialog box, click the Columns tab, view the list of external columns that
the query has returned and the output columns that the data source has generated, and then click
OK.

X Task 2: Use a Derived Column transformation


1.

In the SSIS Toolbox pane, in the Common section, double-click Derived Column to add a Derived
Column transformation to the Data Flow surface, and then position it under the Products source.
(Alternatively, you can drag the Derived Transformation icon to the Data Flow surface.)

2.

Rename the Derived Column transformation Calculate Profit.

3.

Select the Products source, and then drag the output arrow to the Derived Column transformation.

4.

Double-click the Derived Column transformation to edit its settings, and then in the Derived
Column Name box, type Profit.

5.

Ensure that <add as new column> is selected in the Derived Column box.

6.

Expand the Column folder, and then drag the ListPrice column to the Expression box.

7.

In the Expression box, after [ListPrice], type a minus sign (), and then drag the StandardCost
column to the Expression box to create the following expression.
[ListPrice]-[StandardCost]

8.

Click the Data Type box, ensure that it is set to currency [DT_CY], and then click OK.

X Task 3: Use a Lookup transformation

MCT USE ONLY. STUDENT USE PROHIBITED

10777A: Implementing a Data Warehouse with Microsoft SQL Server 2012

4-35

1.

In the SSIS Toolbox pane, in the Common section, double-click Lookup to add a Lookup
transformation to the Data Flow surface, and then position it under the Calculate Profit
transformation. (Alternatively, you can drag the Lookup icon to the Data Flow surface.)

2.

Rename the Lookup transformation Lookup Category.

3.

Select the Calculate Profit transformation, and then drag the output arrow to the Lookup Category
transformation.

4.

Double-click the Lookup Category transformation to edit its settings.

5.

In the Lookup Transformation Editor dialog box, on the General tab, in the Specify how to
handle rows with no matching entries list, select Redirect rows to no match output.

6.

In the Lookup Transformation Editor dialog box, on the Connection tab, ensure that the
localhost.Products connection manager is selected, and then click Use results of an SQL query.

7.

Click Browse, and then in the D:\10777A\Demofiles\Mod04 folder, open the


LookupProductCategories.sql query.

8.

Click Preview to view the product category data, note that it includes a ProductSubcategoryKey
column, and then click Close to close the preview.

9.

In the Lookup Transformation Editor dialog box, on the Columns tab, in the Available Input
Columns list, drag ProductSubcategoryKey to ProductSubCategoryKey in the Available Lookup
Columns list.

10. Select the ProductSubcategoryName and ProductCategoryName columns to add them as new
columns to the data flow, and then click OK.

X Task 4: Configure a destination


1.

In Solution Explorer, create a new OLE DB connection manager with the following settings:

Server name: localhost

Log on to the server: Use Windows Authentication

Select or enter a database name: DemoDW

2.

In the SSIS Toolbox pane, in the Favorites section, double-click Destination Assistant to add a
destination transformation to the Data Flow surface, and then position it under the Lookup Category
transformation. (Alternatively, you can drag the Destination Assistant icon to the Data Flow surface.)

3.

In the Destination Assistant - Add New Destination dialog box, in the list of types, click SQL
Server; in the list of connection managers list, click localhost.DemoDW; and then click OK.

4.

Rename the OLE DB destination DemoDW.

5.

Select the Lookup Category transformation, and then drag the output arrow to the DemoDW
destination.

MCT USE ONLY. STUDENT USE PROHIBITED

4-36 Creating an ETL Solution with SSIS

6.

In the Input Output Selection dialog box, in the Output list, click Lookup Match Output, and then
click OK.

7.

Double-click the DemoDW destination to edit its settings, and then in the Name of the table or the
view list, click [dbo].[DimProduct].

8.

In the OLE DB Destination Editor dialog box, on the Mappings tab, note that input columns are
automatically mapped to destination columns with the same name.

9.

In the Available Input Columns list, drag the ProductKey column to the ProductAltKey column in
the Available Destination Columns list, and then click OK.

10. In the SSIS Toolbox pane, in the Other Destinations section, double-click Flat File Destination to
add a destination transformation to the Data Flow surface, and then position it to the right of the
Lookup Category transformation. (Alternatively, you can drag the Flat File Destination icon to the
Data Flow surface.)
11. Rename the flat file destination Uncategorized Products.

12. Select the Lookup Category transformation, and then drag the output arrow to the Uncategorized
Products destination. The Lookup No Match Output output is automatically selected.
13. Double-click the Uncategorized Products destination to edit its settings, and then click New to
create a new flat file connection manager for delimited text values.
14. Name the new connection manager Unmatched Products and specify the file name
D:\10777A\Demofiles\Mod04\UnmatchedProducts.csv.

15. In the Flat File Destination Editor dialog box, click the Mappings tab, note that the input columns
are mapped to destination columns with the same names, and then click OK.
16. Right-click the Data Flow design surface, and then click Execute Task. Observe the data flow as it
runs, noting the number of files transferred along each path.
17. When the data flow has completed, on the Debug menu, click Stop Debugging.
18. Close Visual Studio, saving your changes if you are prompted.

19. In Excel, open the UnmatchedProducts.csv flat file and note that there were no unmatched products.
20. Use SQL Server Management Studio to view the contents of the DimProduct table in the DemoDW
database, and note that the product data has been transferred.

Lab Scenario

MCT USE ONLY. STUDENT USE PROHIBITED

10777A: Im
mplementing a Data Warehouse with Miccrosoft SQL Server 20012

4-37

In
n this lab, you will
w start the development
d
of
o the ETL soluttion for the Ad
dventure Workks data wareho
ouse.
In
nitially, you will focus on the core task of th
he ETL solution
n, which is to eextract data frrom data sourcces so
th
hat it can be lo
oaded into the data warehou
use.

In
n this lab, you will
w focus on the extraction of
o customer an
nd sales orderr data from the
e InternetSale
es
da
atabase used by
b the compan
nys e-commerce site, which
h you must load into the Staging database
e. This
da
atabase contains customer data
d
(in a table
e named Custo
omers), and saales order dataa (in tables named
Sa
alesOrderHea
ader and SalessOrderDetail). You will extraact sales orderr data at the line item level o
of
granularity, and
d the total sale
es amount for each
e
sales ord er line item m
must be calculated by multipllying
th
he unit price of the product purchased by the quantity o
ordered. Addittionally, the sales order data
in
ncludes only th
he ID of the product purchassed, so your daata flow must look up the de
etails of each p
product
in
n a separate Prroducts datab
base.

Lab 4: Implem
menting
g Data Flow
F
in aan SSIS Packag
ge

Exe
ercise 1: Ex
xploring Source
S
Datta
Sce
enario

MCT USE ONLY. STUDENT USE PROHIBITED

4-38 Creating an
a ETL Solution with SSIS

You
u have designe
ed a data warehouse schema
a for Adventurre Works Cyclees, and now yo
ou must design
n an
ETL process to po
opulate it with data from various source syystems. Before creating the EETL solution, yo
ou
have decided to examine
e
the so
ource data so that
t
you can u
understand it b
better.
Specifically, you want
w
to:

Examine the customer


c
data
a to become fa
amiliar with thee kinds of valu
ues it contains..

Profile the source data to determine:


d

The range of order dattes for sales orders.

The maxiimum field len


ngth required for
f address daata.

The prop
portion of null values for the second line o
of a customerss address.

If the sale
es orders data includes orde
ers with a paym
ment type code that is not p
present in the ttable
of known
n payment typ
pes.

The main tasks for this exercise are as follows:


1.

Prepare the la
ab environmen
nt.

2.

Extract and view sample source data.

3.

Profile source
e data.

X Task 1: Prepare the lab environment

MCT USE ONLY. STUDENT USE PROHIBITED

10777A: Implementing a Data Warehouse with Microsoft SQL Server 2012

4-39

Ensure that the MIA-DC1 and MIA-SQLBI virtual machines are both running, and then log on to
MIA-SQLBI as ADVENTUREWORKS\Student with the password Pa$$w0rd.

Run the Setup Windows Command Script file (Setup.cmd) in the D:\10777A\Labfiles\Lab04\Starter
folder as Administrator.

X Task 2: Extract and view sample source data

Use the Import and Export Data Wizard to extract a sample of customer data from the InternetSales
database on the localhost instance of SQL Server to a comma-delimited flat file.

Your sample should consist of the first 1,000 records in the Customers table.

You should use a text qualifier because some string values in the table may contain commas.

After you have extracted the sample data, use Excel to view it.
Note You may observe some anomalies in the data, such as invalid gender codes and
multiple values for the same country or region. The purpose of examining the source data is
to identify as many of these problems as possible, so that you can resolve them in the
development of the extract, transform, and load (ETL) solution. You will address the
problems in this data in later labs.

X Task 3: Profile source data

Create an Integration Services project named Explore Internet Sales in the


D:\10777A\Labfiles\Lab04\Starter folder.

Add an ADO.NET connection manager that uses Windows authentication to connect to the
InternetSales database on the localhost instance of SQL Server.

Use a Data Profiling task to generate the following profile requests for data in the InternetSales
database:

Column statistics for the OrderDate column in the SalesOrderHeader table. You will use this
data to find the earliest and latest dates on which orders have been placed.

Column length distribution for the AddressLine1 column in the Customers table. You will use
this data to determine the appropriate column length to allow for address data.

Column null ratio for the AddressLine2 column in the Customers table. You will use this data to
determine how often the second line of an address is null.

Value inclusion for matches between the PaymentType column in the SalesOrderHeader table
and the PaymentTypeKey column in the PaymentTypes table. Do not apply an inclusion
threshold and set a maximum limit of 100 violations. You will use this data to find out if any
orders have payment types that are not present in the table of known payment types.

View the report that the Data Profiling task generates in the Data Profile Viewer.

Results: After this exercise, you should have a comma-separated text file that contains a sample of
customer data, and a data profile report that shows statistics for data in the InternetSales database.

Exercise 2: Transferring Data by Using a Data Flow Task


Scenario

MCT USE ONLY. STUDENT USE PROHIBITED

4-40 Creating an ETL Solution with SSIS

Now that you have explored the source data in the InternetSales database, you are ready to start
implementing data flows for the ETL process. A colleague has already implemented data flows for reseller
sales data, and you plan to model your Internet sales data flows on those.
The main tasks for this exercise are as follows:
1.

Examine an existing data flow.

2.

Create a Data Flow task.

3.

Add a data source to a data flow.

4.

Add a data destination to a data flow.

5.

Test the Data Flow task.

X Task 1: Examine an existing data flow

In the D:\10777A\Labfiles\Lab04\Starter\Ex2 folder, open the AdventureWorksETL.sln solution.

Open the Extract Reseller Data.dtsx package and examine its control flow. Note that it contains two
Data Flow tasks.

On the Data Flow tab, view the Extract Resellers task and note that it contains a source named
Resellers and a destination named Staging DB.

Examine the Resellers source, noting the connection manager that it uses, the source of the data, and
the columns that its output contains.

Examine the Staging DB destination, noting the connection manager that it uses, the destination
table for the data, and the mapping of input columns to destination columns.

Right-click anywhere on the Data Flow design surface, click Execute Task, and then observe the data
flow as it runs, noting the number of rows transferred.

When the data flow has completed, stop the debugging session.

X Task 2: Create a Data Flow task

Add a new package to the project and name it Extract Internet Sales Data.dtsx.

Add a Data Flow task named Extract Customers to the new packages control flow.

X Task 3: Add a data source to a data flow

Create a new project-level OLE DB connection manager that uses Windows authentication to connect
to the InternetSales database on the localhost instance of SQL Server.

In the Extract Customers data flow, add a source that uses the connection manager that you created
for the InternetSales database, and name it Customers.

Configure the Customers source to extract all columns from the Customers table in the
InternetSales database.

X Task 4: Add a data destination to a data flow

MCT USE ONLY. STUDENT USE PROHIBITED

10777A: Implementing a Data Warehouse with Microsoft SQL Server 2012

4-41

Add a destination that uses the existing localhost.Staging connection manager to the Extract
Customers data flow, and then name it Staging DB.

Connect the output from the Customers source to the input of the Staging DB destination.

Configure the Staging DB destination to load data into the Customers table in the Staging
database.

Ensure that all columns are mapped, and in particular that the CustomerKey input column is mapped
to the CustomerBusinessKey destination column.

Your completed data flow should look like the following image.

X Task 5: Test the Data Flow task

Right-click anywhere on the Data Flow design surface, click Execute Task, and then observe the data
flow as it runs, noting the number of rows transferred.

When the data flow has completed, stop the debugging session.

Results: After this exercise, you should have an SSIS package that contains a single Data Flow task, which
extracts customer records from the InternetSales database and inserts them into the Staging database.

Exercise 3: Using Transformations in a Data Flow


Scenario

MCT USE ONLY. STUDENT USE PROHIBITED

4-42 Creating an ETL Solution with SSIS

You have implemented a simple data flow to transfer customer data to the staging database. Now you
must implement a data flow for Internet sales records. The new data flow must add a new column that
contains the total sales amount for each line item (which is derived by multiplying the list price by the
quantity of units purchased), and use a product key value to find additional product data in a separate
Products database. Once again, you will model your solution on a data flow that a colleague has already
implemented for reseller sales data.
The main tasks for this exercise are as follows:
1.

Examine an existing data flow.

2.

Create a Data Flow task.

3.

Add a data source to a data flow.

4.

Add a Derived Column transformation to a data flow.

5.

Add a Lookup transformation to a data flow.

6.

Add a data destination to a data flow.

7.

Test the Data Flow task.

X Task 1: Examine an existing data flow

In the D:\10777A\Labfiles\Lab04\Starter\Ex3 folder, open the AdventureWorksETL.sln solution.

Open the Extract Reseller Data.dtsx package and examine its control flow. Note that it contains two
Data Flow tasks.

On the Data Flow tab, view the Extract Reseller Sales task.

Examine the Reseller Sales source, noting the connection manager that it uses, the source of the
data, and the columns that its output contains.

Examine the Calculate Sales Amount transformation, noting the expression that it uses to create
a new derived column.

Examine the Lookup Product Details transformation, noting the connection manager and query
that it uses to look up product data, and the column mappings that it uses to match data and
add rows to the data flow.

Examine the Staging DB destination, noting the connection manager that it uses, the destination
table for the data, and the mapping of input columns to destination columns.

Right-click anywhere on the Data Flow design surface, click Execute Task, and then observe the data
flow as it runs, noting the number of rows transferred.

When the data flow has completed, stop the debugging session.

X Task 2: Create a Data Flow task

Open the Extract Internet Sales Data.dtsx package, and then add a new Data Flow task named
Extract Internet Sales to its control flow.

Connect the pre-existing Extract Customers Data Flow task to the new Extract Internet Sales task.

X Task 3: Add a data source to a data flow

MCT USE ONLY. STUDENT USE PROHIBITED

10777A: Implementing a Data Warehouse with Microsoft SQL Server 2012

4-43

Add a source that uses the existing localhost.InternetSales connection manager to the Extract
Internet Sales data flow, and then name it Internet Sales.

In the D:\10777A\Labfiles\Lab04\Starter\Ex3 folder, configure the Internet Sales source to use the
InternetSales.sql query to extract Internet sales records.

X Task 4: Add a Derived Column transformation to a data flow

Add a Derived Column transformation named Calculate Sales Amount to the Extract Internet Sales
data flow.

Connect the output from the InternetSales source to the input of the Calculate Sales Amount
transformation.

Configure the Calculate Sales Amount transformation to create a new column named SalesAmount
that contains the UnitPrice column value multiplied by the OrderQuantity column value.

X Task 5: Add a Lookup transformation to a data flow

Add a Lookup transformation named Lookup Product Details to the Extract Internet Sales data
flow.

Connect the output from the Calculate Sales Amount transformation to the input of the Lookup
Product Details transformation.

Configure the Lookup Product Details transformation to:

Redirect unmatched rows to the no match output.

Use the localhost.Products connection manager and the Products.sql query in the
D:\10777A\Labfiles\Lab04\Starter\Ex3 folder to retrieve product data.

Match the ProductKey input column to the ProductKey lookup column.

Add all lookup columns other than ProductKey to the data flow.

Add a flat file destination named Orphaned Sales to the Extract Internet Sales data flow. Then
redirect non-matching rows from the Lookup Product Details transformation to the Orphaned
Sales destination, which should save any orphaned records in a comma-delimited file named
Orphaned Internet Sales.csv in the D:\10777A\ETL folder.

X Task 6: Add a data destination to a data flow

Add a destination that uses the existing localhost.Staging connection manager that you created to
the Extract Customers data flow, and name it Staging DB.

Connect the match output from the Lookup Product Details transformation to the input of the
Staging DB destination.

MCT USE ONLY. STUDENT USE PROHIBITED

4-44 Creating an ETL Solution with SSIS

Configure the Staging DB destination to load data into the InternetSales table in the Staging
database. Ensure that all columns are mapped. In particular, ensure that the *Key input columns are
mapped to the *BusinessKey destination columns.

Your completed data flow should look like the following image.

X Task 7: Test the Data Flow task

Right-click anywhere on the Data Flow design surface, click Execute Task, and then observe the data
flow as it runs, noting the number of rows transferred.

When the data flow has completed, stop the debugging session.

Results: After this exercise, you should have a package that contains a Data Flow task that includes a
Derived Column transformation and a Lookup transformation.

Modu
ule Reviiew and
d Takeaw
ways

Review
R
Quesstions

MCT USE ONLY. STUDENT USE PROHIBITED

10777A: Im
mplementing a Data Warehouse with Miccrosoft SQL Server 20012

4-45

1..

How could you determine the range off OrderDate vvalues in a dataa source to plaan a time dime
ension
table in a data
d
warehouse
e?

2..

What kind of source shou


uld you use to extract data ffrom a commaa-delimited file
e?

3..

How could you combine data from two


o identically sttructured sourcces into a sing
gle destination?

MCT USE ONLY. STUDENT USE PROHIBITED

MCT USE ONLY. STUDENT USE PROHIBITED


5-1

Module 5
Implementing Control Flow in an SSIS Package
Contents:
Lesson 1: Introduction to Control Flow

5-3

Lesson 2: Creating Dynamic Packages

5-14

Lesson 3: Using Containers

5-21

Lab 5A: Implementing Control Flow in an SSIS Package

5-33

Lesson 4: Managing Consistency

5-41

Lab 5B: Using Transactions and Checkpoints

5-51

Implementing Control Flow in an


a SSIS Package

Module Overrview

MCT USE ONLY. STUDENT USE PROHIBITED

5-2

Con
ntrol flow in SQ
QL Server Integ
gration Service
es (SSIS) packaages enables yo
ou to impleme
ent complex extract,
tran
nsform, and loa
ad (ETL) solutio
ons that comb
bine multiple ttasks and workkflow logic. By learning how to
imp
plement contro
ol flow, you can design robust ETL processses for a data w
warehousing solution that
coordinate data fllow operationss with other au
utomated taskks.
Afte
er completing this module, you
y will be able to:

Implement co
ontrol flow witth tasks and prrecedence con
nstraints.

Create dynam
mic packages that include va
ariables and paarameters.

Use containers in a package


e control flow..

Enforce consiistency with tra


ansactions and
d checkpoints..

MCT USE ONLY. STUDENT USE PROHIBITED

10777A: Implementing a Datta Warehouse with M


Microsoft SQL Server 2012

Lesson
n1

Introd
duction
n to Con
ntrol Flo
ow

5-3

Control flow in an SSIS package consists of one or more ttasks, usually eexecuted as a ssequence base
ed on
precedence con
nstraints that define
d
a workfllow. Before yo
ou can implement a control fflow, you need
d to
kn
now what task
ks are available
e and how to define
d
a workfllow sequence using precede
ence constrain
nts. You
also need to understand how
w you can use multiple
m
packaages to create complex ETL ssolutions.
After completin
ng this lesson, you
y will be able to:

Describe th
he control flow
w tasks provide
ed by SSIS.

Define a wo
orkflow for tassks by using prrecedence con
nstraints.

Use design time features of SSIS to help


p you develop
p control flow eefficiently.

Use multiplle packages in an SSIS solution.

Create reussable package templates.

Implementing Control Flow in an


a SSIS Package

Co
ontrol Flow
w Tasks

A co
ontrol flow con
nsists of one or
o more tasks. SSIS
S
includes tthe following ccontrol flow taasks that you ccan
use in a package.
Data Flow Tasks
Datta Flow

Enca
apsulates a data flow that trransfers data frrom a source tto a destinatio
on.

Database Tasks

MCT USE ONLY. STUDENT USE PROHIBITED

5-4

Datta Profiling

Gen
nerates statisticcal reports bassed on a data ssource.

Bullk Insert

Inse
erts data into a data destinattion in a bulk load operation
n.

Exe
ecute SQL

Runs a structured query languag


ge (SQL) queryy in a database.

Exe
ecute T-SQL

Runs a Transact-SQL query in a Microsoft SQL Server daatabase.

CDC Control

Perfforms a change data capturee (CDC) status management operation. CD


DC is
discussed in Modu
ule 7: Implemeenting an Increemental ETL Prrocess.

File and Internet Tasks


File
e System

Perfforms file syste


em operations , such as creatting folders or deleting files.

FTP
P

Perfforms file transsfer protocol ( FTP) operation


ns, such as cop
pying files.

XM
ML

Perfforms XML pro


ocessing operaations, such as applying a styyle sheet.

We
eb Service

Calls a method on
n a specific we b service.

Sen
nd Mail

Send
ds an email message.

Process Execution
n Tasks
Exe
ecute Package

Runs a specified SSIS


S
package.

Exe
ecute Process

Runs a specified program.


p

WM
MI Tasks
WM
MI Data Reade
er

Runs a Windows Management


M
IInstrumentatio
on (WMI) querry.

WM
MI Event Watcher

Mon
nitors a specifiic WMI event.

(continued)
Custom Logic Tasks
Script

A Microsoft Visual Studio Tools for Applications (VSTA) script.

Custom Task

A custom task implemented as a .NET assembly.

Database Transfer Tasks

MCT USE ONLY. STUDENT USE PROHIBITED

10777A: Implementing a Data Warehouse with Microsoft SQL Server 2012

5-5

Transfer Database

Transfers a database from one SQL Server instance to another.

Transfer Error
Messages

Transfers custom error messages from one SQL Server instance to another.

Transfer Jobs

Transfers SQL Agent jobs from one SQL Server instance to another.

Transfer Logins

Transfers logins from one SQL Server instance to another.

Transfer Master
Stored Procedures

Transfers stored procedures in the master database from one SQL Server
instance to another.

Transfer SQL Server


Objects

Transfers database objects such as tables and views from one SQL Server instance
to another.

Analysis Services Tasks


Analysis Services
Execute DDL

Runs a data definition language (DDL) statement in an Analysis Services instance


for example to create a cube.

Analysis Services
Processing

Processes an Analysis Services object, such as a cube or data mining model.

Data Mining Query

Runs a prediction query using a data mining model.

SQL Server Maintenance Tasks


Backup Database

Backs up a SQL Server database.

Check Database
Integrity

Checks the integrity of a SQL Server database.

History Cleanup

Deletes out of date history data for SQL Server maintenance operations.

Maintenance
Cleanup

Deletes files left by maintenance operations.

Notify Operator

Sends a notification by email message, pager message, or network alert to a SQL


Agent operator.

Rebuild Index

Rebuilds a specified index on a SQL Server table or view.

Reorganize Index

Reorganizes a specified index on a SQL Server table or view.

Shrink Database

Reduces the size of the specified SQL Server database.

Update Statistics

Updates value distribution statistics for tables and views in a SQL Server
database.

To add a task to a control flow, drag it from the SSIS Toolbox to the control flow design surface. Then
double-click the task on the design surface to configure its settings.

Implementing Control Flow in an


a SSIS Package

Pre
ecedence Constraint
C
ts

MCT USE ONLY. STUDENT USE PROHIBITED

5-6

A co
ontrol flow usu
ually defines a sequence of tasks
t
to be exeecuted. You deefine the seque
ence by conne
ecting
task
ks with precede
ence constrain
nts. These preccedence constrraints evaluatee the outcome
e of a task to
dete
ermine the flow
w of execution
n.

Con
ntrol Flow Conditions
C
You
u can define prrecedence con
nstraints for on
ne of three con
nditions:

Success the
e execution flo
ow to be follow
wed when a ta sk completes ssuccessfully. In
n the control flow
designer, succcess constraintts are shown as
a green arrow
ws.

Failure the execution flow


w to be follow
wed when a tassk fails. In the ccontrol flow de
esigner, failure
e
constraints arre shown as red arrows.

Completion the executio


on flow to be followed when
n a task compleetes, regardlesss of whether it
succeeds or fa
ails. In the con
ntrol flow desig
gner, completee constraints aare shown as b
blue arrows.

By using
u
these conditional precedence constrraints, you can define a conttrol flow that e
executes tasks based
on conditional
c
log
gic. For examp
ple, you could create a contrrol flow with th
he following stteps:
1.

An FTP task downloads


d
a file of sales data
a to a local fol der.

2.

If the FTP dow


wnload succee
eds, a Data Flow task import s the downloaaded data into a SQL Server
database. However, if the FTP download fails, a Send M
Mail task notifiees an administtrator that theres
been a proble
em.

3.

When the Data Flow task co


ompletes, rega
ardless of wheether it fails or succeeds, a File System taskk
deletes the fo
older where the customer da
ata file was dow
wnloaded.

Using Multiple Constraints

MCT USE ONLY. STUDENT USE PROHIBITED

10777A: Implementing a Data Warehouse with Microsoft SQL Server 2012

5-7

You can connect multiple precedence constraints to a single task. For example, a control flow might
include two Data Flow tasks, and a Send Mail task that you want to use to notify an administrator if
something goes wrong. To accomplish this, you could connect a failure precedence constraint from each
of the Data Flow tasks to the Send Mail task. However, you need to determine whether the notification
should be sent if either one of the Data Flow tasks fails, or only if both Data Flow tasks fail.

By default, when multiple precedence constraints are connected to a single task, a logical AND operation
is applied to the precedence condition, meaning that all of the precedence constraints must evaluate to
True in order to execute the connected task. In the example above, this means that the Send Mail task
would only be executed if both Data Flow tasks failed. In the control flow designer, logical AND
constraints are shown as solid arrows.
You can double-click a precedence constraint to edit it and configure it to use a logical OR operation, in
which case the connected task is executed if any of the connections evaluates to True. Setting the
constraints in the example above to use a logical OR operation would result in the Send Mail task being
executed if either (or both) of the Data Flow tasks failed. In the control flow designer, logical AND
constraints are shown as dotted arrows.

Implementing Control Flow in an


a SSIS Package

Gro
ouping an
nd Annotattions

MCT USE ONLY. STUDENT USE PROHIBITED

5-8

As your
y
control flo
ows become more
m
complex,, it can becom e difficult to in
nterpret the co
ontrol flow surrface.
The SSIS Designerr includes two features that can
c help SSIS d
developers wo
ork more efficiently.

Gro
ouping Task
ks

You
u can group multiple tasks on
n the design surface in ordeer to manage tthem as a single unit. A task
grouping is a desiign time only feature
f
and ha
as no effect on
n runtime behaavior. With a g
grouped set off tasks,
you can:

Move the tasks around the design surface


e as a single u nit.

Show or hide the individual tasks to make


e the best use of space on th
he screen.

To create
c
a group
p of tasks, selecct the tasks you want to gro up by draggin
ng around them
m or clicking tthem
while holding the CTRL key, and
d then right-click any of the selected taskss and click Gro
oup.

Adding Annottations

You
u can add anno
otations to the
e design surfacce to documen
nt your workflo
ow. An annotaation is a text-b
based
note
e that you can
n use to describ
be important features
f
of you
ur package deesign. To add aan annotation, rightclick
k the design su
urface and clicck Add Annota
ation. Then tyype the annotaation text.
Note You can add annotations to the Control Flow d
design surfacee, the Data Flow design
surface, and the Event Han
ndler design su
urface.

MCT USE ONLY. STUDENT USE PROHIBITED

10777A: Implementing a Datta Warehouse with M


Microsoft SQL Server 2012

Demonstra
D
ation: Implementing Control FFlow

X Task 1: Ad
dd tasks to a control flo
ow

5-9

1..

Ensure thatt the MIA-DC1


1 and MIA-SQLLBI virtual macchines are both
h running, and
d then log on tto
MIA-SQLBI as ADVENTU
UREWORKS\Sttudent with th
he password P
Pa$$w0rd.

2..

In the D:\10
0777A\Demofiles\Mod05 folder, run Setu p.cmd as Adm
ministrator, and
d then double
e-click
ControlFlo
owDemo.sln to
o open the solution in SQL SServer Data To
ools.

3..

In Solution Explorer, double-click Conttrol Flow.dtsx


x.

4..

From the SSIS Toolbox pa


ane, drag a File System Tassk to the contrrol flow surface
e. Then double
e-click
the File System Task and configure the following setttings:

5..

Name:: Delete Files

Operattion: Delete directory content

Source
eConnection: A new connecction with a Ussage type of C
Create folder,, and a Folderr value
of D:\1
10777A\Demo
ofiles\Mod05
5\Demo.

From the SSIS Toolbox pa


ane, drag a File System Tassk to the contrrol flow surface
e. Then double
e-click
the File System Task and configure the following setttings:

Name:: Delete Folderr

Operattion: Delete directory

Source
eConnection: Demo

6.

7.

8.

MCT USE ONLY. STUDENT USE PROHIBITED

5-10 Implementing Control Flow in an SSIS Package

From the SSIS Toolbox pane, drag a File System Task to the control flow surface. Then double-click
the File System Task and configure the following settings:

Name: Create Folder

Operation: Create directory

UseDirectoryIfExists: True

SourceConnection: Demo

From the SSIS Toolbox pane, drag a File System Task to the control flow surface. Then double-click
the File System Task and configure the following settings:

Name: Copy File

Operation: Copy file

DestinationConnection: Demo

OverwriteDestination: True

SourceConnection: A new connection with a Usage type of Existing file, and a File value of
D:\10777A\Demofiles\Mod05\Demo.txt.

From the SSIS Toolbox pane, drag a Send Mail Task to the control flow surface. Then double-click
the Send Mail Task and configure the following settings:

Name (on the General tab): Send Failure Notification

SmtpConnection (on the Mail tab): Create a new SMTP connection manager with a Name
property of Local SMTP Server and an SMTP Server property of localhost. Use the default
values for all other settings.

From (on the Mail tab): demo@adventureworks.msft

To (on the Mail tab): student@adventureworks.msft

Subject (on the Mail tab): Control Flow Failure

MessageSource (on the Mail tab): A task failed

X Task 2: Use precedence constraints to define a control flow


1.

Select the Delete Files task and drag its green arrow to the Delete Folder task. Then connect the
Delete Folder task to the Create Folder task and the Create Folder task to the Copy File task.

2.

Connect each of the file system tasks to the Send Failure Notification task.

3.

Right-click the connection between Delete Files and Delete Folder and click Completion.

4.

Right-click the connection between Delete Folder and Create Folder and click Completion.

5.

Click each of the connections between the file system tasks and the Send Failure Notification task
while holding the Ctrl key and press F4. Then in the Properties pane, set the Value property to
Failure.

6.

Click anywhere on the control flow surface to clear the current selection, and then double-click any of
the red constraints connected to the Send Failure Notification task. Then in the Precedence
Constraint Editor dialog box, in the Multiple constraints section, select Logical OR. One constraint
must evaluate to True, and click OK. Note that all connections to the Send Failure Notification
task are now dotted to indicate that a logical OR operation is applied.

MCT USE ONLY. STUDENT USE PROHIBITED

10777A: Implementing a Data Warehouse with Microsoft SQL Server 2012

5-11

7.

Right-click the control flow surface next to the Send Failure Notification task and click Add
Annotation. The type Send an email message if a task fails.

8.

Select the Delete Files and Delete Folder tasks, then right-click either of them and click Group. Drag
the group to rearrange the control flow so you can see that the Delete Folder task is still connected
to the Create Folder task.

9.

On the Debug menu, click Start Debugging to run the package, and note that the Delete Files and
Delete Folder tasks failed because the specified folder did not previously exist. This caused the Send
Failure Notification task to be executed.

10. You can view the email message that was sent by the Send Failure Notification task in the
C:\inetpub\mailroot\Drop folder. Double-click it to open it with Microsoft Outlook.

11. In SQL Server Data Tools, on the Debug menu, click Stop Debugging, and then run the package
again. This time all of the file system tasks should succeed because the folder was created during the
previous execution. Consequently, the Send Failure Notification task is not executed.

Using Multip
ple Packag
ges

While you can implement an SS


SIS solution tha
at includes on ly one package, most enterp
prise solutions
include multiple packages.
p
By dividing
d
your so
olution into m
multiple packag
ges, you can:

MCT USE ONLY. STUDENT USE PROHIBITED

5-12 Implemennting Control Flow inn an SSIS Package

Create reusab
ble units of wo
orkflow that ca
an be used mu ltiple times in a single ETL p
process.

Run multiple control flows in parallel, tak


king advantagee of multi-proccessing compu
uters and imprroving
the overall throughput of your
y
ETL proce
esses.

Separate data
a extraction wo
orkflows to suit data acquisi tion windows.

You
u can execute each
e
package independentlyy, and you can
n also use the EExecute Package task to run one
package from ano
other.

Creating
C
a Package Template
T

MCT USE ONLY. STUDENT USE PROHIBITED

10777A: Im
mplementing a Data Warehouse with Miccrosoft SQL Server 20012

5-13

SS
SIS developerss often need to
o create multip
ple similar pac kages. To makke the develop
pment process more
effficient, you ca
an use the follo
owing procedu
ure to create a package tem
mplate, which yyou can reuse tto
crreate multiple packages with
h pre-defined objects and seettings.
1..

2..

Create a pa
ackage that inccludes the elem
ments you wan
nt to reuse. Th
hese elements can include:

Connecction Managers

Tasks

Event Handlers
H

Parame
eters and Varia
ables

Save the pa
ackage to the DataTransforrmationItemss folder on you
ur developmen
nt workstation.

By defa
ault, this folder is located at C:\Program Fiiles (x86)\Micrrosoft Visual Sttudio 10.0
\Comm
mon7\IDE\Priva
ateAssemblies\ProjectItems\\DataTransform
mationProject..

3..

When you want


w
to reuse the package, add
a a new item
m to the projeect and select tthe package in
n the
Add New Item
I
dialog bo
ox.

4..

Change the
e Name and ID
D properties of
o the new pac kage to avoid naming conflicts.

Lesson 2

Creatin
ng Dyna
amic Pa
ackagess

MCT USE ONLY. STUDENT USE PROHIBITED

5-14 Implemennting Control Flow inn an SSIS Package

You
u can use variables, paramete
ers, and expresssions to makee your SSIS pacckages more d
dynamic. For
exam
mple, rather th
han hard codin
ng a database connection sttring or file patth in a data so
ource, you can
crea
ate a package that sets the value
v
dynamica
ally at run timee. This producces a more flexxible and reusaable
solu
ution and helps mitigate diffferences betwe
een the develo
opment and prroduction environments.
Thiss lesson describ
bes how you can
c create variables and paraameters, and u
use them in exxpressions.
Afte
er completing this lesson, yo
ou will be able to:

Create variab
bles in an SSIS solution.
s

Create param
meters in an SSIS solution.

Use expressio
ons in an SSIS solution.
s

Variables
V

MCT USE ONLY. STUDENT USE PROHIBITED

10777A: Im
mplementing a Data Warehouse with Miccrosoft SQL Server 20012

5-15

Yo
ou can use varriables to store
e values that a control flow u
uses at run tim
me. Variable values can chang
ge as
th
he package is executed
e
to re
eflect run time conditions. Fo
or example, a vvariable used tto store a file p
path
might
m
change depending
d
on the specific se
erver on which the package iis running. You
u can use variaables to:

Set propertty values for ta


asks and other objects.

Store an ite
erator or enum
merator value for
f a loop.

Set input an
nd output parameters for a SQL query.

Store resultts from a SQL query.


q

Implement conditional lo
ogic in an exprression.

SS
SIS packages can
c contain two kinds of variiables: user varriables and sysstem variables.

User
U
Variable
es

Yo
ou can define user variables to store dynamic values thaat your contro l flow uses. To create a variaable,
view the Variables pane in SSIS Designer an
nd click the Ad
dd Variable bu
utton. For each
h user variable
e, you
ca
an specify the following prop
perties:

Name: A name for the va


ariable. The co
ombination of name and nam
mespace must be unique witthin the
package. Note that variab
ble names are case-sensitivee.

Scope: The
e scope of the variable.
v
Varia
ables can accesssible through out the whole
e package, or sscoped
to a particu
ular container or
o task. You ca
annot set the sscope in the V
Variable pane, iit is determine
ed by
the object that
t
is selected
d when you cre
eate the variab
ble.

Data Type: The type of data


d
the variab
ble will hold, fo
or example str ing, datetime, or decimal.

Value: The initial value of the variable.

Namespacce: The namesp


pace within wh
hich the variab
ble name is unique. By defau
ult, user variab
bles are
defined in the
t User name
espace, but yo
ou can create aadditional nam
mespaces as req
quired.

Raise Change Event: Causes an event to be raised when the variable value changes. You can then
implement an event handler to perform some custom logic.

IncludeInDebugDump: Cases the variable value to be included in debug dump files.

System Variables
System variables store information about the running package and its objects, and are defined in the
System namespace. Some useful system variables include:

MachineName: The computer on which the package is running.

PackageName: The name of the package that is running.

StartTime: The time that the package started running.

UserName: The user who started the package.


Note For a full list of system variables, refer to the SQL Server Integration Services
documentation in SQL Server books Online.

MCT USE ONLY. STUDENT USE PROHIBITED

5-16 Implementing Control Flow in an SSIS Package

Parameters
P
s

MCT USE ONLY. STUDENT USE PROHIBITED

10777A: Im
mplementing a Data Warehouse with Miccrosoft SQL Server 20012

5-17

Yo
ou can use parrameters to pa
ass values to a project or pacckage at run tiime. When you
u define a parameter,
yo
ou can set a de
efault value, which
w
can be ovverridden wheen the packagee is executed iin a production
en
nvironment. Fo
or example, yo
ou could use a parameter to specify a dataabase connection string for a data
so
ource, and use
e one value during developm
ment, and a diffferent value w
when the proje
ect is deployed
d to a
production environment.
Pa
arameters havve three kinds of
o value:

Design deffault value: A default value assigned to th


he parameter iin the design e
environment.

Server default value: A default value assigned


a
to th e parameter d
during deploym
ment. This valu
ue
overrides th
he design defa
ault value.

Execution value: A value


e specified for a specific execcution of a pacckage. This value overrides b
both the
server default value and the
t design deffault value.

When
W
the proje
ect is deployed
d to an SSIS Ca
atalog, adminisstrators can deefine multiple environments for the
project and spe
ecify server deffault paramete
er values for eaach environmeent.
SS
SIS supports tw
wo kinds of pa
arameter:

Project para
ameters, which
h are defined at
a the project level and can be used in anyy packages witthin the
project.

Package pa
arameters, which are scoped at the packag
ge level and arre only availab
ble within the p
package
for which th
hey are define
ed.
Note Parameters are only
o
supported
d in the projecct deploymentt model. When
n the legacy
deployment model is ussed, you can se
et dynamic pacckage propertties by using p
package
ment is discusse
ed in Module 12: Deploying and Configuriing SSIS
configurattions. Deploym
Packages.

Exp
pressions

MCT USE ONLY. STUDENT USE PROHIBITED

5-18 Implemennting Control Flow inn an SSIS Package

SSIS
S provides a ricch expression language
l
that you can use tto set values fo
or numerous e
elements in an SSIS
package, including:

Properties

Conditional Split
S
transformation criteria

Derived Column transforma


ation values

Precedence constraint
c
cond
ditions

Expressions are ba
ased on Integrration Servicess expression syyntax, which usses similar funcctions and
keyw
words to comm
mon programming languages like Microso
oft C#. Expresssions can inclu
ude variables aand
para
ameters, enabling you to sett values dynam
mically based o
on specific run
n time conditio
ons.

For example, you could use an expression


e
in a Data Flow taask to specify tthe location off a file to be ussed as
a da
ata source. The
e following sam
mple code sho
ows an expresssion that concaatenates a parrameter contaiining
a fo
older path and a variable con
ntaining a file name to produ
uce a full file p
path.
@[$Project::fo
olderPath]+[@
@User::fName]
]

Notte that variable


e names are prrefixed with a @ symbol, and
d that the squ
uare brackets aare used to encclose
iden
ntifier names in order to sup
pport identifierrs with names containing spaaces. Also, notte that the fullyy
qualified parametter and variable names are used,
u
including
g the namespaace; and that th
he parameter name
is prrefixed with a $ symbol.
You
u can type expressions, or in many cases yo
ou can create tthem by using
g the Expressio
on Builder a
grap
phical tool tha
at enables you to create exprressions by draagging and drropping variab
bles, parameterrs,
constants, and fun
nctions. The Exxpression Build
der automaticaally adds prefixxes and text qualifiers for
variables and para
ameters, simpllifying the task
k of creating co
omplex expresssions.

Demonstra
D
ation: Using
g Variable
es and Paraameters

X Task 1: Crreate a varia


able

MCT USE ONLY. STUDENT USE PROHIBITED

10777A: Im
mplementing a Data Warehouse with Miccrosoft SQL Server 20012

5-19

1..

Ensure thatt the MIA-DC1


1 and MIA-SQLLBI virtual macchines are both
h running, and
d then log on tto
MIA-SQLBI as ADVENTU
UREWORKS\Sttudent with th
he password P
Pa$$w0rd.

2..

In the D:\10
0777A\Demofiles\Mod05 folder, double-cclick VariablessAndParameters.sln to ope
en the
solution in SQL Server Da
ata Tools.

3..

In Solution Explorer, double-click Conttrol Flow.dtsx


x.

4..

On the View
w menu, click Other Windo
ows, and click Variables.

5..

In the Varia
ables pane. Click the Add Va
ariable button
n and add a vaariable with the
e following
properties:

Name:: fName

Scope:: Control Flow

Data ty
ype: String

Value: Demo1.txt

X Task 2: Create a parameter


1.

In Solution Explorer, double-click Project.parameters.

2.

In the Project.params [Design] window, click the Add Parameter button and add a parameter with
the following properties:

3.

Name: folderPath

Data type: String

Value: D:\10777A\Demofiles\Mod05\Files\

Sensitive: False

Required: True

Description: Folder containing text files

Save all files and close the Project.params [Design] window.

X Task 3: Use a variable and a parameter in an expression

MCT USE ONLY. STUDENT USE PROHIBITED

5-20 Implementing Control Flow in an SSIS Package

1.

On the Control Flow.dtsx package design surface, in the Connection Managers pane, click the
Demo.txt connection manager and press F4.

2.

In the Properties pane, in the Expressions property box, click the ellipsis () button. Then in the
Property Expressions Editor dialog box, in the Property box, select ConnectionString and in the
Expression box, click the ellipsis button.

3.

In the Expression Builder dialog box, expand the Variables and Parameters folder, and drag the
$Project::folderPath parameters to the Expression box. Then in the Expression box, type a plus (+)
symbol. Then drag the User::fName variable to the Expression box to create the following
expression.
@[$Project::folderPath]+[@User::fName]

4.

In the Expression Builder dialog box, click Evaluate Expression and verify that the expression
produces the result D:\10777A\Demofiles\Mod05\Files\Demo1.txt. Then click OK to close the
Expression Builder dialog box, and in the Property Expressions Editor dialog box, click OK.

5.

Run the project, and when it has completed, stop debugging. Ignore the failure of the Delete Files
and Delete Folders tasks if the demo folder did not previously exist.

6.

View the contents of the D:\10777A\Demofiles\Mod05\Demo folder and verify that Demo1.txt has
been copied.

Lesson
n3

Using
g Containers

MCT USE ONLY. STUDENT USE PROHIBITED

10777A: Im
mplementing a Data Warehouse with Miccrosoft SQL Server 20012

5-21

Yo
ou can create containers in SSIS
S
packages to group relatted tasks togeether or define iterative proccesses.
Using containerrs in packages helps you create complex w
workflows and create a hieraarchy of executtion
sccopes that you
u can use to manage package behavior.
Th
his lesson desccribes the kind
ds of containerrs that are avaiilable and how
w to use them in an SSIS package
co
ontrol flow.
After completin
ng this lesson, you
y will be able to:

Describe th
he types of con
ntainer availab
ble in an SSIS p
package.

Use a Sequence containe


er to group rela
ated tasks.

Use a For Loop containerr to repeat a process until a sspecific condittion is met.

Use a Forea
ach Loop container to proce
ess items in an enumerated ccollection.

Inttroduction
n to Contaiiners

SSIS
S packages can
n contain the following
f
kindss of container::

MCT USE ONLY. STUDENT USE PROHIBITED

5-22 Implemennting Control Flow inn an SSIS Package

Task contain
ners: Each conttrol flow task has
h its own im plicit containeer.

Sequence co
ontainers: You can group tassks and other containers into
o a sequence ccontainer. Thiss
creates an exe
ecution hierarchy and enables you to set p
properties at tthe container level that apply to
all elements within
w
the container.

For Loop con


ntainers: You can use a For Loop containeer to perform aan iterative process until a
specified condition is met. For example, you
y could use a For Loop co
ontainer to exe
ecute the same
e task
a specific num
mber of times.

Foreach Loop containers: You can use a Foreach Loop


p container to
o perform an itterative task th
hat
processes eacch element in an
a enumerated collection. FFor example, yo
ou could use a Foreach Loop
p
container to execute
e
a Data
a Flow task tha
at imports dataa from each file in a specifie
ed folder into a
database.

ntainers can be
e start or endp
points for prece
edence constrraints and you can nest containers within o
other
Con
containers.

Sequence Containers
C
s

MCT USE ONLY. STUDENT USE PROHIBITED

10777A: Im
mplementing a Data Warehouse with Miccrosoft SQL Server 20012

5-23

Yo
ou can use a sequence conta
ainer to group
p tasks and oth
her containers together and define a subse
et of the
pa
ackage contro
ol flow. By using a sequence container, you
u can:

Manage properties for multiple


m
tasks as a unit.

Disable a lo
ogical subset of
o the package
e for debuggin
ng purposes.

Create a sco
ope for variables.

Manage tra
ansactions at a granular leve
el.

To
o create a sequ
uence container, drag the Se
equence Conttainer icon fro
om the SSIS To
oolbox pane to
o the
de
esign surface. Then drag the
e tasks and oth
her containers you want to in
nclude in the ssequence into the
se
equence conta
ainer.
Note In the design environment, the
e sequence co
ontainer behavves similarly to a grouped
set of task
ks. However, un
nlike a group, a sequence co
ontainer exists at run time an
nd its
propertiess can affect the
e behavior of the
t control flow
w.

De
emonstration: Using a Sequencce Contain
ner

X Task 1: Use a Sequence


e container

MCT USE ONLY. STUDENT USE PROHIBITED

5-24 Implemennting Control Flow inn an SSIS Package

1.

Ensure that th
he MIA-DC1 and MIA-SQLBII virtual machiines are both rrunning, and then log on to
MIA-SQLBI ass ADVENTURE
EWORKS\Stud
dent with the password Pa$
$$w0rd.

2.

In the D:\10777A\Demofile
es\Mod05 folde
er, double-clicck SequenceContainer.sln tto open the
solution in SQ
QL Server Data
a Tools.

3.

In Solution Exxplorer, double


e-click Contro
ol Flow.dtsx.

4.

Right-click the Group indica


ator around th
he Delete File
es and Delete Folder tasks aand click Ungrroup
to remove it.

5.

Drag a Seque
ence Containe
er from the SS
SIS Toolbox to the control flo
ow design surfface.

6.

Right-click the precedence constraint tha


at connects De
elete Files to S
Send Failure N
Notification, aand
click Delete. Then delete th
he precedence
e constraints co
onnecting Dellete Folder to Send Failure
Notification and Create Fo
older.

7.

Click and drag around the Delete Files and Delete Follder tasks to sselect them bo
oth, and then d
drag
them both into the sequence container.

MCT USE ONLY. STUDENT USE PROHIBITED

10777A: Implementing a Data Warehouse with Microsoft SQL Server 2012

5-25

8.

Drag a precedence constraint from the sequence container to Create Folder. Then right-click the
precedence constraint and click Completion.

9.

Drag a precedence constraint from the sequence container to Send Failure Notification. Then rightclick the precedence constraint and click Failure.

10. Run the package and view the results. Then stop debugging.

11. Click the sequence container and press F4. Then in the Properties pane, set the Disable property to
True.

12. Run the package and note that neither of the tasks in the sequence container is executed. Then stop
debugging.

For Loop Containers

MCT USE ONLY. STUDENT USE PROHIBITED

5-26 Implemennting Control Flow inn an SSIS Package

You
u can use a Forr Loop contain
ner to repeat a portion of thee control flow until a specificc condition is met.
For example, you could run a ta
ask a specified number of tim
mes.
ng
Con
nceptually, a Fo
or Loop container behaves similarly
s
to a F or loop constrruct in commo
on programmin
lang
guages such ass Microsoft C#
#. A For Loop container
c
uses the following expression-baased propertie
es to
dete
ermine the number of iterations it perform
ms:

An optional initialization exxpression, whicch sets a countter variable to


o an initial valu
ue.

An evaluation
n expression th
hat typically evvaluates a cou nter variable in order to exitt the loop whe
en it
matches a spe
ecific value.

An iteration expression
e
thatt typically mod
difies the valuee of a counter variable.

To use
u a For Loop
p container in a control flow,, drag the For Loop Contain
ner icon from the SSIS Toolb
box to
the control flow surface, and then double-clicck it to set the expression prroperties required to control the
num
mber of loop itterations. Then
n drag the task
ks and contain ers you want tto repeat into the For Loop
container on the control
c
flow su
urface.

Demonstra
D
ation: Using
g a For Loop Contaiiner

X Task 1: Usse a For Loo


op containerr

MCT USE ONLY. STUDENT USE PROHIBITED

10777A: Im
mplementing a Data Warehouse with Miccrosoft SQL Server 20012

5-27

1..

Ensure thatt the MIA-DC1


1 and MIA-SQLLBI virtual macchines are both
h running, and
d then log on tto
MIA-SQLBI as ADVENTU
UREWORKS\Sttudent with th
he password P
Pa$$w0rd.

2..

In the D:\10
0777A\Demofiles\Mod05 folder, double-cclick ForLoopC
Container.sln to open the so
olution
in SQL Servver Data Tools..

3..

In Solution Explorer, double-click Conttrol Flow.dtsx


x.

4..

On the View
w menu, click Other Windo
ows, and click Variables. Theen add a variaable with the
following properties:
p

Name:: counter

Scope:: Control Flow

Data ty
ype: int32

Value: 0

5..

From the SSIS Toolbox, drag a For loop


p Container to
o the control fflow design su
urface.

6..

Double-clicck the For Loop container an


nd set the follo
owing properties. Then click OK.

7..

InitExp
pression: @co
ounter = 1

EvalEx
xpression: @co
ounter < 4

Assign
nExpression: @counter
@
= @counter
@
+1

From the SSIS Toolbox, drag an Execute


e Process task and drop it in
n the For loop container.

8.

Double-click the Execute Process task and set the following properties, then click OK.

Name (on the General tab): Open File

Executable (on the Process tab): Notepad.exe

Expressions (on the Expressions tab): Use the Property Expressions Editor to set the following
expression for the Arguments property:
@[$Project::folderPath] + "Demo" + (DT_WSTR,1)@[User::counter] + ".txt"

Note This expression concatenates the folderPath parameter (which has a default value
of "D:\10777A\Demofiles\Mod05\Files\"), the literal text Demo, the value of the counter
variable (converted to a 1-character string using the DT_WSTR data type cast), and the
literal text .txt. Because the For Loop is configured to start the counter with a value of 1
and loop until it is no longer less than 4, this will result in the following arguments for the
Notepad executable:
D:\10777A\Demofiles\Mod05\Files\Demo1.txt
D:\10777A\Demofiles\Mod05\Files\Demo2.txt
D:\10777A\Demofiles\Mod05\Files\Demo3.txt
9.

MCT USE ONLY. STUDENT USE PROHIBITED

5-28 Implementing Control Flow in an SSIS Package

Drag a precedence constraint from the For Loop container to the Sequence container and rearrange
the control flow if necessary.

10. Run the package, and note that the For Loop starts Notepad three times, opening the text file with
the counter variable value in its name (Demo1.txt, Demo2.txt, and Demo3.txt). Close Notepad each
time it opens, and when the execution is complete, stop debugging.

Foreach Loop Containers

MCT USE ONLY. STUDENT USE PROHIBITED

10777A: Im
mplementing a Data Warehouse with Miccrosoft SQL Server 20012

5-29

Yo
ou can use a Foreach
F
Loop container
c
to pe
erform an iteraative process o
on each item in an enumeraated
co
ollection. SSIS supports the following
f
enum
merators in a FForeach Loop container:

ADO You
u can use this enumerator
e
to
o loop through
h elements of aan ADO objectt, for example records
in a Record
dset.

ADO.NET Schema
S
Rowsset You can use this enum erator to itera te through ob
bjects in an AD
DO.NET
schema, forr example tablles in a datasett or rows in a ttable.

File You can


c use this en
numerator to iterate through
h files in a fold
der.

Variable You can use this enumerato


or to iterate th rough elemen
nts in a variable
e that containss an
array.

Item You
u can use this enumerator
e
to iterate throug
gh a property collection for an SSIS objectt.

Nodelist You can use this


t enumerato
or to iterate th
hrough elemen
nts and attributes in an XML
document.

SMO You
u can use this enumerator
e
to
o iterate throug
gh a collection
n of SQL Serve
er Managemen
nt
Objects.

To
o use a Foreacch Loop contaiiner in a contro
ol flow:
1..

Drag the Fo
oreach Loop Container
C
icon from the SSIIS Toolbox to tthe control flo
ow surface.

2..

Double-clicck the ForEach Loop container and select tthe enumerato


or you want to
o use. Each
enumerator has specific properties
p
you
u need to set, ffor example th
he File enumerrator requires tthe path
to the folde
er containing the
t files you want
w
to iterate through.

3..

Specify the variable in wh


hich you want to store the e numerated co
ollection value during each itteration.

4..

Drag the ta
asks you want to perform du
uring each iteraation into the Foreach Loop
p container and
d
configure their propertiess appropriatelyy to reference the collection
n value variable
e.

De
emonstration: Using a Foreach
h Loop Con
ntainer

X Task 1: Use a Foreach loop contain


ner

MCT USE ONLY. STUDENT USE PROHIBITED

5-30 Implemennting Control Flow inn an SSIS Package

1.

Ensure that th
he MIA-DC1 and MIA-SQLBII virtual machiines are both rrunning, and then log on to
MIA-SQLBI ass ADVENTURE
EWORKS\Stud
dent with the password Pa$
$$w0rd.

2.

In the D:\10777A\Demofile
es\Mod05 folde
er, double-clicck ForeachLoo
opContainer.ssln to open the
solution in SQ
QL Server Data
a Tools.

3.

In Solution Exxplorer, double


e-click Contro
ol Flow.dtsx.

4.

From the SSIS


S Toolbox, drag a Foreach lo
oop Containe
er to the contrrol flow design
n surface. Then
n
double-click the
t Foreach lo
oop container to
t view the Fo
oreach Loop E
Editor dialog b
box.

5.

On the Collecction tab, ensure Foreach File


F Enumerat or is selected, and in the Ex
xpressions boxx, click
the ellipsis bu
utton. Then in the Property Expressions EEditor dialog box, in the Pro
operty list, sellect
Directory and in the Expre
ession box clicck the ellipsis b
button.

6.

In the Expresssion Builder dialog box, exxpand the Variiables and Pa


arameters fold
der and drag th
he
$Project::folderPath param
meter to the Expression
E
bo
ox to specify th
hat the loop sh
hould iterate
through files in the folder referenced
r
by the
t folderPatth project paraameter. Then cclick OK.

7.

In the Foreacch loop Editorr dialog box, on


o the Collectiion tab, in thee Retrieve file
e name section
n,
select Name and extensio
on to return the
e file name an
nd extension fo
or each file the
e loop finds in the
folder.

8.

In the Foreacch loop Editorr dialog box, on


o the Variablle Mappings ttab, in the Varriable list, sele
ect
User::fName
e and in the Ind
dex column se
elect 0 to assig
gn the file nam
me of each file found in the ffolder
to the fName
e variable. The
en click OK.

9.

Remove the precedence


p
co
onstraints that are connected
d to and from the Copy File
e task, and then
n
drag the Cop
py File task into the Foreach loop containeer.

MCT USE ONLY. STUDENT USE PROHIBITED

10777A: Implementing a Data Warehouse with Microsoft SQL Server 2012

5-31

10. Create a success precedence constraint from the Create Folder task to the Foreach Loop container,
and a failure precedence constraint from the Foreach Loop container to the Send Failure
Notification task.
11. Run the package, closing each instance of Notepad as it opens. When the package execution has
completed, stop debugging and verify that the D:\10777A\Demofiles\Mod05\Demo folder contains
each of the files in the D:\10777A\Demofiles\Mod05\Files folder.

Lab
b Scenario
o

MCT USE ONLY. STUDENT USE PROHIBITED

5-32 Implemennting Control Flow inn an SSIS Package

In th
his lab, you will continue to develop the ET
TL solution forr the Adventurre Works data warehouse. Y
You
have created data
a flows to extra
act customer and
a sales orde r data and loaad it into the sttaging database.
Now
w you must en
ncapsulate thesse data flows in a control flo
ow that executtes the data flo
ows and notifie
es an
ure occurs.
ope
erator by e-ma
ail when the da
ata flows succe
eed, or if a failu

As well
w as the Inte
ernet and reseller sales data that the solutiion currently p
processes, Advventure Works has
an accounts
a
system that records payments fro
om resellers. D
Details of thesee payments are
e exported to
com
mma-delimited
d files, and you
u need to inclu
ude this data in
n the ETL soluttion. Because tthe location an
nd file
nam
mes of these files may change in the future
e, you must creeate a packagee that can be e
easily adapted
d. You
have decided to use
u a project-le
evel paramete
er for the foldeer path and a vvariable for the
e file name so that
your package can
n determine the complete file
e path dynam ically at run-time when load
ding the payments
data
a into the stag
ging database.
Havving used precedence constrraints to define
e a control flow
w for the custo
omer and Internet sales orde
er
data
a flows, you no
ow want to combine the datta flows into a discrete sequeence so that th
hey can be
configured as a unit. The tasks to
t send an e-m
mail notificatio
on will remain outside of the
e sequence and
d will
be executed
e
on su
uccess or failurre of the seque
ence as a who
ole.

Fina
ally, you have created
c
a conttrol flow that lo
oads data from
m a single payyments file, butt the accountss
system actually exxports a file for each countryy where Adven
nture Works haas resellers. Yo
ou must modiffy the
control flow to ite
erate through all of the paym
ments files in t he folder and load them all into the stagin
ng
data
abase.

Lab 5A:
5 Impllementiing Con
ntrol Flo
ow in an
n SSIS P
Package
e

Exercise 1: Using Tasks and Pre


ecedence iin a Contro
ol Flow
Scenario

MCT USE ONLY. STUDENT USE PROHIBITED

10777A: Im
mplementing a Data Warehouse with Miccrosoft SQL Server 20012

5-33

Yo
ou have imple
emented data flows
f
to extracct data and loaad it into a staging database
e as part of the
e ETL
process for your data warehousing solution
n. Now you waant to coordinaate these data flows by
im
mplementing a control flow that
t
notifies an operator of tthe outcome o
of the process.
Th
he main tasks for this exercisse are as follow
ws:
1..

Prepare the
e lab environm
ment.

2..

Examine an
n existing pack
kage.

3..

Add tasks to a package control flow.

4..

Test the control flow.

X Task 1: Prrepare the la


ab environm
ment

Ensure the MIA-DC1 and MIA-SQLBI virtual machinees are both run
nning, and then log on to
MIA-SQLBI as ADVENTU
UREWORKS\Sttudent with th
he password P
Pa$$w0rd.

Run the Settup Windows Command Scrript file (Setup


p.cmd) in the D
D:\10777A\Lab
bfiles\Lab05A\SStarter
folder as Ad
dministrator.

X Task 2: Ex
xamine an ex
xisting pack
kage

Open the AdventureWo


A
orksETL.sln solution in the D
D:\10777A\Lab
bfiles\Lab05A\SStarter\Ex1 folder.

Open the Extract


E
Reselle
er Data.dtsx package
p
and eexamine its con
ntrol flow. Notte that it contaains two
Send Mail tasks one th
hat runs when either the Extrract Resellerss task or the Ex
xtract Reselle
er Sales
task fails, an
nd one that ru
uns when the Extract
E
Reselle
er Sales task ssucceeds.

MCT USE ONLY. STUDENT USE PROHIBITED

5-34 Implementing Control Flow in an SSIS Package

Examine the settings for the precedence constraint connecting the Extract Resellers task to the Send
Failure Notification task to determine the conditions under which this task will be executed.

Examine the settings for the Send Mail tasks, noting that they both use the Local SMTP Server
connection manager.

Examine the settings of the Local SMTP Server connection manager.

On the Debug menu, click Start Debugging to run the package, and observe the control flow as the
task executes. Then, when the task has completed, on the Debug menu, click Stop Debugging.

In the C:\inetpub\mailroot\Drop folder, double-click the most recent file to open it in Microsoft
Outlook. Then read the email message and close Microsoft Outlook.

X Task 3: Add tasks to a package control flow

Open the Extract Internet Sales Data.dtsx package and examine its control flow.

Add a Send Mail task to the control flow, configure it with the following settings, and create a
precedence constraint that runs this task if the Extract Internet Sales task succeeds.

Name: Send Success Notification

SmtpConnection: A new SMTP Connection Manager named Local SMTP Server that connects
to the localhost SMTP server

From: ETL@adventureworks.msft

To: Student@adventureworks.msft

Subject: Data Extraction Notification

MessageSourceType: Direct Input

MessageSource: The Internet Sales data was successfully extracted

Priority: Normal

Add a second Send Mail task to the control flow, configure it with the following settings, and create a
precedence constraint that runs this task if either the Extract Customers or Extract Internet Sales
task fails.

Name: Send Failure Notification

SmtpConnection: The Local SMTP Server connection manager you created previously.

From: ETL@adventureworks.msft

To: Student@adventureworks.msft

Subject: Data Extraction Notification

MessageSourceType: Direct Input

MessageSource: The Internet Sales data extraction process failed

Priority: High

Verify that your control flow looks is similar to the following.

X Task 4: Test the control flow

MCT USE ONLY. STUDENT USE PROHIBITED

10777A: Implementing a Data Warehouse with Microsoft SQL Server 2012

5-35

Set the ForceExecutionResult property of the Extract Customers task to Failure. Then run the
package and observe the control flow.

When package execution is complete, stop debugging and verify that the failure notification email
message has been delivered to the C:\inetpub\mailroot\Drop folder. You can double-click the email
message to open it in Microsoft Outlook.

Set the ForceExecutionResult property of the Extract Customers task to None. Then run the
package and observe the control flow.

When package execution is complete, stop debugging and verify that the success notification email
message has been delivered to the C:\inetpub\mailroot\Drop folder.

Close the AdventureWorksETL project when you have completed the exercise.

Results: After this exercise, you should have a control flow that sends an email message if the Extract
Internet Sales task succeeds, or sends an email message if either the Extract Customers or Extract
Internet Sales fails.

Exercise 2: Using Variables and Parameters


Scenario

MCT USE ONLY. STUDENT USE PROHIBITED

5-36 Implementing Control Flow in an SSIS Package

You need to enhance your ETL solution to include the staging of payments data that is generated in
comma-separated value (CSV) format from a financial accounts system. You have implemented a simple
data flow that reads data from a CSV file and loads it into the staging database, and you must now modify
the package to construct the folder path and file name for the CSV file dynamically at run time instead of
relying on a hard-coded name in the settings of the Data Flow task.
The main tasks for this exercise are as follows:
1.

Examine an existing package.

2.

Create a variable.

3.

Create a parameter.

4.

Use a variable and a parameter in an expression.

X Task 1: Examine an existing package

View the contents of the D:\10777A\Accounts folder and note the files it contains. In this exercise, you
will modify an existing package to create a dynamic reference to one of these files.

Open the AdventureWorksETL.sln solution in the D:\10777A\Labfiles\Lab05A\Starter\Ex2 folder.

Open the Extract Payment Data.dtsx package and examine its control flow. Note that it contains a
single Data Flow task named Extract Payments.

View the Extract Payments data flow and note that it contains a flat file source named Payments
File, and an OLE DB destination named Staging DB.

View the settings of the Payments File source and note that it uses a connection manager named
Payments File.

In the Connection Managers pane, double-click Payments File, and note that it references the
Payments.csv file in the D:\10777A\Labfiles\Lab05A\Starter\Ex2 folder. This file has the same
data structure as the payments file in the D:\10777A\Accounts folder.

Run the package, and stop debugging when it has completed.

On the Execution Results tab, find the following line in the package execution log.
[Payments File [2]] Information: The processing of the file
D:\10777A\Labfiles\Lab05A\Starter\Ex2\Payments.csv has started

X Task 2: Create a variable

Add a variable with the following properties to the package:

Name: fName

Scope: Extract Payments Data

Data type: String

Value: Payments - US.csv


Note The value includes a space on either side of the - character.

X Task 3: Create a parameter

Add a project parameter with the following settings:

Name: AccountsFolderPath

Data type: String

Value: D:\10777A\Accounts\

Sensitive: False

Required: True

Description: path to accounts files

X Task 4: Use a variable and a parameter in an expression

MCT USE ONLY. STUDENT USE PROHIBITED

10777A: Implementing a Data Warehouse with Microsoft SQL Server 2012

5-37

Select the Payments File connection manager and view its properties in the Properties pane.

Click the ellipsis () button for the Expressions property to open the Property Expressions Editor
dialog box, and then set the ConnectionString property to an expression that concatenates the
AccountsFolderPath parameter and fName variable. Your expression should look like the following.
@[$Project::AccountsFolderPath]+ @[User::fName]

Run the package and view the execution results to verify that the data in the
D:\10777A\Accounts\Payments - US.csv file was loaded.

Close SQL Server Data Tools when you have completed the exercise.

Results: After this exercise, you should have a package that loads data from a text file based on a
parameter that specifies the folder path where the file is stored, and a variable that specifies the file name.

Exercise 3: Using Containers


Scenario

MCT USE ONLY. STUDENT USE PROHIBITED

5-38 Implementing Control Flow in an SSIS Package

You have created a control flow that loads Internet Sales data and sends a notification email message to
indicate whether the process succeeded or failed. You now want to encapsulate the data flow tasks for
this control flow in a sequence container so you can manage them as a single unit.

You have also successfully created a package that loads payments data from a single CSV file based on a
dynamically derived folder path and file name. Now you must extend this solution to iterate through all of
the files in the folder and import data from each of them.
The main tasks for this exercise are as follows:
1.

Add a Sequence Container to a package control flow.

2.

Add a Foreach Loop Container to a package control flow.

X Task 1: Add a Sequence Container to a package control flow

Open the AdventureWorksETL solution in the D:\10777A\Labfiles\Lab05A\Starter\Ex3 folder.

Open the Extract Internet Sales Data.dtsx package and modify its control flow so that:

The Extract Customers and Extract Internet Sales tasks are contained in a Sequence container
named Extract Customer Sales Data.

The Send Failure Notification task is executed if the Extract Customer Sales Data container
fails.

The Send Success Notification task is executed if the Extract Customer Sales Data container
succeeds.

Your completed control flow should look like the following.

MCT USE ONLY. STUDENT USE PROHIBITED

10777A: Implementing a Data Warehouse with Microsoft SQL Server 2012

5-39

Run the package to verify that it successfully completes both Data Flow tasks in the sequence and
then executes the Send Success Notification task.

X Task 2: Add a Foreach Loop Container to a package control flow

Open the Extract Payment Data.dtsx package.

Add a Foreach Loop container to the control flow and drag the existing Extract Payments Data Flow
task into the Foreach Loop container.

Configure the Foreach Loop container with the following settings on the Collection tab of the
Foreach Loop Editor dialog box:

Enumerator: Foreach File Enumerator

Expressions: Set the Directory property to an expression that references the


AccountsFolderPath project parameter.

Folder: C:\

Files: *.*

Retrieve File name: Name and extension

Traverse subfolders: Not selected

Note The value you specify for the Folder property will be overridden by the expression
you have set for the Directory property.

The Collection tab of the Foreach Loop Editor dialog box should look like the following.

MCT USE ONLY. STUDENT USE PROHIBITED

5-40 Implementing Control Flow in an SSIS Package

On the Variable Mappings tab, add the fname user variable and map it to index 0 (which represents
the file name and extension value retrieved by the Foreach File Enumerator).

Run the package and count the number of times the Foreach loop is executed.

When execution has completed, stop debugging and view the execution results to verify that all files
in the D:\10777A\Accounts folder were processed.

Close SQL Server Data Tools when you have completed the exercise.

Results: After this exercise, you should have a package that encapsulates two data flow tasks in a
sequence container, and another package that uses a Foreach loop to iterate through the files in a folder
specified in a parameter and uses a Data Flow task to load their contents into a database.

Lesson
n4

Mana
aging Co
onsistency

MCT USE ONLY. STUDENT USE PROHIBITED

10777A: Im
mplementing a Data Warehouse with Miccrosoft SQL Server 20012

SS
SIS solutions are generally used to transferr data from on
ne location to another. Often
n, the overall SSSIS
so
olution can incclude multiple data flows and
d operations, aand it may be important to ensure that th
he
process always results in data that is a consistent state, evven if some paarts of the proccess fail.
Th
his lesson discusses techniqu
ues for ensurin
ng data consisttency when paackages fail.
After completin
ng this lesson, you
y will be able to:

Configure failure
f
behavio
or for tasks, containers, and p
packages.

Use transacctions to enforrce data consisstency.

Use checkp
points to restarrt failed packag
ges.

5-41

Co
onfiguring Failure Be
ehavior

MCT USE ONLY. STUDENT USE PROHIBITED

5-42 Implemennting Control Flow inn an SSIS Package

An SSIS
S
package control
c
flow ca
an contain nessted hierarchiees of containerrs and tasks, an
nd you can use
e the
follo
owing properties to control how a failure in
i one elemen
nt of the contro
ol flow determ
mines the overaall
outccome of the package.

FailPackageO
OnFailure: Wh
hen set to True
e, the failure o
of the task or ccontainer results in the failurre of
the package in
i which it is defined.
d
The de
efault value fo r this propertyy is False.

FailParentOn
nFailure: When set to True, the failure of the task or co ntainer resultss in the failure of its
container. If the
t item with this
t property iss not in a conttainer, then its parent is the package; in which
case this prop
perty has the same
s
effect as the FailPackaageOnFailure property. Whe
en setting this
property on a package thatt is executed by
b an Execute P
Package task in another pacckage, a value of
True causes the
t calling pacckage to fail if this package ffails. The defau
ult value for th
his property is False.

MaximumErrorCount: This property spe


ecifies the maxximum numbeer of errors that can occur be
efore
the item fails.. The default value
v
for this property is 1.

You
u can use these
e properties to
o achieve fine-g
grained contro
ol of package behavior in th
he event of an error
thatt causes a task
k to fail.

Using
U
Transactions

MCT USE ONLY. STUDENT USE PROHIBITED

10777A: Im
mplementing a Data Warehouse with Miccrosoft SQL Server 20012

5-43

Trransactions ensure that all data changes in


n a control flow
w either succeeed or fail as a single, atomicc unit of
work.
w
When tasks are enlisted
d in a transaction, a failure off any single taask causes the failure of all taasks in
th
he transaction ensuring tha
at the data affe
ected by the ccontrol flow reemains in a con
nsistent state w
with no
pa
artial data modifications.
A task, containe
er, or packages participation
n in a transactiion is determin
ned by its Tran
nsactionOptio
on
property, which
h you can set to
o one of three
e possible valu es:

Required this executab


ble requires a transaction,
t
an
nd will create a new one if no
one exists.

Supported
d this executa
able will enlist in a transactio
on if its parentt is participatin
ng in one.

NotSupported this exe


ecutable does not support trransactions an d will not enlisst in an existin
ng
transaction.

SS
SIS Transaction
ns rely on the Microsoft Disttributed Transaaction Coordin
nator (MSDTC)), a system
co
omponent that coordinates transactions across multiple data sources. An error will o
occur if an SSISS
pa
ackage attemp
pts to start a trransaction whe
en the MSDTC
C service is nott running.

MCT USE ONLY. STUDENT USE PROHIBITED

5-44 Implementing Control Flow in an SSIS Package

SSIS supports multiple concurrent transactions within a single hierarchy of packages, containers, and tasks;
but it does not support nested transactions. To understand how multiple transactions behave in a
hierarchy, consider the following facts:

If a container with a TransactionOption value of Required includes a container with a


TransactionOption of NotSupported, the child container will not participate in the parent
transaction.

If the child container includes a task with a TransactionOption value of Supported, the task will not
participate in the existing transaction.

If the child container contains a task with a TransactionOption value of Required, the task will start
a new transaction. However, the new transaction is unrelated to the existing transaction, and the
outcome of one transaction will have no effect on the other.

Demonstra
D
ation: Using
g a Transa
action

X Task 1: Usse a transacttion

MCT USE ONLY. STUDENT USE PROHIBITED

10777A: Im
mplementing a Data Warehouse with Miccrosoft SQL Server 20012

5-45

1..

Ensure thatt the MIA-DC1


1 and MIA-SQLLBI virtual macchines are both
h running, and
d then log on tto
MIA-SQLBI as ADVENTU
UREWORKS\Sttudent with th
he password P
Pa$$w0rd.

2..

In the D:\10
0777A\Demofiles\Mod05 folder, right-clicck Setup.cmd and click Run as administrrator.
Click Yes when
w
prompted
d to confirm th
he action.

3..

Start SQL Server Management Studio and


a connect to
o the localhosst database engine instance using
Windows
authenticatio
on. Then view the
t contents o
of the dbo.StagingTable and
dbo.Produ
uctionTable ta
ables in the De
emoDW datab
base, noting th
hat dbo.Stagin
ngTable contaains
product data and dbo.PrroductionTab
ble is empty.

4..

In the D:\10
0777A\Demofiles\Mod05 folder, double-cclick Transactiions.sln to ope
en the solution in
SQL Server Data Tools.

5..

In Solution Explorer, double-click Move


e Products.dttsx. Note that the control flo
ow consists of a Data
Flow task th
hat moves pro
oducts from a staging
s
table tto a productio
on table, and a SQL Comman
nd task
that sets the price for the
e products.

6..

Run the package and notte that the SQLL Command taask fails. Then stop debugging.

7..

In SQL Servver Manageme


ent Studio, view
w the contentss of the dbo.S
StagingTable and
dbo.Produ
uctionTable ta
ables in the De
emoDW datab
base, noting th
hat dbo.Produ
uctionTable n
now
contains prroduct data bu
ut the prices arre all set to 0.000. You want to
o avoid having
g products witth
invalid prices in the production table, so
s you need to
o modify the SSIS package to
o ensure that w
when
the price up
pdate task fails, the production table remaains empty.

8..

In the D:\10
0777A\Demofiles\Mod05 folder, run Setu p.cmd as Adm
ministrator agaain to reset the
e
database.

9..

In SQL Servver Data Tools,, click anywherre on the conttrol flow surfacce and press F4
4. Then in the
Properties pane,
p
set the TransactionO
T
ption propertty to Required
d.

MCT USE ONLY. STUDENT USE PROHIBITED

5-46 Implementing Control Flow in an SSIS Package

10. Click the Copy Products task, and in the Properties pane, set the FailPackageOnFailure property to
True and ensure the TransactionOption property is set to Supported.
11. Repeat the previous step for the Update Prices task.
12. Run the package and note that the SQL Command task fails again. Then stop debugging.

13. In SQL Server Management Studio, view the contents of the dbo.StagingTable and
dbo.ProductionTable tables in the DemoDW database, noting that dbo.ProductionTable is empty,
even though the Copy Products task succeeded. The transaction has rolled back the changes to the
production table because the Update Prices task failed.
14. In SQL Server Data Tools, double-click the Update Prices task and change the SQL Statement
property to UPDATE ProductionTable SET Price = 100. Then click OK.
15. Run the package and note that all tasks succeed. Then stop debugging.
16. In SQL Server Management Studio, view the contents of the dbo.StagingTable and
dbo.ProductionTable tables in the DemoDW database, noting that dbo.ProductionTable now
contains products with valid prices.

Using
U
Checckpoints

MCT USE ONLY. STUDENT USE PROHIBITED

10777A: Im
mplementing a Data Warehouse with Miccrosoft SQL Server 20012

5-47

Another way yo
ou can manage
e data consiste
ency is to use ccheckpoints. C
Checkpoints en
nable you to re
estart
a failed package
e after the issu
ue that caused it to fail has b
been resolved. Any tasks that were previou
usly
co
ompleted succcessfully are ignored, and the
e execution reesumes at the p
point in the co
ontrol flow where the
pa
ackage failed. While checkpo
oints do not offer
o
the same level of atomic consistency as a transactio
on, they
ca
an provide a useful solution when a contro
ol flow includees a long-runn
ning or resourcce-intensive task that
yo
ou do not wish
h to repeat unnecessarily, such as downloaading a large ffile from an FT
TP server.
Checkpoints wo
ork by saving information ab
bout work in p rogress to a ch
heckpoint file. When a failed
d
pa
ackage is resta
arted, the checckpoint file is used
u
to identiffy where in thee control flow to resume exe
ecution.
To
o enable a pacckage to use checkpoints, yo
ou must set thee following properties of the
e package:

Checkpoin
ntFileName: Th
he full file path
h where you w
want to save th
he checkpoint file.

SaveCheck
kpoints: A Boo
olean value use
ed to specify w
whether or nott the package should save
checkpoint information to the checkpo
oint file.

Checkpoin
ntUsage: An en
numeration wiith one of the following valu
ues:

Always: The package


e will always lo
ook for a checkkpoint file wheen starting. If n
none exists, th
he
packag
ge will fail with
h an error.

Never:: The package will never use a checkpoint file to resumee execution and will always b
begin
execution with the first task in the control flow.

oint file exists, the package w


will use it to resume where itt failed previou
usly. If
IfExists: If a checkpo
no che
eckpoint file exxists, the packa
age will begin execution with
h the first taskk in the control flow.

De
emonstration: Using a Checkpo
oint

X Task 1: Use a checkpoint

MCT USE ONLY. STUDENT USE PROHIBITED

5-48 Implemennting Control Flow inn an SSIS Package

1.

Ensure that th
he MIA-DC1 and MIA-SQLBII virtual machiines are both rrunning, and then log on to
MIA-SQLBI ass ADVENTURE
EWORKS\Stud
dent with the password Pa$
$$w0rd.

2.

In the D:\10777A\Demofile
es\Mod05 folde
er, right-click SSetup.cmd an
nd click Run ass administrattor.
Click Yes whe
en prompted to
t confirm the action.

3.

Start SQL Servver Management Studio and


d connect to th
he localhost d
database engin
ne instance ussing
Windows authentication. Th
hen, in the De
emoDW datab
base, view the ccontents of the
dbo.StagingTable, noting that it contain
ns data about tthree productts.

4.

In the D:\10777A\Demofile
es\Mod05 folde
er, double-clicck Products.cssv to open it w
with Microsoft Excel,
and note thatt it contains de
etails for three
e more produccts. Then close Excel.

5.

In the D:\10777A\Demofile
es\Mod05 folde
er, double-clicck Checkpointts.sln to open the solution in
SQL Server Data Tools.

6.

In Solution Exxplorer, double


e-click Load Data.dtsx.
D
Notte that the con
ntrol flow conssists of a file syystem
task to create
e a folder, a seccond file syste
em task to cop
py the productss file to the ne
ew folder, and Data
Flow task that loads the datta in the produ
ucts file into th
he staging tab
ble.

7.

d press F4. The


en in the Properties
Click anywhere on the conttrol flow surfacce to select thee package, and
pane, set the following properties:

Checkpo
ointFileName:: D:\10777A\D
Demofiles\Mod
d05\Checkpoin
nt.chk

Checkpo
ointUsage: IfExxists

SaveChe
eckpoints: True

8.

Set the FailPa


ackageOnFaillure property for
f all three taasks in the control flow to Trrue.

9.

Run the package, and note that the data flow task fails . Then stop deebugging.

MCT USE ONLY. STUDENT USE PROHIBITED

10777A: Implementing a Data Warehouse with Microsoft SQL Server 2012

5-49

10. In the D:\10777A\Demofiles\Mod05 folder, note that a file named Checkpoint.chk has been created,
and that the file system tasks that succeeded have created a folder named Data and copied the
Products.csv file into it.
11. In SQL Server Data Tools, view the Data Flow tab for the Load to Staging Table task, and doubleclick the Derive Columns transformation. Then change the expression for the NewPrice column to
100 and click OK.
12. View the Control Flow tab, and then run the package. Note that the Create Folder and Copy File
tasks, which succeeded previously, are not re-executed. Only the Load to Staging Table task is
executed.

13. Stop debugging, and verify that the Checkpoint.chk file has been deleted now that the package has
been executed successfully.

14. In SQL Server Management Studio, view the contents of the dbo.StagingTable, and note that it now
contains data about six products.

Lab
b Scenario
o

In th
his lab, you will continue to develop the Adventure
A
Worrks ETL solutio
on.

MCT USE ONLY. STUDENT USE PROHIBITED

5-50 Implemennting Control Flow inn an SSIS Package

The current solution extracts Intternet custome


er and sales orrder data by eexecuting a seq
quence of two data
flow
w tasks. Howevver, you are concerned that it
i is possible th
hat one of these tasks mightt fail, leaving yyou
with
h a partially loa
aded staging database.
d
To avoid
a
this, you intend modifyy the control fflow so that the
sequ
uence containing the custom
mer data flow and the Intern
net sales orderr data flow is e
executed as a
tran
nsaction.

A similar problem
m exists with the reseller saless data extractio
on, but in this case you want to take a diffferent
app
proach. The data flow that exxtracts the rese
eller data has tthe potential tto be a long-ru
unning operation,
and you want to avoid
a
repeatin
ng it in the eve
ent that the sub
bsequent dataa flow to extracct reseller sales
orde
ers fails. To acccomplish this, you intend to use a checkpo
oint so that if the reseller sales order extraaction
failss, you can reso
olve the proble
em and restart the package w
without having
g to re-extractt the reseller data.

Lab 5B:
5 Using Transsactionss and Ch
heckpoints

Exercise 1: Using Tran


nsactions
Scenario

MCT USE ONLY. STUDENT USE PROHIBITED

10777A: Im
mplementing a Data Warehouse with Miccrosoft SQL Server 20012

5-51

Yo
ou have create
ed an SSIS pacckage that usess two data flow
ws to extract, ttransform, and
d load Internett sales
da
ata. You now want
w
to ensure
e that package
e execution alw
ways results in a consistent d
data state, so tthat if
an
ny of the data flows fail, no data
d
is loaded.
Th
he main tasks for this exercisse are as follow
ws:
1..

Prepare the
e lab environm
ment.

2..

View the da
ata in the data
abase.

3..

Run a package to stage Internet sales data.


d

4..

Implement a transaction.

5..

Observe tra
ansaction beha
avior.

X Task 1: Prrepare the la


ab environm
ment

Ensure the MIA-DC1 and MIA-SQLBI virtual machinees are both run
nning, and then log on to
MIA-SQLBI as ADVENTU
UREWORKS\Sttudent with th
he password P
Pa$$w0rd.

Run the Settup Windows Command Scrript file (Setup


p.cmd) in the D
D:\10777A\Lab
bfiles\Lab05B\SStarter
folder as Ad
dministrator.

X Task 2: View the data in the database

MCT USE ONLY. STUDENT USE PROHIBITED

5-52 Implementing Control Flow in an SSIS Package

Start SQL Server Management Studio and connect to the localhost database engine instance by
using Windows authentication.

In the Staging database, view the contents of the dbo.Customers and dbo.InternetSales tables to
verify that they are both empty.

X Task 3: Run a package to stage Internet sales data

Open the AdventureWorksETL.sln solution in the D:\10777A\Labfiles\Lab05B\Starter\Ex1 folder.

Open the Extract Internet Sales Data.dtsx package and examine its control flow.

Run the package, noting that the Extract Customers task succeeds, but the Extract Internet Sales
task fails. When execution is complete, stop debugging.

In SQL Server Management Studio, verify that the InternetSales table is still empty, but the
Customers table now contains customer records.

In SQL Server Management Studio, execute the following Transact-SQL query to reset the staging
tables.
TRUNCATE TABLE Staging.dbo.Customers

X Task 4: Implement a transaction

Configure the Extract Customer Sales Data sequence container in the Extract Internet Sales
Data.dtsx package so that it requires a transaction.

Ensure that the Extract Customers and Extract Internet Sales tasks both support transactions, and
configure them so that if they fail, their parent also fails.

X Task 5: Observe transaction behavior

Run the Extract Internet Sales Data.dtsx package, noting once again that the Extract Customers
task succeeds, but the Extract Internet Sales task fails. Note also that the Extract Customer Sales
Data sequence container fails. When execution is complete, stop debugging.

In SQL Server Management Studio, verify that both the InternetSales and Customers tables are
empty.

View the data flow for the Extract Internet Sales task, and modify the expression in the Calculate
Sales Amount derived column transformation to remove the text / (OrderQuantity %
OrderQuantity). The completed expression should match the following code sample.
UnitPrice * OrderQuantity

Run the Extract Internet Sales Data.dtsx package, noting that the Extract Customers and Extract
Internet Sales tasks both succeed. When execution is complete, stop debugging.

In SQL Server Management Studio, verify that both the InternetSales and Customers tables contain
data.

Close the AdventureWorksETL project when you have completed the exercise.

Results: After this exercise, you should have a package that uses a transaction to ensure that all data flow
tasks succeed or fail as an atomic unit of work.

Exercise 2: Using Checkpoints


Scenario

MCT USE ONLY. STUDENT USE PROHIBITED

10777A: Implementing a Data Warehouse with Microsoft SQL Server 2012

5-53

You have created an SSIS package that uses two data flows to extract, transform, and load reseller sales
data. You now want to ensure that if any task in the package fails, it can be restarted without re-executing
the tasks that had previously succeeded.
The main tasks for this exercise are as follows:
1.

View the data in the database.

2.

Run a package to stage reseller sales data.

3.

Implement checkpoints.

4.

Observe checkpoint behavior.

X Task 1: View the data in the database

Use SQL Server Management Studio to view the contents of the dbo.Resellers and
dbo.ResellerSales tables in the Staging database on the localhost database engine instance.

Verify that both of these tables are empty.

X Task 2: Run a package to stage reseller sales data

Open the AdventureWorksETL.sln solution in the D:\10777A\Labfiles\Lab05B\Starter\Ex2 folder.

Open the Extract Reseller Data.dtsx package and examine its control flow.

Run the package, noting that the Extract Resellers task succeeds, but the Extract Reseller Sales task
fails. When execution is complete, stop debugging.

In SQL Server Management Studio, verify that the ResellerSales table is still empty, but the Resellers
table now contains reseller records.

In SQL Server Management Studio, execute the following Transact-SQL query to reset the staging
tables.
TRUNCATE TABLE Staging.dbo.Resellers

X Task 3: Implement checkpoints

Set the following properties of the Extract Reseller Data package:

CheckpointFileName: D:\10777A\ETL\CheckPoint.chk

CheckpointUsage: IfExists

SaveCheckpoints: True

Configure the properties of the Extract Resellers and Extract Reseller Sales tasks so that if they fail,
the package also fails.

X Task 4: Observe checkpoint behavior

MCT USE ONLY. STUDENT USE PROHIBITED

5-54 Implementing Control Flow in an SSIS Package

View the contents of the D:\10777A\ETL folder and verify that no file named CheckPoint.chk exists.

Run the Extract Reseller Sales Data.dtsx package, noting once again that the Extract Resellers task
succeeds, but the Extract Reseller Sales task fails. When execution is complete, stop debugging.

View the contents of the D:\10777A\ETL folder and verify that a file named CheckPoint.chk has been
created.

In SQL Server Management Studio, verify that the ResellerSales table is still empty, but the Resellers
table now contains reseller records.

View the data flow for the Extract Reseller Sales task, and modify the expression in the Calculate
Sales Amount derived column transformation to remove the text / (OrderQuantity %
OrderQuantity). The completed expression should match the following code sample.
UnitPrice * OrderQuantity

Run the Extract Reseller Sales Data.dtsx package, noting the Extract Resellers task is not
re-executed, and package execution starts with the Extract Reseller Sales task, which failed on the
last attempt. When execution is complete, stop debugging.

In SQL Server Management Studio, verify that the ResellerSales table now contains data.

Close SQL Server Data Tools when you have completed the exercise.

Results: After this exercise, you should have a package that uses checkpoints to enable execution to be
restarted at the point of failure on the previous execution.

Modu
ule Reviiew and
d Takeaw
ways

Review
R
Quesstions

MCT USE ONLY. STUDENT USE PROHIBITED

10777A: Im
mplementing a Data Warehouse with Miccrosoft SQL Server 20012

5-55

1..

You want Task


T
3 to run iff Task 1 or Task
k 2 fails. How ccan you accom
mplish this?

2..

Which conttainer should you


y use to perrform the samee task once fo r each file in a folder?

3..

Your package includes a FTP task that downloads


d
a laarge file from an FTP folder,, and a Data Fllow task
that inserts data from the
e file into a dattabase. The Daata Flow task m
may fail becau
use the database is
unavailable
e, in which case
e you plan to run
r the packag
ge again after bringing the d
database onlin
ne. How
can you avo
oid downloadiing the file aga
ain when the p
package is re-eexecuted?

MCT USE ONLY. STUDENT USE PROHIBITED