You are on page 1of 182

Mahindra Satyam Learning World version 1.

0 1



Workbook for SQL Server
Integration Services
A guide to effective implementation























Mahindra Satyam Learning World version 1.0 2

Why This Module

Microsoft Integration Services is a platform for building enterprise-level data integration and
data transformations solutions. SQL Server Integration Services (SSIS) is an ETL tool from
Microsoft, moving data among heterogeneous data bases with transformations applied on the
data through a well defined workflow.

It is the new data transformation standard for SQL Server 2005 and has replaced the old SQL
Server Data Transformation Services. Integration Services includes a rich set of built -in tasks
and transformations; tools for constructing packages; and the Integration Services service for
running and managing packages.

The Integration Services architecture separates data flow from the control flow, thereby,
providing better control of package execution, enhancing the extensibility by simplifying the
creation and implementation of custom tasks and transformations. Noticeably, the Integration
services provide Integration and warehousing as a seamless, manageable, operation.

















Mahindra Satyam Learning World version 1.0 3

Contents

1. Introduction to SQL Server 2008 Integration Services
1.1 Overview of SQL Server Integration Services. 12
1.2 Using Integration Services Tools 16
1.3 Crossword 20 20


2. Planning an SSIS/ETL Solution

2.1 Planning Packages 22
2.2 Planning Package Development 25
2.3 Designing Package Control Flow 33
2.4 Crossword 35

3. Developing Integration Services Solutions

3.1 Creating an Integration Services Solution. 38
3.2 Creating packages 42
3.3 Building and Running a Solution. 44
3.4 Crossword 50

4. Implementing Control Flow

4.1 Control Flow Tasks. 52
4.2 Control Flow Precedent Constraints. 57
4.3 Control Flow Containers. 63
4.4 Crossword 70

5. Designing Data Flow


5.1 Understanding Data Flow 72
5.2 Designing Data Flow Operations 72
5.3 Handling Data Changes 72
5.4 Crossword 76





Mahindra Satyam Learning World version 1.0 4


6. Implementing Data Flow


6.1 Data Flow Sources and Destinations 78
6.2 Basic Data Flow Transformations 84
6.3 Advanced Data Flow Transformations 93
6.4 Data Flow Paths 100
6.5 Crossword 105

7. Logging, Error Handling, and Reliability


7.1 Logging ETL Operations 107
7.2 Handling Errors in SSIS 112
7.3 Implementing Reliable ETL Processes with SSIS 120
7.4 Crossword 121

8. Debugging and Error Handling


8.1 Debugging a Package 123
8.2 Implementing Error Handling 123
8.3 Crossword 131

9. Implementing Checkpoints and Transactions


9.1 Implementing Checkpoints 133
9.2 Implementing Transaction 140
9.3 Crossword 144

10. Configuring and Deploying Packages


10.1 Package Configurations 146
10.2 Preparing and Deploying Packages 151
10.3 Crossword 155









Mahindra Satyam Learning World version 1.0 5


11. Optimizing an SSIS Solution


11.1 Monitoring SSIS Performance 157
11.2 Optimizing SSIS Packages 163
11.3 Scaling Out SSIS Packages 163
11.4 Crossword 165

12. Managing and Securing Packages


12.1 Managing Packages 167
12.2 Securing Packages 174
12.3 Crossword 179

**Answers For Crosswords 180


























Mahindra Satyam Learning World version 1.0 6
Guide to Use this Workbook

Conventions Used

Convention Description

Topic

Indicates the Topic Statement being discussed.

Estimated Time

Gives an idea of estimated time needed to understand the Topic and
complete the Practice session.

Presentation

Gives a brief introduction about the Topic.

Scenario

Gives a real time situation in which the Topic is used.
Demonstration/Code
Snippet
Gives an implementation of the Topic along with Screenshots and real
time code.
Code in Italic
Represents a few lines of code in Italics which is generated by the
System related to that particular event.
// OR '

Represents a few lines of code (Code Snippet) from the complete
program which describes the Topic.
Context

Explains when this Topic can be used in a particular Application.

Practice Session

Gives a practice example for the participant to implement the Topic,
which gives him a brief idea how to develop an Application using the
Topic.


Check list

Lists the brief contents of the Topic.
Common Errors

Lists the common errors that occur while developing the Application.
Exceptions

Lists the exceptions which result from the execution of the Application.


Mahindra Satyam Learning World version 1.0 7
Lessons Learnt
Lists the lessons learnt from the article of the workbook.
Best Practices
Lists the best ways for the efficient development of the Application.
Notes
Gives important information related to the Topic in form of a note

































Mahindra Satyam Learning World version 1.0 8
Database Diagram







Mahindra Satyam Learning World version 1.0 9
Database Schema






GROUP_ID Char(10)(PK) GROUP ID Char(10)(FK)
GROUP_NAME varchar(255) SUB GROUP ID Char(10)(PK)
Address1 varchar(255) SUB GROUP NAME varchar(255)
Address2 varchar(255) Address1 varchar(255)
Address3 varchar(255) Address2 varchar(255)
City varchar(255) Address3 varchar(255)
State varchar(255) City varchar(255)
Zip char(10) State varchar(255)
COUNTY char(2) Zip char(10)
Country_Code varchar(255) COUNTY char(2)
Phone char(15) Country Code varchar(255)
Phone_Ext char(15) Phone char(15)
FAX char(15) Phone Ext char(15)
Email varchar(255) FAX char(15)
Group_Term_Date datetime Email varchar(255)
Group_Effective_Date datetime Sub Group Term Date datetime
Sub Group Effective Date datetime
CLASS ID Char(10)(FK)
Claim ID Char(10)(PK) GROUP ID char(10)(FK)
Member ID Char(10) Class ID char(10)(PK)
GROUP ID Char(10) GroupPlanID char(10)(FK)
SUB GROUP ID Char(10) Open Month numeric
SUBscriber ID Char(10) End Month numeric
Claim Type char(10) Plan Age Calculation Method varchar(255)
Claim Sub Type char(10)
Class ID char(10)
MemEligibilty ID int(PK) Member ID Char(10)(FK)
Member ID Char(10)(FK) Member Effective Date datetime
Member Effective Date datetime Term Date datetime
Member Term Date datetime Handicap Desc varchar(255)
Group ID Char(10) Type char(1)
SubGroupId Char(10) Last Verified Date datetime
Product ID Char(10)
Eligibility Indicator char(1)
SUBGROUP GROUP
Member Eligibility
Plan Claims
Member Handicap


Mahindra Satyam Learning World version 1.0 10





















GROUP ID Char(10) (FK) Group ID Char(10)(FK)
SUB GROUP ID Char(10)(FK) Subscriber ID Char(10)(FK)
SUBscriber ID Char(10)(PK) Member ID Char(10)(PK)
First Name varchar(255) Member Suffix numeric
Middle Name varchar(255) First Name varchar(255)
Last Name varchar(255) Middle Name varchar(255)
Title Char(10) Last Name varchar(255)
Subscriber Effective Date datetime Title Char(10)
Member Effective Date datetime
SSN Char(10)
Sex char(2)
Birth Date datetime
Phone char(15)
Phone Ext char(15)
Eligibility Date datetime
Term Date datetime
Product ID char(10)(PK) GroupPlanID char(10)(PK)
Product Desc varchar(255) Description varchar(255)
Effective Date datetime Dependent Stop Age numeric
Term Date datetime Student Stop Age numeric
Product Type char(10)
Price Indicator varchar(255)
Claims Indicator varchar(255)
SUBSCRIBER MEMBER
Product Plan Age Limit


Mahindra Satyam Learning World version 1.0 11








1.0 Introduction to SQL Server 2008 Integration Services




Topics

1.1 Overview of SQL Server
Integration Services

1.2 Using Integration Services
Tools

1.3 Crossword














Mahindra Satyam Learning World version 1.0 12
Topic: Overview of Integration Services Estimated Time: 30 min.



Objectives: At the end of the activity, participants will be able to understand

The features of the SSIS development environment.
The features of the SSIS runtime
SSIS Architecture

Presentation:

SSIS is a set of utilities, applications, designers, components and service all wrapped up into
one powerful software application suite.
An ETL Tool
Extract - Extracting data from any data source
Transform - Transformations essentially alter the data according to some logical
Rules(s).
Load - Once the transformation has been done on the data, loading data on
Destination
Successor of DTS (Data transformation Services).
SSIS consists of four key partsIntegration Services Service, the Integration Services object
model, the Integration Services runtime and the run-time executables.
SSIS can process large volumes of data efficiently through complex operations such as
extracting and loading data, and transforming data by cleaning, aggregating, merging, and
copying data. Therefore, SSIS is intended for both traditional ETL and non-traditional data
integration.
SSIS develop custom extensions such as tasks, log providers, enumerators, connection
managers, and data flow components. These custom objects can be integrated into the user
interface of BIDS.
SSIS can automate package maintenance and execution by loading, modifying, and executing
programmatically new or existing Integration Services packages.

Evolution of SSIS
SQL Server 6.5 - BCP
SQL Server 7.0 - DTS (With the help of Scripts)
SQL Server 2000 - DTS (With the help of Scripts)
SQL Server 2005 - SSIS (With the help of Services)

Difference between BCP and DTS


Mahindra Satyam Learning World version 1.0 13
Interface: - BCP is a command line utility and DTS is a GUI (Graphical user interface).
Customization: - BCP is a command line utility and user need to remember so much of
command but while working with DTS user can work with VBScript which is very easy to
understand.
Difference between DTS and SSIS

Traditional DTS warehousing Loading
Integration process simply conforms data and loads the database server
The database performs aggregations, sorting and other operations
Database competes for resources from user queries
This solution does not scale very well


Warehouse loading with SSIS
SQL Server Integration Services conforms the data
But also aggregates and sorts, and loads the database
This frees-up the database server for user queries



Mahindra Satyam Learning World version 1.0 14
SSIS Architecture
SSIS is a new, highly scalable platform for building high-performance data integration solutions
including extract, transform and load (ETL) packages for data warehousing. SSIS overcomes many of the
limitations of DTS. In SQL Server 2005, the Integration Services architecture separates data flow from
control flow by introducing two distinct engines:
The Data Transformation Run-time engine: The Run-time engine provides package
storage, package execution, logging, debugging, event handling, package deployment,
and management of variables, transactions, and connections.
Data Flow engine: The Data Flow engine handle the flow of data from source systems,
through transformations, and finally to destination target systems.






Mahindra Satyam Learning World version 1.0 15
Scenario:

Mr. George is National Sales Manager of Cure Health and Insurance Company and requires
transformation on the existing data

Context:

Merging Data from Heterogeneous Data Stores
Populating Data Warehouses and Data Marts
Cleaning and Standardizing Data


Practice Session:

In the above scenario, identify the tasks that performs control flow and the data flow
activities
How many places do you feel logging is required
Common Errors:

Using Management studio instead of Business Intelligence Development Studio.
Lessons Learnt:

SSIS Package v/s DTS Package
Execution and storage of SSIS Package
Runtime Engine v/s Data Flow Engine













Mahindra Satyam Learning World version 1.0 16
Topic: Using Integration Services Tools Estimated Time: 30 min.

Objectives: At the end of the activity, the participant will be able to understand:

The Business Intelligence Development Studio
SSIS Designer

Presentation:

Business Intelligence Platform




Business Intelligence Development Studio

The Business Intelligence Development Studio (BIDS) is a light version of Microsoft Visual
Studio 2005 and is where you create Integration Services projects. The following tasks are
available from either with Visual Studio or BIDS:
Create packages that include control flow, data flow, event-driven logic and logging.
Test and debug packages by using the troubleshooting and monitoring features in SSIS
Designer, and the debugging features in BIDS.
Create configurations that update the properties of packages and package objects at run time.
Create a deployment utility that can install packages and their dependencies on other
computer.
Save copies of packages to the SQL Server msdb database, the SSIS Package Store, and the
file system.
Run SQL Server Import and Export Wizard to create a basic package to copy data from a
source to a destination.



Mahindra Satyam Learning World version 1.0 17


SSIS Designer
The SSIS Designer is a graphical tool which is used to create and maintain Integration Services
packages. The SSIS Designer is available with BIDS or Visual Studio as part of Integration
services projects where the following task can be performed:
Constructing the data flows in a package.
Adding event handlers to the package and package objects.
Viewing the package content.
At run time, viewing the execution progress of the package.


Mahindra Satyam Learning World version 1.0 18

The SQL Server Management Studio allows you to perform the following tasks:
Create folders to organize packages that align with your organization.
Run packages that are stored on the local computer by using the Execute Package utility.
Run the Execute Package utility to generate a command line to use when running the dtexec
command prompt utility.
Import and export packages to and from the SQL Server msdb database, the SSIS Package
Store, and the file system.








Mahindra Satyam Learning World version 1.0 19

Scenario:

Mr. George is National Sales Manager of Cure Health and Insurance Company; require
transformation on the existing data.

Context:

Building Business Intelligence into a Data Transformation Process
Automating Administrative Functions and Data Loading

Practice Session:
Identify the packages that need to be put in Database and the file system
Check list:

Using BIDS with SQL Server 2005
Using Business Intelligence with Visual Studio 2005.
Common Errors:

Using Data Flow instead of Control Flow
Lessons Learnt:

Control Flow v/s Data Flow
Package Development using Export import Wizard
Package Development using BI Designer








Mahindra Satyam Learning World version 1.0 20

Crossword: Unit-1 Estimated Time: 10 min

Across:
1) SSIS is successor of _____ (3)
3) ETL Stands for ________________ (20)
4) SSIS is an ___ Tool (3)
Down:
1) The flow of data from Data sources through transformation and finally to the Data
Destination target system is achieved by the _____________ Task (14)
2) The Command line utility of SSIS is __________________(3)




Mahindra Satyam Learning World version 1.0 21




2.0 Planning an SSIS/ETL Solution




Topics

2.1 Planning Packages
2.2 Planning Package
Development
2.3 Designing Package Control
Flow
2.4 Crossword















Mahindra Satyam Learning World version 1.0 22

Topic: Planning Packages Estimated Time: 30 min.

Objectives: At the end of the activity, the participant will be able to understand:

What is a Package
Planning Package

Presentation:
Packages are the key component of SQL Server Integration Services.
A package is a collection of tasks that execute in an orderly approach.
Precedence constraints help manages which order the tasks will execute in.



A package is a collection of SSIS objects including four permanent tabs:
1. Control flow used to construct the control flow in a package on the design surface.
2. Data Flows used to construct the data flows in a package on the design surface.
3. Event handlers used to construct the events in a package on the design surface.
4. Package Explorer used for viewing the contents of a package.


Mahindra Satyam Learning World version 1.0 23

When the package executes a fifth tab is available that displays the execution progress. The
following diagram shows the SSIS designer, toolbox, and the solution explorer.
A package can be saved onto a SQL Server, which in actuality is saved in the MSDB
database. It can also be saved as .DTSX file, which is an XML-structured file.
Building a Successful ETL Project

When we are building ETL projects we need to do some planning up front to make sure we have all the
things we need to make it a success. Here are some of the things we consider when embarking on an ETL
Project. There are two lists. The first is around an ETL Project in general and the second is around things
we think about specifically for SSIS.
Review target data model
Identify source systems (Owners, RDBMS types, Permissions)
Analyze and profile source systems (use patterns, Windows of opportunity)
Document source data defects and anomalies (known issues, data profiling)
Define business rules required for the project (What is it we are trying to achieve)
Define data quality rules (Thresholds, OOR values etc)
Develop mappings for the target tables
Integrate business and quality rules with mappings
Lineage
Security
Compliance
Available Skills
Legacy items (Is anything about to retire? Can you still get drivers?)

Rules around specifics

Good naming convention for objects
Break down packages into manageable pieces of work (Scalability, Manageability, re -
start ability )
Consider restarts for the package
Consider touching down on disk after screens (Raw Files)
Logging (Use of event handlers)
Do not hardcode values, use Package Configurations
If you use Package Configurations then use Indirect Configurations




Mahindra Satyam Learning World version 1.0 24

Scenario:

Mr. George requires gathering the data from multiple data sources into one single excel file.
Context:
Performing ELT
Configuring Data warehouse
Using OLTP and OLAP

Practice Session:

Identify the role of DtexecUI.exe in package execution.
Identify the usage of various containers in various scenarios.

Check list:

Select the appropriate task for better performance.


Lessons Learnt:

Control Flow is the Main Flow of the package.
The entire ETL task can be performed by using Data Flow tab.

















Mahindra Satyam Learning World version 1.0 25

Topic: Planning Packages Development Estimated Time: 30 min.

Objectives: At the end of the activity, the participant will be able to understand:

Package Development
Advantages of developing the package in File system/msdb database
Importance of Variables and Annotations
Importance of Naming Conventions
Connection Manager
Data Sources
Presentation:
Package Storage




SSIS packages may be stored in either the file system or the SQL Server msdb system database.
SQL Server Management Studio allows management of packages stored in MSDB database as
well as File system.
The Stored Packages folder has two subfolders.
o The File system folder lists the packages that have been stored on the local server.
o The MSDB folder list packages stored to the SQL Server instances where the IS Service
is configured to enumerate packages.


Mahindra Satyam Learning World version 1.0 26
There are no default subfolders in these two folders; subfolder may be created, renamed, and
deleted if required, using the New Folder menu option.
When creating a new folder beneath the File System folder, a new directory is created in the file
system as well. For folders that are created under the MSDB folder, a new entry is added to the
sysdtspackaesfolder90 table that tracks the folder structure.
Subfolders make it possible to better organize packages.
Grouping of packages into one subfolder recommended if multiple packages exist in a solution
and give the folder the same name as the solution. This makes it easier to link the production
storage location with the solution.

Advantages of saving packages to Files system:
Can use source control to manage
Ultra secure when using the Encrypt with User Key encryption option
Not subject to network downtime problems (saved locally)
May escrow deployment bundles, including miscellaneous files
Less steps to load into the designer
Easier direct access for viewing
May store packages hierarchically in file system
Projects in Visual Studio are disk based and require the package to be in the file system
Generally a better experience during development

Advantages of saving to SQL Server msdb database:
Generally easier access by multiple individuals
Benefits of database security, DTS roles, and Agent interaction
Packages get backed up with normal DB backup processes
Able to filter packages via queries
May store packages hierarchically via new package folders
Generally a better in-production experience



SSIS development flow


Mahindra Satyam Learning World version 1.0 27
Annotation
An annotation is a comment that can be placed in the package to help others to understand what is
happening in the package.
Basically we use an annotation in the package to shows the title and version of package. A version history
can also be maintained with annotation note in the package so that it is clear what has changed in the
package between releases and who performed the change. The following are examples:
Version 1.0 Bryan Thiel 9/1/2006 Initial Release
Version 1.1 Bryan Thiel 9/2/2006 Fixed Data Type Conversion Issue
Variables

Variables are SSIS objects, used to dynamically set values and control processes in packages,
containers, tasks, and event handlers.
Variables are used to pass values to the scripts in the Script task or Script component.
The precedence constraints that sequence tasks and containers into a workflow can also use
variables when their constraint definitions include expressions.
Variable can be within the scope of a package or within the scope of a container, task, or event
handler in the package.
Variables with package scope are equivalent to global variables and to global variables in DTS
2000. These variables can be used by all the containers in the package.
Variables that are defined within the scope of a container, such as a For Loop container, can be
used by all tasks or containers within that container.
SSIS supports two types of variables - system variables and user-defined variables.
System Variables
System variables are variables that are defined by SSIS.
These variables contain useful information about the package and its containers, tasks, and event
handlers. For example, the Machine Name system variable, Start Time system variable.
System variables can to add or updated. We can only view the information contained in them.
User-Defined Variables
User-defined variables are defined by package developers.
User-defined variables can be created in packages, containers, tasks, transformations, and
precedence constraints in any namespace.
You can also define User-defined variables scope while create a new variable.
If you set an expression as the value of a variable, the expression is evaluated at run time, and the
variable is set to the result of the evaluation.



Mahindra Satyam Learning World version 1.0 28

Naming Conventions
The acronyms below should be used at the beginning of the names of tasks to identify what type of task it
is.
Task Prefix
For Loop Container FLC
Foreach Loop Container FELC
Sequence Container SEQC
ActiveX Script AXS
Analysis Services Execute DDL ASE
Analysis Services Processing ASP
Bulk Insert BLK
Data Flow DFT
Data Mining Query DMQ
Execute DTS 2000 Package EDPT
Execute Package EPT
Execute Process EPR
Execute SQL SQL
File System FSYS
FTP FTP
Message Queue MSMQ
Script SCR
Send Mail SMT
Transfer Database TDB
Transfer Error Messages TEM
Transfer Jobs TJT
Transfer Logins TLT
Transfer Master Stored Procedures TSP
Transfer SQL Server Objects TSO
Web Service WST
WMI Data Reader WMID
WMI Event Watcher WMIE
XML XML
These acronyms should be used at the beginning of the names of components to identify what type
of component it is.
Component Prefix
DataReader Source DR_SRC
Excel Source EX_SRC
Flat File Source FF_SRC
OLE DB Source OLE_SRC
Raw File Source RF_SRC


Mahindra Satyam Learning World version 1.0 29
XML Source XML_SRC
Aggregate AGG
Audit AUD
Character Map CHM
Conditional Split CSPL
Copy Column CPYC
Data Conversion DCNV
Data Mining Query DMQ
Derived Column DER
Export Column EXPC
Fuzzy Grouping FZG
Fuzzy Lookup FZL
Import Column IMPC
Lookup LKP
Merge MRG
Merge Join MRGJ
Multicast MLT
OLE DB Command CMD
Percentage Sampling PSMP
Pivot PVT
Row Count CNT
Row Sampling RSMP
Script Component SCR
Slowly Changing Dimension SCD
Sort SRT
Term Extraction TEX
Term Lookup TEL
Union All ALL
Unpivot UPVT
Data Mining Model Training DMMT_DST
DataReader Destination DR_DST
Dimension Processing DP_DST
Excel Destination EX_DST
Flat File Destination FF_DST
OLE DB Destination OLE_DST
Partition Processing PP_DST
Raw File Destination RF_DST
Recordset Destination RS_DST
SQL Server Destination SS_DST
SQL Server Mobile Destination SSM_DST





Mahindra Satyam Learning World version 1.0 30
Connection Manager
A connection manager is a logical representation of a
connection. SSIS provides different types of connection managers,
such as ADO.NET, OLE DB, ODBC, Flat File, Excel, WML,
FTP, and HTTP.

These connection managers can be used to connect the different types of data stores in a
package.
To implement a connection, specify the connection manager properties that describe the
attributes of required connection. This connection information is used at run time to create the
physical connection.
A package can use multiple instances of a connection manager type with different properties
set for each instance. At run time, each of these instances creates a connection with the
specified attributes.

Data Sources

Data sources represent a connection to a data store and can be nearly any OLE-DB compliant data source
such as SQL Server, Sybase, DB2, or even nontraditional data sources such as Analysis Services and
Outlook. Data sources can be localized to a single SSIS package or shared across multiple packages.
Connections are defined in the Connection Manager.





Mahindra Satyam Learning World version 1.0 31
The connection can be configured completely offline and the SSIS package will not use it until the
instantiation begin in the package.

Data Source Views
Data source views (DSVs) are a new concept in SQL Server 2005. DSV allows user to create a
logical view of your business data.
DSVs are a collection of tables, views, stored procedures, and queries that can be shared across
the project.
DSVs are especially useful in large complex data models that are common in ERP systems like
Siebel or SAP.
DSVs map the relationships between tables that may not necessarily exist in the physical model.
DSVs allow user to segment a large data model into more bite-sized chunks. For example, Siebel
system may be segmented into a DSV called Accounting, Human Resources and Inventory.
DSVs are deployed as a connection manager.
There are a few key things to remember with data source views. Like data sources, DSVs allow to
define the connection logic once and reuse it across your packages.

Scenario:

Mr. George requires gathering the data from multiple database into one single excel file and
requires to save the package in File System.

Context:
Storing Data in Database or File System

Practice Session:

Identify some scenarios of saving data in the File System and in the database.
Identify the need to move the package from database system to the File System.


Check list:

Packages stored in the File System can be moved into Database, if required.
Proper Naming Convention
Appropriate use of Annotation and Variable in package Development
Appropriate Naming Convention


Common Errors:


Mahindra Satyam Learning World version 1.0 32

Not understanding the data storage requirement properly.
Using System defined variables for storing data.
Not updating the version History of SSIS package.

Lessons Learnt:

Advantages of storing package in File System /SQL Server msdb database
Naming Convention
Variables and Annotations

SSIS is an in-memory pipeline, so ensure that all transformations occur in memory
Plan for capacity by understanding resource utilization.
























Best Practices:


Mahindra Satyam Learning World version 1.0 33

Topic: Designing Package Control Flow Estimated Time: 30 min.

Objectives: At the end of the activity, the participant will be able to understand:

Designing a Package
Use of tasks, containers and constraints


Presentation:
A package is made up of three main components: the control flow, the data flow,
precedence constraints, used to link tasks in a package together. Control flow provides the
steps for the execution when a package runs.
There are three types of objects within the control flow.

o Containers containers provide structure, by grouping tasks and other containers into
meaningful units of work. There are four types of containers that can be used in SSIS.
Task host container: The core type of container that every task implicitly belongs to
by default. The SSIS architecture extends variables and event handlers to the task
through the task, through the task host container.
Sequence container: Allows you to group tasks into logical subject areas. In the
package designer, you can then collapse or expand this container for usability.
For loop container: Loops through a series of task for a given amount of time or
until a condition is met.
For each loop container: Loop through a series of files or records in a data set and
then executes the tasks in the container for each record in the collection.


o Tasks tasks perform a variety of functionality in the package. Tasks are broken into two
types; control flow task and data flow tasks.
Control flow tasks handle workflow responsibilities.
Data flow task provides the ability to move data between different data sources by
defining a source and target destination such as a flat file, Excel, OLE DB, and
SQL Server.

o Precedence Constraints Constraints link containers and task in a package into a logical
flow and specify the conditions upon which the items are executed. There are three types of
constraints used to control package execution control flow; SSIS supports creating
constraints based on an evaluation operator or the execution results of a task.
Success: This constraint will execute only when the prior task completes
successfully.
Completion: This constraint will execute when the prior task completes. Whether the
prior task succeeds or fails is inconsequential.


Mahindra Satyam Learning World version 1.0 34
Failure: This constraint will execute only when the prior task fails to complete. This
constraint is frequently used to notify an operator of a failed event.


Scenario:

Mr. George requires gathering data from multiple databases into one single excel file and requires
saving the package in File System with some data cleansing.

Context:
Developing SSIS Package

Practice Session:

Identify the task to copy a large amount of data into some other database.
Identify the task to send a mail to the user on the failure of the package.

Check list:

Using the control flow, the data flow and precedence constraints in appropriate situations.
Correct mapping of the precedence constraints with the tasks.

Common Errors:

Using Foreach loop container in place of for loop container.
Using success precedence constraint in place of failure constraint.

Lessons Learnt:

Use of Control Flow, Data Flow and Precedence constraints.
The importance of tasks in designing and the packages.







Mahindra Satyam Learning World version 1.0 35

Crossword: Unit-2 Estimated Time: 15 min

ACROSS:
3) SSIS packages may be stored in either _______ or ______ (Join the both the blanks answer) (14)
6) _________ is a comment that can be placed to help others to understand what is happening in the
package (10)
9) There are __ types of the object within control flow (5)
10) For easier access of the packages by multiple users, the package should be placed in ________ (4)
11) _____ are used to pass the values to the script in the script tasks or script components (8)


Mahindra Satyam Learning World version 1.0 36

Down:
1) _______ is maintained by annotation
2) .Dtsx file is an _____ structured file
4) A connection manager is used to connect different types of ________ in a package
5) The key component of SSIS
7) Collection of these objects in SSIS makes a package
8) The flow that provides the steps for the execution of a package



















Mahindra Satyam Learning World version 1.0 37




3.0 Developing Integration Services Solutions




Topics

3.1 Creating an Integration
Services Solution

3.2 Creating packages

3.3 Building and Running a
Solution.

3.4 Crossword













Mahindra Satyam Learning World version 1.0 38

Topic: Creating an Integration Services Solution Estimated Time: 30 min.

Objectives: At the end of the activity participant should understand

SSIS Integrated Development Environment
SSIS Project Lifecycle

The following are some of the common dialog and windows available in BIDS and Visual Studio.

The Solution Explorer window is where you will find all of the SSIS packages, connections,
and Data Source Views. A solution is a container that holds a series of projects that can
include SSIS projects as well as other types of projects (i.e. Database Projects, Class
Libraries, etc.). Each project holds a variety of object related to the type of project. For SSIS,
it will hold packages, and shared connections.

The Toolbox contains all the items that you can use in the designer at any given point in time.
For example, when working on the control flow tab the toolbox will display items related to
working with the control flow, while working on the data flow tab the toolbox will be
completely different displaying items related to creating the data flow. By right clicking on
the toolbox, you may customize the toolbox by adding or removing tabs and adding,
renaming, or removing items.

The Properties window is where you customize almost any item that is selected. The view of
the properties window will vary greatly based on the item that is selected.

The Navigation Pane is a new feature that allows you to quickly navigate through a package.
The pane is visible only when the package is more than one screen in size, and allows for
quick navigation through a package. To access the pane, left-click and hold on the cross-
arrow in the bottom-right corner of the screen. This will allow you to scroll up and down a
large package easily.
o In addition to these common dialogs and windows the following windows are
available at design time.

The Error List window shows errors and warnings that been detected in the package.
Double-Click the entry to go to the object causing the error.

The Output window shows the results from the build or execution of a package.

The Task List window shows task that a developer can create for descriptive purpose or to
use as a follow-up for later development.
o The last sets of windows to cover are related to testing a package.

The Call Stack window shows the names of functions or task on the stack.
Presentation:


Mahindra Satyam Learning World version 1.0 39

The Breakpoints window shows all of the breakpoints set in the current project.

The Command window is used to execute commands or aliases directly in Visual Studio.

The Immediate window is used to debug and evaluate expressions, execute statements, and
view variable values.

The Autos window displays variables used in the current statement and the previous
statement.

The Locals window shows all of the local variables in the current scope.

The Watch window allows you to add specific variables to the window that can be viewed as
a package executes. Read / write variables can be directly modified in this window.


There are four steps in the lifecycle of an SSIS Project. They are design, store, execute, and
manage.
Design The first step is to start by designing a SSIS package. Packages are created using
Visual Studio 2005 or the BI Development Studio included with SQL Server 2005.
Development will occur on the Developers workstation and upon completion deployed to
the Development Server.
Store After the package is implemented the package may be stored in either the file
systems or the SQL Server database (MSDB). The dtutil command line utility provides the
ability to move packages between SQL Server systems and from the msdb database to the
file system.
Execute SQL Server Agent or the dtexec or dtexecui utilities can be used to run the
package. For BCBSLA we will use the dtexec command line utility to run packages.
Manage The new SQL Server Management Studio provides the ability to monitor and
manage packages.

The following diagram illustrates this high-level flow.

Store
DTUtil
File System
(.dtsx files)
Design
V
i
s
u
a
l

S
t
u
d
i
o

2
0
0
5
o
r
B
u
s
i
n
e
s
s

I
n
t
e
l
l
i
g
e
n
c
e

D
e
v
e
l
o
p
m
e
n
t

S
t
u
d
i
o
o
r
I
m
p
o
r
t

/

E
x
p
o
r
t

W
i
z
a
r
d
Execute Manage
SQL Server
Management
Studio
SQL Agent
Design and Deployment Administration
MSDB
sysdtspackages90
DTExec



Mahindra Satyam Learning World version 1.0 40

Scenario:

Mr. George requires gathering data from multiple data sources into one single excel file.

Demonstration/Code Snippet:

Step 1: To open BI Development Studio from the SQL Server 2005 program group, on the Start
menu, point to All Programs, point to Microsoft SQL Server 2005, and click SQL
Server Business Intelligence Development Studio. The Start Page Microsoft Visual
Studio appears.
Step 2: From the File menu, point to New, and click Project.
Step 3: If there is VS.NET installed, in the New Project dialog box, click the Business
Intelligence Projects template.
Step 4: From the Templates Box, select Integration Services Project Template.
Step 5: In the Name box, change the default name of the project, select appropriate location
and change the Solution name (select the checkbox create the SSIS Solution) and then
click OK. By default, an empty package titled, Package.dtsx is added to the project.


Context:

Developing an SSIS package
To load data into the Data Warehouse.
For parallel execution of multiple tasks.

Practice Session:
Develop an SSIS Application with two packages using SSIS Package Wizard and as
demonstrated.
Develop an SSIS Application using VS2005 to demonstrate parallel execution of two
packages.
Check list:

The use of error handler and package execution tabs.




Mahindra Satyam Learning World version 1.0 41
Common Errors:
Choosing an inappropriate template (project type) to create an SSIS project.
Renaming he package without the extension (.dtsx)
Not saving the package at appropriate location.
Not creating a solution for the SSIS project.

Lessons Learnt:

SSIS Project Lifecycle
Various Components of SSIS Designer


To use the Execution Results tab to understand the performance of the task
To use the In Progress tab to understand the package progress at the execution time.
























Best Practices:


Mahindra Satyam Learning World version 1.0 42

Topic: Creating Packages Estimated Time: 30 min.

Objectives: At the end of the activity, the participant will be able to understand:

How to create a package
Presentation:
Package is a set query in the form of tasks. Package can have Control Flow or Data Flow. A
control flow can be constructed in an SSIS package by using three types of control flow elements.
1) Containers
2) Tasks and
3) Precedence constraints
To construct the control flow, you need to first add tasks and containers to the control flow, and
then connect them by using precedence constraints. If the control flow includes tasks and
containers that connect to data sources, you also need to add connection managers to the package.

Scenario:
Mr. George is lookng at the way of creating a package to implement a set of tasks as part of the
business requirement. This demonstration provides an approach of creating a package for Mr.
George.
Demonstration/Code Snippet:

Step 1: To open BI Development Studio from the SQL Server 2005 program group, on the Start
menu, point to All Programs, point to Microsoft SQL Server 2005, and click SQL
Server Business Intelligence Development Studio. The Start Page Microsoft Visual
Studio appears.
Step 2: From the File menu, point to New, and click Project.
Step3: If there is VS.NET installed, in the New Project dialog box, click the Business
Intelligence Projects template.
Step 4: From the Templates Box, select Integration Services Project Template.
Step 5: In the Name box, change the default name of the project, select appropriate location and
change the Solution name (select the checkbox create the SSIS Solution) and then click
OK. By default, an empty package titled, Package.dtsx is added to the project.
Step 6: In the Solution Explorer pane, right-click Package.dtsx, and then rename the default
package.
Step 7: Then, to rename the package object, click OK.
Step 8: Finally, to open the renamed package in SSIS Designer, double-click the package.


Mahindra Satyam Learning World version 1.0 43

Context:

When developing any SSIS package
When requires to load data into Data Warehouse.
When requires parallel execution of multiple tasks.
Practice Session:

Create a package to run an executable file from the control flow (Hint: Use Execute Process
task)


Common Errors:

Use Integration Services Project template.
Lessons Learnt:

Two ways to create a package, using Wizards and adding new SSIS Packages to the solution.


























Mahindra Satyam Learning World version 1.0 44


Topic: Building and Running a Solution Estimated Time: 30 min.

Objectives: At the end of the activity, the participant will be able to understand:
Execution of an SSIS Package

Presentation:
There are three tools to run an Integration Services package
The dtexec command prompt utility (dtexec.exe).
The Execute Package Utility (dtexecui.exe).
A SQL Server Agent job.

Dtexec Utility: - Using the dtexec utility, you can run packages that are stored in the file
system, in an instance of SQL Server, or in the Integration Services Package Store.
Use the Execute Package Utility dialog box to specify package run-time configurations and run
packages on the local computer and to generate command lines for use with dtexec command
prompt utility.
Use the dtexec to run an existing package at the command prompt. The dtexec utility provides
access to all the package configuration ad execution features, such as connections, properties,
variables, logging, and progress indicators. The dtexec utility provides the ability to load
packages from three sources: a Microsoft SQL Server database, the SSIS service, and the file
system.
The utility has four phases that it proceeds through as it executes. The phases are:
1. Command Sourcing Phase: The command prompt reads the list of options and
arguments specified. All subsequent phases are skipped if a /? or /HELP option is
encountered.
2. Package Load Phase: The package specified by the /SQL, /FILE, or /DTS option is
loaded.
3. Configure Phase: Options are processed as follows:
o Options that set package flags, variables, and properties.
o Options that verify the package versioning and build.
o Options that configure the utility operation, such as reporting.
4. Validation and Execution Phase: The package is run, or validated without running if
the /VALIDATE option was specified.
When a package runs, dtexec can return an exit code. The following table lists the values
that the dtexec utility can set when exiting.

Value Description
0 The package executed successfully


Mahindra Satyam Learning World version 1.0 45
1 The package failed
3 The package was canceled by the user.
4 The utility was unable to locate the requested package.
The package could not be found.
5 The utility was unable to load the requested package.
The package could not be loaded.
6 The utility encountered an internal error of syntactic or
semantic errors in the command line.

When specifying options, all options must begin with a slash (/) or a minus sign (-). The options that are
shown here begin with a slash (/), but the minus sign (-) can be substituted. Arguments must be strings
that are either enclosed in quotation marks or contain no white space. Double quotation marks with
quoted strings represent escaped single quotation marks. Options and arguments, except for passwords,
are not case sensitive.

Execute Package Utility: - The Execute Package Utility is available only in a 32-bit version. On a 64-
bit computer, any commands that the Execute Package Utility creates should also be tested in 64-bit mode
before you deploy or schedule them on a production server. To test these commands in 64-bit mode, use
the 64-bit version of the dtexec utility.

SQL Server Agent job: - There must be a separate step for each package that you want to run. The job
can be associated with one or more schedules, or can be an unscheduled job that you run manually. The
account that runs an Integration Services package as a SQL Server Agent job step requires all the same
permissions as an account that runs the package directly.

Dtutil
Use the dtutil command prompt utility to manage existing packages at the command prompt. You can
access packages that are stored in the SQL Server msdb database, the SSIS Package Store and the file
system, and perform tasks such as copying, deleting, moving, and signing packages. You can also verify
that a specified package exists.
The dtutil command prompt utility includes the following features:
Remarks in the command prompt, which makes the command prompt action self-
documenting and easier to understand.
Overwrite protection, to prompt for a confirmation before overwriting and existing package
when you are copying or moving packages.
Console help to provide information about the command options for dtutil.
If the utility accesses a package that is stored in msdb, the command prompt may require a user name and
a password. If the instance of SQL Server uses SQL Server Authentication, the command prompt requires
both a user name and a password. If the user name is missing, dtutil tries to log on to SQL Server using
Windows Authentication.
The dtutil command prompt utility does not support the use of command files or redirection.


Mahindra Satyam Learning World version 1.0 46
When syntax errors are detected, incorrect arguments are used, or invalid combinations of options are
specified, dtutil can return an exit code. The following table lists the values that the dtutil utility can set
when exiting.
Value Description
0 The utility executed successfully.
1 The utility failed.
4 The utility was unable to locate the requested package. The package
could not be found.
5 The utility was unable to load the requested package. The package
could not be loaded.
6 The utility encountered an internal error of syntactic or semantic
errors in the command line.

When specifying options, all options must begin with a slash (/) or a minus sign (-). The options that are
shown here begin with a slash (/), but the minus sign (-) can be substituted. Arguments must be strings
that are either enclosed in quotation marks or contain no white space. Double quotation marks with
quoted strings represent escaped single quotation marks. Options and arguments, except for passwords,
are not case sensitive.
Scenario:
Mr George requires to know all the possible ways to run a package as some of the developers in
his team have hands on experience in using command line utility and some of them have good
knowledge of scheduling the jobs through SQL Job Agent.
Demonstration/Code Snippet:

To run a package by using the dtexec utility
Step 1: At the command prompt, type dtexec / followed by the DTS, SQL, or File
option and the package path. Make sure to include the package file name in the
package path.


Mahindra Satyam Learning World version 1.0 47
Step 2: If the package encryption level is EncryptSensitiveWithPassword or
EncryptAllWithPassword, use the Decrypt option to provide the password. If
you do not include a password, dtexec will prompt you for the password.
Step 3: (Optional) Enter additional command-line options.
Step 4: Press the ENTER key.
Step 5: (Optional) View logging and reporting information before closing the Command
Prompt window.
To run a package by using the Execute Package Utility

Step 1: In Management Studio, on the View menu, click Object Explorer.
Step 2: In Object Explorer, click Connect, and then click Integration Services.
Step 3: Expand the Stored Packages folder and its subfolders to locate the package to
run, right-click the package, and then click Run Package.
Step 4: (Optional) In the Execute Package Utility dialog box, perform one or more of
the following tasks:
Specify a different package to run.
Click Configurations, Command Files, Connection Managers,
Execution Options, Reporting, Logging, Set Values, or Verification to
update run-time options.
Click Command Line to review the command line that the utility uses.
Step 5: Click Execute.
Step 6: To stop the running package, click Stop in the Package Execution Progress
dialog box.
Step 7: When the package finishes, click Close to exit the Package Execution Progress
dialog box.
To run a package by using a SQL Server Agent job
Step 1: In SQL Server Management Studio, open the instance of SQL Server in which
you want to create a job, or the instance that contains the job to which you want
to add a step.



Mahindra Satyam Learning World version 1.0 48
Step 2: Expand SQL Server Agent and perform one of the following tasks:
To create a new job, right-click Jobs and then click New.
To use an existing job, expand Jobs, right-click the job, and then click
Properties.
Step 3: On the General page, if you are creating a new job, provide a job name, select an
owner and job category, and, optionally, provide a job description.
Step 4: To make the job available for scheduling, select Enabled.
Step 5: Click Steps and then click new.
Step 6: Provide a step name and, in the Type list, select a job step type that is based on
the version of the dtexec utility (dtexec.exe) that you want to run the job:
To run the job by using the version of the dtexec utility that the system
automatically invokes, select SQL Server Integration Services
Package.
On a 32-bit computer that is running SQL Server and SQL Server Agent,
this setting invokes the 32-bit version of the dtexec utility.
On a 64-bit computer that has the 64-bit version of SQL Server and SQL
Server Agent installed and running, this setting invokes the 64-bit
version of the dtexec utility.
On a 64-bit computer that only has the 32-bit version of SQL Server and
SQL Server Agent installed and running, this setting invokes the 32-bit
version of the dtexec utility.
To run a package in 32-bit mode from a 64-bit version of SQL Server
Agent, in the New Job Step dialog box, on the Execution options tab,
select Use 32 bit runtime.
Step 7: In the Run as list, select the proxy account that has the credentials that the job
will use.
Step 8: On the General tab, select the package source.
Step 9: To specify command-line options, do the following:
Click the Command Files tab to specify the files that contain the options
that the package uses.


Mahindra Satyam Learning World version 1.0 49
Click the Command Line tab to modify or restore the command-line
options.
Click the Configurations tab to add configuration files and, optionally,
export configured variables to a file.
Click the Data Sources tab, select the connection manager check box
and then update the connection string.
Click the Execution Options tab to specify the package run-time
behavior, such as whether the package fails if warnings occur.
Click the Logging tab to add log providers. Select a log provider in the
Log Provider list and then type the connection string.
Click the Set Values tab to map properties and variables to values.
Click the Verification tab to specify whether only signed packages can
run and to specify the version of package to run.
Step 10: Click OK.

Context:

Implementing the tasks by executing a package
Practice Session:

Create and execute packages using various package execution mechanisms
Create a package to understand the difference between parallel and sequential execution.

Common Errors:

The command line flags not provided in the specified orede
Lessons Learnt:

Executing Package using various tools includes (dtexec utility , Execute package Utility and
SQL Server Agent)

Best Practices:

Do not use parallel execution until the tasks are not dependent on the other tasks success or
failure.



Mahindra Satyam Learning World version 1.0 50

Crossword: Unit-3 Estimated Time: 10 min

Across:
1) The SSIS packages, connection and data source views can be viewed in this window (16)
3) The command line utility to manage an existing package at the command prompt (6)
6) The results from the build or execution of a package are shown in this window (6)
Down:
2) The window to debug and evaluate expression, execute statements and view variable value (9)
4) The no. of steps in the lifecycle of an SSIS project (Hint: Value in number) (4)
5) ________ is used to run an existing package at the command prompt (6)








Mahindra Satyam Learning World version 1.0 51






4.0 Implementing Control Flow




Topics

4.1 Control Flow Tasks

4.2 Control Flow Precedent
Constraints

4.3 Control Flow Containers

4.5 Crossword














Mahindra Satyam Learning World version 1.0 52
Topic: Control Flow Tasks Estimated Time: 45 min.

Objectives: At the end of the activity ,participants should be able to relate

The Use of various tasks to fulfill the requirements
Presentation:
The Control Flow tab contains the workflow parts of the package. This includes the tasks and precedence
constraints. In the Control Flow tab, drag and drop tasks from the Toolbox into the Control Flow designer
pane. Double-click the task to configure it. The task may display a yellow warning or red error icon until
it is configured.
Link the tasks to the other tasks after configuring it by using precedence constraints. When the task is
selected, a green arrow will point down from the task. To create an On Success precedence constraint,
click on the arrow and drag it to the task wish to link the task to. To create an On Failure constraint, this is
represented as a red arrow between tasks.
Tasks
A task can be described as an individual unit of work. Tasks provide functionality to your package in
much the same way that a method does in a programming language. The following are the tasks available
in SQL Server Integration Services:
ActiveX Script Task: This task provides backward compatibility for DTS packages to continue
use of custom code that was developed using ActiveX script, until such scripts can be upgraded to
use the more advanced features provided by the Script task.
Analysis Services Execute DDL Task: This task runs Data Definition Language (DDL)
statements that can create, drop or alter mining models and multidimensional objects such as
cubes and dimensions.
Analysis Services Processing Task: This task processes Analysis Services objects such as cubes,
dimensions and mining models.
Bulk Insert Task: This task provides a quick way to copy large amounts of data into a SQL
Server table or view. To ensure high-speed copying, transformations cannot be performed on the
data while it is moving from the source file to the table or view.
Data Flow Task: This task encapsulates the data flow engine that moves data between sources
and destinations, providing the facility to transform, clean and modify data as it is moved. A data
flow consists of at least one data flow component, but it is typically a set of connected data flow
components: sources that extract data; transformations that modify, route, or summarize data; and


Mahindra Satyam Learning World version 1.0 53
destinations that load data. Components are connected in the data flow by paths. Each path
specifies the two components that are the start and the end of the path.
Data Mining Query Task: This task runs prediction queries based on data mining models built
in Analysis Services. The prediction query creates a prediction for new data by using mining
models.
Execute DTS 2000 Package Task: This task runs packages that were developed by using the
SQL Server 2000 tools. By using this task, you can include SQL Server 2000 DTS packages in
SQL Server 2005 data transformation solutions. A package may include both Execute Package
tasks and Execute DTS 2000 Package tasks, because each type of task uses a different version of
the run-time engine.
Execute Package Task: This task extends the enterprise capabilities of Integration Services by
letting packages run other packages as part of a workflow. Examples when you may consider
using the Execute Package task include breaking down complex package workflows, reusing
parts of packages, grouping work units, and/or controlling package security.
Execute Process Task: This task runs an application or batch file as part of a SQL Server 2005
Integration Services (SSIS) package workflow. Although you can use the Execute Process task to
open any standard application, such as Microsoft Excel or Microsoft Word, you typically use it to
run business applications or batch files that work against a data source. For example, you can use
the Execute Process task to run a custom Visual Basic application that generates a daily sales
report. Then you can attach the report to a Send Mail task and forward the report to a distribution
list.
Execute SQL Task: This task runs SQL statements or stored procedures from a package. The
task can contain either a single SQL statement or multiple SQL statements that run sequentially.
You can use the Execute SQL task for the following purposes:
o Truncate a table or view in preparation for inserting data.
o Create, alter, and drop database objects such as tables and views.
o Re-create fact and dimension tables before loading them.
o Run store procedures
o Save the rowset returned from a query into a variable.
File System Task: This task performs operations on files and directories in the file system. For
example, by using the File System task, a package can create, move or delete directories and files.
You can also use the File System task to set attributes on files and directories.


Mahindra Satyam Learning World version 1.0 54
FTP Task: This task downloads and uploads data files and manages directories on servers. For
example, a package can download data files from a remote server or an Internet location as part of
an Integration Services package workflow.
Message Queue Task: This task allows you to use Microsoft Message Queuing (MSMQ) to send
and receive message between SQL Server Integration Services packages, or to send messages to
an application queue that is processed by a custom application. These messages can take the form
of simple text, files, or variables and their values.
Transfer Database Task: This task transfers a SQL Server database between two instances of
SQL Server. In contrast to the other tasks that only transfer SQL Server objects by copying them,
the Transfer Database task can either copy or move a database. The task can copy a database
between instances of SQL Server 2000, instances of SQL Server 2005, or one of each. This task
can also be used to copy a database within the same server.
Transfer Error Messages Task: This task transfers one or more SQL Server user-defined error
messages between instances of SQL Server. User-defined messages are messages with an
identifier that is equal to or greater than 50000. Messages with an identifier less than 50000 are
system error messages, which cannot be transferred by using the Transfer Error Messages task.
Transfer Jobs Task: This task transfers one or more SQL Server Agent jobs between instances
of SQL Server.
Transfer Logins Task: This task transfers one or more logins between instances of SQL Server.
Transfer Master Stored Procedures Task: This task transfers one or more user-defined stored
procedure between master databases on instances of SQL Server. To transfer a stored procedure
from the master database, the owner of the procedure must be a DBO.
Transfer SQL Server Objects Task: This task transfers one or more types of objects in a SQL
Server database between instances of SQL Server. For example, the task can copy tables and
stored procedures. Depending on the version of SQL Server that is used as a source, different
types of objects are available to copy. The Transfer SQL Server Objects task can be configured to
transfer all objects, all objects of a type, or only specified objects of a type.
Script Task: This task provides code to perform functions that are not available in the built-in
tasks and transformations that SQL Server 2005 Integration Services provides. The Script task
can also combine functions in one script instead of using multiple tasks and transformations. The
code is custom Microsoft Visual Basic .NET code that is compiled and executed at package run
time.
Send Mail Task: This task sends an e-mail message. By using the Send Mail task, a package can
send messages if tasks in the package workflow succeed or fail, or send messages in response to
an event that the package raises at run time. For example, the task can notify a database
administrator about the success or failure of the Backup Database task.


Mahindra Satyam Learning World version 1.0 55
Web Service Task: This task executes a Web service method. You can use the Web Service
Task for the following purposes:
o Writing to a variable the value that a Web service method returns.
o Writing to file the values that a Web service method returns.
WMI Data Reader Task: This task runs queries using the Windows Management
Instrumentation (WMI) Query Language that returns information from WMI about a computer
system.
WMI Event Watcher Task: This task watches for a Windows Management Instrumentation
(WMI) event using a Management Instrumentation Query Language (WQL) event query to
specify events of interest.
XML Task: This task is used to work with XML data. Using this task, a package can retrieve
XML documents, apply operations to the documents using Extensible StyleSheet Language
Transformations (XSLT) style sheets and XPath expressions, merge multiple documents or
validate, compare and save the updated documents to files and variables.
In addition, to these tasks the following tasks are available for Database Administration and
Maintenance:
BackUp Database Task: This task performs different types of SQL Server database backups.
Using the Back Up Database task, a package can back up a single database or multiple databases.
Check Database Integrity Task: This task checks the allocation and structural integrity of all the
objects in the specified database. The task can check a single database or multiple databases, and
you can choose whether to check the database indexes.
Execute SQL Server Agent Job Task: This task runs SQL Server Agent jobs. SQL Server
Agent jobs automate tasks that you perform repeatedly. You can create jobs that execute
Transact-SQL statements and ActiveX scripts, perform Analysis Services and Replication
maintenance tasks, or run packages. SQL Server Agent is a Microsoft Windows service that runs
jobs, monitors Microsoft SQL Server and fires alerts.
Execute T-SQL Statement Task: This task runs Transact-SQL statements. This task is similar
to the Execute SQL task. However, the Execute T-SQL Statement task supports only the
Transact-SQL version of the SQL language and you cannot use this task to run statements on
servers that use other dialects of the SQL language. If you need to run parameterized queries,
save the query results to variables, or use property expressions. You should use the Execute SQL
task instead of the Execute T-SQL Statement task.
History Cleanup Task: This task deletes entries in the following history tables in the SQL
Server msdb database:


Mahindra Satyam Learning World version 1.0 56
o Backupfile
o Backupfilegroup
o Backupmediafamily
o Backupmediaset
o Backupset
o Restorefile
o Restorefilegroup
o Restorehistory
By using the History Cleanup task, a package can delete historical data related to backup and
restore activities, SQL Server Agent jobs, and database maintenance plans.
Notify Operator Task: This task sends notification messages to SQL Server Agent operators. A
SQL Server Agent operator is an alias for a person or group that can receive electronic
notifications.
Rebuild Index Task: This task rebuilds indexes in SQL Server database tables and views.
Reorganize Index Task: This task reorganizes indexes in SQL Server database tables and views.
Shrink Database Task: This task reduces the size of SQL Server database data and log files.
Update Statistics Task: This task updates information about the distribution of key values for
one or more statistics groups (collections) in the specified table or indexed view.











Mahindra Satyam Learning World version 1.0 57

Topic: Control Flow Precedent Constraints Estimated Time: 40 min.


Objectives: At the end of the activity, the participant should understand

The use of Precedent Constraints.

Presentation:

o Precedence Constraints Constraints link containers and task in a package into a logical
flow and specify the conditions upon which the items are executed. There are three types of
constraints used to control package execution control flow; SSIS supports creating
constraints based on an evaluation operator or the execution results of a task.
Success:
This constraint will execute only when the prior task completes
successfully.
This Constraint can be indentified with green color arrow.
To use this constraint we have to connect prior tasks green color arrow
to the subsequent task.
Completion:
This constraint will execute when the prior task completes. Whether the
prior task succeeds or fails is inconsequential.
This Constraint can be indentified with blue color arrow.
For using this constraint we need not to connect this task.
Failure:
This constraint will execute only when the prior task fails to complete.
This constraint is frequently used to notify an operator of a failed event.
This constraint can be indentified with red color arrow.
To use this constraint we have to connect prior tasks red color arrow to
the subsequent task.
Conditional Expressions
A major improvement to precedence constraints in SSIS 2005 is the ability to dynamically follow
workflow paths based on certain conditions being met. These conditions use the new conditional
expressions to drive the workflow. An expression allows you to evaluate whether certain conditions
have been met before the task is executed and the path is followed. The constraint evaluates only the
success or failure of the previous task to determine whether the next step will be executed. Conditions
can be set by using evaluation operators. Once a precedence constraint is created, you can set the
EvalOp property to any one of the following options:


Mahindra Satyam Learning World version 1.0 58
Constraint: This is the default setting and specifies that only the constraint will be followed in
the workflow.
Expression: This option gives you the ability to write an expression (much like VB.NET) that
allows you to control the workflow based on conditions that you specify.
ExpressionAndConstraint: Specifies that both the expression and the constraint must be met
before proceeding.
ExpressionOrConstraint: Specifies that either the expression or the constraint can be met before
proceeding.


Scenario:
Mr. George has found that some of the members have provided special characters in their first
name. Given the Member ID, George would like to extract the member names through a stored
procedure by invoking it from the control flow and accordingly make corrections to the data.

Demonstration/Code Snippet:

Table to be used : Tbl_Member
Task to be used : Execute SQL Task, Script Task and VbScript Code, Stored Procedure

Step 1: Create a Stored Procedure in BCBS Database to fetch the Members FirstName using
their MemberID.
Step 2:





Mahindra Satyam Learning World version 1.0 59



Step 3: First, start Visual Studio or BI Development Studio and create a new solution, name it as
SSIS Lab and the Project Name is SSIS Training Labs.

Step 2: In the Solution Explorer, expand the SSIS Training Labs project and right-click the
SSIS Packages folder. Select New SSIS Package from the menu. A new package named
Package1.dtsx is added to the Solution Explorer. Right-click the newly created package
and select rename from the menu and type GetMemberFirstNameWithSP.dtsx for the
new name. When prompted, Do you want to rename the package object as well? select
Yes.



Step 3: Drag and Drop the Execute SQL Task from the Tool Box in the design Surface of the
Control flow Task and rename it as SQL_MemberName.


Mahindra Satyam Learning World version 1.0 60


Step 4: Double click on SQL_MemberName to open Execute SQL Task Editor.
Step 5: Now Change the following Properties in General Option and SQL Statement section

ConnectionType = ADO.Net
Connection = LocalHost.BCBS
IsQueryStoredProcedure = True
SQLStatement = GetMemberName

Step 6: Now add parameters in the Parameter Mapping Option, click on Add button to add the
parameter and set the following properties
o Variable Name = Click on the ellipsis and set the variable name as VarMemID



o Direction = Set Direction as Input or Output
o DataType = Set Data type as String for both the variable
o Parameter Name = Set the parameter Mapping

Step 7: Click on O.k.



Mahindra Satyam Learning World version 1.0 61


Step 8: Drag and Drop Script Task from the Tool Box to the Designer Surface of Control Flow
and rename it as SCR_MemberName. Drop it below to SQL_MemberName and connect
it to the success (green arrow) path of the SQL_MemberName.
Step 9: Double click on SCR_MemberName to open Script Task Editor, Set the properties in
Script section
ReadOnlyVariables = User::VarMemID, User::VarFirstName

Step 10: Click on Design Script button, Microsoft Visual Studio for Applications Editor will get
displayed and add the code in the Main () of the script

Step 11: Package is ready to Execute, Right click on the package and select Execute package
option to execute the package.


Mahindra Satyam Learning World version 1.0 62

Context:

To Create work Flow for a Package
To connect the tasks in a sequence

Practice Session:



Check list:

To make use of stored procedure from SSIS solution.
Establishing the Database Connection
To invoke the stored procedure, set the property IsQueryStoredProcedure = true

Common Errors:

Not Changing the IsQueryStoredProcedure Property to true while calling the stored
procedure from Execute SQL Task.
Not building the VBScript code before executing the Package.
Not passing the matching parameters
Not using DTS namespace while using Variables in VBScript code.
Choosing inappropriate connection for invoking the stored procedure.


Mahindra Satyam Learning World version 1.0 63
Lessons Learnt:

Using Stored Procedure with Execute SQL Task
Using Script Task to build custom scripts in VB









































Mahindra Satyam Learning World version 1.0 64


Topic: Control Flow Containers Estimated Time: 30 min.

Objectives: At the end of the activity, the participant should understand
The Use of Various Containers
Presentation:
Containers are a new concept in SSIS that didnt previously exist in DTS. They are a core unit in the SSIS
architecture that helps you logically group task together into units of work or create complex conditions.
By using containers, SSIS variables and event handlers can be defined to have the scope of the container
instead of the package.
There are four types of containers in the Control Flow tab: Task Host, Sequence, For Loop, and Foreach
containers.
Task Host Containers
The task host container is the default container that encapsulates a single task. The task host is not
configured separately. Instead, it is configured when you set the properties of the task it encapsulates. The
SSIS architecture extends the use of variables and event handlers to the task through the task host
container.
Sequence Containers
The Sequence container defines a control flow that is a subset of the package control flow. This can help
to divide a package into smaller, more-manageable pieces. Some benefits of using a Sequence container
are:
Disabling groups of tasks to focus package debugging on one subset of the package control flow.
Managing properties on multiple tasks in one location by setting properties on a Sequence
container instead of on the individual tasks.
Providing scope for variables that a group of related tasks and containers use.
You can set a transaction attribute on the Sequence container to define a transaction for a subset
of the package control flow. In this way, you can manage transactions at a more granular level.
Sequence containers are available in the Control Flow Toolbox just like any other task. After adding a
container to the Control Flow pane, drag the task you require into the container.
For Loop Container


Mahindra Satyam Learning World version 1.0 65
The For Loop container defines a repeating control flow in a package. The loop implementation is similar
to the for looping structure in common programming languages. The For Loop container evaluates an
expression and repeats the workflow until the expression evaluates to False.
The For Loop container uses the following elements to define the loop:
InitExpression: An optional initialization expression that assigns the values to the loop counters.
EvalExpression: An evaluation expression that contains the expression used to test whether the
loop should stop or continue.
AssignExpression: An optional iteration expression that increments or decrements the loop
counter.
Foreach Loop Container
The Foreach Loop container provides you the ability to loop through a collection of objects. As you loop
through the collection, the container will assign the value from the collection to a task or connection
inside the container. You may also map the value to a variable. The type of objects that you will loop
through can vary based on the enumerator you set in the editor in the Collection page. SSIS provides the
following enumerator types:
Foreach ADO enumerator to enumerate rows in a table. For example, you can get the rows in an
ADO recordset.
Foreach ADO.NET Schema Rowset enumerator to enumerate the schema information about a
data source. For example, you can enumerate and get a list of the tables in the AdventureWorks
SQL Server database.
Foreach File enumerator to enumerate files in a folder. The enumerator can traverse subfolders.
For example, you can read all the files that have the *.log file name extension in the Windows
folder.
Foreach from Variable enumerator to enumerate the enumerable object that a specified variable
contains. For example, the variable contains the result of a query that is enumerated at run time.
Foreach Item enumerator to enumerate items that are collections. For example, you can
enumerate the rows and the columns in an Excel spreadsheet.
Foreach Nodelist enumerator to enumerate the result set of an XML Path Language (XPath)
expression. For example, this expression enumerates and gets a list of all the authors in the
classical period: /authors/author [@period='classical'].
Foreach SMO enumerator to enumerate SQL Server Management Objects (SMO) objects. For
example, you can enumerate and get a list of the views in a SQL Server database.
A Foreach Loop container can include multiple tasks and containers, but it can use only one type of
enumerator. If the Foreach Loop container includes multiple tasks, you can map the enumerator collection
value to multiple properties of each task.



Mahindra Satyam Learning World version 1.0 66
Scenario:
Mr. George requires to update the Product Type in the Product table for the first 50 rows
Demonstration/Code Snippet:

Task to be used: For Loop Container, Execute SQL Task, and Script Task
Table to be used: Product
Demonstration:
Step 1: First, start Visual Studio or BI Development Studio and open the SSIS Lab solution .In
the Solution Explorer, expand the SSIS Training Labs project and right-click the SSIS
Packages folder. Select New SSIS Package from the menu. A new package named
Package1.dtsx is added to the Solution Explorer. Right-click the newly created package
and select rename from the menu and type ForLoopContainer.dtsx for the new name.
When prompted, Do you want to rename the package object as well? select Yes.
Step 2: Drag and Drop the For Loop Container from the Tool Box in the design Surface of the
Control flow Task and rename it as FLC_InsertData.
Step 3: Right Click on the designer of the Control flow Task, one pull down menu will get
displayed; select Variables options from the menu, Variables windows will get displayed
Step 4: Add two variables by clicking Add Variable Button and name them as VarResult and
VarCount, select DataType as int32.




Step 5: Double Click on FLC_InsertData, For Loop Editor will get displayed. Set the following
properties to run the For Loop Container

InitExpression property = @VarCount=1
EvalExpression property = @VarCount<=50
AssignExpression=@VarCount=@VarCount+1


Mahindra Satyam Learning World version 1.0 67



Step 6: Drag and Drop Script Task from the Tool Box inside the FLC_InsertData and rename it
as SCR_Product. Double click on the SCR_Product to get the Script task Editor, Now set
the following Properties of in Script Section.

PrecomplieScriptIntoBinaryCode=False
ReadWriteVariables = User :: VarResult

Step 7: Click on Design Script Button to write the VBScript code and write the following code.



Step 8: Build the Script code,close the Window and Click on O.k to Script Task editior.



Mahindra Satyam Learning World version 1.0 68

Step 9: Drag and Drop two Execute SQL Task from the Toolbox inside the For Loop Container ,
below the SCR_Product.Connect SCR_Product to both the Execute SQL Task using the
success (green color) Precedent constraint.



Step 10: Double Click on the Precedent constraint of Execute SQL Task 1. Precedent constraint
Editor will get displayed. Now set the following Properties

Evaluation Operation = Expression
Expression = @VarResult==1
Click on O.k button to close the Editor and green precedent constraint will covert into
Blue color.

Step 11: Double Click on the Precedent constraint of Execute SQL Task 2. Precedent constraint
Editor will get displayed. Now set the following Properties

Evaluation Operation = Expression
Expression = @VarResult==2
Click on O.k button to close the Editor and green precedent constraint will covert into
Blue color.

Step 12: Rename the Execute SQL Task as SQL_UpdateProductType and
SQL_UpdateProductType1 .
Step 13: Double click on SQL_UpdateProductType and Execute SQL Task Editor will get
displayed, Now Change the following Properties in General Option and SQL Statement
section

ConnectionType = OLEDB
Connection = LocalHost.BCBS
SQLStatement = Update Tbl_Product set Product_Type=A



Mahindra Satyam Learning World version 1.0 69

Step 14: Double click on SQL_UpdateProductType1 and Execute SQL Task Editor will get
displayed, Now Change the following Properties in General Option and SQL Statement
section

ConnectionType = OLEDB
Connection = LocalHost.BCBS
SQLStatement = Update Tbl_Product set Product_Type=B

Step 15: Package is ready to Execute. Right click on the For Loop Container.dtsx from the
Solution Explorer.



Step 16: Run Sql command in the Sql Command studio to list the updated ProductType

Context:


To execute a set of tasks repeatedly

Practice Session:

The constituent providers of a Health Insurance company have made plans to suit the
requirements for their businesses. These plans are shared in flat files on monthly basis to the
company. It is now required for George to consolidate the data into a single table, where all the
plans exists. (Hint: Use Foreach Loop Container)






Mahindra Satyam Learning World version 1.0 70

Check list:
Setting the values of Precedent Constraint

Common Errors:
Using sequence container in place of Foreach loop container
Not setting all the three properties of For Loop Container

Lessons Learnt:

Use of Script Task, Control Flow and Execute SQL Task.
The types of containers and their differences
To set the values for Precedent Constraint
To convert Success Precedent constraint into Completion.



























Mahindra Satyam Learning World version 1.0 71

Crossword: Unit-4 Estimated Time: 10 min

Across:
1) The task to transfer a SQL server database between two instances of SQL server (16)
3) This container defines a control flow that is a subset of the package control flow (8)
4) The task to provide a quick way to copy large amounts of data into a SQL server table or view (10)
5) The default container that encapsulates a single task (8)
6) The task to encapsulate the dataflow engine that moves data between sources and destination, provides
the facility to transfer, clean and modify as it is moved (8)
Down:
1) An individual unit of work (4)
2) The task to run a SQL statement or stored procedure from a package (10)








Mahindra Satyam Learning World version 1.0 72





5.0 Designing Data Flow




Topics

5.1 Understanding Data Flow

5.2 Designing Data Flow
Operations

5.3 Handling Data Changes

5.4 Crossword












Mahindra Satyam Learning World version 1.0 73
Topic: Understanding and Designing Data Flow Estimated Time: 30 min.
Objectives: At the end of the activity, the participant should understand
ETL with Dataflow
Importance of Data Sources, Transformations and Destinations
Various Wizards for developing SSIS application


Presentation:

The Data Flow task encapsulates the data flow engine
that moves data between a source and its destination.
Data Flow provides the facility to transform, clean, and
modify data as it is moved. Addition of a Data Flow task
to a package control flow makes it possible for the
package to extract, transform, and load data.
A Data Flow task usually consists of a set of connected
data flow elements. These elements may include sources
that extract data, transformations that modify and
aggregate data, destinations that load data, and paths
that connect the outputs and inputs of the data flow
elements into a data flow.
At run time, the Data Flow task first builds an execution
plan based on the elements within the task container.
Then, the data flow engine executes the plan. It is possible to create a Data Flow task with no data
flow, but the task will execute only if it includes at least one data flow.
A Data Flow task can include multiple data flows into a single Data Flow task.
Data Flow tasks also manage error flows. At run time, row-level errors may occur when data flow
components convert data, perform a lookup, or evaluate expressions. These errors can be
redirected using an error flow.
A data flow allows for a successful code path and error conditions for each data-read,
transformation, or write operation. It also allows for highly parallel operations.

Data Flow Tasks

A Data Flow task is a special type of task that encapsulates data sources, transformations,
destinations, and the paths that connect these components. It is the redesigned and enhanced
version of the SQL Server 2000 data pump.

Source: - A source is the data flow component that makes data from different external data
sources available to the other components in the data flow. SSIS provides a variety of sources
such as DataReader, Microsoft Excel, Flat File, and OLE DB source.

Transformation: - A transformation is the data flow component that aggregates, merges,
distributes, and modifies the data in a package. SSIS provides a variety of transformations such
as business intelligence transformations, row transformations, and split and join transformations.


Mahindra Satyam Learning World version 1.0 74
SSIS also provides Multicast transformation. This transformation is used by a package to create
logical copies of data and apply multiple sets of transformations to the same data.

Destination: - A destination is the data flow component that loads the data in a data flow into
various types of data sources or creates an in-memory dataset. SSIS offers various destinations
such as DataReader, Excel, OLE DB, and Flat File destinations. The DataReader destination
exposes the data in a data flow by using the ADO.NET DataReader interface. This data can then
be used by other applications. For example, you can configure the data source of a Reporting
Services report to use the result of running a SSIS package with a data flow that implements the
DataReader destination.










Wizards
Microsoft SQL Server 2005 Integration Services includes a set of wizard for construct simple packages,
create package configurations, deploy Integration Services projects and migrate SQL Server 2000 DTS
packages. There are three types of Wizards.

Import and Export Wizard

The Import and Export wizard provides the easiest means to move data between data sources.
This Wizards allows a developer to delegate the trivial work to the wizard and focus on the more
complex tasks required.

Package Installation Wizard

The Package Installation wizard guides the process of deploying packages to the file system or to
SQL Server.
The file-based dependencies for packages are always installed to the file system.
If the package is installed to the file system, the dependencies are installed in the same folder as
the one that specify for the package.
If the package is installed to SQL Server, the folder can be specified in which to store the file-
based dependencies.
If the package includes configurations to modify for use on the destination computer, the values
of the properties can be update by using the wizard.



Mahindra Satyam Learning World version 1.0 75
Package Configuration Wizard

The Package Configuration wizard creates the configurations that update the values of properties
of packages and package objects at run time.

Package Migration Wizard

The Package Migration Wizard migrates SQL Server 2000 DTS packages to SQL Server 2005.
SQL Server 2000 DTS packages that are stored in a SQL Server 2000 msdb database, in Meta
Data Services, or in structured storage files can also be migrated.
The packages can be migrated to the file system as .dtsx files or to the SQL Server 2005 msdb
database.

Scenario:
Mr. George requires generating a separate text file for Member Details and wants to explore the
option to use wizard for this assignment.
Demonstration/Code Snippet:

Step 1: First, start Visual Studio or BI Development Studio and open the SSIS Lab solution .In
the Solution Explorer, expand the SSIS Training Labs project and right-click the SSIS
Packages folder.
Step 2: Select SSIS Import and export Wizard from the menu. SQL Server Import and Export
Wizard will get displayed. Click on the Next Button to the Data Source Window.
Step 3: Select the Data Source from where you want to Load the data and set the following
Properties.
Data Source : SQL Native Client
Server Name : LocalHost
Authentation : Windows Authentation
Database Name: BCBS
Click on the Next button.

Step 4: Select the Data Destination from where you want to Load the data and set the following
Properties.
Destination : Flat File Destination
File Name : Go to Browse and select the Destination as
C:\MembersData\MemberDetails.txt
Select the checkbox Column Name in the first DataRow and click on Next Button.



Mahindra Satyam Learning World version 1.0 76

Step 5: Next Screen is Specify Table Copy or Query, Select Copy data from one or more
tables or views and click on the Next Button
Step 6: Next Screen is to select the Data Source table select Tbl_Member table and click on Next.
Step 7: Click on the Next Button and Package is ready to execute.
Step 8: Right Click on the package from the solution Explorer and rename it to
MemberDetails.dtsx
Step 9: Right Click on the Package and Click on Execute Package from the menu, txt file is being
created at the given location.

Context:

Extraction of data from various data source
Transforming the data before loading into Data Warehouse
Loading Data into Data Warehouse with or without Transformation

Practice Session:

Create an application to generate the text file for the subscribers.

Check list:

Data Destination as SQL Native Client
Connection has been initialization or not

Common Errors:

Using SQL Data Destination in place of OLE DB Data Destination
Using a connection without initialization
Wrong selection of Data Source

Exceptions:

Using OLEDB Data destination to load data into XML Files.
Lessons Learnt:

Various Data Sources
Various Transformation for cleaning the data in the database





Mahindra Satyam Learning World version 1.0 77

Crossword: Unit-5 Estimated Time: 10 min.


Across:
1) The wizard that guides the process of deploying packages to the file system or to the sql server (19)
2) The wizard that provides the easiest mean to move data between data sources (12)
3) The data flow component that aggregates, merge, distribute and modifies the data in a package (14)
Down:
1) The Wizard to migrate SQL 2000 DTS package to SQL server 2005 (16)







Mahindra Satyam Learning World version 1.0 78



6.0 Implementing Data Flow




Topics

6.1 Data Flow Sources and
Destinations
6.2 Basic Data Flow
Transformations
6.3 Advanced Data Flow
Transformations
6.4 Data Flow Paths
6.5 Crossword














Mahindra Satyam Learning World version 1.0 79
Topic: Data Flow Sources and Destinations Estimated Time: 30 min.

Objectives: At the end of the activity, the participant will be able to understand:

Various sources to Extract Data in Data Flow
Various destination to load Data in Data Flow

Presentation:
The data flow handles the transformation of data. Almost anything that manipulates data falls into
the data flow category. As data moves through each step of the data flow, the data changes based
on what the transform does.

Sources
A source is where you specify the location of your source data to pull from in the data pump.
Sources will generally point to the Connection Manager in SSIS. By pointing to the Connection
Manager, you can reuse connections throughout your package, because you need only change the
connections in one place. Six sources can be used out of the box with SSIS:
DataReader Source: This source uses an ADO.NET connection manager to connect to an
Integration Services data source.
Excel Source: This source extracts data from worksheets or ranges in Microsoft Excel
workbooks. The Excel source provides four different data access modes for extracting data:
o A table or view.
o A table or view specified in a variable.
o The results of an SQL statement. The query can be a parameterized query.
o The results of an SQL statement stored in a variable.
Flat File Source: This source reads data from a text file. The text can be in delimited, fixed width
or mixed format.
o Delimited format uses column and row delimiters to define columns and rows.
o Fixed width format uses width to define columns and rows. This format also includes a
character for padding fields to their maximum width.
o Ragged right format uses width to define all columns, except for the last column, which
is delimited by the row delimiter.
OLE DB Source: The OLE DB source extracts data from a variety of OLE DB-compliant
relational databases using a database table, a view, or an SQL command. For example, the OLE
DB source can extract data from tables in Microsoft Access or SQL Server databases.
The OLE DB source provides four different data access modes for extracting data:
o A table or view.
o A table or view specified in a variable.
o The results of an SQL statement. The query can be a parameterized query.
o The results of an SQL statement stored in a variable.


Mahindra Satyam Learning World version 1.0 80
Raw File Source: This source reads raw data from a file. Because the representation of the data
is native to the source, the data requires no translation and almost no parsing. This means that the
Raw File source can read data more quickly than other sources such as the Flat File and the OLE
DB sources. The Raw File source is used to retrieve raw data that was previously written by the
Raw File destination.
XML Source: This source reads an XML data file and populates the columns in source outputs
with the data.

Destinations
Destinations are the data flow components that load data in a data flow into different types of data
sources or create an in-memory dataset. SSIS can send the data to nearly any OLE DB-compliant
data source or to a flat file. The following destinations are available in SSIS.
Data Mining Model Training Destination: This destination trains data mining models by
passing the data that the destination receives through the data mining model algorithms. Multiple
data mining models can be trained by one destination if the models are built on the same data
mining structure.
DataReader Destination: This destination exposes the data in a data flow by using the
ADO.NET DataReader interface. The data can then be consumed by other applications. For
example, you can configure the data source of a Reporting Services report to use the result of
running a Microsoft SQL Server 2005 Integration Services (SSIS) package.
Dimension Processing Destination: This destination loads and processes an SQL Server 2005
Analysis Services (SSAS) dimension.
Excel Destination: This destination loads data into worksheets or ranges in Microsoft Excel
workbooks.
Flat File Destination: This destination writes data to a text file. The text file can be in delimited,
fixed width, fixed width with row delimiter, or ragged right format.
OLE DB Destination: This destination loads data into a variety of OLE DB-compliant databases
using a database table or view or an SQL command. For example, the OLE DB source can load
data into tables in Sybase and SQL Server 2005 databases.
Partition Processing Destination: This destination loads and processes an SQL Server 2005
Analysis Services (SSAS) partition.
Raw File Destination: This destination writes raw data to a file. Because the format of the data is
native to the destination, the data requires no translation and little parsing. This means that the
Raw File destination can write data more quickly than other destinations such as the Flat File and
the OLE DB destinations.
Recordset Destination: This destination creates and populates an in-memory ADO recordset.
The shape of the recordset is defined by the input to the Recordset destination at design time.
SQL Server Mobile Destination: This destination writes data to SQL Server Mobile databases.
SQL Server Destination: This destination connects to a SQL Server database and bulk loads
data into SQL Server tables and views. This destination offers the same high-speed insertion of
data into SQL Server that the Bulk Insert task provides; however, by using the SQL Server


Mahindra Satyam Learning World version 1.0 81
destination, a package can apply transformations to column data before the data is loaded into
SQL Server.

Scenario:

Mr. George requires generating a separate Excel file for Member Details, but do not want to
explore the option of using Wizards for DataFlow.

Demonstration/Code Snippet:

Task to be used: Dataflow Task, OLEDB Data Source and Excel File Destination
Table to be used: Tbl_Member
Step 1: First, start Visual Studio or BI Development Studio and open the SSIS Lab solution .In
the Solution Explorer, expand the SSIS Training Labs project and right-click the SSIS
Packages folder. Select New SSIS Package from the menu. A new package named
Package1.dtsx is added to the Solution Explorer. Right-click the newly created package
and select rename from the menu and type SimpleDataFlow.dtsx for the new name.
When prompted, Do you want to rename the package object as well? select Yes.

Step 2: Drag and Drop the Data Flow Task from the Tool Box in the design Surface of the
Control flow Task and rename it as DFT_MemberDetails.



Mahindra Satyam Learning World version 1.0 82

Step 3: Double click on the DFT_MemberDetails; Data Flow Task Designer surface will get
displayed.
Step 4: Drag and Drop OLEDB Data Source from the Toolbox to the Data Flow Designer surface
and name it as OLE_SRC_MemberDetails.
Step 5: Double-click the OLE_SRC_MemberDetails. The OLE DB Source Editor Dialog box is
displayed. Configure this dialog by setting the following properties
OLEDB Connection Manager : BCBS
Data access mode : Table or View
Name of the table : Tbl_Member
Select the required columns from the column option and click on OK button.
Step 6: Drag and Drop Excel Destination from the Toolbox to the Data Flow Designer surface
and name it as Ex_Dst_MemberDetails. Connect OLE_SRC_MemberDetails to
Ex_Dst_MemberDetails with the green connector.
Step 7: Double click on Ex_Dst_MemberDetails, Excel Destination Editor will get displayed,
now set the following properties.
Connection Manager
OLEDB Connection Manager-> New -> Browse -> Select the Excel File




Mahindra Satyam Learning World version 1.0 83

Click on OK Button and select the New Button to create a new sheet and click on OK to
create Excel File.
Step 8: Now go to Mapping Option and see the Mapping the final Excel file Mapping has been
done.
Step 9: Right click on the Package from the solution Explorer and Execute the package and after
running the package, Excel file will be created at the given destination.

Context:

Extracting data from single or multiple sources
Loading data into data warehouse from heterogeneous data sources.

Practice Session:

Explore Other Data Sources and Data Destination

Check list:

Required data sources and data destinations

Common Errors:

Not selecting the checkbox (Columns Name in the first Data row) while creating the
connection.


Mahindra Satyam Learning World version 1.0 84



Exceptions:
Improper use of separator in Flat File destination
Lessons Learnt:

Use of Various Data Sources
SSIS Data Flow task
Use of Various Data Destinations



















Mahindra Satyam Learning World version 1.0 85

Topic: Basic Data Flow Transformation Estimated Time: 40 min.

Objectives: At the end of the activity, the participant will be able to understand:

Various transformations before loading data into multiple data destinations

Presentation:
Transformations
Transformations are key components to the data flow that aggregate, merge, distribute, and
modify data. Transformations can also perform lookup operations and generate sample datasets.
A major change from DTS is that transformations in SSIS are all done in-memory. The following
is the complete set of transformations available in SSIS.
Aggregate Transformation: This transformation applies aggregate functions, such as Average,
to column values and copies the results to the transformation output. Besides aggregate functions,
the transformation provides the GROUP BY clause, which can be used to specify groups to
aggregate across.
Audit Transformation: This transformation enables the data flow in a package to include data
about the environment in which the package runs. For example, the name of the package,
computer, and operator can be added to the data flow. SSIS includes system variables that
provide this information.
Character Map Transformation: This transformation applies string functions, such as
conversion from lowercase to uppercase, to character data. This transformation operates only on
column data with a string data type.
Conditional Split Transformation: This transformation can route data rows to different outputs
depending on the content of the data. The implementation of the Conditional Split transformation
is similar to a CASE decision structure in a programming language. The transformation evaluates
expressions, and, based on the results, directs the data row to the specified output. This
transformation also provides a default output, so that if a row matches no expression it is directed
to the default output.
Copy Column Transformation: This transformation creates new columns by copying input
columns and adding the new columns to the transformation output. Later in the data flow,
different transformations can be applied to the column copies. For example, use the Copy Column
transformation to create a copy of a column and then convert the copied data to uppercase
characters by using the Character Map transformation, or apply aggregations to the new column
by using the Aggregate transformation.
Data Conversion Transformation: This transformation converts the data in an input column to
a different data type and then copies it to a new output column. For example, a package can
extract data from multiple sources, and then use this transformation to convert columns to the


Mahindra Satyam Learning World version 1.0 86
data type required by the destination data store. You can apply multiple conversions to a single
input column.
Data Mining Query Transformation: This transformation performs prediction queries against
data mining models. This transformation contains a query builder for creating Data Mining
Extensions (DMX) queries. The query builder lets you create custom statements for evaluating
the transformation input data against an existing mining model using the DMX language
Derived Column Transformation: This transformation creates new column values by applying
expressions to transformation input columns. An expression can contain any combination of
columns from the transformation input, variables, functions, and operators. The result can be
added as a new column or inserted into an existing column as a replacement value. The Derived
Column transformation can define multiple derived columns, and any variable or input columns
can appear in multiple expressions.
Export Column Transformation: This transformation reads data in a data flow and inserts the
data into a file. For example, if the data flow contains product information, such as a picture of
each product, you could use the Export Column transformation to save the images to files.
Fuzzy Grouping Transformation: This transformation performs data cleaning tasks by
identifying rows of data that are likely to be duplicates and selecting a canonical row of data to
use in standardizing the data. The transformation requires a connection to an instance of SQL
Server 2005 to create the temporary SQL Server tables that the transformation algorithm requires
to do its work. The connection must resolve to a user who has permission to create tables in the
database.
Fuzzy Lookup Transformation: This transformation performs data cleaning tasks such as
standardizing data, correcting data, and providing missing values. This transformation differs
from the Lookup transformation in its use of fuzzy matching. The Lookup transformation uses an
equi-join to locate matching records in the reference table. It returns either an exact match or
nothing from the reference table. In contrast, the Fuzzy Lookup transformation uses fuzzy
matching to return one or more close matches from the reference table.
Import Column Transformation: This transformation reads data from files and adds the data to
columns in a data flow. Using this transformation, a package can add text and images stored in
separate files to a data flow. For example, a data flow that loads data into a table that stores
product information can include the Import Column transformation to import customer reviews of
each product from files and add the reviews to the data flow.
Lookup Transformation: This transformation performs lookups by joining data in input
columns with columns in a reference dataset. The reference dataset can be an existing table or
view, a new table, or the result of an SQL statement. The Lookup transformation uses an OLE
DB connection manager to connect to the database that contains the data that is the source of the
reference dataset.
Merge Transformation: This transformation combines two sorted datasets into a single dataset.
The rows from each dataset are inserted into the output based on values in their key columns.
Merge Join Transformation: This transformation provides an output that is generated by joining
two sorted datasets using a FULL, LEFT, or INNER join. For example, you can use a LEFT join
to join a table that includes product information with a table that lists the country/region in which


Mahindra Satyam Learning World version 1.0 87
a product was manufactured. The result is a table that lists all products and their country/region of
origin.
Multicast Transformation: This transformation distributes its input to one or more outputs. This
transformation is similar to the Conditional Split transformation. Both transformations direct an
input to multiple outputs. The difference between the two is that the Multicast transformation
directs every row to every output, and the Conditional Split directs a row to a single output.
OLE DB Command Transformation: This transformation runs an SQL statement for each row
in a data flow. For example, you can run an SQL statement that inserts, updates, or deletes rows
in a database table.
Union All Transformation: This transformation combines multiple inputs into one output. For
example, the outputs from five different Flat File sources can be inputs to the Union. All
transformation and combined into one output.

Scenario:

Mr. Tom is a National Sales Manager in one of the leading Insurance Company. It is required to obtain
the Member Details based on their Title. The output generated in separate text files for each of the
member title shall have the Fully Qualified Customer Name

Demonstration/Code Snippet:

Task to be used: Derived Column, Conditional Split, Flat File Destination and Dataflow Task
Table to be used: Tbl_Subscriber
Step 1: First, start Visual Studio or BI Development Studio and open the SSIS Lab solution .In
the Solution Explorer, expand the SSIS Training Labs project and right-click the SSIS
Packages folder. Select New SSIS Package from the menu. A new package named
Package1.dtsx is added to the Solution Explorer. Right-click the newly created package
and select rename from the menu and type ConditionalSplitPackage.dtsx for the new
name. When prompted, Do you want to rename the package object as well? select Yes.


Mahindra Satyam Learning World version 1.0 88

Step 2: Drag and Drop the Data Flow Task from the Tool Box in the design Surface of the
Control flow Task and rename it as DFT_Group.

Step 3: Double click on the DFT_Group; Data Flow Task Designer surface will get displayed.
Step 4: Drag and Drop OLEDB Data Source from the Toolbox to the Data Flow Designer surface
and name it as OLE_SRC_Group.
Step 5: Double-click the OLE_SRC_Subscriber. The OLE DB Source Editor Dialog box is
displayed. Configure this dialog by setting the following properties
OLEDB Connection Manager : BCBS
Data access mode : Table or View
Name of the table : Tbl_Subscriber



Mahindra Satyam Learning World version 1.0 89

Step 6: Go to Columns option and select the required column and click on O.k.

Step 7: Select and drag a Derived Column Task from the toolbox to the design surface and name
it as DER_Names, placing it directly below the OLE_SRC_Group. Select the
OLE_SRC_Group and drag the data path (green arrow) from the bottom and drop it on
the Derived Column Task.
Step 8: Double click on DER_Names to display Derived Column Transformation Editor. This
editor has three panes: a pane for variables and columns available for use in deriving a
new column, a pane that provides operations to be performed on derived columns, and a
pane that contains a data grid for creating the derived columns.


Mahindra Satyam Learning World version 1.0 90
Change the Following Properties in Data Grid Type.

Step 9: Click on O.K.
Step 10: Select and drag a Conditional Split Transformation from the toolbox to the design surface
and rename it as CSPL_StateWise, drop it below the DER_Names and connect it to the
success (green arrow) path of the DER_Names.
Step 11: Double-click the CSPL_StateWise to display Conditional Split Transformation Editor.
Much like the Derived Column Transformation Editor, this editor is broken into three
panes. The top two provide variables, columns, and operations that can be used to split
the dataset. The grid located in the bottom pane allows you to enter the expression that
will be used to split the dataset and provide a name and order for the output datasets.
Step 12: Drag and drop Title from columns folder to Condition and set the following properties:


Mahindra Satyam Learning World version 1.0 91


Step 13: Drag and drop four Flat File Destination Task from the Tool Box in the design Surface of
the Data Flow Task and rename them as FF_DST_DST_Analyst,
FF_DST_DST_Executive, FF_DST_DST_Manager and FF_DST_DST_Others.
Step 14: Connect all the four Flat File Destination to CSRT_StateWise by green connector and
select the specific case name from Input Output Selection->Output combobox.
Step 16: Double click on FF_DST_DST_Analyst and change the following properties of
Connection manager -> New
General
Flat file format Delimited
Connection Manager Name Case Analyst
File Name Browse -> c:\Members Data\Analyst.txt
Select the checkbox (Columns Name in the first Data row)
Columns
Select the Row identifier and Column delimiter.
Advanced
Select FirstName, Middle Name and LastName Field and Delete.
Mapping
Check the Mapping of the Fields whether it is properly mapped or not

Step 17: Click on O.k. and the application is ready to execute.


Mahindra Satyam Learning World version 1.0 92


Step 18: Right Click on the package from solution explorer and Execute the Package.
Step 19: All Files have been generated at the given destination.

Context:
Transforming data
Merging data from multiple sources before loading into data warehouse
Cleaning the data

Practice Session:

Mr. Jefferson is a National Sales Manager in one of the leading Insurance Company. It is required to
obtain the Top 4 states, which has maximum customers for their policies. The output generated in
separate text files for each of the 4 states shall have the Fully Qualified Customer Name.

Check list:



Mahindra Satyam Learning World version 1.0 93
Red color doted line for checking the condition that the data filled in the condition is right or
wrong.

Common Errors:

Using derived column and original columns together.

Lessons Learnt:

Splitting the table into multiple parts
Creating a new column from multiple columns using Derived column
Using Multiple conditions to split the table
Using Multiple Destinations


























Mahindra Satyam Learning World version 1.0 94

Topic: Advanced Data Flow Transformation Estimated Time: 60 min.

Objectives: At the end of the activity, the participant will be able to understand:

Advanced data flow transformations

Presentation:
Percentage Sampling Transformation: This transformation creates a sample data set by
selecting a percentage of the transformation input rows. The sample data set is a random selection
of rows from the transformation input, to make the resultant sample representative of the input.
The Percentage Sampling Transformation is useful for creating sample data sets
or package development. By applying the Percentage Sampling
transformation to a data flow, you can uniformly reduce the size of the data set
while preserving its data characteristics.
Pivot Transformation: This transformation makes a normalized data set into a less normalized
but more compact version by pivoting the input data on a column value. For example, a
normalized Orders data set that lists customer name, product, and quantity purchased typically
has multiple rows for any customer who purchased multiple products, with each row for that
customer showing order details for a different product. By pivoting the data set on the product
column, the Pivot transformation can output a data set with a single row per customer. That single
row lists all the purchases by the customer, with the product names shown as column names, and
the quantity shown as a value in the product column. Because not every customer purchases every
product, many columns may contain null values.
Row Count Transformation: This transformation counts rows as they pass through a data flow
and stores the final count in a variable.
Row Sampling Transformation: This transformation is used to obtain a randomly selected
subset of an input dataset. You can specify the exact size of the output sample, and specify a seed
for the random number generator.
Script Component: This transformation hosts script and enables a package to include and run
custom script code.
Slowly Changing Dimension Transformation: This transformation coordinates the updating
and inserting of records in data warehouse dimension tables.
Sort Transformation: This transformation sorts input data in ascending or descending order and
copies the sorted data to the transformation output. You can apply multiple sorts to an input; each
sort is identified by a numeral that determines the sort order. The column with the lowest number
is sorted first, the sort column with the second lowest number is sorted next, and so on.
Term Extraction Transformation: This transformation extracts terms from text in a
transformation input column, and then writes the terms to a transformation output column. The


Mahindra Satyam Learning World version 1.0 95
transformation works only with English text and it uses its own English dictionary and linguistic
information about English.
Term Lookup Transformation: This transformation matches terms extracted from text in a
transformation input column with terms in a reference table. It then counts the number of times a
term in the lookup table occurs in the input data set, and writes the count together with the term
from the reference table to columns in the transformation output. This transformation is useful for
creating a custom word list based on the input text, complete with word frequency statistics.
Unpivot Transformation: This transformation makes a non-normalized dataset into a more
normalized version by expanding values from multiple columns in a single record into multiple
records with the same values in a single column. For example, a dataset that lists customer names
has one row for each customer, with the products and the quantity purchased shown in columns in
the row. After the Unpivot transformation normalizes the data set, the data set contains a different
row for each product that the customer purchased.

Scenario:

Mr. George has a new policy to launch for the existing insurance members. Members can get the benefit
of the policy by fulfilling some of the conditions
Condition 1: Members who are also the subscribers.
Demonstration/Code Snippet:

Table(s) to be used : Tbl_Subscriber
Task(s) to be used : Data Flow, Flat File Source, Fuzzy Look Up, OLEDB Destination

Step 1: First, start Visual Studio or BI Development Studio and open the SSIS Lab solution .In
the Solution Explorer, expand the SSIS Training Labs project and right-click the SSIS
Packages folder. Select New SSIS Package from the menu. A new package named
Package1.dtsx is added to the Solution Explorer. Right-click the newly created package
and select rename from the menu and type FuzzyLookUpPackage.dtsx for the new
name. When prompted, Do you want to rename the package object as well? select Yes.
Step 2: Drag and Drop the Data Flow Task from the Tool Box in the design view of the package
and rename it as DFT_FuzzyLookUp.
Step 3: Double click on the DFT_FuzzyLookup to open the Data Flow Designer Surface.
Step 4: Drag and Drop Flat File Source from the Toolbox to the Designer Surface of Data Flow
and rename it as FF_SRC_Members. Double click on FF_SRC_Members, Flat File
Source Editor will get displayed. Now set the connection properties as follows.


Mahindra Satyam Learning World version 1.0 96

In Connection Manager Section, Click on New button and provide new connection
manager name as MemberConnection, In General Section, Select the File Name as
C:\MemberData\MemberDetials.txt and select the Checkbox (Column names in the first
data row. Preview the column names and click on OK.
Step 5: Drag and Drop the Fuzzy Lookup Task from Toolboxs Data Flow transformations on to
the Designer Surface of the Data flow Transformation and Rename it as FZL_Members.
Step 6: Connect the Flat File Source task to FZL_Members with the green arrow and Double
Click on the FZL_Members to set the properties as follows
Reference Table tab
Connection Manager LocalHost.BCBS
Table Name Tbl_Subscriber


Mahindra Satyam Learning World version 1.0 97

Step 7: Columns tab
Map the Available input columns to Available lookup columns

Step 8: Advanced tab
Set the following properties
Space : Checked


Mahindra Satyam Learning World version 1.0 98
Tab : Checked
Carriage Column : Checked
Line Feed : Checked



Step 9: Drag and Drop OLEDB Destination Task from Toolbox to the designer surface of data
flow and rename is as OLE_DST_Members. Establish the connection from
FZL_Members to OLE_DST_Members.

Step 10: Double click on it to get the OLEDB Destination Editor and the Properties as follows.


Mahindra Satyam Learning World version 1.0 99
Connection Manager Tab
OLEDB Connection Manager : BCBS
Data access mode : Table or View
Name of the table : Select new button to create a new table

Step 11: Go to the Mappings tab of the OLEDB Destination Editor and map Available Input
Columns to Available Destination Columns and click on OK button
Step 12: Package is ready to run, save and execute the package by right click on the package from
the solution explorer
Step 13: Navigate to SQL Server Management Studio to view the FuzzyResult

Context:
Satisfying more than one transformation at the same time
Transforming data before loading into the database.
Cleansing and extending the input data

Practice Session:
The xyz Health Insurance Company planned to extend the benefits of the existing plan by 1 year to the
plan A and plan B based on their (GroupID, ClassID) without any additional cost to the member. In
this context, it is required to obtain the details of members for each group and merge them into the new
data file of the new plan.


Mahindra Satyam Learning World version 1.0 100

Check list:

Provide the close matches values from the reference table while using Fuzzy lookup.

Common Errors:

The Fuzzy Lookup transformation needs access to a reference data source that contains
the values that are used to clean and extend the input data

Lessons Learnt:

The Fuzzy Lookup transformation performs data cleansing tasks.
This transformation differs from the Lookup transformation in its use of fuzzy matching.
The Lookup transformation returns either an exact match or nothing from the reference table.



















Mahindra Satyam Learning World version 1.0 101
Topic: Data Flow Paths Estimated Time: 60 min.

Objectives: At the end of the activity, the participant will be able to understand:

Importance of Data Paths

Presentation:
A path connects two components in a data flow by connecting the output of one data flow
component to the input of another component.
A path has a source and a destination as if a path connects an OLE DB source and a Sort
transformation, the OLE DB source is the source of the path, and the Sort transformation is the
destination of the path. The source is the component where the path starts, and the destination is
the component where the path ends.
The configurable path properties include the name, description, and annotation of the path.
A path annotation displays the name of the path source or the path name on the design surface of
the Data Flow tab in SSIS Designer.
Path annotations are similar to the annotations that can be added to data flows, control flows, and
event handlers. The only difference is that a path annotation is attached to a path, whereas other
annotations appear on the Data Flow, Control Flow, and Event Handler tabs of the SSIS
Designer.

Scenario:

All the National sales managers of HIM Company are supposed to submit the no. of new members
subscribed for the policies to the CEO of the Company. Mr. George requires applying filters on the
extracted data using .Net application to generate reports on the fly with inputs from the user interface of
an ASP.Net application. All this is to improve the performance by caching the data in the data reader,
thereby to reduce the load on the database.

Demonstration/Code Snippet:

Task to be used : Dataflow Task, OLEDB Data Source, Data Reader Destination
Table to be used : Tbl_Member
Step 1: First, start Visual Studio or BI Development Studio and open the SSIS Lab solution .In
the Solution Explorer, expand the SSIS Training Labs project and right-click the SSIS
Packages folder. Select New SSIS Package from the menu. A new package named
Package1.dtsx is added to the Solution Explorer. Right-click the newly created package


Mahindra Satyam Learning World version 1.0 102
and select rename from the menu and type DataReaderPackage.dtsx for the new name.
When prompted, Do you want to rename the package object as well? select Yes.
Step 2: Drag and Drop the Data Flow Task from the Tool Box in the design Surface of the
Control flow Task and rename it as DFT_Members.
Step 3: Double click on the DFT_Members; Data Flow Task Designer surface will get displayed.
Step 4: Drag and Drop OLEDB Data Source from the Toolbox to the Data Flow Designer surface
and name it as OLE_SRC_Members.
Step 5: Double-click the OLE_SRC_Members. The OLE DB Source Editor Dialog box is
displayed. Configure this dialog by setting the following properties
OLEDB Connection Manager : BCBS
Data access mode : Table or View
Name of the table : Tbl_Member
Go to the Columns tab, select required columns and click on OK.
Step 6: Select and drag a Derived Column Task from the toolbox to the design surface and name
it as DER_Names, placing it directly below the OLE_SRC_Members. Select the
OLE_SRC_Members and drag the data path (green arrow) from the bottom and drop it on
the DER_Names.
Step 7: Double click on DER_Names to display Derived Column Transformation Editor. This
editor has three panes: a pane for variables and columns available for use in deriving a
new column, a pane that provides operations to be performed on derived columns, and a
pane that contains a data grid for creating the derived columns.
Change the Following Properties in Data Grid Type.



Mahindra Satyam Learning World version 1.0 103

Step 8: Click on O.K.
Step 9: Drag and Drop DataReader Destination Task from the Tool Box in the design view of the
Data Flow and name it as DR_DST_Names.
Step 10: Create a connection from DER_Names to DR_DST_Names. Now double click on
Connectivity of to open Data Flow Path Editor.

Step 11: Select the Data Viewers to Configure Data Viewers, click on ADD button and select Grid
Viewer. This will act as the OLE DB Source Output Data viewer. Click on OK.

Step 12: Now the data is in the DR_DST_Names. You can also configure the data in
DR_DST_Names. Double Click on DR_DST_Names, Advanced Editor for
DR_DST_Names will get displayed. Go to Input Columns tab and select the columns
required in the report (Group_ID, SubscriberID, Member_ID, Member Name,


Mahindra Satyam Learning World version 1.0 104

Step 13: Select the Input and Output Tab and Check the output parameter and click ok.
Step 14: Package is ready to run, once you will run the package the output will be loaded in the
DR_DST_Names and can be used in any of the .Net Application.
Step 15: Right Click on the package from solution explorer and Execute the Package.


Context:
Using Data Paths
Output of one data flow component to the input of another component
Using the output of SSIS application as an input of SSRS to generate the reports.

Practice Session:



Mahindra Satyam Learning World version 1.0 105

Use the above application to generate a report from SSRS/Crystal Reports or any .Net application.

Check list:

Understanding of Data Path

Common Errors:

Not connecting the path to the appropriate destination
Not setting the input and output parameters in Data Reader Destination
Not using Add new columns while creating a new column in Derived Column
Lessons Learnt:

Passing output of one dataflow to the input of another component.
Using two dataflow together.















































Mahindra Satyam Learning World version 1.0 106

Crossword: Unit-6 Estimated Time: 10 min.

Across:
1) The transformation that reads data in the data flow and inserts data into a file (12)
4) The transformation to obtain a randomly selected subset of an input dataset (11)
5) The Data source used by ADO.Net connection manager to connect to an integration services data
source (10)
6) The GROUP BY clause is supported by this transformation (9)
8) The transformation that combines multiple inputs into one output (8)
Down:
2) The transformation to convert from lowercase to uppercase for the character data (12)
3) ________ transformation enables the dataflow in a package to include data about the environment in
which the package runs (5)
7) _______________ transformation perform data cleaning task such as standardizing data, correcting
data and providing missing value (11)




Mahindra Satyam Learning World version 1.0 107





7.0 Logging, Error Handling, and Reliability




Topics

7.1 Logging ETL Operations

7.2 Handling Errors in SSIS

7.3 Implementing Reliable ETL
Processes with SSIS

7.4 Crossword













Mahindra Satyam Learning World version 1.0 108

Topic: Logging ETL Operations Estimated Time: 45 min.

Objectives: At the end of the activity, the participant will be able to understand:

Importance of Logging in ETL operation.
Multiple logging files
Various methods to implement logging

Presentation:
SSIS provides several features that enhance ETL operations.
After deploying a package, implement log providers on packages, containers, and tasks to
capture information on events that occur at run time during ETL operations.
Logging enables recording information about events exist in the running package.
The logging information can be stored in a text or XML file, to a SQL Server table, to the
Windows Event Log, or to a file suitable for Profiler.
Logging can be enabled for all or some tasks and containers and for all or any events.
Tasks and containers can inherit the setting from parent containers.
Multiple logs can be set up, and a task or event can log to any or all logs configured.
Logging has the ability to control which pieces of information are recorded for any event.
Logging settings can be saved as a template; alternately, the previously saved template can be
used in the new package.
There are more than a dozen events that can be
logged for each task or package.
Partial logging for one task can be enabled and
enable much more detailed logging for another
task in the same package.
Some of the common events that may be
monitored are OnError, OnPostValidate,
OnProgress, and OnWarning.
The logs can be written to nearly any
connection; SQL Profiler, text files, SQL Server, Windows Event Log, or an XML file.

The following table displays the locations to which the SSIS log providers write to.
Log Provider Type Description
Text File Log Provider Writes log entries to ASCII text files by using the Comma-


Mahindra Satyam Learning World version 1.0 109
Separated Value (CSV) format. The default file extension for this
provider is .log.
SQL Profiler Log Provider Writes traces that you can view by using SQL Server Profiler. The
default file extension for this provider is .trc.
SQL Server Log Provider Writes log entries to the sysdtslog90 table in a SQL Server 2005
database.
Microsoft Windows Event
Log Provider
Writes entries to the Application log in the Windows Event log.
XML File Log Provider Writes log files to an XML file. The default file extension for this
provider is .xml.



Scenario:

Mr. George has been maintaining SSIS projects for the Health Insurance Management Company. He
noticed that there are some anomalies in the execution with different data sets. He realized that the
timestamps represent the execution times of each of the tasks, and the performance of the package can be
improved by appropriately caching the datasets. The logging of activities is planned, so that a postmortem
can be made to fix the anomalies and performance challenges.

Demonstration/Code Snippet:

Step1: Open SQL Server BIDS, open SSISLab Solution and select SSISTrainingLab Project
Step 2: Select Logging option from SSIS Menu, Logging editor will get displayed.



Mahindra Satyam Learning World version 1.0 110

Step 2: Go to the containers section to the left side of editor and select DataReaderDestination checkbox
Step 3: Go to providers and Logs tab on the right side of the editor and click on Add button, one
connection will get displayed select the checkbox and click on configuration tab and create a new
connection. Create a New txt file to maintain the log.

Click on new connection, set the following properties
Usage Type : Create File
File : D:\Various Logs\DataReaderLogFile.txt


Step 4: Go to Details tab from Configure SSIS Logs editor, a list of all the events will get displayed here
and select the checkbox in front of OnError Event and click on O.K button.


Mahindra Satyam Learning World version 1.0 111


Step 5: Execute the package. If an error occur, DataReaderLogFile.txt will get generate at the specified
destination.

Context:
To enhance ETL operations
To enable recording features of the package
Using same logging setting for multiple packages
Enhancing Performance of the package by using logging

Practice Session:

Create a log file
a) using the advance features of logging.
b) log file using other events.
c) Create a log file using more than one event at the same time.
d) Create a log file using an existing log file in the same package.


Check list:

Appropriate use of events while creating log files.

Common Errors:

Using OnError event to define actions to perform at the progress interval.
Using OnInformation event to define actions when warning occurs.
Not saving the Log File for further use.




Mahindra Satyam Learning World version 1.0 112
Exceptions:
Neither selecting any package template nor setting the values for configuration file.
Lessons Learnt:

logging can be used to monitor package activity
Logging supports parallel execution of the package with high performance.



















Mahindra Satyam Learning World version 1.0 113

Topic: Handling Errors in SSIS Estimated Time: 45 min.

Objectives: At the end of the activity, the participant will be able to understand:

Error handling in SSIS

Presentation:
Handling errors in data is easy now in SSIS.
Specify a transformation or connection in the dataflow when an error exists in the data. The
entire transformation can fails and exits upon an error, or the bad rows can be redirected to a
failed data flow branch.
An error can also be ignored.
Some Useful component in Error Handling
o Precedence Constraints - Precedence constraints, those green, red and blue arrows,
can be used to handle error condition and the workflow of a package.
o Precedence Constraints and Expressions - The workflow within a package can be
controlled by using Boolean expressions in place of or in addition to the outcome of
the initial task or container.
Evaluation
Operation
Definition
Constraint Success, Failure, or Completion
Expression Any expression that evaluates to True or False
Expression
AND
Constraint
Both conditions must be satisfied
Expression
OR
Constraint
One of the conditions must be satisfied

o Multiple Constraints - Multiple precedence constraints points to the same task. By
default, the conditions of both must be True to enable execution of the constrained
task. One more option of running a task if at least one of the conditions is True by
setting the Multiple Constraint property to Logical Or. One Constraint Must Evaluate
to True.
o Event Handling - Each task and container raises events as it runs, such as OnError
event, among several others that will be discussed shortly. SSIS allows you to trap


Mahindra Satyam Learning World version 1.0 114
and handle these events by setting up workflows that will run when particular events
fire.
o Events - As the package and each task or container executes, a dozen different events
are raised. You can capture the events by adding Event Handlers that will run when
the event fires. The OnError event may be the event most frequently handed, but
some of the other events will be useful in complex ETL packages. Events can also be
used to set breakpoints and control logging.
The following table shows a list of all of the events.
Event Description
OnError The OnError event is raised whenever an error occurs. You can
use this event to capture errors instead of using the failure
precedence constraint to redirect the workflow.
OnExecStatusChanged Each time the execution status changes on a task or container, this
event fires.
OnInformation During the validation and execution events of the tasks and
containers, this event reports information. This is the information
displayed in the Progress tab.
OnPostExecute Just after task or container execution completes, this event fires.
You could use this event to clean up work tables or delete no-
longer-needed files.
OnPostValidate This event fires after validation of the task is complete.
OnPreExecute Just before a task or container runs, this event fires. This event
could b used to check the value of a variable before the task
executes.
OnPreValidate Before validation of a task begins, this event fires.
OnProgress As measurable progress is made, this event fires. The information
about the progress of an event can be viewed in the Progress tab.
OnQueryCancel The OnQueryCancel event is raised when an executable checks to
see if it should stop or continue running.
OnTaskFailed Its possible for a task of container to fail without actual errors.
You can trap that condition with this event.


Mahindra Satyam Learning World version 1.0 115
OnVariableValueChanged Any time a variable value changes, this event fires. Setting the
RaiseChangeEvent property to False prevents this event from
firing. This event will be very useful when debugging a package.
On Warning Warnings are less critical than errors. This event fires when a
warning occurs. Warnings are displayed in the Progress tab.



Scenario:

There is a requirement to generate a separate table from the database where the members are
registered, but, not part of the member handicap policy.

Demonstration/Code Snippet:

Table to be used: Tbl_Member , Tbl_MemberEligibility

Task to be used: Data Flow Task, OLEDB Data Source, Look Up Transformation, Flat File Destination

Step 1: First, start Visual Studio or BI Development Studio and open the SSIS Lab solution . In
the Solution Explorer, expand the SSIS Training Labs project and right-click the SSIS
Packages folder. Select New SSIS Package from the menu. A new package named
Package1.dtsx is added to the Solution Explorer. Right-click the newly created package
and select rename from the menu and type ErrorHandlingPackage.dtsx for the new
name. When prompted, Do you want to rename the package object as well? select Yes.
Step 2: Drag and Drop the Data Flow Task from the Tool Box in the designer Surface of the
Control flow Task and rename it as DFT_Members. Double click on the DFT_Members;
Data Flow Task Designer surface will get displayed.
Step 3: Drag and Drop OLEDB Data Source from the Toolbox to the Data Flow Designer surface
and name it as OLE_SRC_Members. Double click on OLE_SRC_Members, OLEDB
Data Source Editor will get displayed. Now configure this dialog by setting the following
properties
OLEDB Connection Manager : BCBS
Data access mode : Table or View
Name of the table : Tbl_Member
Step 4: Drag and Drop Lookup task from the Toolbox on the designer surface of DataFlow and
name it as LKP_Member. Now connect OLE_SRC_Members to LKP_Member with the
green connector.
Step 5: Double click on LKP_Member and set the following properties in Reference Table Tab


Mahindra Satyam Learning World version 1.0 116
OLEDB Connection Manager : BCBS
Use a Table or View : Tbl_MemberEligibility

Step 6: Go to Columns tab and Map only MemberID from source to destination and click on OK.

Step 7: Create two Flat File destinations from theToolBox on the designer surface of dataflow
below the LKP_Members.Name the Flat File Destinations as FF_SRC_EligibleMembers
and FF_SRC_NotEligibleMembers.



Mahindra Satyam Learning World version 1.0 117


Step 8: Connect the LKP_Members to the FF_SRC_EligibleMembers with Green Connector and
double click on the FF_SRC_EligibleMembers to set the following properties.
Connection manager -> New
General
Flat file format Delimited
Connection Manager Name EligibleMemberConnection
File Name Browse ->D:\Eligible Members.txt
Select the checkBox (Columns Name in the first Data row)
Columns
Select the Row identifier and Column delimiter
Advanced
Select Member_ID and Delete
Mapping
Check the Mapping of the Fields whether it is properly mapped or not



Mahindra Satyam Learning World version 1.0 118

Step 9: Connect the LKP_Members to the FF_SRC_NotEligibleMembers with Red Connector,
Now configure Error Output Editor will get displayed , select Redirect Now option in
Error Tab.





Mahindra Satyam Learning World version 1.0 119

Step 10: Double click on the FF_SRC_NotEligibleMembers to set the following properties.
Connection manager -> New
General
Flat file format Delimited
Connection Manager Name NotEligibleMemberConnection
File Name Browse ->D:\Not Eligible Members.txt
Select the checkBox (Columns Name in the first Data row)
Columns
Select the Row identifier and Column delimiter
Advanced
Select Member_ID and Delete
Mapping
Check the Mapping of the Fields whether it is properly mapped or not


Step 11: Package is ready to Execute. Right Click on the package from solution explorer and click
on Execute Package from the menu. now the output will be loaded into two destinations.
Context:
Handling errors at runtime



Mahindra Satyam Learning World version 1.0 120
Practice Session:



Common Errors:

Not mapping the OnFailure precedent constraint to the appropriate task.
Not Registering the error details
Lessons Learnt:

Adding error handling to the package.
Registering error details.
Maintaining the error reports using the logging.


























Mahindra Satyam Learning World version 1.0 121
Topic: Implementing Reliable ETL Processes Estimated Time: 45 min.

Objectives: At the end of the activity, the participant will be able to understand:

Reliable ETL Process

Presentation:
Errors and the unexpected conditions that precipitate them are the most obvious threats to a
reliable process. There are several features of SQL Server 2005 Integration Services that allow
handling these situations with grace and integrity, keeping the data moving and systems running.

Error outputs and checkpoints are the two features can be used in the context of reliability. The
implementation of these methods can also have a direct effect on package performance, and
therefore scalability.

The ability to provide checkpoints does not natively extend inside the DATA Flow, but there are
methods to apply to achieve this. The methods can then be transferred almost directly into the
context of scalability, allowing you to partition packages and improve both reliability and
scalability at the same time. All of these methods can be combined, and while there is no perfect
answer, you will look at the options and acquire the necessary information to make informed
choices for your own SSIS implementations.

Context:
Managing the Runtime errors
Better performance
Flawless execution of the package










Mahindra Satyam Learning World version 1.0 122
Crossword: Unit-7 Estimated Time: 10 min.

Across:
1) The event that fires each time the execution status changes on a task or container (18)
2) The event that fires when a warning occurs (9)
3) This enables recording information about event exits in the running package (7)
Down:
1) This events fires after validation of the task completes (14)






Mahindra Satyam Learning World version 1.0 123



8.0 Debugging and Error Handling




Topics

8.1 Debugging a Package.

8.2 Implementing Error handling
with breakpoints.

8.3 Crossword















Mahindra Satyam Learning World version 1.0 124
Topic: Debugging a Package Estimated Time: 60 min.

Objectives: At the end of the activity, the participant will be able to understand:

Debugging an SSIS Package
Presentation:
After creating the control flow and data flow for a package, debugging the package is
required to ensure that it executes successfully and delivers the expected results.
SSIS Designer includes tools and features for debugging the control flow and data flow in a
package. By using these tools, breakpoints can be set in a package, and then view the
package information in debug windows.
Data viewers can also be attached to the outputs of the source and transformation adapters.
These viewers can be used to view the data and progress reports that describe the control and
data flows during package execution.

Breakpoints
Breakpoints are used to debug programs, viewing the value of variables and following the
flow of the logic as they step through the source code.
SSIS allows setting breakpoints on the package or any Control Flow level task or container.
Breakpoints can also be set in Script task code just like most programming environments.
An additional debugging window may also help troubleshoot packages, known as the Call
Stack Window. This window shows a list of the tasks that have executed up to the breakpoint.
This could be very useful when trying to figure out a very complex workflow.
The ability to set breakpoints on the tasks and containers will save you lots of time while
troubleshooting your packages. Data Viewers are similar to breakpoints, but they are used to
view data as the package executes.

Setting Breakpoints
SSIS Designer provides the Set Breakpoints dialog box, in which breakpoint can be set by
enabling break conditions.
o In this dialog box, Specify the number of times a breakpoint occurs before the execution
of the package is suspended.
o If the break conditions are enabled, the Breakpoint icon appears next to the task or
container on the design surface of the Control Flow tab.
o If the break conditions are enabled on the package, the Breakpoint icon appears on the
label of the Control Flow tab.
o If the break conditions are enabled on the Data Flow task, a red dot appears on the Data
Flow task.
o When a breakpoint is hit, the Breakpoint icon changes to help in identifying the source
of the breakpoint. You can add, delete, and change breakpoints while the package is


Mahindra Satyam Learning World version 1.0 125
running. SSIS provides 10 break conditions that you can enable on all tasks and
containers. In addition, some tasks and containers include custom break conditions. For
example, you can enable a break condition on the For Loop container that sets a
breakpoint to suspend execution at the beginning of each iteration of the loop.

BI Development Studio includes a number of windows that you can use to work with breakpoints and
debug packages that contain breakpoints. To open these windows in BI Development Studio, click the
Debug menu, point to Windows, and then click Breakpoints, Output, or Immediate.
The following table lists the various types of windows that you can use to work with breakpoints.
Window Description
Breakpoints Lists the breakpoints in a package and provides options to enable and delete
breakpoints.
Output Displays status messages of features in BI Development Studio.
Immediate Used to debug and evaluate expressions and print variable values.

The steps to set breakpoints in a package, a task, or a container are as follows.
1. In BI Development Studio, open the Integration Services project that contains the package
you want.
2. Double-click the package in which you want to set breakpoints.
3. In the SSIS Designer, do the following:
To set breakpoints in the package object, click the Control Flow tab, place the cursor
anywhere on the background of the design surface, right-click, and then click Edit
Breakpoints.
To set breakpoints in a package control flow, click the Control Flow tab, right-click a
task, a For Loop container, a Foreach Loop container, or a Sequence container, and then
click Edit Breakpoints.


Mahindra Satyam Learning World version 1.0 126

To set breakpoints in an event handler, click the Event Handler tab, right-click a task, a
For Loop container, a Foreach Loop container, or a Sequence container, and then click
Edit Breakpoints.
4. In the Set Breakpoints <container name> dialog box, select the breakpoints to enable.
5. Optionally, modify the hit count type and the hit count number for each breakpoint.

6. To save the package, on the File menu, click Save Selected Items.





Mahindra Satyam Learning World version 1.0 127
Scenario:
Recently, Mr. George has acquired an organization with about 500 employees, need to retrieve the data
from the data source of the new organization, perform data transformation (a data viewer for the
precedence constraint between the data source and destination, configure the error output to be transferred
to an XML destination file, view the data in the Data Viewer and the error output in the XML file.) to suit
the requirements of their organization, and load the data to a data destination.
Demonstration/Code Snippet:

Table to be Used : Tbl_Group, Group.txt
Task to be Used : Bulk Insert Task, Data Flow Task, OLEDB Source, Aggregation
Transformation, OLEDB Destination
Step 1: First, start Visual Studio or BI Development Studio and open the SSIS Lab solution .In
the Solution Explorer, expand the SSIS Training Labs project and right-click the SSIS
Packages folder. Select New SSIS Package from the menu. A new package named
Package1.dtsx is added to the Solution Explorer. Right-click the newly created package
and select rename from the menu and type NewOrganizationPackage.dtsx for the new
name. When prompted, Do you want to rename the package object as well? select Yes.
Step 2: Drag and Drop Bulk Insert Task from the Toolbox to the Designer surface of Control
Flow and Rename it as BLK_GroupData. Right-click the BLK_GroupData, and then
click Edit. The Bulk Insert Task Editor window appears. In the left pane, click
Connection and set the following properties
Connection : LocalHost.BCBS
Destination Table : Tbl_Group
File : D:\NewMembers\Group.txt



Mahindra Satyam Learning World version 1.0 128

Step 3: Drag and Drop the Data Flow Task from the Tool Box in the design Surface of the
Control flow Task and rename it as DFT_Group. Connect BLK_GroupData to
DFT_Group with Green connector. Double Click on DFT_Group, Data Flow Designer
Surface will get displayed.
Step 4: Drag and Drop OLEDB Data Source from the Toolbox to the Data Flow Designer surface
and name it as OLE_SRC_Group.
Step 5: Double-click the OLE_SRC_Group. The OLE DB Source Editor Dialog box is displayed.
Configure this dialog by setting the following properties
OLEDB Connection Manager : SQL Native Client\BCBS
Data access mode : Table or View
Name of the table : Tbl_Group
Step 6: Drag and Drop Aggregation Transformation from the Toolbox to the designer surface of
the Data flow. Rename it is as AGG_GroupID. Click the OLE_SRC_Group object and
drag the green arrow below the object to the OLE_SRC_Group object. A green
connecting arrow is displayed between the OLE_SRC_Group and AGG_GroupID
object. This indicates that the regular output of the OLE_SRC_Group will be provided
as input to the aggregate transformation object
Step 7: Double Click on AGG_GroupID, Aggregate Transformation Editor window appears will
get appear and set the following properties for aggregation.
Input Column Output Alias Operation
Group_ID Group_ID Group By



Mahindra Satyam Learning World version 1.0 129

Step 8: Drag and Drop OLEDB Data Destination from the Toolbox to the Data Flow Designer
surface and name it as OLE_DST_FactData. Connect ADD_GroupID to
OLE_DST_FactData with green connector.
Step 9: Double-click the OLE_DST_FactData. The OLE DB Destination Editor Dialog box is
displayed. Configure this dialog by setting the following properties
OLEDB Connection Manager : SQL Native Client\BCBS
Data access mode : Table or View
Name of the table : Click on New Button as follows

Click on Ok to create the table. Go to Mappings Pane from the left side and observe the
mapping between the two tables, now click on Ok. Data will be loaded in the Destination
after execution of the package.
Step 10: Execute the Package by right click on the package from the solution explorer.


Mahindra Satyam Learning World version 1.0 130

Control Flow View Data Flow View
Context:
Enabling break conditions
Troubleshoot an SSIS Package interactively
Determining variables and the package status
Review the variables and overall SSIS Package status at particular points in time.

Practice Session:

Create a Data View in the NewOrganizationPackage.dtsx to see the output in the Grid
Create a Log File in the NewOrganizationPackage.dtsx to redirect the error output
Use Breakpoint in the NewOrganizationPackage.dtsx to trace all the tasks performance.

Check list:

A Red dot appears on the Data Flow task while setting the breakpoint.
Rather than relying on the error messages that SSIS generates in production, consider setting
up SSIS breakpoints in your development or test

Common Errors:

Not setting the Hit Values for the Breakpoint.
Not setting the operation in aggregation transformation
Not observing the DataTypes used in the Bulk Insert Source and Destination




Mahindra Satyam Learning World version 1.0 131

Lessons Learnt:

Setting breakpoints (add, update and delete) while the package is running
The Hit Count value is an integer greater than 1.
The breakpoints can also be used in combination. One typical example is using both the
OnPreExecute and OnPostExecute events to determine the status of the variables as the
process begins and ends
Using Bulk insert and Dataflow task together
Debugging the data.
Using aggregation Methods for Group by operations
































Mahindra Satyam Learning World version 1.0 132


Crossword: Unit-8 Estimated Time: 10 min.

Across:
2) The window that shows a list of the tasks that has to be executed upto the break point (6)
Down:
1) This window display status message of features in BI development studio (9)













Mahindra Satyam Learning World version 1.0 133






9.0 Implementing Checkpoints and Transactions



Topics

9.1 Implementing Checkpoints

9.2 Implementing Transaction

9.3 Crossword













Mahindra Satyam Learning World version 1.0 134


Topic: Implementing Checkpoints Estimated Time: 40 min.

Objectives: At the end of the activity, the participant will be able to understand

The use of Checkpoints

Presentation:
Checkpoint can be enabled on a package to allow a failed package to restart at the point of failure.
If Checkpoint is configured to a package, Integration Services writes information about package
execution to a checkpoint file.
The Checkpoint File should be xml file.
The Checkpoint file includes the execution results of all completed containers, current values of
system and user-defined variables, and package configuration information. The file also includes
the unique identifier of the package.
To successfully restart a package, the package identifiers in the checkpoint file and the package
must match, otherwise the restart fails.
Followings are the package properties that you can set after enabling checkpoints.
Property Description
Checkpoint Filename Specifies the name of the checkpoint file.
Checkpoint Usage Specifies whether checkpoints are used.
Save Checkpoints Indicates whether the package saves
checkpoints. This property must be set to True
to restart a package from a point of failure.
FailPackageOnFailure Indicates whether the package fails when a
failure occurs in the package in a task or
container. This property must be set to True.

Scenario:

Mr. George has a requirement to update the Group table from time to time. These updates are considered
only for the attributes of e-mail, Fax and address. Hence, it has been decided to have a separate update
task for each of the attributes. On execution, there may be a situation that one of the tasks may fail for the


Mahindra Satyam Learning World version 1.0 135
want of valid data. On the next execution, the package has to run from the last failed task. To achieve this
objective, implement checkpoints appropriately.
Update Task on Group_ID: For a given group ID, The e-mail has to be updated
For a given group ID, The fax has to be updated
For a given group ID, The address has to be updated

Demonstration/Code Snippet:

Task to be used: Execute SQL Task
Table to be used: Tbl_Group

Step 1: First, start Visual Studio or BI Development Studio and open the SSIS Lab solution. In
the Solution Explorer, expand the SSIS Training Labs project and right-click the SSIS
Packages folder. Select New SSIS Package from the menu. A new package named
Package1.dtsx is added to the Solution Explorer. Right-click the newly created package
and select rename from the menu and type CheckPointPackage.dtsx for the new name.
When prompted, Do you want to rename the package object as well? select Yes.
Step 2: Drag and Drop three Execute SQL Task and rename them as SQL_Email, SQL_Fax and
SQL_Address. Now connect SQL_Email to SQL_Fax with the green color connector and
SQL_Fax to SQL_Address with the green connector.



Mahindra Satyam Learning World version 1.0 136
Step 3: Double click on SQL_Email, Execute SQL Task Editor will get displayed and set the
following properties in the General tab and SQL Statement Section.
Connection LocalHost.BCBS
SQL Statement - Update Tbl_Group set Email=Smith@gmail.com where
Group_ID=101

Step 4: Double click on SQL_Fax, Execute SQL Task Editor will get displayed and set the
following properties in the General tab and SQL Statement Section.
Connection LocalHost.BCBS
SQL Statement - Update Tbl_Group set Fax=1212 where Group_ID=101
Step 5: Double click on SQL_Address, Execute SQL Task Editor will get displayed and set the
following properties in the General tab and SQL Statement Section.
Connection LocalHost.BCBS
SQL Statement - Update Tbl_Group set Address1=BCBS LA where
Group_ID=101
Step 6: Now Package is ready to Execute, right click on the package from the solution Explorer.


Mahindra Satyam Learning World version 1.0 137
Step 7: Package got failed at SQL_Fax because the values we are entering is not matching with
the tables fields datatype. Now set the values right in the Update Query and run the
package once again.

Step8: Change the values in Update Query and Run the Package Once Again.




Mahindra Satyam Learning World version 1.0 138


Step 9: Set the Checkpoint in the Package by setting some properties
CheckpointFileName : Create a new file as C:\ Ckeckpoint.xml
Checkpoint usage : IfExist
Save checkpoints : true


Step 10: Now select all the three tasks together and set one more property
FailPackageOnFailure=true
Step 11: Now provide some wrong value in the three tasks, Execute the package once again,
Package will fail at some point where we have passed wrong values.


Mahindra Satyam Learning World version 1.0 139

Step 12: Pass appropriate input and restart (Execute) the package once again, package will be start
from the last left point.

Context:

For restarting the package from the last failure point

Practice Session:


Check list:

Select all the tasks need to apply the checkpoint
Saving the checkpoint file as xml file



Mahindra Satyam Learning World version 1.0 140
Common Errors:

Not setting the property FailPackageOnFailure=true
Not selecting the appropriate tasks for checkpoint

Lessons Learnt:

Using Checkpoints to Restart After Failure.
Better performance by not executing the same package again and again





















Mahindra Satyam Learning World version 1.0 141

Topic: Implementing Transaction Estimated Time: 45 min.

Objectives: At the end of the activity, the participant will be able to understand:

Package transaction
Handling data consistency

Presentation:
A transaction within the packages is used to handle data consistency.
Transactions in the packages are used for the following.
o Gather the results of several tasks into a single transaction to ensure consistent updates.
E.g. information about orders and line items that is stored in two different tables can be
uploaded by two tasks that succeed or fail together.
o Ensure consistent updates on multiple database servers. E.g. A customer address can be
changed in two different online transaction processing (OLTP) systems, all in the context
of one transaction.
o Guarantee updates in an asynchronous environment. E.g. A package might use a Message
Queue task to read and delete a message bearing the name of a file to upload. If the task
that uploads the file fails, the subsequent rollback both reverses the database changes and
puts the message back on the queue.
o Carry out multiple transactions under the control of a single package. E.g. using Execute
Package tasks, a package can simultaneously run an end-of-day sequence of transactions
on three different servers.
There are two types of transactions available in an SSIS package.
o Distributed Transaction Coordinator (DTC) Transactions: one or more transactions
that require a DTC and can span connections, tasks, and packages
o Native Transaction: A transaction at a SQL Server engine level, using a single
connection managed through using T-SQL transaction commands
All SSIS Containers and Tasks can be configured to use transactions. Integration Services
provides three options for configuring transactions:
o Required indicates that the container starts a transaction, unless one is already started by
its parent container. If a transaction already exists, the container joins the transaction. E.g.
if a package that is not configured to support transactions includes a Sequence container
that uses the required option, the Sequence container would start its own transaction. If
the package were configured to use the required option, the Sequence container would
join the package transaction.


Mahindra Satyam Learning World version 1.0 142
o Supported indicates that the container does not start a transaction, but joins any
transaction started by its parent container. E.g. if a package with four Execute SQL tasks
starts a transaction and all four tasks use the Supported option, the database updates
performed by the Execute SQL tasks are rolled back if any task fails. If the package does
not start a transaction, the four Execute SQL tasks are not bound by a transaction, and no
database updates except the ones performed by the failed task are rolled back.
o NotSupported indicates that the container does not start a transaction or join an existing
transaction. A transaction started by a parent container does not affect child containers
that have been configured to not support transactions. For example, if a package is
configured to start a transaction and a For Loop container in the package uses the
NotSupported option, none of the tasks in the For Loop can roll back if they fail.
Transactions can be configured by setting the TransactionOption property of the container using
the Properties window in Visual Studio or BIDS. The property can also be set programmatically.

Scenario:

Mr. George is trying to build an SSIS package where the entire package is encapsulated in a transaction.
In addition there is a table that needs to remain locked for the duration of the SSIS package execution.

Demonstration/Code Snippet:


Step 1: The Test Initialization sequence container is used to create a test environment. Two
tables are created (TranQueue and TranQueueHistory)
Step 2: Row is inserted into TranQueue
Step 3: The package setting for the Process sequence container has TransactionOption set
to Required


Mahindra Satyam Learning World version 1.0 143
Step 4: Process TranQueue is an Execute SQL task that executes the following SQL command
to simulate processing a group of rows in the TranQueue table:
DELETE TOP (10) dbo.TranQueue
OUTPUT DELETED.*
INTO dbo.TranQueueHistory
FROM dbo.TranQueue WITH (TABLOCKX)
Step 5: The Placeholder for Breakpoint Execute SQL task does not execute a command; it's there
so we can set a breakpoint and run some queries while the package is running and the
transaction is open (discussed below). The Simulate Failure Execute SQL task is
executed if the package variable v_SimulateFailure = 1; it does a SELECT 1/0 to
generate an error (i.e. a divide by zero) which will cause a rollback on the package
transaction.
Follow these steps to see the transaction handling in an SSIS package:
Make sure the value of the variable v_SimulateFailure = 1; this will demonstrate
the rollback
Make sure there is a breakpoint on the Placeholder for Breakpoint Execute SQL
task
Execute the package; your screen should look like this (stopping at the
breakpoint):
Step 6: Open a new query window in SQL Server Management Studio, connect to the mssqltips
database and execute the command below. You should see a single row result set; e.g.
Test Message2008-09-08 14:22:31.043 (your date and time will be different of course).
The NOLOCK hint ignores locks; the row you see is not committed yet.
SELECT * FROM dbo.TranQueueHistory WITH (NOLOCK)

Step 7: Open another new query window in SQL Server Management Studio, connect to the
mssqltips database and execute the command below. You will be blocked waiting for the
transaction executing in the SSIS package to either rollback or commit since we added
the TABLOCKX hint which will keep the TranQueue table locked for the duration of the
transaction. Alternatively you could issue an INSERT INTO the dbo.TranQueue table
and you will see that it also is blocked until the transaction either commits or does a
rollback.
SELECT * FROM dbo.TranQueue
Step 8: Click Continue in BIDS (or click Debug on the top-level menu then Continue) and you
will see the package fail. Execute the SELECT statement above on the
TranQueueHistory table again and you will see no rows; the select statement above on
the TranQueue table will complete showing a single row. Thus the error caused the
transaction to rollback. After the rollback the deleted row(s) in the TranQueue table are


Mahindra Satyam Learning World version 1.0 144
restored and the inserted row(s) in the TranQueueHistory table are not committed (i.e.
they will disappear).
You can change the value of the v_SimulateFailure variable to 0 and run the package and
queries above again to validate that the transaction commit works as we expect.

Context:
Consistent update on multiple servers
Updates in asynchronous environment


Check list:

Identify the transaction type DTC or Native transaction.


Common Errors:

Identify the transaction option (Required, Supported or NotSupported)
The Distributed Transaction Coordinator (MSDTC) service which must be running,
otherwise, the following error message would be generated Error: 0xC001401A at
Transaction: The SSIS Runtime has failed to start the distributed transaction due to error
0x8004D01B "The Transaction Manager is not available. The DTC transaction failed to
start. This could occur because the MSDTC Service is not running.
Lessons Learnt:

Maintaining consistency in data
Using transactions for consistent updates on database servers and tasks










Mahindra Satyam Learning World version 1.0 145
Crossword: Unit-9 Estimated Time: 10 min.

Across:
1) This writes information about package execution to a checkpoint file if configured to a package (19)
3) The one within the package to handle data consistency (11)
4) _____________ can be enabled on a package to allow a failed package to restart at the point of failure
(10)
5) The no. of types of transactions those are available in an SSIS package (3)

Down:
2) This property indicates whether the package saves checkpoint (14)





Mahindra Satyam Learning World version 1.0 146





10.0 Configuring and Deploying Packages



Topics

10.1 Package Configurations

10.2 Preparing and Deploying
Packages

10.3 Crossword














Mahindra Satyam Learning World version 1.0 147

Topic: Package Configurations Estimated Time: 45 min.

Objectives: At the end of the activity, the participant will be able to understand:

Configuring SSIS Packages
SSIS Package at Runtime

Presentation:
Package configurations allow data to be supplied to a SSIS package at run time.
A common use of configuration is to enable the server name and user login information to be
dynamically applied at run time.
The two most common uses of configurations are to pass in variables and connection strings
to a package at run time.
To dynamically set the connection properties of a Database connection, use a separate
configuration to assign a value to each of the Connection Managers properties: Connection
String, ServerName, and InitialCatalog. When the package executes, each of the
configuration values will be used to create the connection.
Configuration data can be loaded to a package from the following locations:
o The Registry
o Environment Variables
o A parent package
o XML File
o SQL Server database
Multiple configurations can be created for a single package. Each configuration is applied to
the package in the order they are shown in the Package Organizer.
A single configuration file may be created and applied to multiple packages.
Follow the Steps to Configure an SSIS Package

Step 1: From Visual Studio 2005 open the package to add a configuration. Select
Package Configurations from the SSIS menu. The Package Configuration Organizer
opens.
Step 2: Select Enable package configurations and click Add. The Package Configuration
Wizard opens and steps through creating a package configuration.
Step 3: Set the configuration type. Load the configuration data to a package.
Step 4: Select the package properties or variables that will be set by the configuration
when the package is run. XML and SQL Server configurations support selecting multiple


Mahindra Satyam Learning World version 1.0 148
properties in a single configuration object. The other configuration objects allow for only
one configurable property per configuration.
Step 5: After selecting the configuration options provide a name for the configuration
and click finish. You may edit the configuration to change objects and properties.

Demonstration/Code Snippet:

Step 1: Open an SSIS Package for configuration.
Step 2: Save the package configurations outside the package
1. Click the Package Configurations option,
2. Enable the package configurations



3. Choose XML configuration file
4. Provide the XML file location and file name.
5. select LocalHost.SSIS_Package_Config connection manager which is the
connection manager for the SQL Server database
6. select the properties you require to save as a package configuration from the
following screen



Mahindra Satyam Learning World version 1.0 149


7. Select ServerName, UserName, InitialCatalog to construct the connection
string
8. For text file configurations, select the Load Txt file
9. Select the ConnectionString




The next time you load the package, your package will read the configurations
from an XML file. You can verify this by changing the XML file contents and
reloading the package. After reloading the package, view the connection manager


Mahindra Satyam Learning World version 1.0 150
properties and you can see that the connection manager properties are updated
from the values in the XML files are updated for the properties.

10. After creating those two configurations, the following screen will be shown.




Context:
Moving packages from development to production environments
Dynamically updating the package to run successfully in a different environment.

Practice Session:

Configure other Packages by using the steps in the above demonstration.

Check list:
Deploying the package without configuration.

Common Errors:
Configure a File without setting the TransactionOption property
When you add an Environment variable configuration (see step 9 above) and the environment
variable does not appear in the Environment variable drop down list, close BIDS and reopen
it. Any environment variable added after opening BIDS will not show up until you close and
reopen.
When you are working with an SSIS package in BIDS, the package configuration is read
when you open the package. Any changes made to the configuration will not be reflected
until you close and reopen the SSIS package.
You can use an Environment variable package configuration to specify the ConnectionString
property of the Configuration database Connection Manager, allowing you to change the
server or database that holds the package configuration data. However, the table name that
you specify (see step 12 above) is hard-coded in the SSIS package.



Mahindra Satyam Learning World version 1.0 151
Lessons Learnt:

Configuring the package.
Best to assign one setting for each configuration
To avoid unauthorized users from accessing the database credentials ,SQL server
configuration is ideal
XML files must be kept in a secured folder, so that, users do not have access to them.





























Best Practices:


Mahindra Satyam Learning World version 1.0 152


Topic: Preparing and Deploying Packages Estimated Time: 25 min.

Objectives: At the end of the activity, the participant will be able to understand:

Deploying a package
Presentation:
After debugging a package, the next step is to save the package.
The package can be saved in the msdb database in SQL Server 2005 or in the package store.
The package store represents the folders in the file system location that Integration Services
service manages.
SSIS Packages can be deployed in one of four ways:
o Use the Deployment Utility in BI Development Studio.
o Use the import and export package features in SQL Server Management Studio.
o Save a copy of the package in the file system.
o Run the dtutil command line utility.

Package Deployment Utility
The Deployment Utility is the preferred method to deploy multiple packages. The Deployment
Utility allows package dependencies to be deployed with the SSIS packages; a deployment folder
is created that contains an executable setup file, the SSIS packages, package configurations, and
supporting files.



SSIS deployment flow with the Deployment Utility






Mahindra Satyam Learning World version 1.0 153

To create a Package Deployment, create an SSIS Integration project and add all required
packages in the Package Deployment utility. After all packages has been added right-click the
project properties in the Solution Explorer window. In the property pages dialog box, select the
Deployment Utility option and set the following property values:

o AllowConfigurationChange A value that specifies whether configurations can be
updated during deployment. The default value of this property is true.
o CreateDeploymentUtility A value that specifies whether a package deployment utility
is created when the project is built. The default value of this property is False. The
property must be true to create a deployment utility.
o DeploymentOutputPath The location, relative to the SSIS project, of the files the
project deployment uses.



To create a Deployment Utility, set the CreateDeploymentUtility option to True on the project
property page. Then build the project by selecting the Build Solution option on the Visual Studio
menu. Building the project creates the file, DTSDeploymentManifest.xml, and copies the
project packages, along with the DTSInstall.exe, to the bin/Deployment folder, or to the location
specified in the DeploymentOutputPath property. The DTSDeploymentManifest.xml file lists
the packages and the package configurations in the project. DTSInstall.exe is the application that
runs the Package Installer Wizard.





Mahindra Satyam Learning World version 1.0 154

Using the Package Installer Wizard

To deploy the SSIS project, run the package installation executable program, by right-clicking the
file [ProjectName]. SSISDeploymentManifest (create by the Package Deployment Utility) and
selecting Deploy.
o The Package Installer Wizard steps through the installation process.
o The first step prompts to install to the file system or SQL Server. The SQL Server
deployment installs the packages in the sysdtspackages90 table in the SQL Server 2005
msdb database. Any package dependency files, such as XML Configuration files, are
copied to a folder on the file system that is specified during installation.
o Next, the wizard prompts for an installation folder location for a file system install, or
target SQL Server for SQL Server deployments. SQL Server deployments will also
prompt for the file system folder to copy the package dependency files. For packages that
contain configurations, the wizard provides the option of editing the updatable
configuration values.



Manual Package Deployment
SSIS provides the DTUTIL command line utility for managing packages. This utility allows
packages to be published to the SQL Server msdb database or the file system. When manually
deploying packages, support files must be explicitly included in the deployment script.




Mahindra Satyam Learning World version 1.0 155
Demonstration/Code Snippet:

Step 1: Deploy the package after configuration
Step 2: Build the Integration Services project to create a package deployment utility.
Step 3: Copy the deployment folder, which was created when building the Integration Services
project, to the target computer.
Step 4: Finally, to install the packages, run the Package Installation Wizard.
Context:
Using the Package at Runtime

Practice Session:

Deploy the packages developed earlier by as demonstrated

Check list:

Whether the Package has been configured properly or not

Common Errors:

Configuring the package without saving

Lessons Learnt:

Deployment of the package
Understanding of the sysdtspackages90 table in the SQL Server 2005 msdb database

The Deployment Utility is the preferred method to deploy multiple packages




Best Practices:


Mahindra Satyam Learning World version 1.0 156

Crossword: Unit-10 Estimated Time: 10 min



Across:
2) Package configuration allows data to be supplied to a SSIS package (7)
Down:
1) There can exists multiple configurations for a ________ package (6)











Mahindra Satyam Learning World version 1.0 157





11.0 Optimizing an SSIS Solution




Topics

11.1 Monitoring SSIS
Performance
11.2 Optimizing SSIS Packages

11.3 Scaling Out SSIS Packages

11.4 Crossword













Mahindra Satyam Learning World version 1.0 158

Topic: Monitoring and Optimizing Performance Estimated Time: 30 min.

Objectives: At the end of the activity, the participant will be able to understand:

Monitoring the performance by using various tools
Presentation:
A Package performance can be enhanced by effectively monitoring its performance at run
time.
Some tools and features such as SSIS logging, SSIS counters in Performance Monitor,
and SQL Server Profiler can be used to monitor the performance of SSIS packages.
SSIS provides the logging feature that captures information when log-enabled events occur at
run time, troubleshoot package performance and failure to making it easier. It provides a
schema of commonly logged information to include in log entries.
A log provider specifies a format and destination, such as a SQL Server database or text file,
for the log data.
Logging can be implemented for the entire
package or for any task or container that is
included in the package. However, logs are
associated with packages and configured at the
package level. Each of the tasks and the
containers in a package can log information to
any package log. The tasks and containers in a
package can be enabled for logging even if the
package itself is not. A package, container, or
task can write to multiple logs with different
information.
The following table displays the locations to
which the SSIS log providers write to.

Log Provider Location
Text File Uses the File connection manager to write
log entries to ASCII text files using a
comma-separated value (CSV) format. The
default file extension for this provider is
.log.


Mahindra Satyam Learning World version 1.0 159
SQL Profiler Uses the File connection manager to write
traces that you can view using SQL Server
Profiler. The default file extension for this
provider is .trc.
Windows Event Writes entries to the Application log in the
Windows Event log on the local computer.
SQL Server Uses the OLE DB connection manager to
write log entries to the sysdtslog90 table in a
SQL Server database.
XML File Uses the File connection manager to write
the log files to an XML file. The default file
extension for this provider is .xml.


SSIS includes a set of performance counters for monitoring the performance of the data flow
engine. E.g. the performance counters can monitor the number of rows a source produces, use the
Performance snap-in in the Microsoft Management Console (MMC) to create a log that
captures performance counters.

The following table describes the Performance Counters available for SSIS.

Performance Counter Description
BLOB bytes read The number of bytes of binary large
object (BLOB) data that the data flow
engine has read from all sources.
BLOB bytes written The number of bytes of BLOB data that
the data flow engine has written to all
destinations.
BLOB files in use The number of BLOB files that the data
flow engine uses for spooling.


Mahindra Satyam Learning World version 1.0 160
Buffer Memory The amount of memory buffers, of all
types, in use. Memory swapping
increases when the buffer spooled
count number is larger than the amount
of physical memory.
Buffers in use The number of buffer objects, of all
types, that the data flow engine is
currently using.
Buffers Spooled The number of buffers written to disk.
If the Data flow engine runs low on
physical memory, buffers not currently
used are written to disk and then
reloaded when needed.
Flat buffer memory The total amount of memory, in bytes,
that all flat buffers use. Flat buffers are
blocks of memory that a component
uses to store data that is accessed byte
by byte.
Flat buffers in use The number of flat buffers that the Data
flow engine uses. All flat buffers are
private buffers.
Private buffer memory The total amount of memory in use by
all private buffers. A private buffer is a
buffer that a transformation uses for
temporary work.
Private buffers in use The number of buffers that
transformations use.
Rows Read The number of rows a source adapter
produces. The number does not include
rows read from reference tables that the
Lookup transformation uses.


Mahindra Satyam Learning World version 1.0 161
Rows Written The number of rows offered to a
destination adapter. The number does
not reflect rows written to the
destination source.

SQL Profiler shows how SQL Server resolves queries internally. This allows administrators to see
exactly what Transact-SQL statements or Multi-Dimensional Expressions have been submitted to the
server and how the server accesses the database or cube to return result sets. This tool can be used to
analyze the kind of statements load operations generate and how to optimize them.

Scenario:

Approach 1
Mr. George has adopted SQL Server 2005 Integration Services and moving a great deal of data
on a consistent basis throughout the day for a number of systems. Unfortunately, he has been
seeing some memory related issues and wanted to find out how he could monitor these on a
regular basis? He wants someway to collect performance related data and monitor the overall
process.
Approach 2
Use SQL Profiler by using trace to monitor the performance.

Demonstration/Code Snippet:

Step 1: Go to Start Menu -> All Programs -> Administrative Tools -> Performance , Performance
Monitor will get displayed



Step 2: Click on the + from the menu bar, Load the SSIS related counters
Step 3: In the Performance Object select SQL Server: SSIS Pipeline and SQL Server: SSIS Service



Mahindra Satyam Learning World version 1.0 162


Step 4: Monitor the Performance, Monitor the result, In the Menu select the light bulb Icon to highlight a
single counter.
Step 5: Press the up and Down arrow Key to highlight various counters.




Approach-2

Step 1: Create a trace that is based on a reusable template.
Step 2: Watch the trace results as the trace runs.
Step 3: Store the trace results in a table.
Step 4: Start, stop, pause, and modify the trace results as necessary.
Step 5: Replay the trace results.




Mahindra Satyam Learning World version 1.0 163
Context:
To Monitor the performance of a package

Practice Session:
Monitor the performance of all the above packages used in SSISLab Solution.

Check list:
Trace should be based on a reusable template

Lessons Learnt:

Monitoring the package performance using SQL Profiler
Monitoring the package performance using performance wizard




























Mahindra Satyam Learning World version 1.0 164

Topic: Scaling Out SSIS Packages Estimated Time: 30 min.

Objectives: At the end of the activity, the participant will be able to understand:

Scale Out SSIS Packages

Presentation:
Scale Out Memory Pressures
The pipeline processing takes place almost exclusively in memory. This makes for faster data
movement and transformations, and a design goal should always be to make a single pass over
your data. In this way, you eliminate the time-consuming staging and the costs of reading and
writing the same data several times. The potential disadvantage of this is that for large amounts of
data and complicated set of transformations, you need a large amount of memory, and it needs to
be the right type of memory for optimum performance.
The virtual memory space for 32-bit Windows operation systems is limited to 2 GB by default.
Although you can increase this amount through the use of the /3GB switch applied in the boot.ini
file, this often falls short of the total memory available today. This limit is applied per process,
which for your purposes means a single package during execution, so by partitioning a process
across multiple packages, you can ensure that each of the smaller packages is its own process and
therefore takes advantage of the full 2-3 GB virtual space independently. The most common
method of chaining packages together to form a consolidated process is through the Execute
Package task, in which case it is imperative that you set the child package to execute out of
process. You must set the ExecuteOutOfProcess property to true to allow this to happen.
It is worth noting that unlike the SQL Server database engine, SSIS does not support Advanced
Windowing Extensions (AWE), so scaling out to multiple packages across processes is the only
way to take advantage of larger amounts of memory. If you have a very large memory
requirement, then you should consider a 64-bit system for hosting these processes.

Scale Out by Staging Data
Staging of data is very much on the decline; after all, why incur the cost of writing to and reading
from a staging area, when you can perform all the processing in memory with a single pass of
data? With the inclusion of the Dimension and Partition Processing Destinations, you no
longer need a physical data source to populate your SQL Server Analysis Services (SSAS) cubes-
yet another reason for the decline of staging or even the traditional data warehouse. Although this
is still a contentious subject for many, the issue here is this: Should you use staging during the
SSIS processing flow? Although it may not be technically required to achieve the overall goal,
there are still some very good reasons why you may want to, coming from both the scalability and
reliability perspective.


Mahindra Satyam Learning World version 1.0 165
For this discussion, staging could also be described as partitioning. The process could be
implemented within a single data flow, but for one or more of the reasons described below; it may
be subdivided into multiple data flows. These smaller units could be within a single package, or
they may be distributed through several as discussed below. The stated data will be used only by
another Data Flow and does not need to be accessed directly through regular interfaces. For this
reason, the ideal choices for the source and destinations are the raw file adapters. This could be
described as vertical partitioning, but you could also overlay a level of horizontal partitioning, as
by executing multiple instances of a package in parallel.
Raw file adaptors allow you to persist the native buffer structures to disk. The in-memory buffer
structure is simply dumped to and from the file, without any translation or processing as found in
all other adapters, making these the fastest adapters for staging data. You can take advantage of
this to artificially force a memory checkpoint to be written to disk, allowing you to span multiple
Data Flows tasks and packages.
The key use for raw files is that by splitting a Data Flow into at least two individual tasks, the
primary task can end with a raw file destination and the secondary task can get with a raw file
source. The buffer structure is exactly the same between the two tasks, so the split can be
considered irrelevant from an overall flow perspective, but it provides perfect preservation
between the two.

Context:
Pipeline Processing in Memory
Dimension and Partition Processing Destinations

Lessons Learnt:

Scaling Out Memory








Mahindra Satyam Learning World version 1.0 166

Crossword: Unit-11 Estimated Time: 10 min

Across:
3) This specifies a format and destination such as a SQL server database or text file for the log data (11)
4) This performance counter is the number of bytes of binary larger object (BLOB) data that the dataflow
engine has read from all sources (9)
5) The default file extension for text file provider (4)
Down:
1) This performance counter is the number of BLOB files that the dataflow engine uses for spooling (10)
2) _____________ log provider uses the file connection manager to write log entries to ASCII text file
using CSV format (8)






Mahindra Satyam Learning World version 1.0 167



12.0 Managing and Securing Packages




Topics

12.1 Managing Packages

12.2 Securing Packages

12.3 Crossword
















Mahindra Satyam Learning World version 1.0 168
Topic: Managing Packages Estimated Time: 40 min.

Objectives: At the end of the activity, the participant will be able to understand:

Managing SSIS Service
Managing SSIS Package

Presentation:
Managing the SSIS Service
The Execution of SSIS packages can be monitor using the SQL Server Management Studio.
The SQL Server Management Studio includes a new SSIS Server node that lists saved and
running SSIS packages.
The SSIS management node only appears after the SSIS Service is started.
The SSIS service is installed when you select the option to install SQL Server Integration
Services and its purpose is to enable the management of SSIS packages.
The SSIS service is normally started by default when SSIS is installed on the system. In case it
isnt started, you can use the following procedure to start it manually.

To manually start the SSIS Service:
1. On the Start menu, click all programs.
2. Select Microsoft SQL Server 2005, and then select SQL Computer Manager.
3. Scroll down to the Services and Applications section and expand the SQL Computer
Manager. Then expand the SQL Server 2005 Services node.
4. Select SSIS Server.
5. Right-click the service entry in the right pane and select Start on the shortcut menu to start the
service as shown below.

Starting the SSIS Server


Mahindra Satyam Learning World version 1.0 169
If you want the SSIS service to always run, you can change the startup type to Automatic. This
will automatically start the SSIS service whenever the server starts.
Its important to understand that the SSIS service is designed to enable the monitoring of SSIS
packages; it is not necessary that it be running in order to execute a package. Likewise, stopping
the SSIS service wont prohibit you from running SSIS packages. However, if the SSIS service is
running, the SSIS Designer will be able to use it to cache objects that are used in the designer,
enhancing the performance of the designer.

Managing SSIS Packages with SQL Server Management Studio
After the SSIS service has been started, you can use it to monitor running SSIS packages in SQL
Server Management Studio.
One of the key advantages to the SSIS service is the fact that it enables you to monitor packages
running on both the local SQL Server as well as remote SQL Server systems that are registered
in the SQL Server Management Studio.
Its important to note that while the SQL Server Management Studio enables you to manage
existing SSIS packages, it does not allow you to create them. Packages are created using the BI
Development Studio, the Import and Export Wizard, or programmatically using the SSIS APIs.
To manage SSIS packages using the SQL Server Management Studio:
Open the SQL Server Management Studio.
In the Connect to Server dialog box, select Integration Services from the Server Type
list.
Supply the name of the SQL Server at the Server Name prompt and provide your
authentication information.SQL Server Management Studio opens and the Object
Explorer displays the SSIS management information

Managing SSIS packages with SQL Server Management Studio



Mahindra Satyam Learning World version 1.0 170
By default, the Integration Services server node presents two folders for working with SSIS
packages: the Running Packages folder and the Stored Packages Folder.
The Running Packages folder displays the SSIS packages that are currently executing on the
local server. The contents of this folder are constantly changing to reflect the current system
activity. The contents of the Running Packages folder must be manually refreshed to keep the
display updated with the current running packages.
The Stored Packages folder lists the saved SSIS packages that have been registered on the local
server. By default this folder contains two subfolders: the File System Folder and the MSDB
folder.
o
The File System folder lists the SSIS packages that have been saved in the File system
while the MSDB folder lists the packages that are stored in the sysdtspackages90 table
in the msdb database. Its important to note that the SSIS server isnt aware of packages
stored in the File System until those packages have been imported to the File System
folder in the SSIS service. In addition to listing the saved SSIS packages, the SQL Server
Management Studio also enables you to work with them. Right-clicking a package
displays a shortcut menu that enables you to perform a number of task including:
New Folder. Creates a new folder in Object Explorer for displaying
packages saved in the file system or in the sysdtapackages90 table.
Import Package. Imports the package from the file system to the msdb
database
Export Package. Exports the package from the msdb database to the file
system.
Run Package. Executes the package using dtexecui.
Delete. Deletes the package.
Rename. Renames the package.
While the SQL Server Management Studio is shipped using the default folder locations of MSDB
and File System, you can freely add new folders to this structure using the Create New Folder
option. When you create a new folder beneath the File System system folder, a new directory will
be created in the file system. By default, these directories are located in the c:\Program Files\SQL
Server\90\Packages directory. Importing packages to a File System folder will result in the
package being copied to the like named directory in the file system. For folders that are created
under the MSDB folder, a new entry is added to the sysdtspackackefolder90 table that tracks the
folder structure. However, its important to realize that the packages themselves are still stored in
the msdb sysdtspackaes90 table. The Folders option in the SQL Server Management Studio
essentially gives you a way to apply and organization structure to your packages, enabling you to
group like packages together.
Modifying the Default SSIS Package Folders
The two default folders provided by SQL Server Management Studio, the File System and
MSDB folder, themselves are actually configurable.


Mahindra Satyam Learning World version 1.0 171
The definitions for these folders are stored in the XML file that the SSIS service reads at startup.
The SSIS service retrieves the location of this file from the following registry location:
HKLM\SOFTWARE\Microsoft\MSDTS\ServiceConfigFile.
To customize the SSIS startup folders you can create a new XML file that follows the required
format and then point the SSIS service to that file by updating the ServiceConfigFile registry
key. The following listing illustrates a sample of the SSIS service configuration file:
<?xml version="1.0" encoding="UTF-8"?>
<DtsServiceConfiguration xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<StopExecutingPackagesOnShutdown>true</StopExecutingPackagesOnShutdown>
<TopLevelFolders>
<Folder xsi:type="FileSystemFolder">
<Name>Fsystem__SQL2005-SSIS</Name>
<StorePath>C:\_work\VisualStudioProjects\DTS</StorePath>
</Folder>
<Folder xsi:type="FileSystemFolder">
<Name>Fsystem__SQL2005-SSIS MSN Money
Projects</Name><StorePath>C:\_MoneyChartingRebuild\MoneyChartingRebuild2</StorePath>
</Folder>

<Folder xsi:type="FileSystemFolder">
<Name>Fsystem__SQL2005-SSISdts01 test packages</Name>
<StorePath>\\SQL2005-SSISdts01\c$\_work\testPackages</StorePath>
</Folder>
<Folder xsi:type="SqlServerFolder">
<Name>SQL__SQL2005-SSIS</Name>
<ServerName>SQL2005-SSIS</ServerName>
</Folder>
<Folder xsi:type="SqlServerFolder">
<Name>SQL__SQL2005-SSISdts01</Name>
<ServerName>SQL2005-SSISdts01</ServerName>
</Folder>

</TopLevelFolders>
</DtsServiceConfiguration>
You can see the results of using this custom SSIS service configuration file below.

Customizing the SSIS service folders


Mahindra Satyam Learning World version 1.0 172
One way you might want to use the SSIS service configuration ability is to create a common
management folder structure for multiple servers. To do this, you could store the service
configuration file in a central file share and point multiple servers to the shared configuration file.
This would enable all of the servers to have the same SSIS folder structure.
Managing DTS 2000 Packages with SQL Server Management Studio
The SQL Server Management Studio can also manage DTS packages that have been created in
SQL Server 2000 and that are stored in the dtspackages table of the msdb database.
To manage legacy SQL Server 2000 DTS packages using SQL Server Management Studio,
open the Object Explorer using the server type of Database Server in the Connect to Server
window. The Object Explorer will display a DTS 2000 Packages node as shown below.

Managing DTS 2000 packages with SQL Server Management Studio
The DTS 2000 Packages folder lists the SQL Server 2000 packages that are in the
sysdtspackages table in the msdb database. You manage the DTS by right-clicking a package
which displays a shortcut menu that you can use to perform a number of tasks including:
o

Open. Starts the SQL Server 2000 DTS Designer.
o

Migrate a package. Opens the Migration Wizard to migrate the DTS package to an
SSIS package.
o

Export. Exports the package from the msdb database to the file system.
o

Delete. Deletes the package from the msdb database.
o

Rename. Renames the package.
It is important to note that in order to use the Open option, the SQL Server 2000 DTS Designer
must be installed on the SQL Server 2005 system. The SQL Server 2000 DTS Designer will be
present if an existing SQL Server installation has been upgraded to SQL Server 2005 or if the
SQL Server 2000 Management Tool has been installed on the SQL Server 2005 system. More
information about working with legacy DTS packages is presented in the Migrating SQL Server
2000 DTS Packages section in this paper.


Mahindra Satyam Learning World version 1.0 173

Scheduling Package Execution
Schedule the execution of SSIS packages by using the SQL Server Agent. The SQL Server Agent
is the built-in job scheduling tool that is provided with SQL Server 2005. Like the SSIS Server,
the SQL Server Agent is implemented as a Windows service and that service must be running in
order to support job scheduling. Also, like the SSIS Server, the SQL Server Agent service is
managed using the SQL Computer Manager thats a part of the Computer Management MMC
console.
To create a new SQL Server Agent job to schedule a SSIS package using SQL Server
Management Studio:
Open Object Explorer.
Expand the SQL Server Agent node.
Right-click the Jobs node.
A SQL Server Agent job is comprised of a series of job steps. To execute an SSIS
package, you add a new Job Step to the SQL Server Agent job. To do this, select the
Steps page and then click the new button to display the New Job Step dialog box as
shown below

Scheduling SSIS packages


Mahindra Satyam Learning World version 1.0 174
When you create a job step that executes an SSIS package, the SQL Server Agent enables
you to specify the same run-time properties that you can use when the package is
executed from the SSIS Designer or by the dtexec utility. This includes supplying
configuration files, enabling checkpoints, and adding logging. If the job contains multiple
packages or successive job steps, you can set up procedures between each step that
control the execution of the job based on the completion, success, or failure of each job
step.
Remote Package Execution
To run SSIS packages on remote SQL Server systems, you can use the SQL Server
Management Studio to create a SQL Agent job on remote server. That SQL Agent job
can then perform an execute Agent Task that calls the dtexec utility to run the SSIS
package on the remote system.
In addition, you can design packages that are able to execute SSIS packages on remote
SQL Server systems by using the Execute SQL Server Agent task which is found in the
SSIS Designer toolbox under the Maintenance Plan tasks section. When you add the
Execute SQL Server Agent task to the SSIS Designer, you can set its Connection
properties to point to the remote server. Then when the task is executed, the SQL Server
Agent will execute a package on the remote machine.

Context:
Managing SSIS Package
Executing package Remotely

Check list:

SSIS Service should be running while using SSIS

Common Errors:

Executing SSIS Package when SSIS Service was stopped

Lessons Learnt:

Monitoring SSIS Package local as well as Remote Server.






Mahindra Satyam Learning World version 1.0 175

Topic: Securing Packages Estimated Time: 30 min.

Objectives: At the end of the activity, the participant will be able to understand:

Securing SSIS package

Presentation:
Security in SQL Server Integration Services consists of several layers that provide a rich and flexible
security environment. These security layers include the use of digital signatures, package properties,
SQL Server database roles, and operating system permissions. Most of these security features fall into
the categories of Identity and Access control.

Identity Features: Ensure that you only open and run packages from trusted sources.
To ensure that you only open and run packages from trusted sources
You first have to identify the source of packages. You can identify the source by signing
packages with certificates. Then, when you open or run the packages, you can have
Integration Services check for the presence and the validity of the digital signatures.

Access Control Features: Ensure that only authorized users open and run packages.
To ensure that only authorized users open and run packages, you have to control access to the
following information:
Control access to the contents of packages, especially sensitive data.
Control access to packages and package configurations that are stored in SQL Server.
Control access to packages and to related files such as configurations, logs, and
checkpoint files that are stored in the file system.
Control access to the Integration Services service and to the information about packages
that the service displays in SQL Server Management Studio.

Controlling Access to the Contents of Packages
Access restriction to the contents of a package can be achieved by setting the
ProtectionLevel property to the desired level.
Integration Services automatically detects sensitive properties and handles these
properties according to the specified package protection level.

Controlling Access to Packages
Integration Services packages can be saved to the msdb database in an instance of
SQL Server, or to the file system as XML files that have the .dtsx file name
extension.


Mahindra Satyam Learning World version 1.0 176

Saving Packages to the msdb Database
Saving the packages to the msdb database helps provide security at the server,
database, and table levels. In the msdb database, Integration Services packages are
stored in the sysssispackages table, whereas SQL Server 2000 DTS packages are
stored in the sysdtspackages table.
SQL Server packages stored in the msdb database can also be protected by applying
the Integration Services database-level roles. Integration Services includes three fixed
database-level roles db_ssisadmin, db_ssisltduser, and db_ssisoperator for
controlling access to packages.

Saving Packages to the File System
Packages store to the file system instead of in the msdb database, make sure to secure the
package files and the folders that contain package files.

Controlling Access to Files Used by Packages

Packages that have been configured to use configurations, checkpoints, and logging generate
information that is stored outside the package. Checkpoint files can be saved only to the file
system, but configurations and logs can be saved to the file system or to tables in a SQL Server
database.

Storing Package Configurations Securely
Package configurations can be saved to a table in a SQL Server database or to the file system.
Configurations can be saved to any SQL Server database, not just the msdb database. Thus, you
are able to specify which database serves as the repository of package configurations. You can
also specify the name of the table that will contain the configurations, and Integration Services
automatically creates the table with the correct structure. Saving the configurations to a table
makes it possible to provide security at the server, database, and table levels. In addition,
configurations that are saved to SQL Server are automatically backed up when you back up the
database.
If you store configurations in the file system instead of in SQL Server, make sure to secure the
folders that contain the package configuration files.

Controlling Access to the Integration Services Service
SQL Server Management Studio uses the SQL Server service to list stored packages. To prevent
unauthorized users from viewing information about packages that are stored on local and remote
computers, and thereby learning private information, restrict access to computers that run the
SQL Server service.






Mahindra Satyam Learning World version 1.0 177

Understanding the Protection Levels

The following table describes the protection levels that Integration Services provides. The values in
parentheses are values from the DTSProtectionLevel enumeration. These values appear in the
Properties window that you use to configure the properties of the package when you work with
packages in Business Intelligence Development Studio.

1. Do not save sensitive (DontSaveSensitive)
2. Encrypt all with password (EncryptAllWithPassword)
3. Encrypt all with user key (EncryptAllWithUserKey)
4. Encrypt sensitive with password (EncryptSensitiveWithPassword)
5. Encrypt sensitive with user key (EncryptSensitiveWithUserKey)
6. Rely on server storage for encryption (ServerStorage)


Using Integration Services Roles

SQL Server Integration Services includes the three fixed database-level roles, db_ssisadmin,
db_ssisltduser, and db_ssisoperator, for controlling access to packages. Roles can be
implemented only on packages that are saved to the msdb database in SQL Server. The following
table describes the read and writes actions of Windows and fixed database-level roles in
Integration Services.

Db_ssisadmin or sysadmin
db_ssisltduser
db_ssisoperator

If the user-defined roles to packages are not assigned, access to packages is determined by the
fixed database-level roles.
New database roles in SQL Server Management Studio can be created in msdb database and
assigned to packages
The Integration Services database-level roles grant rights on the Integration Services system
tables in the msdb database, but not on the DTS system tables, such as sysdtspackages in the
msdb database.

Steps for Assigning Roles to Package
Step 1: Open Object Explorer by using SQL Server Management Studio and Connect to
Integration Services and the Integration Services service must be started before
you can connect to Integration Services.
Step 2: Assign Reader and Writer Roles to Packages. You can assign a reader and a writer
role to each package. In Object Explorer, locate the Integration Services
connection.


Mahindra Satyam Learning World version 1.0 178
1. Expand the Stored Packages folder, and then expand the subfolder that
contains the package to which you want to assign roles.
2. Right-click the package to which you want to assign roles.
3. In the Packages Roles dialog box, select a reader role in the Reader
Role list and a writer role in the Writer Role list.
4. Click OK.
Step 3: Create a User-Defined Role; SQL Server (the MSSQLSERVER service) must be
started before you can connect to the Database Engine and access the msdb
database. To create a user-defined role
1. Open SQL Server Management Studio.
2. Click Object Explorer on the View menu.
3. On the Object Explorer toolbar, click Connect, and then click Database
Engine.
4. In the Connect to Server dialog box, provide a server name and select
an authentication mode. You can use a period (.), (local), or localhost to
indicate the local server.
5. Click Connect.
6. Expand Databases, System Databases, msdb, Security, and Roles.
7. In the Roles node, right-click Database Roles, and click New Database
Role.
8. On the General page, provide a name and optionally, specify an owner
and owned schemas and add role members.
9. Optionally, click Permissions and configure object permissions.
10. Optionally, click Extended Properties and configure any extended
properties.
11. Click OK.
Context:
To Open or run the package from the trusted source and authorized.
To Setting Read and Write Roles to the package
To enabling passwords on the package for Security.

Check list:

Ensure that you only open and run packages from trusted sources.
Ensure that only authorized users open and run packages.
Control Access to the Contents of Packages
Save Packages to the msdb Database


Mahindra Satyam Learning World version 1.0 179
Save Packages to the File System
Control Access to Files Used by Packages
Store Package Configurations Securely
Control Access to the Integration Services Service

Common Errors:

Differentiate the user with the database role and assign the accordingly.
The below are the fixed roles:
o Db_ssisadmin or sysadmin
o db_ssisltduser
o db_ssisoperator

Lessons Learnt:

About the security features of the SSIS i.e. Identity and Access control of the package.













Mahindra Satyam Learning World version 1.0 180

Crossword: Unit-12 Estimated Time: 10 min


Across:
2) _____________ feature ensures that only authorized users open and run package.
4) ___________ features ensure that youre only open and run package from trusted source
Down:
1) New database role in SQL Server Management Studio can be created in ______ database
3) SSIS includes ___________ fixed database level role.









Mahindra Satyam Learning World version 1.0 181

Answers for Crosswords
Unit-1
Across 1)DTS 3) ExtractTransformationLoad 4) ETL
Down 1) DataFlowEngine 2) BCP

Unit-2
Across 3) FilesystemMSBD 6) Annotation 9) Three 10) MSDB 11) variable
Down 1) versioning 2) XML 4) datastores 5) packages 7) object 8) controlflow

Unit-3
Across 1) Solution Explorer 3) dtutil 6)o/p
Down 2)Immediate 4)4 5)dtexec

Unit-4
Across 1) Transfer database 3)sequence 4)bulk insert 5)taskhost6)dataflow
Down 1)Task 2)Execute SQL

Unit-5
Across 1) Package Installation 2) Import Export 3) Transformation
Down 1) Package Migration

Unit-6
Across 1) ExportColumn 4) RowSampling 5) DataReader 6) Aggregate 8) UnionAll
Down 2) Charactermap 3) Audit 7) fuzzyLookup

Unit-7
Across 1) OnExecStatusChange 2) OnWarning 3) Logging
Down 1)OnPostValidate

Unit-8
Across 2)Output
Down 1)Callstack

Unit-9
Across 1) Integrationservices 2) transaction 4) checkpoint 5) two
Down 2) savecheckpoint





Mahindra Satyam Learning World version 1.0 182

Unit-10
Across 2) Single
Down 1) Runtime

Unit-11
Across 3) LogProvider 4) BytesRead 5) .Log
Down 1) FileInUse 2) TextFile

Unit-12
Across 2) AccessControl 4) Identity
Down 1) MSDB 3) Three