Вы находитесь на странице: 1из 24

SSIS Tutorial Transcript for bbv Techday 2006

September, 2006 by Urs Gehrig

SQL Server Integration Services

SSIS Tutorial Transcript for bbv Techday 2006

September, 2006 by Urs Gehrig

Page 2 of 24

SSIS Tutorial Transcript for bbv Techday 2006

September, 2006 by Urs Gehrig

Goal and Scenario


This tutorial about SSIS shows you what SSIS is for namely Microsofts new ETL (Exctract Transform and Load) tool. SSIS is a part of SQL Server 2005 and the successor of SQL Server 2000s DTS. This tutorial is not a summary of best practices its more a demonstration of its power and should give you some ideas for your next project. During this tutorial you will see an example of how you can load address data from a legacy system, adding some missing data and auditing information. The SSIS package doesnt overwrite already existing data. Instead it writes a history of all chances so you will know all the time who made which chances at what time. What you will learn here is great and useful but also only the peak of the iceberg. Therefore at the end of this tutorial you will find both some ideas for your own exercises and great web links and books about SSIS. Have fun!

Prerequisite
Reading this tutorial is one thing, to program it on your on machine is another. You dont need a lot for exercising yourself, only SQL Server 2005 SP1 Access to bbvs ftp server. The ftp server plays the role of a legacy system. Internet connection. You will query geonames.org. And last but not least this transcript. You can get a copy from wiki.bbv.ch. You will also find there a BIDS project containing two packages. One package is the result of this tutorial and the other one for setting up the DB.

Some Words about SQL Server Integration Services


It took a unique team to build SQL Server 2005 Integration Services. If I had told you in 2000 when we started the Integration Services project that we would assembly a team of almost 30 people at Microsoft who were utterly passionate about ETL, you would have been skeptical. Bill Baker, General Manager for SQL Server BI at Microsoft All are correct because Integration Services is a set of utilities, applications, designers, components, and services all wrap up into one powerful software application suite. SQL Server Integration Services (SSIS) is many things to many people. Kirk Haselden, Development Manager of SSIS team at Microsoft

Page 3 of 24

SSIS Tutorial Transcript for bbv Techday 2006

September, 2006 by Urs Gehrig

Step by Step Tutorial 1 Create a new SSIS project and solution. 2 Add user variables to hold working directory etc. The working directory will contain all of our input files. NOTE: Variables are case sensitive.

From the SSIS menu choose Variables. Click the Add new Variable Button. Name it WorkingDir. Scope should be the top most. Data Type of string. For Value enter D:\Work\bbv TechDay 2006\SSIS Tutorial\Work (no quotes). 7. Do the same for Variable InputFiles of type string and Value AD*.tab. The variable window looks like:

1. 2. 3. 4. 5. 6.

3 Add and configure a FTP task. We will download the address files from there.

1. Make the Control Flow canvas (vs. Data Flow) active. 2. From the toolbox, add a FTP task and open it. 3. On the General page set Name to Load Input Files from Legacy. 4. For FtpConnection select <New Connection>. 5. For Server name type ftp.bbv.ch, User name is bbvftp and Password is ???1. 6. Click OK to close FTP Connection Manager Editor. 7. On the File Transfer page set IsLocalPathVariable to True, LocalVariable to User::WorkingDir, OverwriteFileAtDest to True and Operation to Receive Files. 8. On the Expressions page enter the expression "/System/SSIS_Tutorial/" + @[User::InputFiles] for the property RemotePath.

For security reason the password is not shown here.

Page 4 of 24

SSIS Tutorial Transcript for bbv Techday 2006

September, 2006 by Urs Gehrig

4 Add and configure a For Each container. We will loop over the address files. The container will return the full file name into a mapped variable.

9. Click OK to close FTP Task Editor. 1. Add, connect and open a Fore Each container to the dataflow. 2. On the General page set Name to Process Input Files. 3. On the Collection page use the default enumerator type of Foreach File Enumerator. So it will loop once for each file in the specified folder. 4. Enter the expression @[User::WorkingDir] for the property Directory and @[User::InputFiles] for FileSpec. 5. Leave the default Fully Qualified for Retrieve File name 6. Change to the Variable Mappings page. 7. In the Variable Column, drop the list down and choose Add a new variable that is scoped to the top most container, which is the package itself. 8. Name the variable InputFile and set Value Type to String. 9. Click OK twice to close add variable dialog and For Each Loop editor. 10.After you hit enter a warning icon may appear on the transform. If you hover your mouse over the transform, a tool tip will mention an empty path.

11.We need to delay the validation until runtime. Set the property DelayValidation to TRUE.
Page 5 of 24

SSIS Tutorial Transcript for bbv Techday 2006

September, 2006 by Urs Gehrig

5 Add a new user variable to hold row counts. Later, inside the DataFlow task, we will map RowCount transform to this variable, in effect storing the row count of the Data Flow Path.

6 Add a dataflow task. The task will be processed once per iteration of the loop. Therefore in our case for each file in the folder the dataflow task will be executed.

1. From the SSIS menu choose Variables. You should see the InputFile already there. 2. Click the Add new Variable Button. 3. Name should be LineCount 4. Scope should be the top most. 5. Data Type of int32. 6. Value can be left at 0. 1. From the Toolbox add a Data Flow Task to the inside of the loop container and open it. 2. Rename in to Read Input File. 3. The data flow looks like:

7 Add Flat File Source and Connection Manager. We define a single and specific file in this step; it could be any of our existing files. After this step we will define a Property Expression on the new connection manager to load a different file per iteration of the loop.

1. Open the Data Flow Task. 2. From the Toolbox add a Flat File Source to the Data Flow Task and open the Flat File Source. 3. Click new to create a new Flat File Connection Manager. 4. For Connection Manager Name enter Load Address Data. 5. For the File Name point to our first file D:\Work\bbv TechDay 2006\SSIS Tutorial\Work\AD20060920.tab (no quotes). 6. Select Tab {t} as Column delimiter. 7. In Preview rows 1 - 100 you can see three dummy rows and some strange characters. 8. After you select 10000 (MAC Roman) for Code page

Page 6 of 24

SSIS Tutorial Transcript for bbv Techday 2006

September, 2006 by Urs Gehrig

8 Modify Connection String to dynamically change with loop iteration. Remember the variable InputFile we created earlier. We need that to feed our connection string per iteration of the loop via a property expression.

and 3 for Header rows to skip everything is ok. 9. Rename the columns as following: ID, Phone, Company, Title, Surname, Name, Address, Country, Zip and City. 10.Click OK twice to close Connection Manager dialog and Flat File Source editor. 1. In the Connection manager window select (but not open) the Load Address Data connection manager. We want to view its properties in the property sheet not the editor window. 2. In the property pane click in the empty row for Expression and then click the ellipse button. 3. Chose the ConnectionString property and click the ellipse button for the Expression column to go into the expression builder. 4. Expand the variable folder and drag the InputFile variable down to the expression. 5. Click OK twice to close Expression Builder and Property Expression Editor.

Page 7 of 24

SSIS Tutorial Transcript for bbv Techday 2006

September, 2006 by Urs Gehrig

9 Add Column to hold name of the file processed. This will add a new column to our data flow, containing the file name we are processing, to each data row. Nice for auditing.

10 Add a Row Count Transform. To capture the number of rows processed to a variable. Anomaly with the Row Count Transform is you have to manually type in variable names, it will not allow you to pick from a list. Remember variable names are case sensitive. 11 Add Audit Transform. To add useful meta data to our data stream for capturing in log data, such as package name and start time.

1. With the Flat File Source selected, view the properties window and 2. Set the FilenameColumnName property to SourceFilename. 3. After you hit enter a warning icon may appear on the transform. If you hover your mouse over the transform, a tool tip will mention a meta data error. 4. We need to refresh metadata. Right click the source and choose Advanced Editor. 5. Select the Refresh button at the bottom and click OK to close the source. 1. Add, connect and open a Row Count Transform to the dataflow. 2. Enter LineCount in the VariableName property (the variable we created earlier). 3. Click OK to close Row Count Transform. 1. Add, connect and open an Audit Transform to the dataflow. 2. In the first blank row, click the Audit Type column and select Execution Start Time. 3. Note the name is automatically filled in for you. 4. Now keep adding audit types Machine Name and User Name.

5. Click OK to close Audit Transform.


Page 8 of 24

SSIS Tutorial Transcript for bbv Techday 2006

September, 2006 by Urs Gehrig

12 Split all Addresses without any country info. In the next step we will try to guest the missing country code.

13 Try to find out the missing Country codes. For this we make a call to the net, e.g. http://ws.geonames.org/postalCodeSearch?postalcode=90 11&placename=Irnsum&maxRows=1. So wee need a HTTP Connection manager. The answer looks like:
<?xml version="1.0" encoding="UTF-8" ?> <geonames> <totalResultsCount>1</totalResultsCount> <code> <postalcode>9011</postalcode> <name>Irnsum (Jirnsum)</name> <countryCode>NL</countryCode> <lat>53.09166665</lat> <lng>5.75</lng> </code> </geonames>

1. Add, connect and open a Conditional Split Transform to the dataflow. 2. Enter Without Country for Output Name and TRIM(Country) == "" for Condition. 3. Enter With Country for Default output name. 4. Click OK to close Conditional Split Transform. 1. Right click into Connection Mangers window and select New Connection... 2. Select Type HTTP and press Add 3. Point the Server URL to http://ws.geonames.org/postalCodeSearch?postalcode ={Zip}&placename={City}&maxRows=1 and verify it by pressing Test Connection. 4. Press OK to close HTTP Connection Manager Editor. 5. Rename it to geonames.org 6. Add, connect (to the With Country path) and open a Script Component to the dataflow. 7. Rename it to Add Country. 8. From Available Input Columns select Country, Zip and City. 9. For Country change Usage Type to ReadWrite. 10.Go to tab Connection Managers and add Connection Manager geonames.org. Rename it to Geonames. 11.Go to tab Script and press Design Script 12.In the Project Explorer right click on node References and select Add Reference 13.Select Component System.XML and press Add. Do the same for Microsoft.SQLServer.ManagedDTS. 14.Press OK to close Add Reference. 15.Add the following two import lines at the very beginning of the source code: Imports System.Text and Imports Microsoft.SqlServer.Dts.Runtime 16.Type in the whole code as show in Appendix A.

Page 9 of 24

SSIS Tutorial Transcript for bbv Techday 2006

September, 2006 by Urs Gehrig

14 Merge both data paths together.

15 Add two Data Viewers to see the effect of Add Country.

1. Add and connect (to the With Country path) an Union All Component to the dataflow. 2. Add a path from the Script Task Add Country to the Union All Task. 1. Right click the path Without Country and select the menu Date Viewers. 2. Select the tab Data Viewers and click the button Add. 3. Name it Without Country: before adding and for Type select Grid. 4. Under the tab Grid remove all columns from Displayed Columns except ID, Name, Address, Country, Zip and City. 5. Click OK twice to close Config Data Viewer and Data Path Flow Editor. 6. Do the same for the path connecting Add Country and Union All but name it Without Country: after adding. 7. Note the two little icons indicating a Data Viewer is on the path:

8. Run the package. The two Data Viewer appears but only one
Page 10 of 24

SSIS Tutorial Transcript for bbv Techday 2006

September, 2006 by Urs Gehrig

shows data. A Data Viewer always pause the package processing.

16 Have a look to the distribution of the Countries. Add an additional Data Viewer but this time of type Column Chart.

9. To continue press the green arrow button in top of the Data Viewer window. 10.The package pause the processing again and the second Data Viewer shows also data. Have a look to the Country column where you will see the country code now. 11.Continue the package processing again and stop the debugger. 1. Add a Data Viewer to the path from Conditional Split to Union All. 2. Name it Distribution of Country and make it of Type Column Chart. 3. On the tab Chart Column select Country as Visualized column. 4. Click OK twice to close Config Data Viewer and Data Path Flow Editor. 5. Run the package again and have a look to the chart.

Page 11 of 24

SSIS Tutorial Transcript for bbv Techday 2006

September, 2006 by Urs Gehrig

17 Cleansing the columns. Have a look to one of the two Data Viewer of Type Grid and you see that some string values have leading white spaces. We have to remove all of them.

6. Stop the debugger. 1. Add, connect and open a Derived Column to the dataflow and name it Trim all values. 2. For Derived Column select Replace Phone and set Expression to TRIM(Phone). 3. Do the same for the Derived Column Replace Company through Replace City.

Page 12 of 24

SSIS Tutorial Transcript for bbv Techday 2006

September, 2006 by Urs Gehrig

18 Prepare the columns for saving to database. In our DB we will save all strings as UNICODE but the data from the files are coded as Macintosh (Roman). So we have to convert the strings.

4. Click OK to close Derived Column Transformation Editor. 1. Add, connect and open a Data Conversion to the dataflow and name it Macintosh -> Unicode. 2. Convert Input Column Phone to Unicode string and name it nPhone. 3. Do the same for Company, Titel, Surname, Name, Address, Country, Zip and City. 4. Convert Execution Start Time to database timestamp and name it nExecution Start time.

19 Its time to save the data to the DB. Because we will maintain a history of all chances we will add a SCD to the dataflow.

5. Click OK to close Data Conversion Transformation Editor. 1. Add, connect and open a Slowly Changing Dimension to the dataflow and name it Save Addresses. 2. For Table or view select dbo.Addresses. 3. For Input Columns select the corresponding Column, like nAddress for Address. 4. Dont assign an Input Column for ValidFrom and ValidTo. 5. ID is the only one of key type Business key. Click Next >. 6. Dimensions columns Address to Zip are of change type Historical attribute. Click Next >. 7. Use start and end date to identify current and expired records: ValidFrom and ValidTo and select System::Start Time

Page 13 of 24

SSIS Tutorial Transcript for bbv Techday 2006

September, 2006 by Urs Gehrig

for the value to set. Click Next >. 8. Deselect Enable inferred member support. Click Next > and Finish to close the wizard. 9. The wizard adds several data flow controls to the path:

10.Lets have a look to the SQL statement of the OLE DB Command task:

Page 14 of 24

SSIS Tutorial Transcript for bbv Techday 2006

September, 2006 by Urs Gehrig

20 Last but not least delete the processed input file.

1. Make the Control Flow canvas (vs. Data Flow) active. 2. Add, connect and open a File System Task to the Success task flow. 3. Name it Delete Input File. 4. Select Delete file as Operation. 5. Set IsSourcePathVariable to True and SourceVariable to User::InputFile. 6. Click OK to close File System Task Editor. 7. The control flow looks like following:

Page 15 of 24

SSIS Tutorial Transcript for bbv Techday 2006

September, 2006 by Urs Gehrig

21 Have a look to the running package.

22 Save environment specific parameters to a configuration 1. Select from the Windows Start Menu Run and start the file. program sysdm.cpl. Because the path to the configuration file is embedded to 2. Go to the Advanced page and click Environment the package, the whole package is no longer independent Variables. to the environment. Therefore we will use an indirect 3. Under System variables click New.
Page 16 of 24

SSIS Tutorial Transcript for bbv Techday 2006

September, 2006 by Urs Gehrig

configuration, i.e. the path to the configuration file is placed in an environment variable.

4. Define the variable SSISTutorial_Configuration and set its value to D:\Work\bbv TechDay 2006\SSIS Tutorial\ SSISTutorial_Configuration.dtsconfig.

5. Click OK three times to close System Property dialog. 6. Go back to the BIDS close and reopen it to get the new environment variable. Open the project and go to the package. 7. Select from the menu SSIS / Package Configuration. 8. Select Enable package configurations. 9. Click Add and Next. 10.As Configuration type select XML configuration file. 11.Select Configuration location is stored in an environment variable. 12.Select the environment variable SSISTutorial_Configuration. 13.Click Next >. 14.Name it Configuration and click Finish. 15.Click Add and Next again. 16.This time select XML configuration file and Specify configuration settings directly. 17.Name the configuration file D:\Work\bbv TechDay 2006\SSIS Tutorial\ SSISTutorial_Configuration.dtsconfig and click Next >. 18.Select following objects: Executables / Load Input Files from Legacy / Properties / RemotePath Connection Managers / FTP Connection Manager /
Page 17 of 24

SSIS Tutorial Transcript for bbv Techday 2006

September, 2006 by Urs Gehrig

Properties / ServerName Connection Managers / FTP Connection Manager / Properties / ServerPassword Connection Managers / FTP Connection Manager / Properties / ServerUserName Connection Managers / geonames.org / Properties / ConnectionString Connection Managers / SSIS_Tutorial / Properties / ConnectionString Connection Managers / SSIS_Tutorial / Properties / InitialCatalog Connection Managers / SSIS_Tutorial / Properties / ServerName Variables / InputFiles / Properties / Value Variables / WorkingDir / Properties / Value

19.Click Next >.


Page 18 of 24

SSIS Tutorial Transcript for bbv Techday 2006

September, 2006 by Urs Gehrig

23 Inspect the configuration file.

20.Name it XML File and click Finish. 21.Click Close to close Package Configurations Organizer. 1. In the Solution Explorer right click on the node project node and select Add / Existing Item. 2. Select the file D:\Work\bbv TechDay 2006\SSIS Tutorial\ SSISTutorial_Configuration.dtsconfig and click Add. 3. Open the file and press Ctrl+K, Ctrl+D to format the configuration file. 4. Go to the line for the ftp server password. For security reason the password was not exported. Type it in again.

Page 19 of 24

SSIS Tutorial Transcript for bbv Techday 2006

September, 2006 by Urs Gehrig

Are you ready for some more interesting stuff?


Following are some ideas for your own exercises: Error Flow: Many data flow controls can redirect erroneous data to a different path. Try to redirect addresses with truncated columns to an error file for later investigation. Package logging: Enable logging for the package and play around with the many logging options and log providers. (see BIDS menu SSIS / Logging) Transactions: Start looking in the BOL; search for transaction [Integration Services] Checkpoints: You can configure Integration Services packages to restart from a point of failure, instead of rerunning the entire package, by setting the properties that apply to checkpoints. Insert a checkpoint after the task Load Input Files from Legacy. Start looking in the BOL; search for checkpoints [Integration Services].

Are you looking for more information?


Try this books, web sites etc.: Microsofts SSIS product site: http://www.microsoft.com/sql/technologies/integration/default.mspx Project REAL: Business Intelligence ETL Design Practices: http://www.microsoft.com/technet/prodtechnol/sql/2005/realetldp.mspx The ultimate SSIS Book (in my opinion) from the Development Manager on the Integration Services team (You will found it in bbv library.): HASELDEN, Kirk: Microsoft SQL Server 2005 Integration Services. Sams Puplishing 2006.

Page 20 of 24

SSIS Tutorial Transcript for bbv Techday 2006

September, 2006 by Urs Gehrig

Appendix A Code for Script Component Add Country


' Microsoft SQL Server Integration Services user script component ' This is your new script component in Microsoft Visual Basic .NET ' ScriptMain is the entrypoint class for script components Imports Imports Imports Imports Imports Imports Imports System System.Data System.Math System.Text Microsoft.SqlServer.Dts.Pipeline.Wrapper Microsoft.SqlServer.Dts.Runtime.Wrapper Microsoft.SqlServer.Dts.Runtime

Public Class ScriptMain Inherits UserComponent #Region "Private declarations..." Private httpConnection As Microsoft.SqlServer.Dts.Runtime.HttpClientConnection #End Region Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer) Dim xmlResult As Xml.XmlDocument = New Xml.XmlDocument() Dim node As Xml.XmlNode Dim URL As String 'chance URL according actual row parameters and load page 'Remark: Because you can change DTS Variables only in 'PreExecute' and 'PostExecute' ' we can't use an expression for 'genames.org' connection string. So we do it on this way. URL = Me.Connections.Geonames.ConnectionString.Replace("{Zip}", Uri.EscapeUriString(Row.Zip)) URL = URL.Replace("{City}", Uri.EscapeUriString(Row.City)) httpConnection.ServerURL = URL xmlResult.LoadXml(Encoding.Default.GetString(httpConnection.DownloadData())) 'xmlResult example: '<?xml version="1.0" encoding="UTF-8" ?> '<geonames> ' <totalResultsCount>1</totalResultsCount>
Page 21 of 24

SSIS Tutorial Transcript for bbv Techday 2006

September, 2006 by Urs Gehrig

' <code> ' <postalcode>9011</postalcode> ' <name>Irnsum (Jirnsum)</name> ' <countryCode>NL</countryCode> ' <lat>53.09166665</lat> ' <lng>5.75</lng> ' </code> '</geonames> 'parse result and set country node = xmlResult.DocumentElement.SelectSingleNode("/geonames/code/countryCode") Row.Country = CStr(IIf(node Is Nothing, "<unknown>", node.InnerText)) End Sub

Public Overrides Sub AcquireConnections(ByVal Transaction As Object) MyBase.AcquireConnections(Transaction) httpConnection = New HttpClientConnection(Me.Connections.Geonames.AcquireConnection(Nothing)) End Sub Public Overrides Sub ReleaseConnections() MyBase.ReleaseConnections() httpConnection = Nothing End Sub End Class

Page 22 of 24

SSIS Tutorial Transcript for bbv Techday 2006

September, 2006 by Urs Gehrig

Appendix B Adresses Table Definition

Page 23 of 24

September, 2006 by Urs Gehrig

Вам также может понравиться