Вы находитесь на странице: 1из 50

Getting started with SSIS - Part 1: Introduction to SSIS

Feb 2 2012 12:00AM by Sudeep Raj

SSIS is a tool used for ETL. Before we talk on SSIS, let me walk you through what is ETL. ETL stands for Extract Transform and Load. These are simple day-to-day words we use in our daily lives. The figure below depicts ETL in real world scenario.

E - Extract data from various homogeneous or non-homogeneous source systems. Data could be stored in any of the following forms though not limited to them: Flat file, Database, XML, Web queries etc. When we can have sources of such variety, the job of extraction is to fetch the data from these sources and make it available for the next step. T - Transform Data: As already discussed that the data are coming from various sources and we cannot assume that the data is structured in the same way across all the sources. Therefore, we need to transform the data to a common format so that the other transformations can be done on them. Once we have the data we need to perform various activities like: Data cleansing Mandatory check Data type check Check for foreign key constraints Check for business rules and apply business rules Creation of surrogate keys Sorting the data Aggregating the data Transposing the data Trim the data to remove blanks.

The list can go on as the business requirements get complex day by day and hence the transformations get complex. While transformations are on, we need to log the anomalies in data for reporting and corrective action to be taken.

L Load Data: Once the transformations are done and the data takes the form as per the requirement, we have to load the data to the destination systems. The destinations can also be as varied as the sources. Once the data reaches the destination, it is consumed by other systems, which either stores it as historical data, generate reports out of it, build modes to take business decisions etc. SSIS stands for SQL Server Integration Services. Microsoft introduced Business Intelligence Suite, which includes SSIS, SSAS (SQL Server Analysis Server) and SSRS (SQL Server Reporting Services). Now whats this Business Intelligence (BI)? Let me take some time to explain that. As the name suggests, it helps Business run across the globe. It provides the business with data and ways to look into the data and make business decisions to improve the business. So, how do the 3 products work in the BI world or how are they organized? To start any business analysis we need data, and as I explained earlier, ETL would be used here to get the data from varied sources and put the data to tables or create Cubes for data warehouse. To do this, we make use of SSIS. Once we have the data with us, SSAS comes into picture to organize the data and store them to cubes. Next, we need to report the data so that it makes sense to the end user. This is where SSRS comes into picture for report generation. The order of SSIS and SSAS could change, as both can come first. Having said this, SSIS makes the backbone of this entire domain as all the data is assembled using SSIS. Now we dive in to what exactly SSIS is. Is it another coding language? The answer to the question is NO. SSIS is not another coding language. In fact very little coding is required in SSIS that too in very few cases. SSIS is a tool used for ETL. It is based on the paradigm shift where we need to focus more onto the business requirement and less on the actual coding to achieve the goals. SSIS is a visual tool with drag and drop feature, which enables us to create SSIS, packages to perform ETL in a very short amount of time. The development time is widely reduced as compared to legacy systems where each aspect of ETL had to be coded and then of course, tested. Once the package is created, the visual look is good enough to give us an idea of what the ETL is doing. SSIS provides us many tasks and transforms which help us build our ETL packages. A few examples would be Execute SQL Task which enables us to execute any T-SQL script from SSIS, Send Mail Task which enables us to send mail to any recipient. Likewise, there are lots of tasks and transforms available in SSIS to take care of most of the ETL scenarios that you can think of. Note: I will explain what is Tasks and transforms in later chapters. However, what if I have a scenario that cannot be accommodated in the existing tasks and transforms? SSIS has an answer to that too. In that case you can use the entire .NET library and code in either C# or VB to achieve your requirement. Though such scenarios would be very few and even if required will be of very basic nature where you need to know the basics of any programming language to get going. Lets say that you are not satisfied with the logging provided by SSIS, you can always go ahead and write code to log the errors or activities the way you choose. Taking this a step further, if you want this new type of logging in many packages, you could convert your code to create a custom component and add it to all SSIS packages. It will now work as any other task or transform for you. Is that not great!!! Note: C# is available only in SSIS 2008 and not 2005. Let us peep into a simple SSIS package and get to know how SSIS looks:

This is how an SSIS package looks and feels. I will get into the details of what you see above in the coming chapters.

Getting started with SSIS - Part 2: BIDS Overview


Feb 7 2012 12:00AM by Sudeep Raj

What is BIDS? Why do I need it? Where can I find it? 3 questions are asked here. Therefore, I will provide the 3 answers to you. What is BIDS? BIDS stands for Business Intelligence Development Studio. And this is what MSDN says about BIDS: Business Intelligence Development Studio is Microsoft Visual Studio 2008 with additional project types that are specific to SQL Server business intelligence. Business Intelligence Development Studio is the primary environment that you will use to develop business solutions that include Analysis Services, Integration Services, and Reporting Services projects. Each project type supplies templates for creating the objects required for Business Intelligence Solutions, and provides a variety of designers, tools, and wizards to work with the objects.

Therefore, this definition from MSDN lays down the basics of what is BIDS. BIDS is an IDE (Integrated Development Environment) that is very similar to the basic look and feel of Visual Studios. It has been customized for Business Development Solution creation of MS BI Suite (SSAS, SSRS and SSIS). It does not have tools like textbox or buttons as you would see in Visual Studio. Instead, they comprise MSBI specific tasks, which enable us to create packages. Why do I need it? BIDS facilitates the entire development of solutions mentioned above. All you need to know is the business requirement and the basics of BI suite. The development need is highly reduced as a developers job can be summarized by DRAG, DROP, LINK & CONFIGURE the various tasks or transforms, as may be the case (I will elaborate them later). Talking specifically about SSIS, BIDS is the only way where you develop and test the SSIS package. SSIS package is nothing but a complex XML file. To make this xml file (package) and to modify the complex XML, we need BIDS. BIDS has lots of other features as well which we will be covering in the next chapter. Where can I find BIDS? You can get BIDS while installing MS SQL Server on your machine. While SQL Server installation on your machine follow the below steps and you will have BIDS on your machine. Step 1

Step 2

Step 3

Step 4

Step 5

Step 6

Step 7: Put a valid product key for your SQL server.

Step 8: Accept the license terms

Step 9: Setup the role as per your need

Step 10: While selecting the features select SQL Server Integration Services (You do not see here as it is already installed on my machine) and then from the shared services select the Business Intelligence Development Studio.

After this step, you could proceed with your SQL Server installation. Once installed, you could start BIDS from Start->MS SQL Server -> SQL Server Business Intelligence Studio When you open BIDS, it appears just like Visual Studios.

The difference comes when you try creating a new project ( File->New->Project).

Here you will find the Business Intelligence Projects on the left pane and the various templates for MS BI development available. As of now, we would be concentrating on Integration Services Projects. So, that sums up how to get BIDS on your machine. We will talk on how to use BIDS in the next article. Previous: Getting started with SSIS - Part 1: Introduction to SSIS Next: Getting started with SSIS - Part 3: First SSIS Package

What is Control Flow? Let me start by saying, No Package on SSIS can be made without having a Control Flow. Think of 'Control Flow' as a big container that contains a group of tasks to achieve the final goal. Well, 'Control Flow' is also known as the place for data preparation and final clean up. In Control Flow, we set up the flow of jobs in an ETL process, like fetching the files from ftp location to local and then make the file ready for ETL or at times, we clear the staging tables before the ETL process. All such tasks are done before ETL commences. Once the ETL completes, we use Control Flow for tasks like sending mail updates, achieving files, etc. Control Flow primarily contains various tasks and precedence constraints, which connects these tasks in a logical sequence. This creates the flow of control among the tasks.

Few of the various tasks available in Control Flow are: For Loop For Each loop Script Task Send Mail Task Execute SQL Task Data Flow Task FTP Task ActiveX Script Task Execute DTS 2000 Package Task Execute Package Task Execute Process Task XML Task Transfer Database Task

........ etc. In most of the cases, the names are self-explanatory. For others we will discuss in details.

A meaningful package would contain at least one task. In most of the cases, we have a number of the above tasks in a package with a particular task used more than once based on the requirement. Now we can have the tasks linked to each other or each task independent of the other or a group of tasks that need to follow a sequence and other group of tasks need to follow a sequence. However, these 2 groups are mutually exclusive and are not dependent on each other. In such cases, we need to join the 2 groups of tasks. Single Flow in a package Multiple Groups, which run in parallel Precedence Constraint Now how do we group create a flow between two tasks or link two Tasks? When you drag a particular task from the Tasks pane onto the designer, you will see a GREEN arrow coming from the task. All you need to do is drag this arrow and connect to another task on the designer. Therefore, the Flow would be from the task, which had the arrow initially to the Task to which it is connected. This Green arrow is called the PRECEDENCE CONSTRAINT and this comes very handy while designing packages. One Task could link with multiple tasks. Similarly, multiple tasks could lead to one task. Precedence constraint need not always be green. It can have colours as Green, Blue or Red. It need not always be a continuous line. It could also be a dotted line. We will talk about all this later in detail. Note: In most of the images, you would see a red dot in the tasks. This is because the tasks are not yet configured and SSIS shows the red mark to highlight that this need attention. Next, we will talk about Data Flow Task. Previous: Getting started with SSIS - Part 3: First SSIS Package Next: Getting started with SSIS - Part 5: Data Flow Task

What is Control Flow? Let me start by saying, No Package on SSIS can be made without having a Control Flow. Think of 'Control Flow' as a big container that contains a group of tasks to achieve the final goal. Well, 'Control Flow' is also known as the place for data preparation and final clean up. In Control Flow, we set up the flow of jobs in an ETL process, like fetching the files from ftp location to local and then make the file ready for ETL or at times, we clear the staging tables before the ETL process. All such tasks are done before ETL commences. Once the ETL completes, we use Control Flow for tasks like sending mail updates, achieving files, etc. Control Flow primarily contains various tasks and precedence constraints, which connects these tasks in a logical sequence. This creates the flow of control among the tasks.

Few of the various tasks available in Control Flow are: For Loop For Each loop Script Task Send Mail Task Execute SQL Task Data Flow Task FTP Task ActiveX Script Task Execute DTS 2000 Package Task Execute Package Task Execute Process Task XML Task Transfer Database Task

........ etc. In most of the cases, the names are self-explanatory. For others we will discuss in details.

A meaningful package would contain at least one task. In most of the cases, we have a number of the above tasks in a package with a particular task used more than once based on the requirement. Now we can have the tasks linked to each other or each task independent of the other or a group of tasks that need to follow a sequence and other group of tasks need to follow a sequence. However, these 2 groups are mutually exclusive and are not dependent on each other. In such cases, we need to join the 2 groups of tasks. Single Flow in a package Multiple Groups, which run in parallel Precedence Constraint Now how do we group create a flow between two tasks or link two Tasks? When you drag a particular task from the Tasks pane onto the designer, you will see a GREEN arrow coming from the task. All you need to do is drag this arrow and connect to another task on the designer. Therefore, the Flow would be from the task, which had the arrow initially to the Task to which it is connected. This Green arrow is called the PRECEDENCE CONSTRAINT and this comes very handy while designing packages. One Task could link with multiple tasks. Similarly, multiple tasks could lead to one task. Precedence constraint need not always be green. It can have colours as Green, Blue or Red. It need not always be a continuous line. It could also be a dotted line. We will talk about all this later in detail. Note: In most of the images, you would see a red dot in the tasks. This is because the tasks are not yet configured and SSIS shows the red mark to highlight that this need attention. Next, we will talk about Data Flow Task. Previous: Getting started with SSIS - Part 3: First SSIS Package Next: Getting started with SSIS - Part 5: Data Flow Task

Now we come to the heart and soul of SSIS DATA FLOW TASK aka DFT. As the name suggests, it is a task. What kind of task? Data Flow Task that literally means a task where DATA flows. Thats exactly what ETL does, Extract Transform & Load of data. Most of the ETL is done in the Data Flow Task. A Data Flow Task has to be a part of a Control Flow. One control flow can have a number of DFTs in it, as the Control Flow is the place where we can decide the ways the multiple DFTs will be placed in our solution. Each DFT has to have a Source, may or may not have a Transform, may or may not have a Destination. However a DFT needs to have at least a transform or Destination followed by the Source. Source cannot just get data from somewhere and do nothing with it. If thats the case, DFT will fail. Ideally, DFT has a Source, Transform (optional) and a Destination. Source is used to extract the data, Transform is used to transform the data while Destination is used to Load the data to a destination. That's how ETL is implemented in DFT. The way the Control Flow contains tasks, DFT contains Source,Transforms and destinations and they are not called Tasks as many people call them. Let list down a few examples from each: Source: OLEDB Source Flat File Source XML Source Excel Source ADO NETSource

Transformations: Aggregate Conditional Split Derived Column Fuzzy Lookup Merge Merge Join Look up Row Count Sort Union All Script Component OLEDB Command

Destinations: OLEDB Destination Flat File Destination Excel Destination Recordset Destination Raw File Destination

As you can see from the list above, there are many sources & transforms, which can help you to achieve your requirement. I have selected just a few elements of DFT. There are others as well. We would talk

about DFT and parallel execution later in the advanced section where we talk about the SSIS Execution tree. In the figure below, you can have a look at how a DFT looks once we have various Sources, Transforms and Destinations attached. You would notice the Red circle on each transform, it is because they are not yet configured and SSIS is showing that this will error out if executed now.

Just remember the way we connect the tasks in Control Flow with the use of precedence constraints. Similarly, we use arrows in DFT but they are not termed as precedence constraints. Here they are available in just two colours: Green for valid records and Red in case the data is incorrect as per our setup. Unlike Control Flow in Data Flow, the various transforms cannot have any number of inputs or outputs from them. They are predefined barring the Multicast Transform which can have any n number of outputs configured. OLEDB Source takes no input and gives one success (green) and one error output (red), we have similar rule for each source, transform and destination.

Hope that gave you an understanding of basics of Data Flow Task. We will talk about the details later. Previous: Getting started with SSIS - Part 4: Control Flow Next: Getting started with SSIS - Part 6: Import Export Wizard Part 1

As I mentioned in the last chapter, we will look into our very first SSIS package. There are multiple ways to build an SSIS package. The simplest way to do so is Import export wizard. You can start the SQL Server Import & Export wizard by various ways. Below are the ways to start: In Business Intelligence Development Studio, right-click the SSIS Packages folder, and then click SSIS Import and Export Wizard. In Business Intelligence Development Studio, on the Project menu, click SSIS Import and Export Wizard. In SQL Server Management Studio, connect to the Database Engine server type, expand Databases, right-click a database, point to Tasks, and then click Import Data or Export Data. In a command prompt window, run DTSWizard.exe, located in C:\Program Files\Microsoft SQL Server\90\DTS\Binn.

I will be using the 3rd way to start the Import & Export wizard. All the methods will show you the same wizard so you could try any of these. Let us try to use the Import & Export Wizard to load the following data to a SQL Server Table. Name,Age,Sex Sam,28,M Naom,23,M Rita,26,F Lets now start the with the Import & Export Wizard. I will be using screenshots along. In SQL Server Management Studio, connect to the Database Engine server type, expand Databases, right-click a database, point to Tasks, and then click Import Data. See the figure below:

On doing the above, the Import and Export Wizard will open. On the welcome screen you get a brief description about the wizard. You could choose to not see this message when you open it the next time. Just check the checkbox towards the bottom. Once done, click Next.

The next screen of the wizard is the place where you provide the details of the Data Source (Fig.3). There are various options for Data Source varying from Flat File Source (chosen for my example), Excel Source, Access Source, OLEDB Source, etc. You need to select the appropriate source as per you need. Since I have selected Flat File source as the data source I get the following fields. Had I chosen OLEDB Source it would have been asking me for the Server details and the database to fetch the data from etc. On the General tab, you see the features as mentioned below: In the File name click the browse button and select the input file you wish to import to the SQL Table. If you are working in a different locale, you could set that. Else, let it remain as the default. Depending on the Format of your input data, you could select Delimited or Fixed Width File. I am using a comma delimited file hence selected Delimited here. If you have a text qualifier, you could select it from the drop down list, which has the most commonly used text qualifiers. In our example, we do not have a text qualifier so we leave it blank.

The next point is Header row delimiter. In case your header row has a different delimiter compared to the detailed records, then this feature is useful. This wizard does all these settings by default after reading your file, so mostly you do not need to change them. Now lets say we need to ignore the first records in the data as that is not relevant. So here we set the Header rows to skip to 1. If you have the column names in the first row you need to check the check box stating Column names in the first data column. What this will do is take the column names from this and not use this as a data record. If you do not do this in our example Name, Age, Sex will also be treated as data records (as I have not even set the Header rows to skip). This is a very useful feature because if the column names are there, the wizard reads them and creates the meta data accordingly. Else you will need to put them manually for easy maintenance and development.

fig. 3 Once you have the General settings for the flat file, click the next tab Columns (Fig. 4). As the name suggests, here we provide the basic details of the columns and you get to see the preview of the data.

You do not need to make changes in the Data source, as it is set to Flat File Source in the last tab itself. Row delimiter is set to {CR}{LF} by default. CR is for Carriage return while LF is line feed terms derived from olden days typewriter. Windows uses both for line feed (new line) while UNIX uses ju st LF. Column delimiter will be set automatically if it is among the once available in the drop down list. Else, you need to do the advanced settings, which I will explain later. Other column delimiters commonly used are semicolon { ; } , pipe or vertical bar{|} , colon {:}, tab {t}, etc. On doing these basic setting, you can view the preview of the file. Note: Had we not checked the box stating Column name in the first data row we would not get the column names as the you see now (Fig. 4). For column headers, you would have seen Column1, Column2 & Column3. And the current column headers would be treated as data record and you would have 4 records in you data now. You data would look like:

Fig. 4

Let us move on to the next tab Advanced (Fig. 5). As you see here, we will have the list of all columns. As you click each column, on the right pane you can see various properties for the column. Primarily you could set the data type of each column and the length, in case it is of string type. In its numeric, you need to set the precision and scale. Note: By default, all the columns would be assigned datatype as string ([DT_STR]) with OutputColumnWidth (length) set to 50. You need to take care about this or else there might be truncation and your package would fail if the data in any column is greater than 50 characters. In case all the data in your particular column is of at max 10 characters, then you would be unnecessarily wasting memory and your package performance would be affected. Note: Had we not checked the box stating Column name in the first data row we would not get the column names as you see now (Fig. 5). All you would see Column1, Column2, Column3 one below the other. It will not be an issue if the number of columns is 5 or less. But imagine if you have greater than 10 columns. It would get very difficult to track which column is representing what data. And this would lead to error, delay in development and MOST difficult to maintain the package later.

fig. 5 After the Advanced tab, click the Preview tab and you would see a similar preview as you saw in (Fig. 4). In case you do some changes in the advance tab which were not there in the Columns tab then you will see difference in the 2 previews. Otherwise they will be the same. Till this point we have set up the source of our ETL. Once you are satisfied that all the columns are well mapped click the Next button. You will be in the next window (Fig. 6) Now we need to configure the destination. As mentioned at the beginning, we need to get the data from a flat file to a table in a SQL Server database. Let the Destination be set as default SQL Server Native Client 10.0. In the Server Name provide the server name. I have provided a period {.} which represents localhost. In case you have a SQL server instance created you would need to put that as well .

Select the Authentication mode. You need to select the radio button accordingly. Next, you need to select the database name from the drop down list (provided you have given the valid credentials above). You could also create a new database if you wish to send the data to a new database. All you need to do is click the New button and provide the database name. Once completed, click the Next Button.

fig. 6 The next window will have the option to select the source and the destinations configured earlier (In our case as we have just done for one. In case the source was SQL server, then we would have to select the tables we would like to export.) Here we just need to select the check boxes you see below (Fig. 7). In the Destination, just key in the name of the table you wish to load the data. You are now almost done. Click the Next button.

fig. 7 The primary setting for you 1st SSIS package is done. The next window has the option to Run & Save the SSIS package (Fig. 8). You have the option to run the package immediately AND / OR save the SSIS package for later execution. I select the 2nd option. You could select any of the options or both. I would just be saving the package now and will execute that later. Once you select the option Save SSIS Package, you need to select from 2 options. 1. SQL Server: The package would be stored in SQL server and you could execute the package from SSMS and easily execute them from there 2. File System: The SSIS package would be saved on you system. In addition, here you could open the package and edit using BIDS if you feel the need. Then execute it with SSIS Package Execution Utility.

We will talk in detail about the next property that is set on this screen, Package Protection level. You could leave it to the default value or set it to Do not save sensitive data as I have done. Click Next after this is done.

On the next screen, you see very basic settings for the package like providing a name, description to the package. And finally providing the path where we need to save the package. Thats it!! Click Next to goto the next screen :)

The next screen shows a brief summary about the package. Primarily what are the source and destination details along with the package name and location. It also provides whether the package will execute on clicking the next button or not. If you remember I had selected the option to not run the package immediately. Hence you see the last line on this screen as The packa ge will not run immediately. One all looks fine click the Next button.

On clicking the Next button, you see that the package being validated for all the setting and finally it is saved. It would have run if we had chosen the option of Run immediately. See image below.

You can go-to the path you provided and check the file with .dtsx extension. That is your SSIS Package. That is it for this chapter. We will dissect the package in the next chapter. Take a break and come back soon :)

Welcome Back.... Hope you have checked that the SSIS package was created in the folder you had mentioned in the last chapter. Let us understand what this package is doing and how. We will not get into all the details right now. Slowly, as we progress, you will get to understand what is happening in the package. Here, we discuss just the high-level working of the package. After this chapter, you would have a better understanding of BIDS and SSIS Package.

So before executing the package, let us have a recap of what we are trying to achieve. We are trying to load data from a text file to a SQL Server database table Input. So before starting, lets see what we get when we do a SELECT * FROM dbo.INPUT. Below is the output that we get. As you can see that we get an error message which tells us that the object dbo.INPUT does not exist. So we are now sure that we do not have the destination table ready before the package execution. Lets now start with the package.

Go to the folder where you have saved the SSIS Package created in the last chapter. Right click the .dtsxfile (SSIS Package) and go to Open with. In case, you see just one option (SQL Server 2008 (or 2005) Integration Services Package Execution Utility); you need to choose default program and then select Visual Studios and then click Ok.

Or you could directly open Visual Studios and then go to File->Open -> File. Browse the SSIS package and open it. Once the package is opened, you would see the below (Fig. 3). Remember BIDS? This is BIDS with the package you created. Pay attention to the central part of the figure that is the Designer. If you see the top RED box I have highlighted, it says Control Flow (though not completely visible here). In the Control Flow, we have 2 tasks created by the Import & Export Wizard based on our settings. Preparation SQL Task 1 and Data Flow Task.

Another important feature that I would like to draw your attention is at the bottom of the Fig. 3 that is Connection Managers. You see that we have 2 Connection managers namely DestinationConnectionOLEDB and SourceConnectionFlatFile. I will explain about them once we discuss the above tasks. Just remember that Connection Managers, as the name suggests, provides connection to the various data sources we have.

Fig 3 Let us see what is happening in the first task in the Control Flow Preparation SQL Task 1. This is the name given to the task in this package. This task is actually Execute SQL Task and you can rename it to describe its actions. Double click the task and you would see the Execute SQL Task Editor. For now, just concentrate on the parts that I have highlighted. The first thing is the property Connection and it has been set to DestinationConnectionOLEDB. Next check the value of the property SQL Statement. If you just go to the adjoining text box you will see a blip. Click that and you will see the small window with some SQL Query written on it. In this case, it has the following query: view source print? 1.CREATE TABLE [dbo].[Input] ( 2.[Name] varchar(50),

3.[Age] int, 4.[Sex] varchar(50) 5.) 6.GO Hope this query tells you what is going on in this task. In this task, we are just creating the destination (target) table where the data needs to be loaded. Now just double click Cancel and come out to the Package designer. Go to the Connection managers and double click the connection DestinationConnectionOLEDB. Are the settings not exactly what you did in the Wizard earlier? This is the connection that was used in the above Execute SQL Task. So before this task is executed, the above connection manager would create the connection between the package and the Server on which it has to execute the SQL query (create table in our case). Lets now move onto the next task in the Control flow.

Double click the next task Data Flow Task 1. As the name suggests, it is the place where the data actually flows. Having said that let us see what is happening in here, first thing to note here is that the control has moved from Control Flow to Data Flow, look at the 1st thing on top I have highlighted. We are not working in Data Flow. Next, see the transforms that are being applied here. We have 2 transforms namely Source - Input_csv & Destination - Input. Notice that we have the same Connections available in the Data Flow as well. Source - Input_csv: This is primarily a Flat File Source and is used to read data from flat files, be it .txt, .csv or any flat file where the data has been set in a specific format. Flat File Source can be used to read delimited files (columns in each record are separated by specific delimiter) or Fixed Width File (Files that do not have any delimiters, but have each column of fixed width so data is extracted based on the precise position of the column). We have already seen the settings in the last chapter. You could explore them by yourself.

Destination - Input: This is primarily an OLEDB Destination and is used to send final transformed data to an OLEDB Destination like SQL Server database, Access Database, Excel file, etc. You could explore this further. Both these transforms will be discussed in detail later.

At this point of time, I will not talk about any other feature that you see in the package. You are free to explore. Now is the time to execute the package. Go back to the folder where the file is saved. Right click the .dtsx file again. Click SQL Server 2008 Integration Service Package Execution Utility and then you will see a wizard open.

Just click the button Execute on the Execute Package Utility (as in figure below). You will notice that another small window pops up where you see something like logs (Fig. 8) being written on. Scroll down and you will notice that all the tasks that we have just seen would be mentioned in the log along with the connection strings. Once the logs are written and the Close button gets enabled, you know that the execution is over. You could close the small window and the Execute Package Utility.

Now all that is left is verifying whether the package has done its work or not. Go to the figure below

When I execute the same query executed before the package execution, I get the following result and no error :) Hence we know that the package has worked and we have the data imported from a flat file to a new table in SQL Server table.

Food for thought: BIDS too provides a way to execute the package. You can click the green play button below the menu items or press F5 key. But here in this case, you will see that the above button is disabled and F5 does not work. I leave it to you to explore why this is happening. Will talk about this in later chapter.

Note that this was not true ETL. If you remember that ETL has 3 components: Extract Transform and Load. We have seen the data being extracted from Flat File and getting loaded to the Destination. There has been no transformation done on the data.Hence I say that this is not complete ETL. We need to learn about the various transformations available in SSIS. Food for thought: This Package will definitely fail if you execute it again. Why??

Now that we have learnt how to create a simple SSIS Package, let us see how we can make it adaptive. What I mean by adaptive is, how we can maintain the package with minimal changes in the future. Before we talk about how we handle this, let us talk about the scenarios where we might want to change the package. Package needs to be moved from one environment to another The password needs to be changed for SQL Server You wish to send a mail with the number of records processed/error You wish to send a mail to a new recipient FTP Location has changed Logic of a query has changed (this needs more thinking as it might also need change in the logic) There can be a lot more scenarios than these.

So now the question is how do we achieve this? Folks who have done development in any other language or TSQL should have guessed the answer. Yes, we make use of variable in SSIS. How do we set up variables in SSIS? When you open a SSIS Package, you may see the Variables panel to the left of the Control flow panel as in figure below:

In case you are unable to view this, you need to go to Menus -> View-> Other Windows-> Variables as shown in figure below:

Another quick way is to click anywhere on the blank area of control flow or data flow and select Variables from the shortcut menu. You will be able to see the variable panel to the left hand side like shown in figure below.

In the variables panel, you will see it as a blank panel with a few buttons on top. By default, you will have 5 buttons as shown in the figure below. Each button is described below with reference to their position from the left.

1. New Variable: Click this to create a new variable 2. Delete Variable: This button is enabled only when you have an existing user variable. Select a variable that you have created and click this button to delete it. 3. Show System Variable: This button is used to toggle view between the list of system variables and user defined variables. You can click this and look at the number of system variables that exists in the package before you do anything in the package. 4. Show All Variable: The variable pane displays the variables based on which container you had last focus on the control flow or the data flow. In case you are in the data flow the variables in the data flow scope will be visible to you. Now, in case you want to view the entire variable you need to click this button, or go to the control flow and click outside the containers in blank space and then go to the variables pane. I guess clicking the above button is simpler. 5. Choose Variable Columns: Click to open the Choose Variable Columns dialog box where you can change the column selection. Use it to select the columns whose properties will display in the Variables window. Once you create a new variable you will see the panel like the one in Fig 5. Here you see 4 columns by default, namely: Name: As the name suggest it shows the name of the variable, this is an editable field Scope: This is set automatically and you cannot change the scope of a variable once created.

Data Type: This is an editable column with default value as Int32. It is a drop down with all possible data types in SSIS. Change this as per your need. Value: This will contain the value of the variable. It may be predefined or set by expressions (will talk about this in the net section), or SSIS configuration based on the requirement.

Note: Scope of the variable is set based on the container you have focus before clicking the new variable button. Though BIDS Helper can help you change the scope of the variable very easily thats the 6th button provided by BIDS Helper on top of the variable pane. Where to use Variables in SSIS Package? You can use variables in SSIS in the following places: Setting us any connection string Setting up row count (one of the available transforms to get the rows of incoming data) transform in the data flow task Used in For Each Loop, For Loop, Parameter mapping in Execute SQL Task. Script task There are many more places where you can make use of variables and you will come to know that as and when you require them.

How do we set Variable values dynamically? Having created variables in our package and used them, how can we set the variables values dynamically? For example, we need to get all the files names depending on the date of the package execution. Let us take a scenario: The package is scheduled to execute daily with no manual intervention. If the date of package execution is odd pick file one else pick file two. To handle this situation we would have to have a variable store the date of package execution, but can we provide the value of the variable manually? No, as the package is automated and scheduled to run, it will execute automatically daily. Hence we would need to use Expressions here to set up the value of the variable on the fly, when the variable is used that time the value would be set. This solves our first problem getting the date on the fly, now how do we select the file names based on the date would be taken up later. To set the value of the variable by expressions follow the below steps:

Select the variable you want to set up dynamically, right click and click Properties OR directly select the variable and click F4 button on your keyboard. On completing the above step a properties window would open on the right side of your BIDS. Here select the property EvaluateAsExpression and set the value to True Next, go to the property Expression and click the small button towards the end the one highlighted in the figure below

On clicking this button an Expression Builder window will pop up which is shown in the figure below

Expand the Variables section to view all the variables in the package, you could use them to create your expression. On the top right hand side you can see the various functions available for use. We need to get todays date so we go to the Data/ Time functions, select GETDATE() and drag drop it to the Expression section below. Next, click on the Evaluate Expression button to validate that the expression is returning valid data. If the date returned is fine, click the OK button to close this window. Now if you go back to the Variables properties -> Expressions you will notice that now it has the value GETDATE(). When the package will execute this value will be calculated and set and we could use this as per our needs in the package.

To get detailed explanation in SSIS Expression please read my blog post SSIS- Expressions. Points to note while using Variables: Check the scope of the variable Do not use excessive variables, delete the unwanted or unused variables Number of variables is inversely proportional to the manageability of the package :) While using a template or copy pasting a package remove unwanted variables Use Bids Helper for tracking variables with expressions(helps save a lot of time while debugging)

Therefore, that would be all that we talk on variables as of now. I will talk about SSIS Debugging in the next section.

Вам также может понравиться