Вы находитесь на странице: 1из 12

SSIS Non-blocking, Semi-blocking and Fully-blocking components

How can you recognize these three component types, what is their inner working and do they acquire new buffers and/or threads? Synchronous vs Asynchronous The SSIS dataflow contain three types of transformations. They can be non-blocking, semiblocking or fully-blocking. Before I explain how you can recognize these types and what their properties are its important to know that all the dataflow components can be categorized to be either synchronous or asynchronous. Synchronous components The output of an synchronous component uses the same buffer as the input. Reusing of the input buffer is possible because the output of an synchronous component always contain exactly the same number of records as the input. Number of records IN == Number of records OUT. Asynchronous components The output of an asynchronous component uses a new buffer. Its not possible to reuse the input buffer because an asynchronous component can have more or less output records then input records.

The only thing you need to remember is that synchronous components reuse buffers and therefore are generally faster than asynchronous components that need a new buffer. All source adapters are asynchronous, they create two buffers; one for the success output and one for the error output. All destination adapters on the other hand, are synchronous.

Non-blocking, Semi-blocking and Fully-blocking In the table below the differences between the three transformation types are summarized. As you can see its not that hard to identify the three types. On the internet are a lot of large and complicated articles about this subject, but I think its enough to look at the core differences between the three types to understand their working and (dis)advantages: Nonblocking Synchronous True False False False SemiFullyblocking blocking Asynchronous Asynchronous Usually False Usually False False True True True Usually True True

Synchronous or asynchronous Number of rows in == number of rows out Must read all input before they can output New buffer created? New thread created?

All SSIS transformations categorized: Non-Blocking transformations Semi-blocking transformations Blocking transformations Audit Data Mining Query Aggregate Character Map Merge Fuzzy Grouping Conditional Split Merge Join Fuzzy Lookup Copy Column Pivot Row Sampling Data Conversion Unpivot Sort Derived Column Term Lookup Term Extraction Lookup Union All Multicast Percent Sampling Row Count Script Component Export Column Import Column Slowly Changing Dimension OLE DB Command

SSIS Execute SQL Task The Execute SQL task runs SQL statements or stored procedures from a package. The task can contain either a single SQL statement or multiple SQL statements that run sequentially. You can use the Execute SQL task for the following purposes:

Truncate a table or view in preparation for inserting data. Create, alter, and drop database objects such as tables and views. Re-create fact and dimension tables before loading data into them. Run stored procedures. Save the rowset returned from a query into a variable.

You can configure the Execute SQL task in the following ways:

Specify the type of connection manager to use to connect to a database. Specify the type of result set that the SQL statement returns. Specify a time-out for the SQL statements. Specify the source of the SQL statement. Indicate whether the task skips the prepare phase for the SQL statement.

If you use the ADO connection type, you must indicate whether the SQL statement is a stored procedure. For other connection types, this property is read-only and its value is always false.

The Execute SQL task can be used in combination with the Foreach Loop and For Loop containers to run multiple SQL statements. These containers implement repeating control flows in a package and they can run the Execute SQL task repeatedly. For example, using the Foreach Loop container, a package can enumerate files in a folder and run an Execute SQL task repeatedly to execute the SQL statement stored in each file. Connecting to a Data Source from the Execute SQL Task The Execute SQL task can use different types of connection managers to connect to the data source where it runs the SQL statement or stored procedure. The task can use the connection types listed in the following table. Connection type EXCEL OLE DB ODBC ADO ADO.NET SQLMOBILE Connection manager Excel Connection Manager OLE DB Connection Manager ODBC Connection Manager ADO Connection Manager ADO.NET Connection Manager SQL Server Compact Edition Connection Manager

Creating SQL Statements used by the Execute SQL Task The source of the SQL statements used by this task can be a task property that contains a statement, a connection to a file that contains one or multiple statements, or the name of a variable that contains a statement. The SQL statements must be written in the dialect of the source database management system (DBMS). For more information, see Using Queries in Packages. If the SQL statements are stored in a file, the task uses a File connection manager to connect to the file. For more information, see File Connection Manager. In SSIS Designer, you can use the Execute SQL Task Editor dialog box to type SQL statements, or use Query Builder, a graphical user interface for creating SQL queries. For more information, see Execute SQL Task Editor (General Page) and Query Builder. Note Valid SQL statements written outside the Execute SQL task may not be parsed successfully by the Execute SQL task.

Sending Multiple Statements in a Batch using the Execute SQL Task If you include multiple statements in an Execute SQL task, you can group them and run them as a batch. To signal the end of a batch, use the GO command. All the SQL statements between two GO commands are sent in a batch to the OLE DB provider to be run. The SQL command can include multiple batches separated by GO commands. There are restrictions on the kinds of SQL statements that you can group in a batch. For more information, see Batches of Statements. If the Execute SQL task runs a batch of SQL statements, the following rules apply to the batch:

Only one statement can return a result set and it must be the first statement in the batch. If the result set uses result bindings, the queries must return the same number of columns. If the queries return a different number of columns, the task fails. However, even if the task fails, the queries that it runs, such as DELETE or INSERT queries, may succeed. If the result bindings use column names, the query must return columns that have the same names as the result set names that are used in the task. If the columns are missing, the task fails. If the task uses parameter binding, all the queries in the batch must have the same number and types of parameters.

Running Parameterized SQL Commands using the Execute SQL Task SQL statements and stored procedures frequently use input parameters, output parameters, and return codes. The Execute SQL task supports the Input, Output, and ReturnValue parameter types. You use the Input type for input parameters, Output for output parameters, and ReturnValue for return codes. Note You can use parameters in an Execute SQL task only if the data provider supports them. For information on using parameters and return codes in the Execute SQL task, see Working with Parameters and Return Codes in the Execute SQL Task. Specifying a Result Set Type for the Execute SQL Task Depending on the type of SQL command, a result set may or may not be returned to the Execute SQL task. For example, a SELECT statement typically returns a result set, but an INSERT statement does not. The result set from a SELECT statement can contain zero rows, one row, or many rows. Stored procedures can also return an integer value, called a return code that indicates the execution status of the procedure. In that case, the result set consists of a single row. For information on retrieving result sets from SQL commands in the Execute SQL task, see Working with Result Sets in the Execute SQL Task.

Custom Log Entries Available on the Execute SQL Task The following table describes the custom log entry for the Execute SQL task. For more information, see Implementing Logging in Packages and Custom Messages for Logging. Log entry ExecuteSQLExecutingQuery Description Provides information about the execution phases of the SQL statement. Log entries are written when the task acquires connection to the database, when the task starts to prepare the SQL statement, and after the execution of the SQL statement is completed. The log entry for the prepare phase includes the SQL statement that the task uses.

Troubleshooting the Execute SQL Task You can log the calls that the Execute SQL task makes to external data providers. You can use this logging capability to troubleshoot the SQL commands that the Execute SQL task runs. To log the calls that the Execute SQL task makes to external data providers, enable package logging and select the Diagnostic event at the package level. For more information, see Troubleshooting Package Execution. Sometimes a SQL command or stored procedure returns multiple result sets. These result sets include not only row sets that are the result of SELECT queries, but single values that are the result of errors of RAISERROR or PRINT statements. Whether the task ignores errors in result sets that occur after the first result set depends on the type of connection manager that is used:

When you use OLE DB and ADO connection managers, the task ignores the result sets that occur after the first result set. Therefore, with these connection managers, the task ignores an error returned by an SQL command or a stored procedure when the error is not part of the first result set. When you use ODBC and ADO.NET connection managers, the task does not ignore result sets that occur after the first result set. With these connection managers, the task will fail with an error when a result set other than the first result set contains an error.

Derived Column Transformation


The Derived Column transformation creates new column values by applying expressions to transformation input columns. An expression can contain any combination of variables, functions, operators, and columns from the transformation input. The result can be added as a new column or inserted into an existing column as a replacement value. The Derived Column transformation can define multiple derived columns, and any variable or input columns can appear in multiple expressions. You can use this transformation to perform the following tasks:

Concatenate data from different columns into a derived column. For example, you can combine values from the FirstName and LastName columns into a single derived column named FullName, by using the expression FirstName + " " + LastName. Extract characters from string data by using functions such as SUBSTRING, and then store the result in a derived column. For example, you can extract a person's initial from the FirstName column, by using the expression SUBSTRING (FirstName, 1, 1). Apply mathematical functions to numeric data and store the result in a derived column. For example, you can change the length and precision of a numeric column, SalesTax, to a number with two decimal places, by using the expression ROUND (SalesTax, 2). Create expressions that compare input columns and variables. For example, you can compare the variable Version against the data in the column ProductVersion, and depending on the comparison result, use the value of either Version or ProductVersion, by using the expression ProductVersion == @Version? ProductVersion : @Version. Extract parts of a datetime value. For example, you can use the GETDATE and DATEPART functions to extract the current year, by using the expression DATEPART ("year", GETDATE ( ) ).

SSIS supports numerous transformations that allow you to combine data originating from multiple sources, cleanse the data and give it the shape your data destination expects. Then you can import the data into a single or multiple destinations. Transformation Description Calculates aggregations such as SUM, COUNT, AVG, MIN and MAX based on the values of a given numeric column. This transformation produces additional output records. Examples of when Transformation Would be Used Adding aggregated information to your output. This can be useful for adding totals and sub-totals to your output.

Aggregate

Audit

Creates advanced logs which indicate Includes auditing information, such as where and when the package was computer name where the package executed, how long it took to run the runs, package version ID, task name, package and the outcome of etc in the data flow. execution. Applying string manipulations prior to Performs minor manipulations on loading data into the data warehouse. string columns. Converts all letters to You can also apply the same uppercase, lowercase, reverse bytes, manipulations to the data while it is etc. being loaded into the warehouse.

Character Map

Cleansing the data to extract specific Accepts an input and determines rows from the source. If a specific Conditional Split which destination to pipe the data into column does not conform to the based on the result of an expression. predefined format (perhaps it has leading spaces or zeros), move such

records to the error file. Extracting columns that need to be Makes a copy of a single or multiple cleansed of leading / trailing spaces, columns which will be further applying character map transformation transformed by subsequent tasks in the to uppercase all data and then load it package. into the table.

Copy Column

Converting columns extracted from the data source to the proper data type expected by the data warehouse. Converts input columns from one data Having such transformation options Data Conversion type to another. allows us the freedom of moving data directly from its source into the destination without having an intermediary staging database. Data Mining Query Queries a data mining model. Includes a query builder to assist you with Evaluating the input data set against a development of Data Mining data mining model developed with eXpressions (DMX) prediction Analysis Services. queries.

Calculates new column value based on Removing leading and trailing spaces Derived Column an existing column or multiple from a column. Add title of courtesy columns. (Mr., Mrs., Dr, etc) to the name. Export Column Saving large strings or images into Exports contents of large columns files while moving the rest of the (TEXT, NTEXT, IMAGE data types) columns into a transactional database into files. or data warehouse.

Cleansing data by translating various Finds close or exact matches between versions of the same value to a multiple rows in the data source. Adds Fuzzy Grouping common identifier. For example, "Dr", columns to the output including the "Dr.", "doctor", "M.D." should all be values and similarity scores. considered equivalent. Cleansing data by translating various Compares values in the input data versions of the same value to a source rows to values in the lookup common identifier. For example, "Dr", table. Finds the exact matches as well "Dr.", "doctor", "M.D." should all be as those values that are similar. considered equivalent. This transformation could be useful for web content developers. For Imports contents of a file and appends example, suppose you offer college to the output. Can be used to append courses online. Normalized course TEXT, NTEXT and IMAGE data meta-data, such as course_id, name, columns to the input obtained from a and description is stored in a typical separate data source. relational table. Unstructured course meta-data, on the other hand, is stored

Fuzzy Lookup

Import Column

in XML files. You can use Import Column transformation to add XML meta-data to a text column in your course table. Joins the input data set to the reference table, view or row set created by a SQL statement to lookup corresponding values. If some rows in the input data do not have corresponding rows in the lookup table then you must redirect such rows to a different output. Merges two sorted inputs into a single output based on the values of the key columns in each data set. Merged columns must have either identical or compatible data types. For example you can merge VARCHAR(30) and VARCHAR(50) columns. You cannot merge INT and DATETIME columns. Obtaining additional data columns. For example, the majority of employee demographic information might be available in a flat file, but other data such as department where each employee works, their employment start date and job grade might be available from a table in relational database. Combining the columns from multiple data sources into a single row set prior to populating a dimension table in a data warehouse. Using Merge transformation saves the step of having a temporary staging area. With prior versions of SQL Server you had to populate the staging area first if your data warehouse had multiple transactional data sources.

Lookup

Merge

Merge Join

Combining the columns from multiple data sources into a single row set prior to populating a dimension table in a data warehouse. Using Merge Join transformation saves the step of having a temporary staging area. With Joins two sorted inputs using INNER prior versions of SQL Server you had JOIN, LEFT OUTER JOIN or FULL to populate the staging area first if OUTER JOIN algorithm. You can your data warehouse had multiple specify columns used for joining transactional data sources. inputs. Note that Merge and Merge Join transformations can only combine two data sets at a time. However, you could use multiple Merge Join transformations to include additional data sets. Similar to the conditional split Populating the relational warehouse as transformation, but the entire data set well as the source file with the output is piped to multiple destinations. of a derived column transformation. Runs a SQL command for each input data row. Normally your SQL Setting the value of a column with BIT data type (perhaps called

Multicast OLEDB Command

statement will include a parameter (denoted by the question mark), for example: UPDATE employee_source SET has_been_loaded=1 WHERE employee_id=?

"has_been_loaded") to 1 after the data row has been loaded into the warehouse. This way the subsequent loads will only attempt importing the rows that haven't made it to the warehouse as of yet. Limiting the data set during development phases of your project. Your data sources might contain billions of rows. Processing cubes against the entire data set can be prohibitively lengthy. If you're simply trying to ensure that your warehouse functions properly and data values on transactional reports match the values obtained from your Analysis Services cubes you might wish to only load a subset of data into your cubes.

Percentage Sampling

Loads only a subset of your data, defined as the percentage of all rows in the data source. Note that rows are chosen randomly.

Pivot

Pivots the normalized data set by certain column to create a more easily readable output. Similar to PIVOT command in Transact-SQL. You can think of this transformation as converting rows into columns. For example if your input rows have customer, account number and account balance columns the output will have the customer and one column for each account.

Creating a row set that displays the table data in a more user-friendly format. The data set could be consumed by a web service or could be distributed to users through email.

Row count

Counts the number of transformed rows and store in a variable.

Determining the total size of your data set. You could also execute a different set of tasks based on the number of rows you have transformed. For example, if you increase the number of rows in your fact table by 5% you could perform no maintenance. If you increase the size of the table by 50% you might wish to rebuild the clustered index. Limiting the data set during development phases of your project. Your data warehouse might contain billions of rows. Processing cubes

Row sampling

Loads only a subset of your data, defined as the number of rows. Note that rows are chosen randomly.

against the entire data set can be prohibitively lengthy. If you're simply trying to ensure that your warehouse functions properly and data values on transactional reports match the values obtained from your Analysis Services cubes you might wish to only load a subset of data into your cubes.

Script Component

Every data flow consists of three main components: source, destination and transformation. Script Component allows you to write transformations for otherwise un-supported source and destination file formats. Script component also allows you to perform transformations not directly available through the built-in transformation algorithms.

Custom transformations can call functions in managed assemblies, including .NET framework. This type of transformation can be used when the data source (or destination) file format cannot be managed by typical connection managers. For example, some log files might not have tabular data structures. At times you might also need to parse strings one character at a time to import only the needed data elements.

Much like Script Task the Script Component transformation must be written using Visual Basic .NET. Maintains historical values of the Slowly Changing dimension members when new Dimension members are introduced. Useful for maintaining dimension tables in a data warehouse when maintaining historical dimension member values is necessary.

Sort

Ordering the data prior to loading it Sorts input by column values. You into a data warehouse. This could be can sort the input by multiple columns useful if you're ordering your in either ascending or descending dimension by member name values as order. The transformation also allows opposed to sorting by member keys. you to specify the precedence of columns used for sorting. This You can also use Sort transformation transformation could also discard the prior to feeding the data as the input to rows with duplicate sort values. the Merge Join or Merge transformation.

Extracts terms (nouns and noun Term Extraction phrases) from the input text into the transformation output column.

Processing large text data and extracting main concepts. For example, you could extract the primary terms used in this section of SQLServerPedia by feeding the Term Extraction transformation the text column containing the entire section. Analyzing large textual data for specific terms. For example, suppose you accept email feedback for latest version of your software. You might not have time to read through every single email messages that comes to the generic inbox. Instead you could use this task to look for specific terms of interest.

Term Lookup

Extracts terms from the input column with TEXT data type and match them with same or similar terms found in the lookup table. Each term found in the lookup table is scanned for in the input column. If the term is found the transformation returns the value as well as the number of times it occurs in the row. You can configure this transformation to perform casesensitive search.

Union ALL

Import data from multiple disparate data sources into a single destination. Combines multiple inputs into a single For example, you could extract data output. Rows are sorted in the order from mail system, text file, Excel they're added to the transformation. spreadsheet and Access database and You can ignore some columns from populate a SQL Server table. each output, but each output column must be mapped to at least one input Unlike Merge and Merge Join column. transformations Union ALL can accept more than two inputs.

Unpivot

Opposite of Pivot transformation, Unpivot coverts columns into rows. It normalizes the input data set that has many duplicate values in multiple columns by creating multiple rows Massaging a semi-structured input that have the same value in a single data file and convert it into a column. normalized input prior to loading data into a wareh For example if your input has a customer name and a separate column for checking and savings' accounts Unpivot can transform it into a row set that has customer, account and

account balance columns.

Вам также может понравиться