Академический Документы
Профессиональный Документы
Культура Документы
2.3 Go to Child Package and create a variable with same name as your parent package
variable.
2.4 Add Package configurations
2.5 "Enable Package Configuration", choose Configuration type as "Parent Package
Variable" and type the name of the variable.
2.6 click 'Next' button and select the 'Value' property of the child package variable.
How to Implement?
Designed SSIS package like:
How to Implement?
IMPORT COLUMN - Reads image specific column from database onto a flat file.
LOOKUP - Performs the lookup (searching) of a given reference object set against a data
source. It is used to find exact matches only.
MERGE - Merges two sorted data sets of same column structure into a single output.
MERGE JOIN - Merges two sorted data sets into a single dataset using a join.
MULTI CAST - is used to create/distribute exact copies of the source dataset to one or
more destination datasets.
ROW COUNT - Stores the resulting row count from the data flow / transformation into a
variable.
ROW SAMPLING - Captures sample data by using a row count of the total rows in
dataflow specified by rows or percentage.
UNION ALL - Merge multiple data sets into a single dataset.
PIVOT Used for Normalization of data sources to reduce analomolies by converting
rows into columns
UNPIVOT Used for demoralizing the data structure by converts columns into rows
incase of building Data Warehouses.
8. What are the different types of Transaction Options
Required: If a transaction already exists at the upper level, the current executable will
join the transaction. If No transaction at the upper level, a new transaction is created
automatically.
Supported:In any executable, if there is a transaction at upper level, the executable join
the transaction else do not create a new transaction.
Not Supported:The executable of the package do not honour any transaction ie do not
join other transaction nor creates new transaction.
9. Explain about Checkpoints with properties
Checkpoint is used to restart the package execution from the point of failure rather than
from initial start.
Set the following Properties:
CheckpointFileName: Specifies the name of the checkpoint file.
CheckpointUsage: Never, IfExists, Always
SaveCheckpoints: indicates whether the package needs to save checkpoints. This
property must be set to True to restart a package from a point of failure.
FailPackageOnFailure: property needs to be set to True for enabling the task in the
checkpoint.
Checkpoint mechanism uses a Text File to mark the point of package failure.
These checkpoint files are automatically created at a given location upon the package
failure and automatically deleted once the package ends up with success.
Select 'Load to sql Table' Data flow Task. Navigate to 'Even Handlers' Tab.
Drag and Drop 'Execute Sql Task'. Open the Execute Sql Task Editor and in Parameter
Mapping' section, select the system variables as follows:
Create a table in Sql Server Database with Columns as: PackageID, PackageName,
TaskID, TaskName, ErrorCode, ErrorDescription.
Foreach ADO:
The ADO Enumerator enumerates rows in a table. For example, we can get the rows in
the ADO records.The variable must be of Object data type.
Foreach ADO.NET Schema Rowset:
The ADO.Net Enumerator enumerates the schema information. For example, we can get
the table from the database.
Foreach File:
The File Enumerator enumerates files in a folder. For example, we can get all the files
which have the *.txt extension in a windows folder and its sub folders.
Foreach From Variable:
The Variable Enumerator enumerates objects that specified variables contain. Here
enumerator objects are nothing but an array or data table.
Foreach Item:
The Item Enumerator enumerates the collections. For example, we can enumerate the
names of executables and working directories that an Execute Process task uses.
Foreach Nodelist:
The Node List Enumerator enumerates the result of an XPath expression.
Foreach SMO:
The SMO Enumerator enumerates SQL Server Management Objects (SMO). For example,
we can get the list of functions or views in a SQL Server database.
29. How to execute the package from .NET?
We need a reference to Microsoft.SqlServer.ManagedDts.dll to call a package.
using Microsoft.SqlServer.Dts.Runtime
Application app = new Application();
Package package = null;
package = app.LoadPackage(@"C:\Program Files\Microsoft SQL
Server\100\DTS\Packages\Integration Services Project2\Package.dtsx", null);
Microsoft.SqlServer.Dts.Runtime.DTSExecResult results = package.Execute();
30. How to schedule a package (Role of Sql Server Agent)
In order for the job to run successfully, the SQL Server agent should be running on the
target machine.
We can start the SQL Server Agent Services in numerous ways like:
Container Description
Foreach
Loop
Container
For Loop
Container
Sequence
Container
language).
memory as a result of this your extraction will fail. So it is recommended to set these
values to an optimum value based on your environment.
#7 - DefaultBufferSize and DefaultBufferMaxRows :
The execution tree creates buffers for storing incoming rows and performing
transformations.
The number of buffer created is dependent on how many rows fit into a buffer and how
many rows fit into a buffer dependent on few other factors. The first consideration is the
estimated row size, which is the sum of the maximum sizes of all the columns from the
incoming records. The second consideration is the DefaultBufferMaxSize property of the
data flow task. This property specifies the default maximum size of a buffer. The default
value is 10 MB and its upper and lower boundaries are constrained by two internal
properties of SSIS which are MaxBufferSize (100MB) and MinBufferSize (64 KB). It
means the size of a buffer can be as small as 64 KB and as large as 100 MB. The third
factor is, DefaultBufferMaxRows which is again a property of data flow task which
specifies the default number of rows in a buffer. Its default value is 10000.
If the size exceeds the DefaultBufferMaxSize then it reduces the rows in the buffer. For
better buffer performance you can do two things.
First you can remove unwanted columns from the source and set data type in each
column appropriately, especially if your source is flat file. This will enable you to
accommodate as many rows as possible in the buffer.
Second, if your system has sufficient memory available, you can tune these properties to
have a small number of large buffers, which could improve performance. Beware if you
change the values of these properties to a point where page spooling (see Best Practices
#8) begins, it adversely impacts performance. So before you set a value for these
properties, first thoroughly testing in your environment and set the values appropriately.
#8 - How DelayValidation property can help you
SSIS uses two types of validation.
First is package validation (early validation) which validates the package and all its
components before starting the execution of the package.
Second SSIS uses component validation (late validation), which validates the
components of the package once started.
Let's consider a scenario where the first component of the package creates an object i.e.
a temporary table, which is being referenced by the second component of the package.
During package validation, the first component has not yet executed, so no object has
been created causing a package validation failure when validating the second
component. SSIS will throw a validation exception and will not start the package
execution. So how will you get this package running in this common scenario?
To help you in this scenario, every component has a DelayValidation (default=FALSE)
property. If you set it to TRUE, early validation will be skipped and the component will be
validated only at the component level (late validation) which is during package execution
9. Better performance with parallel execution
10. When to use events logging and when to avoid.
The first step to setting up the proxy is to create a credential (alternatively you could use
an existing credential). Navigate to Security then Credentials in SSMS Object Explorer
and right click to create a new credential
Navigate to SQL Server Agent then Proxies in SSMS Object Explorer and right click to
create a new proxy
38. How to execute a Stored Procedure from SSIS
using Execute SQL Task
39. How to deploy packages from one server to another server
1.To copy the deployment bundle
Locate the deployment bundle on the first server.
If you used the default location, the deployment bundle is the Bin\Deployment folder.
Right-click the Deployment folder and click Copy.
Locate the public share to which you want to copy the folder on the target computer and
click Paste.
2: Running the Package Installation Wizard
1. On the destination computer, locate the deployment bundle.
2. In the Deployment folder, double-click the manifest file,
Project1.SSISDeploymentManifest.
3. On the Welcome page of the Package Installation Wizard, click Next.
4. On the Deploy SSIS Packages page, select either File sytem or SQL Server
deployment option, select the "Validate packages after installation" check box, and
then click Next.
5. On the Specify Target SQL Server page, specify (local), in the Server name box.
6. If the instance of SQL Server supports Windows Authentication, select Use Windows
Authentication; otherwise, select Use SQL Server Authentication and provide a user
name and a password.
7. Verify that the "Rely on server storage for encryption" check box is cleared.
Click Next.
8. On the Select Installation Folder page, click Browse.
9. On the Confirm Installation page, click Next.
10. The wizard installs the packages. After installation is completed, the Configure
Packages page opens.
Right click on the Solution in Solution Explorer and choose properties in the
Menu.
This screen lets you select where shall the packages be deployed, as mentioned in the
Dialog Box, deploying in SQL Server is more secure, since SQL Server stores the
packages internally compared to File System where additional security measures needs
to taken to secure the physical
files.
44.
Difference
between
Merge
and
UnionAll
Transformations
The Union All transformation combines multiple inputs into one output. The
transformation inputs are added to the transformation output one after the other; no
reordering of rows occurs.
Merge Transformations combines two sorted data sets of same column structure into a
single output.The rows from each dataset are inserted into the output based on values in
their key columns.
The Merge transformation is similar to the Union All transformations. Use the Union All
transformation instead of the Merge transformation in the following situations:
-The Source Input rows are not need to be sorted.
-The combined output does not need to be sorted.
At run time, the FTP task connects to a server by using an FTP connection manager. The
FTP connection manager includes the server settings, the credentials for accessing the
FTP server, and options such as the time-out and the number of retries for connecting to
the server.
The FTP connection manager supports only anonymous authentication and basic
authentication. It does not support Windows Authentication.
Predefined FTP Operations:
Send Files,
Receive File,
Create Local directory,
Remove Local Directory,
Create Remote Directory, Remove Remote Directory
Delete Local Files,
Delete Remote File
Customer Log Entries available on FTP Task:
FTPConnectingToServer
FTPOperation
TOKEN: This function allows you to return a substring by using delimiters to separate a
string into tokens and then specifying which occurrence to
return: TOKEN(character_expression, delimiter_string, occurrence)
TOKENCOUNT: This function uses delimiters to separate a string into tokens and then
returns the count of tokens found within the string: TOKENCOUNT(character_expression,
delimiter_string)
6. Easy Column Remapping in Data Flow (Mapping Data Flow Columns) -When modifying
a data flow, column remapping is sometimes needed -SSIS 2012 maps columns on name
instead of id -It also has an improved remapping dialog
7. Shared Connection Managers: To create connection managers at the project level that
can shared by multiple packages in the project. The connection manager you create at
the project level is automatically visible in the Connection Managers tab of the SSIS
Designer window for all packages. -When converting shared connection managers back
to regular (package) connection managers, they disappear in all other packages.
8. Scripting Enhancements: Now Script task and Script Component support for 4.0. Breakpoints are supported in Script Component
9. ODBC Source and Destination - -ODBC was not natively supported in 2008 -SSIS
2012 has ODBC source & destination -SSIS 2008 could access ODBC via ADO.NET
10. Reduced Memory Usage by the Merge and Merge Join Transformations The old
SSIS Merge and Merge Join transformations, although helpful, used a lot of system
resources and could be a memory hog. In 2012 these tasks are much more robust and
reliable. Most importantly, they will not consume excessive memory when the multiple
inputs produce data at uneven rates.
11. Undo/Redo: One thing that annoys users in SSIS before 2012 is lack of support of
Undo and Redo. Once you performed an operation, you cant undo that. Now in SSIS
2012, we can see the support of undo/redo.
52. Difference between Script Task and Script Component in SSIS.
Script Task
Script Component
Control
Flow/Date
Flow
Purpose
A Script task can accomplish almost any general-purpose You must specify whether you want to create
task.
a source, transformation, or destination with
the Script component.
Raising
Results
Raising
Events
Execution
Editor
Interaction
with the
Package
In the code written for a Script task, you use the Dts
property to access other features of the package. The
Dts property is a member of the ScriptMain class.
Using
Variables
Using
The Script task uses the Connections property of the Dts
Connections object to access connection managers defined in the
package. For example:
string
myFlatFileConnection;
myFlatFileConnection =
(Dts.Connections["Test Flat File
Connection"].AcquireConnection(Dts.Transaction) as
String);
ADO enumerator saves the value from each column of the current row into a separate
package variable. Then, the tasks that you configure inside the Foreach Loop container
read those values from the variables and perform some action with them.
61. Delay Validation, Forced Execution
Delay Validation: Validation take place during the package execution.
Early Validation: Validation take place just before the package execution.
62. Transfer Database Task
used to move a database to another SQL Server instance or create a copy on the same
instance (with different database name). This task works in two modes: Offline, Online.
Offline: In this mode, the source database is detached from the source server after
putting it in single user mode, copies of the mdf, ndf and ldf files are moved to specified
network location. On the destination server the copies are taken from the network
location to the destination server and then finally both databases are attached on the
source and destination servers. This mode is faster, but a disadvantage with mode is that
the source database will not available during copy and move operation. Also, the person
executing the package with this mode must be sysadmin on both source and destination
instances.
Online: The task uses SMO to transfer the database objects to the destination server. In
this mode, the database is online during the copy and move operation, but it will take
longer as it has to copy each object from the database individually. Someone executing
the package with this mode must
be either sysadmin or database owner of the specified databases.
63. Transfer SQL Server Object Task
Used to transfer one or more SQL Server objects to a different database, either on the
same or another SQL Server instance. You can transfer tables, views, Indexes, stored
procedures, User defined functions, Users, Roles etc.
temporary table created in one Control Flow task can be retained in another task.
RetainSameConnection means that the temp table will not be deleted when the task is
completed.
2. Create a data-flow task that consumes your global temp table in an OLE DB Source
component.
3. Set DelayValidation=TRUE on the data-flow task, means that the task will not check if
the table exists upon creation.
68. How to Lock a variable in Script Task?
public void Main()
{
Variables vars = null ;
bool fireAgain = true;
Dts.VariableDispenser.LockOneForRead("varName", ref vars);
//Do something with the value...
vars.Unlock();
Dts.TaskResult = (int)ScriptResults.Success;
}
69. How to pass property value at Run time?
A property value like connection string for a Connection Manager can be passed to the
package using package configurations.
70. How to skip first 5 lines in each Input flat file?
In the Flat file connection manager editor, Set the 'Header rows to skip' property.
71. Parallel processing in SSIS
To support parallel execution of different tasks in the package, SSIS uses 2 properties:
1.MaxConcurrentExecutables: defines how many tasks can run simultaneously, by
specifying the maximum number of SSIS threads that can execute in parallel per
package. The default is -1, which equates to number of physical or logical processor + 2.
2. EngineThreads: is property of each DataFlow task. This property defines how many
threads the data flow engine can create and run in parallel. The EngineThreads property
applies equally to both the source threads that the data flow engine creates for sources
and the worker threads that the engine creates for transformations and destinations.
Therefore, setting EngineThreads to 10 means that the engine can create up to ten
source threads and up to ten worker threads.
72. How do we convert data type in SSIS?
The Data Conversion Transformation in SSIS converts the data type of an input column
to a different data type.
3.
Enter the following:
="javascript:void(window.open('http://servername?%2freportserver%2fpathto
%2freport&rs:Command=Render'))"
4.
Parameterized Solution
Assume you have a field called ProductCode. Normally, you might hard code that like
this:
http://servername/reportserver?%2fpathto
%2freport&rs:Command=Render&ProductCode=123
In this case, you want to pass variables dynamically, using an available value from the
source dataset. You can think of it like this:
http://servername/reportserver?%2fpathto
%2freport&rs:Command=Render&ProductCode=Fields!ProductCode.Value
The exact syntax in the "Jump to URL" (Fx) expression window will be:
="javascript:void(window.open('http://servername/reportserver?%2fpathto
%2freport&rs:Command=Render&ProductCode="+Fields!ProductCode.Value+"'))"
4. How to pass parameter from chart to Table in same report?
5. How to apply custom Colors of chart report?
STEP1:
Create your custome color palette in the report using Custom Code in your report. To do
so, click Report => Report Properties => Code and copy below code:
1. The total time to generate a report (RDL) can be divided into 3 elements:
Time to retrieve the data (TimeDataRetrieval).
Time to process the report (TimeProcessing)
Time to render the report (TimeRendering)
Total time = (TimeDataRetrieval) + (TimeProcessing) + (TimeRendering)
These 3 performance components are logged every time for which a deployed report is
executed. This information can be found in the table ExecutionLogStorage in the
ReportServer database.
SELECT TOP 10 Itempath, parameters,
TimeDataRetrieval + TimeProcessing + TimeRendering as [total time],
TimeDataRetrieval, TimeProcessing, TimeRendering,
ByteCount, [RowCount],Source, AdditionalInfo
FROM ExecutionLogStorage
ORDER BY Timestart DESC
2. Use the SQL Profiler to see which queries are executed when the report is generated.
Sometimes you will see more queries being executed than you expected. Every dataset
in the report will be executed. A lot of times new datasets are added during building of
reports. Check if all datasets are still being used. For instance, datasets for available
parameter values. Remove all datasets which are not used anymore.
3. Sometimes a dataset contains more columns than used in the Tablix\list. Use only
required columns in the Dataset.
4. ORDER BY in the dataset differs from the ORDER BY in the Tablix\list. You need to
decide where the data will be sorted. It can be done within SQL Server with an ORDER
BY clause or in by the Reporting server engine. It is not useful to do it in both. If an
index is available use the ORDER BY in your dataset.
5. Use the SQL Profiler to measure the performance of all datasets (Reads, CPU and
Duration). Use the SQL Server Management Studio (SSMS) to analyze the execution plan
of every dataset.
6. Avoid dataset with result sets with a lot of records like more than 1000 records. A lot
of times data is GROUPED in the report without an Drill down option. In that scenario do
the group by already in your dataset. This will save a lot of data transfer to the SQL
Server and it will save the reporting server engine to group the result set.
7. Rendering of the report can take a while if the result set is very big. Look very critical
if such a big result set is necessary. If details are used in only 5 % of the situations,
create another report to display the details. This will avoid the retrieval of all details in
95 % of the situations.
12. I have 'State' column in report, display the States in bold, whose State
name starts with letter 'A' (eg: Andhra pradesh, Assam should be in bold)
13. In which scenario you used Matrix Report
Use a matrix to display aggregated data summaries, grouped in rows and columns,
similar to a PivotTable or crosstab. The number of rows and columns for groups is
determined by the number of unique values for each row and column groups.
14. Image control in SSRS
An image is a report item that contains a reference to an image that is stored on the
report server, embedded within the report, or stored in a database.
Image Source : Embedded
Local report images are embedded in the report and then referenced. When you embed
an image, Report Designer MIME-encodes the image and stores it as text in the report
definition.
When to Use:
When image is embedded locally within the report.
When you are required to store all images within the report definition.
To create a shared dataset, you must use an application that creates a shared dataset
definition file (.rsd). You can use one of the following applications to create a shared
dataset:
1. Report Builder: Use shared dataset design mode and save the shared dataset to a
report server or SharePoint site.
2. Report Designer in BIDS: Create shared datasets under the Shared Dataset folder in
Solution Explorer. To publish a shared dataset, deploy it to a report server or SharePoint
site.
Upload a shared dataset definition (.rsd) file. You can upload a file to the report server or
SharePoint site. On a SharePoint site, an uploaded file is not validated against the
schema until the shared dataset is cached or used in a report.
The shared dataset definition includes a query, dataset parameters including default
values, data options such as case sensitivity, and dataset filters.
18. How do u display the partial text in bold format in textbox in Report? (eg:
FirstName LastName, where "FirstName" should in bold fornt and "LastName"
should be in normal font.)
Use PlaceHolder
19. How to Keep Headers Visible When Scrolling Through a Report?
1. Right-click the row, column, or corner handle of a tablix data region, and then click
Tablix Properties.
2. On the General tab, under Row Headers or Column Headers, select Header should
remain visible while scrolling.
3. Click OK.
21. A main report contain subreport also. Can we export both main report and
subreport to Excel?
Yes. The exported report contains both the mail report and sub report.
22. how to convert PDF report from Portrait to Landscape format?
In Report Properties -->
Set the width of the report to the landscape size of your A4 paper: 29.7 cm
Set the height of the report to 21 cm.
To avoid extra blank pages during export, the size of the body should be less or equal to
the size of the report - margins.
Set the width of the body to 26.7 cm (29.7 -1.5 - 1.5)
Set the height of the body to 18 cm (21 - 1.5 -1.5)
23. Error handling in Report
Step 1: All the data sets of the report should contain one addition input parameter which
should pass a unique information for every request (for every click of View Report
button) made by the user.
Step 2: Need to implement TRY CATCH blocks for all the Stored procedures used in the
SSRS reports through datasets. The CATCH section of every procedure should have the
provision to save the error details into DB table, if any error occurred while execution of
that procedure.
Step 3: Add one more additional dataset with the name "ErrorInfo" which should call the
store procedure (USP_ERROR_INFO). This procedure should be accepting a unique value.
This unique value should be passed to all the data sets for every click of 'View Report'
button made by the user. This dataset will return the error information available in the
data base table by verifying records with the unique id which has passes as input
parameter.
Step 4:Enable the Use Single Transaction When Processing Queries option in data
source properties, which makes all the query executions through a single transaction.
Step 5: After successful completion of all the above mentioned steps, insert new table on
SSRS report with custom error information which will be shown to the report user if the
user gets any error during execution of the report.
3. There are 2 options for deploying the reports that you create with Report Builder 3.0:
1. Report Manager
2. SharePoint document library
26. Difference between Cached Report and Snapshot Report
Cached Report is a saved copy of processed report.
The first time a user clicks the link for a report configured to cache, the report execution
process is similar to the on-demand process. The intermediate format is cached and
stored in ReportServerTempDB Database until the cache expiry time.
If a user request a different set of parameter values for a cached report, then the report
processor treats the requests as a new report executing on demand, but flags it as a
second cached instance.
Report snapshot contains the Query and Layout information retrieved at specific point of
time. It executes the query and produces the intermediate format. The intermediate
format of the report has no expiration time like a cached instance, and is stored in
ReportServer Database.
27. Subscription. Different types of Subscriptions?
Subscriptions are used to deliver the reports to either File Share or Email in response to
Report Level or Server Level Schedule.
There are 2 types of subscriptions:
1. Standard Subscription: Static properties are set for Report Delivery.
2. Data Driven Subscription: Dynamic Runtime properties are set for Subscriptions
28. SSRS Architecture
29. How to deploy Reports from one server to other server
30. Different life cycles of Report
1.Report authoring:
This stage involves creation of reports that are published using the Report Definition
language. RDL is an XML based industry standard for defining reports.
Report Designer is a full-featured report authoring tool that runs in Business Intelligence
Development Studio and Report Builder.
2. Report management:
This involves managing the published reports as a part of the webservice. The reports
are cached for consistency and performance. They can be executed whenever demanded
or can be scheduled and executed.
In short Report Management includes:
- Organizing reports and data sources,
- Scheduling report execution and delivery
- Tracking reporting history.
3. Report delivery:
Reports can be delivered to the consumers either on their demand or based on an event.
Then they can view them is a web-based format.
Web based delivery via Report Manager web site
Subscriptions allow for automated report delivery
URL Access, Web Services and Report Viewer control
4.Report security:
control because Report Viewer control does not perform any data processing in Local
processing mode, but used data that the host application supplies.
35. Difference between Sorting and Interactive Sorting?
To control the Sort order of data in report, you must set the sort expression on the data
region or group. The does not have control over sorting.
You can provide control to the user by adding Interactive Sort buttons to toggle between
ascending and descending order for rows in a table or for rows and columns in a matrix.
The most common use of interactive sort is to add a sort button to every column header.
The user can then choose which column to sort by.
36. What is Report Builder
Windows Winform application for End users to build ad-hoc reports with the help of
Report models.
37. Difference between Table report and Matrix Report
A Table Report can have fixed number of columns and dynamic rows.
A Matrix Report has dynamic rows and dynamic columns.
38. When to use Table, Matrix and List
1. Use a Table to display detail data, organize the data in row groups, or both.
2. Use a matrix to display aggregated data summaries, grouped in rows and columns,
similar to a PivotTable or crosstab. The number of rows and columns for groups is
determined by the number of unique values for each row and column groups.
3. Use a list to create a free-form layout. You are not limited to a grid layout, but can
place fields freely inside the list. You can use a list to design a form for displaying many
dataset fields or as a container to display multiple data regions side by side for grouped
data. For example, you can define a group for a list; add a table, chart, and image; and
display values in table and graphic form for each group value
39. Report Server Configuration Files
1. RSReportServer.config:
Stores configuration settings for feature areas of the Report Server service: Report
Manager, the Report Server Web service, and background processing.
2. RSSrvPolicy.config
Stores the code access security policies for the server extensions.
3. RSMgrPolicy.config
Stores the code access security policies for Report Manager.
4. Web.config for the Report Server Web service
Includes only those settings that are required for ASP.NET.
5. ReportingServicesService.exe.config
6. Registry settings
7. Web.config for Report Manager
Includes only those settings that are required for ASP.NET
8. RSReportDesigner.config
9. RSPreviewPolicy.config
40. Difference between a Report and adhoc Report
Ad Hoc reporting allows the end users to design and create reports on their own
provided the data models.
Adhoc Report is created from existing report model using Report Builder.
The order of the dataset execution sequence is determined by the topdown order of the dataset appearance in the RDL file, which also
corresponds to the order shown in report designer.
49. ReportServer and ReportServerTempDB Databases
ReportServer: hosts the report catalog and metadata.
For eg: keeps the catalog items in the Catalog table, the data source information in the
Data-Source table of ReportServer Database.
ReportServerTempDB: used by RS for caching purposes.
For eg: once the report is executed, the Report Server saves a copy of the report in the
ReportServerTempDB database.
In MOLAP, the structure of aggregation along with the data values are stored in multi
dimensional format, takes more space with less time for data analysis compared to
ROLAP.
MOLAPoffers faster query response and processing times, but offers a high latency and
requires average amount of storage space. This storage mode leads to duplication of
data as the detail data is present in both the relational as well as the multidimensional
storage.
In HOLAP, stucture is stored in Relational model and data is stored in multi dimensional
model which provides optimal usage and space.
This storage mode offers optimal storage space, query response time, latency and fast
processing times.
Default storage setting is MOLAP.
3. Types of Dimensions
Dimension
type
Regular
Description
Promotion
4. Types of Measures
Fully Additive Facts: These are facts which can be added across all the associated
dimensions. For example, sales amount is a fact which can be summed across different
dimensions like customer, geography, date, product, and so on.
Semi-Additive Facts: These are facts which can be added across only few dimensions
rather than all dimensions. For example, bank balance is a fact which can be summed
across the customer dimension (i.e. the total balance of all the customers in a bank at
the end of a particular quarter). However, the same fact cannot be added across the
date dimension (i.e. the total balance at the end of quarter 1 is $X million and $Y million
at the end of quarter 2, so at the end of quarter 2, the total balance is only $Y million
and not $X+$Y).
Non-Additive Facts: These are facts which cannot be added across any of the dimensions
in the cube. For example, profit margin is a fact which cannot be added across any of the
dimensions. For example, if product P1 has a 10% profit and product P2 has a 10%
profit then your net profit is still 10% and not 20%. We cannot add profit margins
across product dimensions. Similarly, if your profit margin is 10% on Day1 and 10% on
Day2, then your net Profit Margin at the end of Day2 is still 10% and not 20%.
Derived Facts: Derived facts are the facts which are calculated from one or more base
facts, often by applying additional criteria. Often these are not stored in the cube and
are calculated on the fly at the time of accessing them. For example, profit margin.
Factless Facts: A factless fact table is one which only has references (Foreign Keys) to
the dimensions and it does not contain any measures. These types of fact tables are
often used to capture events (valid transactions without a net change in a measure
value). For example, a balance enquiry at an automated teller machine (ATM). Though
there is no change in the account balance, this transaction is still important for analysis
purposes.
Textual Facts: Textual facts refer to the textual data present in the fact table, which is
not measurable (non-additive), but is important for analysis purposes. For example,
codes (i.e. product codes), flags (i.e. status flag), etc.
5. Types of relationships between dimensions and measuregroups.
No relationship: The dimension and measure group are not related.
Regular: The dimension table is joined directly to the fact table.
Referenced: The dimension table is joined to an intermediate table, which in turn,is
joined to the fact table.
Many to many:The dimension table is to an intermediate fact table,the intermediate fact
table is joined, in turn, to an intermediate dimension table to which the fact table is
joined.
Data mining:The target dimension is based on a mining model built from the source
dimension. The source dimension must also be included in the cube.
Fact table: The dimension table is the fact table.
6. Proactive caching
Proactive caching can be configured to refresh the cache (MOLAP cache) either on a predefined schedule or in response to an event (change in the data) from the underlying
relational database. Proactive caching settings also determine whether the data is
queried from the underlying relational database (ROLAP) or is read from the outdated
MOLAP cache, while the MOLAP cache is rebuilt.
Proactive caching helps in minimizing latency and achieve high performance.
It enables a cube to reflect the most recent data present in the underlying database by
automatically refreshing the cube based on the predefined settings.
Lazy aggregations:
When we reprocess SSAS cube then it actually bring new/changed relational data into
SSAS cube by reprocessing dimensions and measures. Partition indexes and
aggregations might be dropped due to changes in related dimensions data so
aggregations and partition indexes need to be reprocessed. It might take more time to
build aggregation and partition indexes.
If you want to bring cube online sooner without waiting rebuilding of partition indexes
and aggregations then lazy processing option can be chosen. Lazy processing option
bring SSAS cube online as soon as dimensions and measures get processed. Partition
indexes and aggregations are triggered later as a background job.
Advantage: Lazy processing saves processing time as it brings as soon as measures and
dimension data is ready.
Disadvantage: User will see performance hit when aggregation are getting build in
background.
7. Partition processing options
Process Default: SSAS dynamically chooses from one of the following process options.
Process Full: Drop all object stores, and rebuild the objects. This option is when a
structural change has been made to an object, for example, when an
attribute hierarchy is added, deleted, or renamed.
Process Update: Forces a re-read of data and an update of dimension attributes. Flexible
aggregations and indexes on related partitions will be dropped.
Process Add: For dimensions, adds new members and updates dimension attribute
captions and descriptions.
Process Data:Processes data only without building aggregations or indexes. If there is
data is in the partitions, it will be dropped before re-populating the
partition with source data.
Process Index: Creates or rebuilds indexes and aggregations for all processed partitions.
For unprocessed objects, this option generates an error.
Unprocess: Delete data from the object.
Process Structure: Drop the data and perform process default on all dimensions.
Process Clear: Drops the data in the object specified and any lower-level constituent
objects. After the data is dropped, it is not reloaded.
Process Clear Structure: Removes all training data from a mining structure.
8. Difference between attirubte hierarchy and user hierarchy
An Attribute Hierarchy is created by SSAS for every Attribute in a Dimension by default.
An Attribute by default contains only two levels - An "All" level and a
"Detail" level which is nothing but the Dimension Members.
A User Defined Hierarchy is defined explicitly by the user/developer and often contains
multiple levels. For example, a Calendar Hierarchy contains Year,
Quarter, Month, and Date as its levels.
Some of the highlights/differences of Attribute and User Defined Hierarchies:
1. Attribute Hierarchies are always Two-Level (Unless All Level is suppressed) whereas
User Defined Hierarchies are often Multi-Level.
2. By default, Every Attribute in a Dimension has an Attribute Hierarchy whereas User
Defined Hierarchies have to be explicitly defined by the user/developer.
3. Every Dimension has at least one Attribute Hierarchy by default whereas every
Dimension does not necessarily contain a User Defined Hierarchy. In essence, a
Dimension can contain zero, one, or more User Defined Hierarchies.
4. Attribute Hierarchies can be enabled or disabled. Disable the Attribute Hierarchy for
those attributes which are commonly not used to slice and dice the data during analysis,
like Address, Phone Number, and Unit Price etc. Doing this will improve the cube
processing performance and also reduces the size of the cube as those attributes are not
considered for performing aggregations.
5. Attribute Hierarchies can be made visible or hidden. When an Attribute Hierarchy is
hidden, it will not be visible to the client application while browsing the Dimension/Cube.
Attribute Hierarchies for those attributes which are part of the User Defined Hierarchies,
like Day, Month, Quarter, and Year, which are part of the Calendar Hierarchy, can be
hidden, since the attribute is available to the end users through the User Defined
Hierarchy and helps eliminate the confusion/redundancy for end users.
9. Dimension, Hierarchy, Level, and Members
Dimensions in Analysis Services contain attributes that correspond to columns in
dimension tables. These attributes appear as attribute hierarchies and can
be organized into user-defined hierarchies, or can be defined as parent-child hierarchies
based on columns in the underlying dimension table. Hierarchies are used to organize
measures that are contained in a cube.
Hierarchy: is the relation between attributes in a dimension.
Level: refers to individual attribute within the Hierarchy.
10. Difference between database dimension and cube dimension
When you create a dimension using dimension wizard in BIDS, then you're creating a
Database dimension in your AS database. Database dimensions is independent of cube
and can be processed on their own.
When you build a cube, and you add dimensions to that cube, you create cube
dimensions: cube dimensions are instances of a database dimension inside a cube.
A database dimension can be used in multiple cubes, and multiple cube dimensions can
be based on a single database dimension
The Database dimension has only Name and ID properties, whereas a Cube dimension
has several more properties.
Database dimension is created one where as Cube dimension is referenced from
database dimension.
Database dimension exists only once.where as Cube dimensions can be created more
than one using ROLE PLAYING Dimensions concept.
11. Importance of CALCULATE keyword in MDX script, data pass and limiting
cube space
12. Effect of materialize
When setting up a dimension with a Refence relationship type, we have the option of
"materializing" the dimension.
Select to store the attribute member in the intermediate dimension that links the
attribute in the reference dimension to the fact table in the MOLAP
structure. This imporvies the qery performance, but increases the processing time and
storage space.
If the option is not selected, only the relationship between the fact records and the
intermediate dimension is stored in the cube. This means that Anaylysis services has to
derive the aggregated values for the members of the referenced dimension when a query
is executed, resulting in slower query performance.
13. Partition processing and Aggregation Usage Wizard
14. Perspectives, Translations, Linked Object Wizard
15. Handling late arriving dimensions / early arriving facts
16. Role playing Dimensions, Junk Dimensions, Conformed Dimensions, SCD
and other types of dimensions
Role playing Dimesnion:
A Role-Playing Dimension is a Dimension which is connected to the same Fact Table
multiple times using different Foreign Keys.
eg: Consider a Time Dimension which is joined to the same Fact Table (Say FactSales)
multiple times, each time using a different Foreign Key in the Fact
Table like Order Date, Due Date, Ship Date, Delivery Date, etc
Steps:
In Cube Designer, click the Dimension Usage tab.
Either click the 'Add Cube Dimension' button, or right-click anywhere on the work surface
and then click Add Cube Dimension.
In the Add Cube Dimension dialog box, select the dimension that you want to add, and
then click OK.
A Conformed Dimension is a Dimension which connects to multiple Fact Tables across
one or more Data Marts (cubes). Conformed Dimensions are exactly the same structure,
attributes, values (dimension members), meaning and definition.
Example: A Date Dimension has exactly the same set of attributes, same members and
same meaning irrespective of which Fact Table it is connected to
A linked dimension is based on a dimension that is stored in a separate Analysis
Services Database which may or may not be on the same server. You can create and
maintain a dimension in just one database and then reuse that dimension by creating
linked dimensions for use in multiple databases.
Linked Dimensions can be used when the exact same dimension can be used across
multiple Cubes within an Organization like a Time Dimension, gography
Dimension etc.
28. What do you understand by attribute relationship? what are the main
advantages in using attribute relationship?
An Attribute Relationship is a relationship between various attributes within a Dimension.
By default, every Attribute in a Dimension is related to the Key
Attribute.
There are basically 2 types of Attribute Relationships: Rigid, Flexible
29. What is natural hierarchy and how will you create it?
Natural hierarchies, where each attribute is related either directly or indirectly to all
other attributes in the same hierarchy, as in product category - product
subcategory - product name
30. What do you understand by rigid and flexible relationship? Which one is
better from performance perspective?
Rigid: Attribute Relationship should be set to Rigid when the relationship between those
attributes is not going to change over time. For example,
relationship between a Month and a Date is Rigid since a particular Date always belongs
to a particular Month like 1st Feb 2012 always belongs to Feb
Month of 2012. Try to set the relationship to Rigid wherever possible.
Flexible: Attribute Relationship should be set to Flexible when the relationship between
those attributes is going to change over time. For example, relationship between an
Employee and a Manager is Flexible since a particular Employee might work under one
manager during this year (time period) and under a different manager during next year
(another time period).
31. In which scenario, you would like to go for materializing dimension?
Reference dimensions let you create a relationship between a measure group and a
dimension using an intermediate dimension to act as a bridge between
them.
32. In dimension usage tab, how many types of joins are possible to form
relationship between measure group and dimension?
33. What is deploy, process and build?
Bulid: Verifies the project files and create several local files.
Deploy: Deploy the structure of the cube(Skeleton) to the server.
Process: Read the data from the source and build the dimesions and cube structures
34. Can you create server time dimension in analysis services(Server time
dimension)?
35. How many types of dimension are possible in SSAS?
Account
Bill of Materials
Currency
Channel
Customer
Geography
Organizations
Products
promotion
Regular
Scenario
Time
Unary
36. What is time intelligence? How will you implement in SSAS?
37. What do you understand by linked cube or linked object feature in SSAS?
38. How will you write back to dimension using excel or any other client tool?
39. What do you understand by dynamic named set (SSAS 2008)? How is i
different from static named set?
40. In Process Update, which relationship will be better(Rigid and Flexible
relationship)?
43. What are different storage mode option in SQL server analysis services and
which scenario, they will be useful?
44. How will you implement data security for given scenario in analysis service
data?
"I have 4 cubes and 20 dimension. I need to give access to CEO, Operation
managers and Sales managers and employee.
1) CEO can see all the data of all 4 cubes.
2) Operation Managers can see only data related to their cube. There are four
operation managers.
3) Employees can see only certain dimension and measure groups data. (200
Employees) "
45. What are the options to deploy SSAS cube in production?
Right click on Project in Solution Explorer -> Properties
Build -> Select ' Output Path'
Deployment ->
Processing Option - Default, Full, Do Not Process
Transactional Deployment - False, True
Deployment Mode - Deploy All, Deploy Changes only
1.BIDS
In BIDS from the build menu select the build option (or right click on the project in the
solution explorer).
The build process will create four xml files in the bin subfolder of the project folder
.asdatabase - is the main object definition file
.configsettings
.deploymentoptions
.deploymenttargets
2. Deploy
Deployment via BIDS will overwrite the destination database management settings so
is not recommended for production deployment.
A more controllable option is the Deployment wizard, available in interactive or command
line mode.
Run the wizard from Start -> All Programs ->Microsoft Sql Server -> Analysis Services
-> deployment wizard
1. Browse to the .asdatabase file created by the build
2. connect to the target server
3. Configure how partitions and roles should be deployed
4. specify how configuration settings are deployed
5. Specify Processing options:
Default processing allows SSAS to decide what needs to be done; Full processing can be
used to process all objects. You can also choose not to process at all.
6. choose whether to deploy instantly or to create an XMLA command script for later
deployment. The script will be created in the same location as the
.asdatabase file.
46. What are the options available to incrementally load relational data into
SSAS cube?
Use Slowly Changing Dimesnion
47. Why will you use aggregation at remote server?
48. What are different ways to create aggregations in SSAS?
49. What do you understand by Usage based optimization?
50. Can we use different aggregation scheme for different partitions?
51. Why will you use perspective in SSAS?
52. What are KPIs? How will you create KPIs in SSAS?
53. What are the main feature differences in SSAS 2005 and SSAS 2008 from
developer point of view?
54.What are the aggregate functions available for measure in SSAS?
Sum, Min, Max, Count, and Distinct Count
55. What are the processing modes available for measure group? What do you
understand by lazy aggregation?
56. How can you improve dimension design?
1: Limit the Number of Dimensions Per Measure Group and Number of Attributes Per
Dimension.
AttributeHierarchyOptimizedState: Determines the level of optimization applied to the
attribute hierarchy. By default, an attribute hierarchy is FullyOptimized, which means
that Analysis Services builds indexes for the attribute hierarchy to improve query
performance. The other option, NotOptimized, means that no indexes are built for the
attribute hierarchy.
2: Use Dimension Properties Effectively
For large dimensions that expose millions of rows and have a large number of attributes,
pay particular attention to the ProcessingGroup property. By
default, this property is assigned a value of ByAttribute.
3: Use Regular Dimension Relationship Whenever Possible
4: Use Integer Data Type for Attribute Keys If at All Possible
5: Use Natural Hierarchies When Possible
57. What are the performance issues with parent child hierarchy?
In parent-child hierarchies, aggregations are created only for the key attribute and the
top attribute, i.e., the All attribute unless it is disabled.
58. What do you understand by formula engine and storage engine?
Formula Engine is single-threaded, Storage Engine (SE) is multi-threaded.
The Query Processor Cache/Formula Engine Cache caches the calculation results whereas
the Storage Engine Cache caches aggregated/fact data being
queried.
59. How can you improve overall cube performance?
1. Partitioning the cube can help to reduce the processing time. The benefit of
partitioning is that it allows to process multiple partitions in parallel on a
MDX
1. Explain the structure of MDX query?
2. MDX functions?
MDX KPI Functions:
KPICurrentTimeMember, KPIGoal, KPIStatus, KPITrend
KPIValue, KPIWeight
MDX Metadata Functions:
Axis, Count (Dimension), Count (Hierarchy Levels), Count (Tuple)
Hierarchy, Level, Levels, Name,Ordinal, UniqueName
MDX Navigation Functions:
Ancestor, Ancestors, Ascendants, Children
Cousin, Current, CurrentMember, CurrentOrdinal
DataMember, DefaultMember, FirstChild, FirstSibling
IsAncestor, IsGeneration, IsLeaf, IsSibling
Lag, LastChild, LastSibling, Lead
LinkMember, LookupCube, NextMember, Parent
PrevMember, Properties, Siblings, UnknownMember
MDX Other Functions:
CalculationCurrentPass, CalculationPassValue, CustomData, Dimension
Dimensions, Error, Item (Member), Item (Set)
Members (String), Predict, SetToArray
MDX Set Functions:
AddCalculatedMembers, AllMembers, BottomCount, BottomPercent
BottomSum, Crossjoin, Descendants, Distinct
Except, Exists, Extract, Filter
Generate, Head, Hierarchize, Intersect
} ON ROWS
FROM [Blog Statistics];
NONEMPTY():
The NonEmpty() returns the set of tuples that are not empty from a specified set, based
on the cross product of the specified set with a second set. Suppose we want to see all
the measures related to countries which have a non-null value for Subscribers
SELECT
{
[Measures].[Hits]
,[Measures].[Subscribers]
,[Measures].[Spam]
} ON COLUMNS
,{
NonEmpty
(
[Geography].[Country].Children
,[Measures].[Subscribers]
)
} ON ROWS
FROM [Blog Statistics];
11. Functions used commonly in MDX like Filter, Descendants, BAsc and others
12. Difference between NON EMPTY keyword and function,
NON_EMPTY_BEHAVIOR, ParallelPeriod, AUTOEXISTS
13. Difference between static and dynamic set
CREATE DYNAMIC SET MySet AS SomeSetExpression
or
CREATE STATIC SET MySet AS SomeSetExpression
A respects the context of a query's subcube and the query's WHERE clause and is
evaluated at the time the query is executed.
A Static Named Set is evaluated at the time the cube is processed and will not respect
any subcube context and slicers in WHERE clause.
14. Difference between natural and unnatural hierarchy, attribute relationships
15. Difference between rigid and flexible relationships
16. Write MDX for retrieving top 3 customers based on internet sales amount?
17. Write MDX to find current month's start and end date?
18. Write MDX to compare current month's revenue with last year same month
revenue?
19. Write MDX to find MTD(month to date), QTD(quarter to date) and YTD(year
to date) internet sales amount for top 5 products?
1) OLEDB connection Used to connect to any data source requiring an OLEDB connection
(i.e.,
SQL Server 2000)
2) Flat file connection Used to make a connection to a single file in the File System. Required
for reading information from a File System flat file
3) ADO.Net connection Uses the .Net Provider to make a connection to SQL Server 2005 or
other
connection exposed through managed code (like C#) in a custom task
4) Analysis Services connection Used to make a connection to an Analysis Services database or
project. Required for the Analysis Services DDL Task and Analysis Services Processing Task
5) File connection Used to reference a file or folder. The options are to either use or create a file
or folder
6) Excel
What is the use of Bulk Insert Task in SSIS?
Bulk Insert Task is used to upload large amount of data from flat files into Sql Server. It supports
only OLE DB connections for destination database.
What is Conditional Split transformation in SSIS?
This is just like IF condition which checks for the given condition and based on the condition
evaluation, the output will be sent to the appropriate OUTPUT path. It has ONE input and
MANY outputs. Conditional Split transformation is used to send paths to different outputs based
on some conditions. For example, we can organize the transform for the students in a class who
have marks greater than 40 to one path and the students who score less than 40 to another path.
How do you eliminate quotes from being uploaded from a flat file
to SQL Server?
This can be done using TEXT QUALIFIER property. In the SSIS package on the Flat File
Connection Manager Editor, enter quotes into the Text qualifier field then preview the data to
ensure the quotes are not included.
Can you explain how to setup a checkpoint file in SSIS?
The following items need to be configured on the properties tab for SSIS package:
CheckpointFileName Specify the full path to the Checkpoint file that the package uses
to save the value of package variables and log completed tasks. Rather than using a hard-coded
path as shown above, its a good idea to use an expression that concatenates a path defined in a
package variable and the package name.
CheckpointUsage Determines if/how checkpoints are used. Choose from these options:
Never(default), IfExists, or Always. Never indicates that you are not using Checkpoints. IfExists
is the typical setting and implements the restart at the point of failure behavior. If a Checkpoint
file is found it is used to restore package variable values and restart at the point of failure. If a
Checkpoint file is not found the package starts execution with the first task. The Always choice
raises an error if the Checkpoint file does not exist.
SaveCheckpoints Choose from these options: True or False (default). You must select
True to implement the Checkpoint behavior.
What are the different values you can set for CheckpointUsage
property ?
There are three values, which describe how a checkpoint file is used during package execution:
1) Never: The package will not use a checkpoint file and therefore will never restart.
2) If Exists: If a checkpoint file exists in the place you specified for the CheckpointFilename
property, then it will be used, and the package will restart according to the checkpoints written.
3) Always: The package will always use a checkpoint file to restart, and if one does not exist,
the package will fail.
What is the ONLY Property you need to set on TASKS in order to
configure CHECKPOINTS to RESTART package from failure?
The one property you have to set on the task is FailPackageOnFailure. This must be set
for each task or container that you want to be the point for a checkpoint and restart. If you do not
set this property to true and the task fails, no file will be written, and the next time you invoke the
package, it will start from the beginning again.
Where can we set the CHECKPOINTS, in DataFlow or ControlFlow ?
Checkpoints only happen at the Control Flow; it is not possible to checkpoint transformations or
restart inside a Data Flow. The Data Flow Task can be a checkpoint, but it is treated as any other
task.
Can you explain different options for dynamic configurations in
SSIS?
1) XML file
2) custom variables
3) Database per environment with the variables
4) Use a centralized database with all variables
What is the use of Percentage Sampling transformation in SSIS?
Percentage Sampling transformation is generally used for data mining. This transformation builds
a random sample of set of output rows by choosing specified percentage of input rows. For
example if the input has 1000 rows and if I specify 10 as percentage sample then the
transformation returns 10% of the RANDOM records from the input data.
What is the use of Term Extraction transformation in SSIS?
Term Extraction transformation is used to extract nouns or noun phrases or both noun and noun
phrases only from English text. It extracts terms from text in a transformation input column and
then writes the terms to a transformation output column. It can be also used to find out the
content of a dataset.
What is Data Viewer and what are the different types of Data
Viewers in SSIS?
A Data Viewer allows viewing data at a point of time at runtime. If data viewer is placed before
and after the Aggregate transform, we can see data flowing to the transformation at the runtime
and how it looks like after the transformation occurred. The different types of data viewers are:
1. Grid
2. Histogram
3. Scatter Plot
4. Column Chart.
What is Ignore Failure option in SSIS?
In Ignore Failure option, the error will be ignored and the data row will be directed to continue on
the next transformation. Lets say you have some JUNK data(wrong type of data or JUNK data)
flowing from source, then using this option in SSIS we can REDIRECT the junk data records to
another transformation instead of FAILING the package. This helps to MOVE only valid data to
destination and JUNK can be captured into separate file.
Which are the different types of Control Flow components in SSIS?
The different types of Control Flow components are: Data Flow Tasks, SQL Server Tasks, Data
Preparation Tasks, Work flow Tasks, Scripting Tasks, Analysis Services Tasks, Maintenance
Tasks, Containers.
What are containers? What are the different types of containers in
SSIS?
Containers are objects that provide structures to packages and extra functionality to tasks. There
are four types of containers in SSIS, they are: Foreach Loop Container, For Loop Container,
Sequence Container and Task Host Container.
What are the different types of Data flow components in SSIS?
There are 3 data flow components in SSIS.
1. Sources
2. Transformations
3. Destinations
1. Uppercase
2. Lowercase
3. Byte reversal : such as from 01234 to 04321
4. Full width
5. Half width
6. Hiragana/katakana/traditional Chinese/simplified Chinese
7. Linguistic casing
Explain Conditional split Transformation ?
It functions as ifthenelse construct. It enables send input data to a satisfied conditional
branch. For example you want to split product quantity between less than 500 and greater or
equal to 500. You can give the conditional a name that easily identifies its purpose. Else section
will be covered in Default Output Column name.
After you configure the component, it connect to subsequent transformation/destination, when
connected, it pops up dialog box to let you choose which conditional options will apply to the
destination transformation/destination.
Explain Copy column Transformation?
This component simply copies a column to another new column. Just like ALIAS Column in TSql.
Explain Data conversion Transformation?
This component does conversion data type, similar to TSQL function CAST or CONVERT. If
you wish to convery the data from one type to another then this is the best bet. But please make
sure that you have COMPATABLE data in the column.
Explain Data Mining query Transformation?
This component does prediction on the data or fills gap on it. Some good scenarios uses this
component is:
1. Take some input columns as number of children, domestic income, and marital income to
predict whether someone owns a house or not.
2. Take prediction what a customer would buy based analysis buying pattern on their shopping
cart.
3. Filling blank data or default values when customer doesnt fill some items in the questionnaire.
Explain Derived column Transformation?
Derived column creates new column or put manipulation of several columns into new column.
You can directly copy existing or create a new column using more than one column also.
Explain Merge Transformation?
Merge transformation merges two paths into single path. It is useful when you want to break out
data into path that handles errors after the errors are handled, the data are merge back into
downstream or you want to merge 2 data sources. It is similar with Union All transformation, but
Merge has some restrictions :
1. Data should be in sorted order
2. Data type , data length and other meta data attribute must be similar before merged.
Explain Merge Join Transformation?
Merge Join transformation will merge output from 2 inputs and doing INNER or OUTER join on
the data. But if you the data come from 1 OLEDB data source, it is better you join through SQL
query rather than using Merge Join transformation. Merge Join is intended to join 2 different data
source.
Explain Multicast Transformation?
This transformation sends output to multiple output paths with no conditional as Conditional Split
does. Takes ONE Input and makes the COPY of data and passes the same data through many
outputs. In simple Give one input and take many outputs of the same data.
Explain Percentage and row sampling Transformations?
This transformation will take data from source and randomly sampling data. It gives you 2
outputs. First is selected data and second one is unselected data. It is used in situation where you
train data mining model. These two are used to take the SAMPLE of data from the input data.
Explain Sort Transformation?
This component will sort data, similar in TSQL command ORDER BY. Some transformations
need sorted data.
Explain Union all Transformation?
It works in opposite way to Merge transformation. It can take output from more than 2 input
paths and combines into single output path.
What r the possible locations to save SSIS package?
You can save a package wherever you want.
SQL Server
Package Store
File System
What is a package?
A discrete executable unit of work composed of a collection of control flow and other objects,
including data sources, transformations, process sequence, and rules, errors and event handling,
and data destinations.
1. What is a package?
a).a discrete executable unit of work composed of a collection of control
flow and other objects, including data sources, transformations, process
sequence, and rules, errors and event handling, and data destinations.
2. What is a workflow in SSIS?
a).`a workflow is a set of instructions on how to execute Tasks.
(It is a set of instructions on how to execute Tasks such as sessions, emails
and shell commands. a workflow is created form work flow mgr.)
3. What is the Difference between control flow Items and data
flow Items?
a).the control flow is the highest level control process. It allows you to
manage the run-time process the run time process activities of data flow
and other processes within a package.
When we want to extract, transform and load data within a package. You
add an SSIS dataflow Task to the package control flow.
4. What are the main components of SSIS (project-architecture)?
A).SSIS architecture has 4 main components
1.SSIS service
2.SSIS runtime engine & runtime executables
3.SSIS dataflow engine & dataflow components
4.SSIS clients
5.different components in SSIS package?
1. Control flow
2.data flow
3.event handler
4.package explorer
Containers: provide structure and scope to your package
Types of containers:
i. Task host container: the Taskhost container services a single Task.
ii. Sequence container: It can handle the flow of subset of a package
and can help you drive a package into smaller more manageable process.
Uses:1. Grouping Tasks so that you can disable a part of the package that no
longer needed.
2. Narrowing the scope of the variable to a container.
3. Managing the property of multiple Tasks in one step by setting the
properties of the container.
iii. For loop container: evaluates an expression and repeats Its workflow
until the expression evaluates to false.
iv. For each loop container: defines a control flow repeatedly by using
an enumerator.
For each loop container repeats the control flow for each member of a
specified enumerator.
Tasks: It provides the functionality to your package.
It is a individual unit of work.
Event handler: It responds to raised events in your package.
Precedence constraints: It provides ordinal relationship b/w various
Items in your package.
6. How to deploy the package?
To deploy the package first we need to configure some properties.
Go to project tab->package properties->we get a window, configure
deployment Utility as "True"
Specify the path as "bin/deployment"
7. Connection manager:
a).It is a bridge b/w package object and physical data. It provides logical
representation of a connection at design time the properties of the
connection mgr describes the physical connection that integration
services creates when the package is run.
8. Tell the Utility to execute (run) the package?
a) In BIDS a package that can be executed in debug mode by using the
debug menu or toolbar or from solution explorer.
In production, the package can be executed from the command line or
from Microsoft windows Utility or It can be scheduled for automated
execution by using the SQL server agent.
i). Go to->debug menu and select the start debugging button
ii).press F5 key
iii).right click the package and choose execute package.
iv).command prompts utilities
a).DTExecUI
1. To open command prompt->run->type dtexecui->press enter
2. The execute package Utility dialog box opens.
3. in that click execute to run the package.
Wait until the package has executed successfully.
b).DTExec Utility
1.open the command prompt window.
2. Command prompt window->type dtexec /followed by the DTS, SQL, or
file option and the package path, including package name.
3. If the package encryption level is encrypting sensitive with password or
encrypt all with password, use the decrypt option to provide the password.
If no password is included, dtexec will prompt you for the password.
4. Optionally, provide additional command-line options
5. Press enter.
6. Optionally, view logging and reporting information before closing the
command prompt window.
The execute package Utility dialog box opens.
7. In the execute package Utility dialog box, click execute package.
Wait until the package has executed successfully.
v).using SQL server mgmt studio to execute package
1. In SSMS right click a package, and then click run package.
Execute package Utility opens.
2. Execute the package as described previously.
9. How can u design SCD in SSIS?
a) Def:-SCD explains how to capture the changes over the period of time.
This is also known as change data capture.
type1: It keeps the most recent values in the target. It does not maintain
the history.
type2: It keeps the full history in the target database. For every update in
the source a new record is inserted in the target.
type3: It keeps current & previous information in the target.
10. How can u handle the errors through the help of logging in
SSIS?
a) To create an on error event handler to which you add the log error
execute SQL Task.
11. What is a log file and how to send log file to mgr?
a) It is especially useful when the package has been deployed to the
production environment, and you cannot use BIDS and VSA to debug the
package.
SSIS enables you to implement logging code through the Dts. Log method.
When the Dts. Log method is called in the script, the SSIS engine will route
the message to the log providers that are configured in the containing
package.
12. What is environment variable in SSIS?
a) An environment variable configuration sets a package property equal to
the value in an environment variable.
Environmental configurations are useful for configuring properties that are
dependent on the computer that is executing the package.
13. about multiple configurations?
a) It means including the xml configuration, environment variable, registry
entry, parent package variable, SQL Server table, and direct and indirect
configuration types.
14. How to provide security to packages?
a) In two ways
1. Package encryption
2. Password protection.
15. as per error handling in T/R, which one handle the better
performance? Like fail component, redirect row or ignore failure?
a) Redirect row provides better performance for error handling.
16. Staging area??
a) It is a temporary data storage location. Where various data T/R
activities take place.
Staging area is a kitchen of data warehouse.
17. Task??
a) An individual unit of work.
Types:1.
2.
3.
4.
5.
Question: What is your approach for ETL with data warehouses (how
many packages you developer during typical load etc.)?
Comment: This is rather generic question. A typical approach (for me) when
building ETL is to. Have a package to extract data per source with extract specific
transformations (lookups, business rules, cleaning) and loads data into staging
table. Then a package do a simple merge from staging to data warehouse
(Stored Procedure) or a package that takes data from staging and performs extra
work before loading to data warehouse. I prefer the first one and due to this
approach I occasionally consider having extract stage (as well as stage phase)
which gives me more flexibility with transformation (per source) and makes it
simpler to follow (not everything in one go). So to summarize you usually have
package per source and one package per data warehouse table destination.
There are might be other approach valid as well so ask for reasons.
No Deployment Wizard
NO BI functionality
Complete BI Integration
also write custom messages. This is not enabled by default. Integration Services
supports a diverse set of log providers, and gives you the ability to create custom log
providers. The Integration Services log providers can write log entries to text files, SQL
Server Profiler, SQL Server, Windows Event Log, or XML files. Logs are associated with
packages and are configured at the package level. Each task or container in a package
can log information to any package log. The tasks and containers in a package can be
enabled for logging even if the package itself is not.
Q: How do you deploy SSIS packages.
BUILDing SSIS Projects provides a Deployment Manifest File. We need to run the
manifest file and decide whether to deploy this onto File System or onto SQL Server
[ msdb]. SQL Server Deployment is very faster and more secure then File System
Deployment. Alternatively, we can also import the package from SSMS from File
System or SQ Server.
Q: What are variables and what is variable scope ?
Variables store values that a SSIS package and its containers, tasks, and event handlers
can use at run time. The scripts in the Script task and the Script component can also use
variables. The precedence constraints that sequence tasks and containers into a
workflow can use variables when their constraint definitions include expressions.
Integration Services supports two types of variables: user-defined variables and system
variables. User-defined variables are defined by package developers, and system
variables are defined by Integration Services. You can create as many user-defined
variables as a package requires, but you cannot create additional system variables.
Q: Can you name five of the Perfmon counters for SSIS and the value they
provide?
SQLServer:SSIS Service
SQLServer:SSIS Pipeline
Buffer memory
Buffers in use
Buffers spooled
Rows read
Rows written
Q SSIS Blocking and Non blocking transformation.
Data flow transformations in SSIS use memory/buffers in different ways. The way a
transformation uses memory can dramatically impact the performance of your package.
Transformation buffer usage can be classified into 3 categories: Non Blocking, Partially
Blocking, and (Full) Blocking.
If you picture a data flow as a river, and transformation buffer usage as a dam in that
river, here is the impact of your transformation on your data flow.
A Non Blocking transformation is a dam that just lets the water spill over the top.
Other than perhaps a bit of a slow down the water (your data) proceeds on its way with
very little delay
A Partially Blocking transformation is a dam that holds the water back until it
reaches a certain volume , and then releases that volume of water downstream and then
completely blocks the flow until that volume is achieved again. Your data in this case,
will stop, then start, then stop, then start over and over until all the data has moved
through the transformation. The downstream transformations end up starved for data
during certain periods, and then flooded with data during other periods. Clearly your
downstream transformations will not be able to work as efficiently when this happens,
and your entire package will slow down as a result.
A Blocking transformation is a dam that lets nothing through until the entire
volume of the river has flowed into the dam. Nothing is left to flow from upstream, and
nothing has been passed downstream. Then once the transformation is finished, it
releases all the data downstream. Clearly for a large dataset this can be extremely
memory intensive. Additionally, if all the transforms in your package are just waiting for
data, your package is going to run much more slowly.
Generally speaking if you can avoid Blocking and Partially Blocking transactions, your
package will simply perform better. If you think about it a bit, you will probably be able to
figure out which transformations fall into which category. Here is a quick list for your
reference:
Non Blocking
Audit
Character Map
Conditional Split
Copy Column
Data Conversion
Derived Column
Import Column
Lookup
Multicast
Percentage sampling
Row count
Row sampling
Script component
Partially Blocking
Data mining
Merge
Merge Join
Pivot/Unpivot
Term Extraction
Term Lookup
Union All
Blocking
Aggregate
Fuzzy Grouping
Fuzzy Lookup
Sort
Facts :
Sort is a fully blocking transformation.
A Merge transform requires a Sort, but a Union All does not, use a Union All when you
can.
Q. Diffrence between synchronous and asynchronous data transmission
To understand the difference between a synchronous and an asynchronous
transformation in Integration Services, it is easiest to start with an understanding of a
synchronous transformation. If a synchronous transformation does not meet your needs,
your design might require an asynchronous transformation.
The component has to acquire multiple buffers of data before it can perform its
processing. An example is the Sort transformation, where the component has to process
the complete set of rows in a single operation.
The component has to combine rows from multiple inputs. An example is the
Merge transformation, where the component has to examine multiple rows from each
input and then merge them in sorted order.
There is no one-to-one correspondence between input rows and output rows. An
example is the Aggregate transformation, where the component has to add a row to the
output to hold the computed aggregate values.
In Integration Services scripting and programming, you specify an asynchronous
transformation by assigning a value of 0 to the SynchronousInputID property of the
component's outputs. . This tells the data flow engine not to send each row automatically
to the outputs. Then you must write code to send each row explicitly to the appropriate
output by adding it to the new output buffer that is created for the output of an
asynchronous transformation.
Note
Since a source component must also explicitly add each row that it reads from
the data source to its output buffers, a source resembles a transformation with
asynchronous outputs.
build process will be deleted. For example, package configuration files saved to the
deployment folders will be deleted.
To create a package deployment utility
1.
In SQL Server Data Tools (SSDT), open the solution that contains the Integration
Services project for which you want to create a package deployment utility.
2.
3.
4.
To update package configurations when packages are deployed, set
AllowConfigurationChanges to True.
5.
6.
Optionally, update the location of the deployment utility by modifying the
DeploymentOutputPath property.
7.
Click OK.
8.
9.
View the build progress and build errors in the Output window
After youve gone through these steps the next time you build your project it will create
the file (YourProjectName).SSISDeploymentManifest. This file is located in the same
folder as your packages in the bin\Deployment folder.
If you run this file it will open the Package Installation Wizard that will allow you to deploy
all your packages that were located in the project to a desired location.
The password part of a connection string. However, if you select an option that
encrypts everything, the whole connection string will be considered sensitive.
The task-generated XML nodes that are tagged as sensitive. The tagging of XML
nodes is controlled by Integration Services and cannot by changed by users.
Any variable that is marked as sensitive. The marking of variables is controlled by
Integration Services.
Whether Integration Services considers a property sensitive depends on whether the
developer of the Integration Services component, such as a connection manager or task,
has designated the property as sensitive. Users cannot add properties to, nor can they
remove properties from, the list of properties that are considered sensitive.
Using Encryption
Protection Levels
When used with the dtutil utility, this protection level corresponds to the value of 3.
3. Encrypt all with user key (EncryptAllWithUserKey)
Uses a key that is based on the current user profile to encrypt the whole package. Only
the user who created or exported the package can open the package in SSIS Designer
or run the package by using the dtexec command prompt utility.
When used with the dtutil utility, this protection level corresponds to the value of 4.
Note
For protection levels that use a user key, Integration Services uses DPAPI standards.
For more information about DPAPI, see the MSDN Library at
http://msdn.microsoft.com/library.
4. Encrypt sensitive with password (EncryptSensitiveWithPassword)
Uses a password to encrypt only the values of sensitive properties in the package.
DPAPI is used for this encryption. Sensitive data is saved as a part of the package, but
that data is encrypted by using a password that the current user supplies when the
package is created or exported. To open the package in SSIS Designer, the user must
provide the package password. If the password is not provided, the package opens
without the sensitive data and the current user must provide new values for sensitive
data. If the user tries to execute the package without providing the password, package
execution fails. For more information about passwords and command line execution, see
dtexec Utility (SSIS Tool).
When used with the dtutil utility, this protection level corresponds to the value of 2.
5. Encrypt sensitive with user key (EncryptSensitiveWithUserKey)
Uses a key that is based on the current user profile to encrypt only the values of
sensitive properties in the package. Only the same user who uses the same profile can
load the package. If a different user opens the package, the sensitive information is
replaced with blanks and the current user must provide new values for the sensitive
data. If the user attempts to execute the package, package execution fails. DPAPI is
used for this encryption.
When used with the dtutil utility, this protection level corresponds to the value of 1.
Note For protection levels that use a user key, Integration Services uses DPAPI
standards. For more information about DPAPI, see the MSDN Library at
http://msdn.microsoft.com/library.
6. Rely on server storage for encryption (ServerStorage)
Protects the whole package using SQL Server database roles. This option is supported
only when a package is saved to the SQL Server msdb database. It is not supported
when a package is saved to the file system from Business Intelligence Development
Studio.
you change the protection level as listed in the following steps:
1. During development, leave the protection level of packages set to the default value,
EncryptSensitiveWithUserKey. This setting helps ensure that only the developer sees
sensitive values in the package. Or, you can consider using EncryptAllWithUserKey, or
DontSaveSensitive.
2. When it is time to deploy the packages, you have to change the protection level to
one that does not depend on the developer's user key. Therefore you typically have to
select EncryptSensitiveWithPassword, or EncryptAllWithPassword. Encrypt the
packages by assigning a temporary strong password that is also known to the
operations team in the production environment.
3. After the packages have been deployed to the production environment, the
operations team can re-encrypt the deployed packages by assigning a strong password
that is known only to them. Or, they can encrypt the deployed packages by selecting
EncryptSensitiveWithUserKey or EncryptAllWithUserKey, and using the local credentials
of the account that will run the packages
o Failure (next task will be executed only when the last task failed) or
o Complete (next task will be executed no matter the last task was completed or failed).
What is a container and how many types of containers are there?
A container is a logical grouping of tasks which allows you to manage the scope of the
tasks together.
These are the types of containers in SSIS:
o Sequence Container Used for grouping logically related tasks together
o For Loop Container Used when you want to have repeating flow in package
o For Each Loop Container Used for enumerating each object in a collection; for
example a record set or a list of files.
Apart from the above mentioned containers, there is one more container called the
Task Host Container which is not visible from the IDE, but every task is contained in it
(the default container for all the tasks).
What are variables and what is variable scope?
A variable is used to store values. There are basically two types of variables, System
Variable (like ErrorCode, ErrorDescription, PackageName etc) whose values you can
use but cannot change and User Variable which you create, assign values and read as
needed. A variable can hold a value of the data type you have chosen when you defined
the variable.
Variables can have a different scope depending on where it was defined. For example
you can have package level variables which are accessible to all the tasks in the
package and there could also be container level variables which are accessible only to
those tasks that are within the container
What are SSIS Connection Managers?
When we talk of integrating data, we are actually pulling data from different sources and
writing it to a destination. But how do you get connected to the source and destination
systems? This is where the connection managers come into the picture. Connection
manager represent a connection to a system which includes data provider information,
the server name, database name, authentication mechanism,
choose which tasks you want to enable logging. On the right side you will notice two
tabs; on the Providers and Logs tab you specify where you want to write the logs, you
can write it to one or more log providers together. On the Details tab you can specify
what events do you want to log for the selected task.
Please note, enabling event logging is immensely helpful when you are troubleshooting a
package, but also incurs additional overhead on SSIS in order to log the events and
information. Hence you should only enabling event logging when needed and only
choose events which you want to log. Avoid logging all the events unnecessarily.
What is the LoggingMode property?
SSIS packages and all of the associated tasks or components have a property called
LoggingMode. This property accepts three possible values: Enabled to enable logging
of that component, Disabled to disable logging of that component and
UseParentSetting to use parents setting of that component to decide whether or not to
log the data.
What is the transaction support feature in SSIS?
When you execute a package, every task of the package executes in its own transaction.
What if you want to execute two or more tasks in a single transaction? This is where the
transaction support feature helps. You can group all your logically related tasks in single
group. Next you can set the transaction property appropriately to enable a transaction so
that all the tasks of the package run in a single transaction. This way you can ensure
either all of the tasks complete successfully or if any of them fails, the transaction gets
roll-backed too.
What properties do you need to configure in order to use the
transaction feature in SSIS?
Suppose you want to execute 5 tasks in a single transaction, in this case you can place
all 5 tasks in a Sequence Container and set the TransactionOption and IsolationLevel
properties appropriately.
o The TransactionOption property expects one of these three values:
Supported The container/task does not create a separate transaction, but if the
parent object has already initiated a transaction then participate in it
Required The container/task creates a new transaction irrespective of any
transaction initiated by the parent object
the first task and referring to the same table in the second task? When early validation
starts, it will not be able to validate the second task as the dependent table has not been
created yet. Keep in mind that early validation is performed before the package
execution starts. So what should we do in this case? How can we ensure the package is
executed successfully and the logically flow of the package is correct? This is where you
can use the DelayValidation property. In the above scenario you should set the
DelayValidation property of the second task to TRUE in which case early validation i.e.
package level validation is skipped for that task and that task would only be validated
during late validation i.e. component level validation. Please note using the
DelayValidation property you can only skip early validation for that specific task, there is
no way to skip late or component level validation.
What are the different components in the SSIS architecture?
The SSIS architecture comprises of four main components:
o The SSIS runtime engine manages the workflow of the package
o The data flow pipeline engine manages the flow of data from source to destination and
in-memory transformations
o The SSIS object model is used for programmatically creating, managing and
monitoring SSIS packages
o The SSIS windows service allows managing and monitoring packages
How is SSIS runtime engine different from the SSIS dataflow pipeline
engine?
The SSIS Runtime Engine manages the workflow of the packages during runtime, which
means its role is to execute the tasks in a defined sequence. As you know, you can
define the sequence using precedence constraints. This engine is also responsible for
providing support for event logging, breakpoints in the BIDS designer, package
configuration, transactions and connections. The SSIS Runtime engine has been
designed to support concurrent/parallel execution of tasks in the package.
The Dataflow Pipeline Engine is responsible for executing the data flow tasks of the
package. It creates a dataflow pipeline by allocating in-memory structure for storing data
in-transit. This means, the engine pulls data from source, stores it in memory, executes
the required transformation in the data stored in memory and finally loads the data to the
destination. Like the SSIS runtime engine, the Dataflow pipeline has been designed to
do its work in parallel by creating multiple threads and enabling them to run multiple
execution trees/units in parallel.
How is a synchronous (non-blocking) transformation different from
an asynchronous (blocking) transformation in SQL Server Integration
Services?
A transformation changes the data in the required format before loading it to the
destination or passing the data down the path. The transformation can be categorized in
Synchronous and Asynchronous transformation.
A transformation is called synchronous when it processes each incoming row (modify
the data in required format in place only so that the layout of the result-set remains
same) and passes them down the hierarchy/path. It means, output rows are
synchronous with the input rows (1:1 relationship between input and output rows) and
hence it uses the same allocated buffer set/memory and does not require additional
memory. Please note, these kinds of transformations have lower memory requirements
as they work on a row-by-row basis (and hence run quite faster) and do not block the
data flow in the pipeline. Some of the examples are : Lookup, Derived Columns, Data
Conversion, Copy column, Multicast, Row count transformations, etc.
A transformation is called Asynchronous when it requires all incoming rows to be stored
locally in the memory before it can start producing output rows. For example, with an
Aggregate Transformation, it requires all the rows to be loaded and stored in memory
before it can aggregate and produce the output rows. This way you can see input rows
are not in sync with output rows and more memory is required to store the whole set of
data (no memory reuse) for both the data input and output. These kind of
transformations have higher memory requirements (and there are high chances of buffer
spooling to disk if insufficient memory is available) and generally runs slower. The
asynchronous transformations are also called blocking transformations because of its
nature of blocking the output rows unless all input rows are read into memory.
What is the difference between a partially blocking transformation
versus a fully blocking transformation in SQL Server Integration
Services?
Asynchronous transformations, as discussed in last question, can be further divided in
two categories depending on their blocking behavior:
o Partially Blocking Transformations do not block the output until a full read of the inputs
occur. However, they require new buffers/memory to be allocated to store the newly
created result-set because the output from these kind of transformations differs from the
input set. For example, Merge Join transformation joins two sorted inputs and produces
a merged output. In this case if you notice, the data flow pipeline engine creates two
input sets of memory, but the merged output from the transformation requires another
set of output buffers as structure of the output rows which are different from the input
rows. It means the memory requirement for this type of transformations is higher than
synchronous transformations where the transformation is completed in place.
o Full Blocking Transformations, apart from requiring an additional set of output buffers,
also blocks the output completely unless the whole input set is read. For example, the
Sort Transformation requires all input rows to be available before it can start sorting and
pass down the rows to the output path. These kind of transformations are most
expensive and should be used only as needed. For example, if you can get sorted data
from the source system, use that logic instead of using a Sort transformation to sort the
data in transit/memory.
What is an SSIS execution tree and how can I analyze the execution
trees of a data flow task?
The work to be done in the data flow task is divided into multiple chunks, which are
called execution units, by the dataflow pipeline engine. Each represents a group of
transformations. The individual execution unit is called an execution tree, which can be
executed by separate thread along with other execution trees in a parallel manner. The
memory structure is also called a data buffer, which gets created by the data flow
pipeline engine and has the scope of each individual execution tree. An execution tree
normally starts at either the source or an asynchronous transformation and ends at the
first asynchronous transformation or a destination. During execution of the execution
tree, the source reads the data, then stores the data to a buffer, executes the
transformation in the buffer and passes the buffer to the next execution tree in the path
by passing the pointers to the buffers.
To see how many execution trees are getting created and how many rows are getting
stored in each buffer for a individual data flow task, you can enable logging of these
events of data flow task: PipelineExecutionTrees, PipelineComponentTime,
PipelineInitialization, BufferSizeTunning, etc.
How can an SSIS package be scheduled to execute at a defined time
or at a defined interval per day?
You can configure a SQL Server Agent Job with a job step type of SQL Server
Integration Services Package, the job invokes the dtexec command line utility internally
to execute the package. You can run the job (and in turn the SSIS package) on demand
or you can create a schedule for a one time need or on a reoccurring basis.
What is an SSIS Proxy account and why would you create it?
When we try to execute an SSIS package from a SQL Server Agent Job it fails with the
message Non-SysAdmins have been denied permission to run DTS Execution job steps
without a proxy account. This error message is generated if the account under which
SQL Server Agent Service is running and the job owner is not a sysadmin on the
instance or the job step is not set to run under a proxy account associated with the SSIS
subsystem.
How can you configure your SSIS package to run in 32-bit mode on
64-bit machine when using some data providers which are not
available on the 64-bit platform?
In order to run an SSIS package in 32-bit mode the SSIS project property
Run64BitRuntime needs to be set to False. The default configuration for this property is
True. This configuration is an instruction to load the 32-bit runtime environment rather
than 64-bit, and your packages will still run without any additional changes. The property
can be found under SSIS Project Property Pages -> Configuration Properties ->
Debugging.
d) Data flow engine: provides the in-memory buffers that move data from source to destination.
component designer in an Async component to provide a column structure to the output buffer and
hook up the data from the input.
Q19 How to achieve parallelism in SSIS?
Parallelism is achieved using MaxConcurrentExecutable property of the package. Its default is
-1 and is calculated as number of processors + 2.
-More questions added-Sept 2011
Q20 How do you do incremental load?
Fastest way to do incremental load is by using Timestamp column in source table and then storing
last ETL timestamp, In ETL process pick all the rows having Timestamp greater than the stored
Timestamp so as to pick only new and updated records
Q21 How to handle Late Arriving Dimension or Early Arriving Facts.
Late arriving dimensions sometime get unavoidable 'coz delay or error in Dimension ETL or may be
due to logic of ETL. To handle Late Arriving facts, we can create dummy Dimension with
natural/business key and keep rest of the attributes as null or default. And as soon as Actual
dimension arrives, the dummy dimension is updated with Type 1 change. These are also known as
Inferred Dimensions.