Ssisinterview
Ssisinterview
1. In BIDS, the SSIS project contain 10 packages. But How to deploy only 5
packages in Destination machine eventhough Manifest file contains all 10
packages after the Build?
-Open the Manifest file in any editor like BIDS or Notepad, keep the required 5
packages, remove remaining 5 packages.
-Save and Close the Manifest file.
-Double click on Manifest file to deploy the required 5 package.
2.6 click 'Next' button and select the 'Value' property of the child package variable.
2.7 click 'Next' and 'OK' buttons
2.8 To test the package, I added sample Script Task with a messageBox to show the
value of the parent package.
3. My Source Table data as follows:
FailPackageOnFailure: property needs to be set to True for enabling the task in the
checkpoint.
Checkpoint mechanism uses a Text File to mark the point of package failure.
These checkpoint files are automatically created at a given location upon the package
failure and automatically deleted once the package ends up with success.
10. How to execute SSIS Package from Stored Procedure.
using xp_cmdshell command
11. Parallel processing in SSIS
To support parallel execution of different tasks in the package, SSIS uses 2 properties:
1.MaxConcurrentExecutables: defines how many tasks can run simultaneously, by
specifying the maximum number of SSIS threads that can execute in parallel per
package. The default is -1, which equates to number of physical or logical processor + 2.
2. EngineThreads: is property of each DataFlow task. This property defines how many
threads the data flow engine can create and run in parallel. The EngineThreads property
applies equally to both the source threads that the data flow engine creates for sources
and the worker threads that the engine creates for transformations and destinations.
Therefore, setting EngineThreads to 10 means that the engine can create up to ten
source threads and up to ten worker threads.
Select 'Load to sql Table' Data flow Task. Navigate to 'Even Handlers' Tab.
Drag and Drop 'Execute Sql Task'. Open the Execute Sql Task Editor and in Parameter
Mapping' section, select the system variables as follows:
Create a table in Sql Server Database with Columns as: PackageID, PackageName,
TaskID, TaskName, ErrorCode, ErrorDescription.
The Merge transformation combines two sorted datasets into a single dataset. The rows
from each dataset are inserted into the output based on values in their key columns.
The Merge transformation is similar to the Union All transformations. Use the Union All
transformation instead of the Merge transformation in the following situations:
Multicast Transformation generates exact copies of the source data, it means each
recipient will have same number of records as the source whereas the Conditional Split
Transformation divides the source data based on the defined conditions and if no rows
match with this defined conditions those rows are put on default output.
Bulk Insert Task is used to copy the large volumn of data from text file to sql server
destination.
19. Incremental Load in SSIS
Using Slowly Changing Dimension
Using Lookup and Cache Transformation
20. How to migrate Sql server 2005 Package to 2008 version
1. In BIDS, by right click on the "SSIS Packages" folder of an SSIS project and
selecting "Upgrade All Packages".
2. Running "ssisupgrade.exe" from the command line (default physical location C:\
Program Files\Microsoft SQL Server\100\DTS\Bin folder).
3. If you open a SSIS 2005 project in BIDS 2008, it will automatically launch the SSIS
package upgrade wizard.
21. Difference between Synchronous and Asynchronous Transformation
Synchronous T/F process the input rows and passes them onto the data flow one row at
a time.
When the output buffer of Transformation created a new buffer, then it is Asynchronous
transformation. Output buffer or output rows are not sync with input buffer.
22. What are Row Transformations, Partially Blocking Transformation, Fully
Blocking Transformation with examples.
In Row Transformation, each value is manipulated individually. In this transformation,
the buffers can be re-used for other purposes like following:
OLEDB Datasource, OLEDB Data Destinations
Other Row transformation within the package, Other partially blocking
transformations within the package.
examples of Row Transformations: Copy Column, Audit, Character Map
Partially Blocking Transformation:
These can re-use the buffer space allocated for available Row transformation and get
new buffer space allocated exclusively for Transformation.
examples: Merge, Conditional Split, Multicast, Lookup, Import, Export Column
Fully Blocking Transformation:
It will make use of their own reserve buffer and will not share buffer space from other
transformation or connection manager.
examples: Sort, Aggregate, Cache Transformation
Foreach ADO:
The ADO Enumerator enumerates rows in a table. For example, we can get the rows in
the ADO records.The variable must be of Object data type.
Foreach ADO.NET Schema Rowset:
The ADO.Net Enumerator enumerates the schema information. For example, we can get
the table from the database.
Foreach File:
The File Enumerator enumerates files in a folder. For example, we can get all the files
which have the *.txt extension in a windows folder and its sub folders.
Foreach From Variable:
The Variable Enumerator enumerates objects that specified variables contain. Here
enumerator objects are nothing but an array or data table.
Foreach Item:
The Item Enumerator enumerates the collections. For example, we can enumerate the
names of executables and working directories that an “Execute Process” task uses.
Foreach Nodelist:
The Node List Enumerator enumerates the result of an XPath expression.
Foreach SMO:
The SMO Enumerator enumerates SQL Server Management Objects (SMO). For example,
we can get the list of functions or views in a SQL Server database.
Container
Container Description Purpose of SSIS Container
Type
To repeat tasks for each element in a
collection, for example retrieve files
Foreach
This container runs a Control Flow from a folder, running T-SQL
Loop
repeatedly using an enumerator. statements that reside in multiple
Container
files, or running a command for
multiple objects.
For Loop This container runs a Control Flow To repeat tasks until a specified
Container repeatedly by checking conditional expression evaluates to false. For
expression (same as For Loop in example, a package can send a
programming language). different e-mail message seven times,
one time for every day of the week.
This container group tasks and
containers that must succeed or fail
Groups tasks as well as containers as a unit. For example, a package can
Sequence
into Control Flows that are subsets of group tasks that delete and add rows
Container
the package Control Flow. in a database table, and then commit
or roll back all the tasks when one
fails.
Success – Workflow will proceed when the preceding container executes successfully.
Indicated in control flow by a solid green line.
Failure – Workflow will proceed when the preceding container’s execution results in a
failure. Indicated in control flow by a solid red line.
Completion – Workflow will proceed when the preceding container’s execution
completes, regardless of success or failure. Indicated in control flow by a solid blue line.
Expression/Constraint with Logical AND – Workflow will proceed when specified
expression and constraints evaluate to true. Indicated in control flow by a solid color line
along with a small ‘fx’ icon next to it. Color of line depends on logical constraint chosen
(e.g. success=green, completion=blue).
Expression/Constraint with Logical OR – Workflow will proceed when either the specified
expression or the logical constraint (success/failure/completion) evaluates to true.
Indicated in control flow by a dotted color line along with a small ‘fx’ icon next to it.
Color of line depends on logical constraint chosen (e.g. success=green,
completion=blue).
Keep Identity – By default this setting is unchecked which means the destination table (if
it has an identity column) will create identity values on its own. If you check this setting,
the dataflow engine will ensure that the source identity values are preserved and same
value is inserted into the destination table.
Keep Nulls – Again by default this setting is unchecked which means default value will be
inserted (if the default constraint is defined on the target column) during insert into the
destination table if NULL value is coming from the source for that particular column. If
you check this option then default constraint on the destination table's column will be
ignored and preserved NULL of the source column will be inserted into the destination.
Table Lock – By default this setting is checked and the recommendation is to let it be
checked unless the same table is being used by some other process at same time. It
specifies a table lock will be acquired on the destination table instead of acquiring
multiple row level locks, which could turn into lock escalation problems.
Check Constraints – Again by default this setting is checked and recommendation is to
un-check it if you are sure that the incoming data is not going to violate constraints of
the destination table. This setting specifies that the dataflow pipeline engine will validate
the incoming data against the constraints of target table. If you un-check this option it
will improve the performance of the data load.
#5 - Effect of Rows Per Batch and Maximum Insert Commit Size Settings:
Rows per batch:
The default value for this setting is -1 which specifies all incoming rows will be treated as
a single batch. You can change this default behavior and break all incoming rows into
multiple batches. The allowed value is only positive integer which specifies the maximum
number of rows in a batch.
Maximum insert commit size:
The default value for this setting is '2147483647' (largest value for 4 byte integer type)
which specifies all incoming rows will be committed once on successful completion. You
can specify a positive value for this setting to indicate that commit will be done for those
number of records. Changing the default value for this setting will put overhead on the
dataflow engine to commit several times. Yes that is true, but at the same time it will
release the pressure on the transaction log and tempdb to grow specifically during high
volume data transfers.
The above two settings are very important to understand to improve the performance of
tempdb and the transaction log. For example if you leave 'Max insert commit size' to its
default, the transaction log and tempdb will keep on growing during the extraction
process and if you are transferring a high volume of data the tempdb will soon run out of
memory as a result of this your extraction will fail. So it is recommended to set these
values to an optimum value based on your environment.
#7 - DefaultBufferSize and DefaultBufferMaxRows :
The execution tree creates buffers for storing incoming rows and performing
transformations.
The number of buffer created is dependent on how many rows fit into a buffer and how
many rows fit into a buffer dependent on few other factors. The first consideration is the
estimated row size, which is the sum of the maximum sizes of all the columns from the
incoming records. The second consideration is the DefaultBufferMaxSize property of the
data flow task. This property specifies the default maximum size of a buffer. The default
value is 10 MB and its upper and lower boundaries are constrained by two internal
properties of SSIS which are MaxBufferSize (100MB) and MinBufferSize (64 KB). It
means the size of a buffer can be as small as 64 KB and as large as 100 MB. The third
factor is, DefaultBufferMaxRows which is again a property of data flow task which
specifies the default number of rows in a buffer. Its default value is 10000.
If the size exceeds the DefaultBufferMaxSize then it reduces the rows in the buffer. For
better buffer performance you can do two things.
First you can remove unwanted columns from the source and set data type in each
column appropriately, especially if your source is flat file. This will enable you to
accommodate as many rows as possible in the buffer.
Second, if your system has sufficient memory available, you can tune these properties to
have a small number of large buffers, which could improve performance. Beware if you
change the values of these properties to a point where page spooling (see Best Practices
#8) begins, it adversely impacts performance. So before you set a value for these
properties, first thoroughly testing in your environment and set the values appropriately.
Let's consider a scenario where the first component of the package creates an object i.e.
a temporary table, which is being referenced by the second component of the package.
During package validation, the first component has not yet executed, so no object has
been created causing a package validation failure when validating the second
component. SSIS will throw a validation exception and will not start the package
execution. So how will you get this package running in this common scenario?
SSIS provide a set of performance counters. Among them, the following few are helpful
when you tune or debug your package:
Buffers in use
Flat buffers in use
Private buffers in use
Buffers spooled
Rows read
Rows written
“Buffers in use”, “Flat buffers in use” and “Private buffers in use” are useful to discover
leaks. During package execution time, we will see these counters fluctuating. But once
the package finishes execution, their values should return to the same value as what
they were before the execution. Otherwise, buffers are leaked.
“Buffers spooled” has an initial value of 0. When it goes above 0, it indicates that the
engine has started memory swapping. In a case like this, set Data Flow Task properties
BLOBTempStoragePath and BufferTempStoragePath appropriately for maximal I/O
bandwidth.
Buffers Spooled: The number of buffers currently written to the disk. If the data flow
engine runs low on physical memory, buffers not currently used are written to disk and
then reloaded when needed.
“Rows read” and “Rows written” show how many rows the entire Data Flow has
processed.
12. FastParse property
Fast Parse option in SSIS can be used for very fast loading of flat file data. It will speed
up parsing of integer, date and time types if the conversion does not have to be locale-
sensitive. This option is set on a per-column basis using the Advanced Editor for the flat
file source.
13. Checkpoint features helps in package restarting
34. Upgrade DTS package to SSIS
1. In BIDS, from the Project Menu, select 'Migrate DTS 2000 Package'
2. In the Package Migration Wizard, choose the Source, Sql Server 2000 Server Name,
Destination folder.
3. Select the List of packages that needs to be upgraded to SSIS
4. Specifty the Log file for Package Migration.
1. A data flow consists of the sources and destinations that extract and load data, the
transformations that modify and extend data, and the paths that link sources,
transformations, and destinations. The Data Flow task is the executable within the SSIS
package that creates, orders, and runs the data flow. Data Sources, Transformations,
and Data Destinations are the three important categories in the Data Flow.
2. Data flows move data, but there are also tasks in the control flow, as such, their
success or Failure effects how your control flow operates
3. Data is moved and manipulated through transformations.
4. Data is passed between each component in the data flow.
37. Different ways to execute SSIS package
1. Using the Execute Package Utility (DTEXECUI.EXE) graphical interface one can
execute an SSIS package that is stored in a File System, SQL Server or an SSIS Package
Store.
DTEXECUI provides a graphical user interface that can be used to specify the various
options to be set when executing an SSIS package. You can launch DTEXECUI by
double-clicking on an SSIS package file (.dtsx). You can also launch DTEXECUI from a
Command Prompt then specify the package to execute.
2. Using the DTEXEC.EXE command line utility one can execute an SSIS package
that is stored in a File System, SQL Server or an SSIS Package Store. The syntax to
execute a SSIS package which is stored in a File System is shown below.
DTEXEC.EXE /F "C:\BulkInsert\BulkInsertTask.dtsx"
3. Test the SSIS package execution by running the package from BIDS:
-In Solution Explorer, right click the SSIS project folder that contains the package which
you want to run and then click properties.
- In the SSIS Property Pages dialog box, select Build option under the Configuration
Properties node and in the right side panel, provide the folder location where you want
the SSIS package to be deployed within the OutputPath. Click OK to save the changes in
the property page.
-Right click the package within Solution Explorer and select Execute Package option from
the drop down menu
The first step to setting up the proxy is to create a credential (alternatively you could
use an existing credential). Navigate to Security then Credentials in SSMS Object
Explorer and right click to create a new credential
Navigate to SQL Server Agent then Proxies in SSMS Object Explorer and right click to
create a new proxy
SSIS Package Store is nothing but combination of SQL Server and File System
deployment, as you can see when you connect to SSIS through SSMS: it looks like a
store which has categorized its contents (packages) into different categories based on its
manager’s (which is you, as the package developer) taste. So, don’t get it wrong as
something different from the 2 types of package deployment.
48. How to provide security to packages?
We can provide security to packages in 2 ways
1. Package encryption
2. Password protection
At run time, the FTP task connects to a server by using an FTP connection manager. The
FTP connection manager includes the server settings, the credentials for accessing the
FTP server, and options such as the time-out and the number of retries for connecting to
the server.
The FTP connection manager supports only anonymous authentication and basic
authentication. It does not support Windows Authentication.
Predefined FTP Operations:
Send Files, Receive File,
Create Local directory, Remove Local Directory,
Create Remote Directory, Remove Remote Directory
Delete Local Files, Delete Remote File
Customer Log Entries available on FTP Task:
FTPConnectingToServer
FTPOperation
3. Flat File Connection Manager Changes - -The Flat File connection manager now
supports parsing files with embedded qualifiers. The connection manager also by default
always checks for row delimiters to enable the correct parsing of files with rows that are
missing column fields. The Flat File Source now supports a varying number of columns,
and embedded qualifiers.
REPLACENULL: You can use this function to replace NULL values in the first argument
with the expression specified in the second argument. This is equivalent to ISNULL in T-
SQL: REPLACENULL(expression, expression)
TOKEN: This function allows you to return a substring by using delimiters to separate a
string into tokens and then specifying which occurrence to
return: TOKEN(character_expression, delimiter_string, occurrence)
TOKENCOUNT: This function uses delimiters to separate a string into tokens and then
returns the count of tokens found within the string: TOKENCOUNT(character_expression,
delimiter_string)
6. Easy Column Remapping in Data Flow (Mapping Data Flow Columns) -When modifying
a data flow, column remapping is sometimes needed -SSIS 2012 maps columns on name
instead of id -It also has an improved remapping dialog
7. Shared Connection Managers: To create connection managers at the project level that
can shared by multiple packages in the project. The connection manager you create at
the project level is automatically visible in the Connection Managers tab of the SSIS
Designer window for all packages. -When converting shared connection managers back
to regular (package) connection managers, they disappear in all other packages.
8. Scripting Enhancements: Now Script task and Script Component support for 4.0. -
Breakpoints are supported in Script Component
9. ODBC Source and Destination - -ODBC was not natively supported in 2008 -SSIS
2012 has ODBC source & destination -SSIS 2008 could access ODBC via ADO.NET
10. Reduced Memory Usage by the Merge and Merge Join Transformations – The old
SSIS Merge and Merge Join transformations, although helpful, used a lot of system
resources and could be a memory hog. In 2012 these tasks are much more robust and
reliable. Most importantly, they will not consume excessive memory when the multiple
inputs produce data at uneven rates.
11. Undo/Redo: One thing that annoys users in SSIS before 2012 is lack of support of
Undo and Redo. Once you performed an operation, you can’t undo that. Now in SSIS
2012, we can see the support of undo/redo.
Control The Script task is configured on the Control Flow tab of The Script component is configured on the Data
Flow/Date the designer and runs outside the data flow of the Flow page of the designer and represents a
Flow package. source, transformation, or destination in the
Data Flow task.
Purpose A Script task can accomplish almost any general- You must specify whether you want to create a
purpose task. source, transformation, or destination with the
Script component.
Raising The Script task uses both the TaskResult property and The Script component runs as a part of the
Results the optional ExecutionValue property of the Dts object Data Flow task and does not report results
to notify the runtime of its results. using either of these properties.
Raising The Script task uses the Events property of the Dts The Script component raises errors, warnings,
Events object to raise events. For example: and informational messages by using the
Dts.Events.FireError(0, "Event Snippet", ex.Message & methods of the IDTSComponentMetaData100
ControlChars.CrLf & ex.StackTrace interface returned by the ComponentMetaData
property. For example: Dim
myMetadata as IDTSComponentMetaData100
myMetaData = Me.ComponentMetaData
myMetaData.FireError(...)
Execution A Script task runs custom code at some point in the A Script component also runs once, but
package workflow. Unless you put it in a loop container typically it runs its main processing routine
or an event handler, it only runs once. once for each row of data in the data flow.
Editor The Script Task Editor has three pages: General, Script, The Script Transformation Editor has up to four
and Expressions. Only the ReadOnlyVariables and pages: Input Columns, Inputs and Outputs,
ReadWriteVariables, and ScriptLanguage properties Script, and Connection Managers. The
directly affect the code that you can write. metadata and properties that you configure on
each of these pages determines the members
of the base classes that are autogenerated for
your use in coding.
Interaction In the code written for a Script task, you use the Dts In Script component code, you use typed
with the property to access other features of the package. The accessor properties to access certain package
Package Dts property is a member of the ScriptMain class. features such as variables and connection
managers. The PreExecute method can access
only read-only variables. The PostExecute
method can access both read-only and
read/write variables.
Using The Script task uses the Variables property of the Dts The Script component uses typed accessor
Variables object to access variables that are available through the properties of the autogenerated based class,
task’s ReadOnlyVariables and ReadWriteVariables created from the component’s
properties. For example: string myVar; ReadOnlyVariables and ReadWriteVariables
myVar = properties. For example:
Dts.Variables["MyStringVariable"].Value.ToString(); string myVar; myVar =
this.Variables.MyStringVariable;
Using The Script task uses the Connections property of the The Script component uses typed accessor
Connections Dts object to access connection managers defined in properties of the autogenerated base class,
the package. For example: string created from the list of connection managers
myFlatFileConnection; myFlatFileConnection = entered by the user on the Connection
(Dts.Connections["Test Flat File Managers page of the editor. For example:
Connection"].AcquireConnection(Dts.Transaction) as IDTSConnectionManager100 connMgr;connMgr
String); = this.Connections.MyADONETConnection;
3. The Bulk Insert task uses the T-SQL BULK INSERT statement for speed when loading
large amounts of data.
58.which services are installed during Sql Server installation
SSIS
SSAS
SSRS
SQL Server (MSSQLSERVER)
SQL Server Agent Service
SQL Server Browser
SQL Full-Text
Offline: In this mode, the source database is detached from the source server after
putting it in single user mode, copies of the mdf, ndf and ldf files are moved to specified
network location. On the destination server the copies are taken from the network
location to the destination server and then finally both databases are attached on the
source and destination servers. This mode is faster, but a disadvantage with mode is
that the source database will not available during copy and move operation. Also, the
person executing the package with this mode must be sysadmin on both source and
destination instances.
Online: The task uses SMO to transfer the database objects to the destination server. In
this mode, the database is online during the copy and move operation, but it will take
longer as it has to copy each object from the database individually. Someone executing
the package with this mode must be either sysadmin or database owner of the specified
databases.
}
69. How to pass property value at Run time?
A property value like connection string for a Connection Manager can be passed to the
package using package configurations.
70. How to skip first 5 lines in each Input flat file?
In the Flat file connection manager editor, Set the 'Header rows to skip' property.
71. Parallel processing in SSIS
To support parallel execution of different tasks in the package, SSIS uses 2 properties:
1.MaxConcurrentExecutables: defines how many tasks can run simultaneously, by
specifying the maximum number of SSIS threads that can execute in parallel per
package. The default is -1, which equates to number of physical or logical processor + 2.
2. EngineThreads: is property of each DataFlow task. This property defines how many
threads the data flow engine can create and run in parallel. The EngineThreads property
applies equally to both the source threads that the data flow engine creates for sources
and the worker threads that the engine creates for transformations and destinations.
Therefore, setting EngineThreads to 10 means that the engine can create up to ten
source threads and up to ten worker threads.
72. How do we convert data type in SSIS?
The Data Conversion Transformation in SSIS converts the data type of an input column
to a different data type.
73. One Excel file contains 10 rows. 2nd Excel file contains 10 rows. There are 5
matching rows in both excel. How find non-matched rows from both excel and
store in output excel file.
78. Before you create your SSIS Package and load data into destination, you
want to analyze your data , which task will help you to achieve that?
Data Profiling Task
79. In Merge Join Transformation, we can use Inner Join, Left Join and Full
Outer Join, Which Transformation is used to perform Cross Join.
Cons:
-Need to specify configuration file that we want to use when the package is triggered
with DTExec (/conf switch).
-If multiple layers of packages are used (parent/child packages), need to transfer
configured values from the parent to the child package using parent packages variables
which can be tricky (if one parent variable is missing, the rest of the parent package
configs (parameters) will not be transferred).
Indirect configuration
Pros:
-All packages can reference the configuration file(s) via environment variable
-Packages can be deployed simply using copy/paste or xcopy, no need to mess with
SSIS deployment utility
-Packages or application is not dependent of configuration switches when triggered with
DTExec utility (command line is much simpler)
Cons:
-Require environment variables to be created
-Does not support easily multiple databases (TEST and Pre-Prod) to be used on the same
server
81. We get the data from flat file and how to remove Leading Zero, Trailing
Zeros OR Leading and trailing both before insert into destination.
Use the Derived column Transformation to remove Leading/Trailing OR Both zero from
the string. After removing Zeros you can Cast to Any data type you want such as
Numeric, Int, float etc.
Leading Zeros: (DT_WSTR,50)(DT_I8)[YourInputColumn]
Trailing Zeros: REVERSE((DT_WSTR,50)(DT_I8)REVERSE([YourInputColumn]))
Leading and Trailing Zeros:
REVERSE((DT_WSTR,50)(DT_I8)REVERSE((DT_WSTR,50)(DT_I8)[YourInputColumn]))
82. Import Data in SSMS:
We can't apply transformations on source data with "Import Data", "Export Data".
Import and Export Wizard in SSIS:
We can apply transformations on source data
82. Import Data in SSMS:
We can't apply transformations on source data with "Import
Data", "Export Data".
Import and Export Wizard in SSIS:
We can apply transformations on source data
83. Flat file contains following records.
ID
1
2
3
4
In the Conditional Split Transformation, I specified
following 2 conditions:
condition 1: ID <= 3
condition 2: ID >= 3
What is the output from both the conditions?
condition 1: 1,2,3
condition 2: 4