Q1 Explain Architecture of SSIS?: 1.what Is A Package?
Q1 Explain Architecture of SSIS?: 1.what Is A Package?
1.what is a package?
a)A discrete executable unit of work composed of a collection of control flow and other objects,
including data sources, transformations ,process sequence, and rules, errors and event handling, and data
destinations.
7. Connection manager:
a).It is a bridge b/w package object and physical data. It provides logical representation of a connection
at design time the properties of the connection mgr describes the physical connection that integration
services creates when the package is run.
a).DTExecUI
1. To open command prompt->run->type dtexecui->press enter
2. The execute package utilIty dialog box opens.
3. in that click execute to run the package.
Wait until the package has executed successfully.
b).DTExec utilIty
1.open the command prompt window.
2.command prompt window->type dtexec/followed by the DTS,SQL,or file option and the package path
,including package name.
3.if the package encryption level is encrypt sensItive wIth password or encrypt all wIth password, use
the decrypt option to provide the password.
If no password is included, dtexec will prompt you for the password.
4. Optionally, provide additional command-line options
5. Press enter.
6. Optionally, view logging and reporting information before closing the command prompt window.
The execute package utilIty dialog box opens.
Type1: It keeps the most recent values in the target. It does not
maintain the history.
Type2: It keeps the full history in the target database. For every
update in the source anew record is inserted in the target.
Type3: It keeps current & previous information in the target.
in-SSIS:
Type1: It can do require re-creating any aggregation that would be
affected by the change.
Matching incoming rows with rows in the lookup table to identify new and existing rows.
Identifying incoming rows that contain changes when changes are not permitted.
Identifying incoming rows that contain historical changes that require insertion of new records
and the updating of expired records.
Detecting incoming rows that contain changes that require the updating of existing records,
including expired ones.
10. How can u handle the errors through the help of logging in SSIS?
a) To create an on error event handler to which you add the log error execute sql task.
11. What is a log file and how to send log file to mgr?
a) It is especially useful when the package has been deployed to the production environment, and you
can not use BIDS and VSA to debug the package.
SSIS enables you to implement logging code through the Dts. Log method. When the Dts. Log method
is called in the script, the SSIS engine will route the message to the log providers that are configured in
the containing package.
An environment variable configuration sets a package property equal to the value in an environment
variable.
Environmental configurations are useful for configuring properties that are dependent on the computer
that is executing the package.
15. as per error handling in T/R, which one handle the better performance? Like fail component,
redirect row or ignore failure?
Redirect row provides better performance for error handling.
17. Task??
a) An individual unit of work.
1. Active x script task
2. Analysis services execute DDL task
3. Analysis services processing task
4. Bulk insert task *
5. Data flow task *
6. Data mining query task
7. Execute Dts 2000 package task
8. Execute package task *
9. Execute process task
10. Execute sql task *
11. File system task
12. Ftp task
13. Message queue task
14. Script task *
15. Send mail task *
16. Web service task
17. Wmi data reader task
18. Wmi event task
19. Xml task
20.solution explorer?
after creating project
project name
-data source
-data source views
-packages
-miscellaneous
23. TRANSFORMATIONS??
It is an object that generates, modifies, or passes data.
1.AGGEGATE T/R:-It applies an aggregate function to grouped records and produces new output
records from aggregated results.
2.AUDIT T/R:-the t/r adds the value of a system variable, such as machine name or execution instance
GUID to a new output column.
3.CHARACTER MAP T/R:-this t/r makes string data changes such as changing data from lower case to
upper case.
4.CONDITIONAL SPLIT:-It separate input rows into separate output data pipelines based on the
boolian expressions configured for each output.
5.COPY COLUMN:-add a copy of column to the t/r output we can later transform the copy keeping the
original for audIting personal
6.DATA CONVERSION:-converts a columns data type to another data type.
7.DATA MINING QUERY:-perform a data mining query against analysis services.
8.DERIVED COLUMN:-create a new derive column calculated from expression.
9.EXPORT COLUMN:-It allows you to export a column from the data flow to a file.
10.FUZZY GROUPING:-perform data cleansing by finding rows that are likely duplicates.
24. Batch?
a) A batch is defined as group of sessions. Those are 2 types.
1. Parallel batch processing
2. Sequential batch processing
Q4 How to pass property value at Run time? How do you implement Package Configuration?
A property value like connection string for a Connection Manager can be passed to the pkg using
package configurations.Package Configuration provides different options like XML File, Environment
Variables, SQL Server Table, Registry Value or Parent package variable.
Q10 What are the points to keep in mind for performance improvement of the package?
http://technet.microsoft.com/en-us/library/cc966529.aspx
Q11 You may get a question stating a scenario and then asking you how would you create a
package for that e.g. How would you configure a data flow task so that it can transfer data to
different table based on the city name in a source table column?
b) Data has to be sorted before Merge Transformation whereas Union all doesn't have any condition like
that.
Q14 May get question regarding what X transformation do?Lookup, fuzzy lookup, fuzzy
grouping transformation are my favorites.
For you.
Q15 How would you restart package from previous failure point?What are Checkpoints and how
can we implement in SSIS?
When a package is configured to use checkpoints, information about package execution is written to a
checkpoint file. When the failed package is rerun, the checkpoint file is used to restart the package from
the point of failure. If the package runs successfully, the checkpoint file is deleted, and then re-created
the next time that the package is run.
Fastest way to do incremental load is by using Timestamp column in source table and then storing last
ETL timestamp, In ETL process pick all the rows having Timestamp greater than the stored Timestamp
so as to pick only new and updated records
4) Sequence Container?
The Sequence container defines a control flow that is a subset of the package control flow. Sequence
containers group the package into multiple separate control flows, each containing one or more tasks
and containers that run within the overall package control flow.
An evaluation expression that contains the expression used to test whether the loop should stop
or continue.
IV. Create the job, schedule the job and run the job
In SQL Server Management Studio, highlight SQL Server Agent -> Start. Highlight Job ->New Job…,
name it , myJob.
Under Steps, New Step, name it, Step1,
Type: SQL Server Integration Service Package
Run as: myProxy
Package source: File System
Browse to select your package file xxx.dtsx
Click Ok
Schedule your job and enable it
Now you can run your job.
8) Script Task?
12) Merge
You can configure the Merge Join transformation in the following ways:
Q 4:SSIS includes logging features that write log entries when run-time events occur and can also write
custom messages.
Integration Services supports a diverse set of log providers, and gives you the ability to create custom
log providers. The Integration Services log providers can write log entries to text files, SQL Server
Profiler, SQL Server, Windows Event Log, or XML files.
Logs are associated with packages and are configured at the package level. Each task or container in a
package can log information to any package log. The tasks and containers in a package can be enabled
for logging even if the package itself is not.
To customize the logging of an event or custom message, Integration Services provides a schema of
commonly logged information to include in log entries. The Integration Services log schema defines the
information that you can log. You can select elements from the log schema for each log entry.
SQL Server 2005 Integration Services (SSIS) makes it simple to deploy packages to any computer.
There are two steps in the package deployment process:
-The first step is to build the Integration Services project to create a package deployment utility.
-The second step is to copy the deployment folder that was created when you built the Integration
Services project to the target computer, and then run the Package Installation Wizard to install the
packages.
Q9:
Variables store values that a SSIS package and its containers, tasks, and event handlers can use at run
time. The scripts in the Script task and the Script component can also use variables. The precedence
constraints that sequence tasks and containers into a workflow can use variables when their constraint
definitions include expressions.
Integration Services supports two types of variables: user-defined variables and system variables. User-
defined variables are defined by package developers, and system variables are defined by Integration
Services. You can create as many user-defined variables as a package requires, but you cannot create
additional system variables.
Scope :
A variable is created within the scope of a package or within the scope of a container, task, or event
handler in the package. Because the package container is at the top of the container hierarchy, variables
with package scope function like global variables and can be used by all containers in the package.
Similarly, variables defined within the scope of a container such as a For Loop container can be used by
all tasks or containers within the For Loop container.
Question 5 - Can you name some of the core SSIS components in the Business Intelligence
Development Studio you work with on a regular basis when building an SSIS package?
Connection Managers
Control Flow
Data Flow
Event Handlers
Variables window
Toolbox window
Output window
Logging
Package Configurations
Question 3 - Can you name 5 or more of the native SSIS connection managers?
OLEDB connection - Used to connect to any data source requiring an OLEDB connection (i.e., SQL
Server 2000)
Flat file connection - Used to make a connection to a single file in the File System. Required for reading
information from a File System flat file
ADO.Net connection - Uses the .Net Provider to make a connection to SQL Server 2005 or other
connection exposed through managed code (like C#) in a custom task
Analysis Services connection - Used to make a connection to an Analysis Services database or project.
Required for the Analysis Services DDL Task and Analysis Services Processing Task
File connection - Used to reference a file or folder. The options are to either use or create a file or folder
Excel
FTP
HTTP
MSMQ
SMO
SMTP
SQLMobile
WMI
Question 4 - How do you eliminate quotes from being uploaded from a flat file to SQL Server?
In the SSIS package on the Flat File Connection Manager Editor, enter quotes into the Text qualifier
field then preview the data to ensure the quotes are not included.
Additional information: How to strip out double quotes from an import file in SQL Server Integration
Services
Question 5 - Can you name 5 or more of the main SSIS tool box widgets and their functionality?
For Loop Container
Foreach Loop Container
Sequence Container
ActiveX Script Task
Analysis Services Execute DDL Task
Analysis Services Processing Task
Bulk Insert Task
Data Flow Task
Data Mining Query Task
Execute DTS 2000 Package Task
Execute Package Task
Execute Process Task
Execute SQL Task
etc.
Question 2 - Can you explain how to setup a checkpoint file in SSIS?
The following items need to be configured on the properties tab for SSIS package:
CheckpointFileName - Specify the full path to the Checkpoint file that the package uses to save the
value of package variables and log completed tasks. Rather than using a hard-coded path as shown
above, it's a good idea to use an expression that concatenates a path defined in a package variable and
the package name.
CheckpointUsage - Determines if/how checkpoints are used. Choose from these options: Never
(default), IfExists, or Always. Never indicates that you are not using Checkpoints. IfExists is the typical
setting and implements the restart at the point of failure behavior. If a Checkpoint file is found it is used
to restore package variable values and restart at the point of failure. If a Checkpoint file is not found the
package starts execution with the first task. The Always choice raises an error if the Checkpoint file
does not exist.
SaveCheckpoints - Choose from these options: True or False (default). You must select True to
implement the Checkpoint behavior.
Question 3 - Can you explain different options for dynamic configurations in SSIS?
Use an XML file
Use custom variables
Use a database per environment with the variables
Use a centralized database with all variables
Question 5 - Can you name five of the Perfmon counters for SSIS and the value they provide?
SQLServer:SSIS Service
SSIS Package Instances - Total number of simultaneous SSIS Packages running
SQLServer:SSIS Pipeline
BLOB bytes read - Total bytes read from binary large objects during the monitoring period.
BLOB bytes written - Total bytes written to binary large objects during the monitoring period.
BLOB files in use - Number of binary large objects files used during the data flow task during the
monitoring period.
Buffer memory - The amount of physical or virtual memory used by the data flow task during the
monitoring period.
Buffers in use - The number of buffers in use during the data flow task during the monitoring period.
Buffers spooled - The number of buffers written to disk during the data flow task during the monitoring
period.
Flat buffer memory - The total number of blocks of memory in use by the data flow task during the
monitoring period.
Flat buffers in use - The number of blocks of memory in use by the data flow task at a point in time.
Private buffer memory - The total amount of physical or virtual memory used by data transformation
tasks in the data flow engine during the monitoring period.
Private buffers in use - The number of blocks of memory in use by the transformations in the data flow
task at a point in time.
Rows read - Total number of input rows in use by the data flow task at a point in time.
Rows written - Total number of output rows in use by the data flow task at a point in time
New improvements / features in SSIS 2008
With the release of SQL SERVER 2008 comes improved SSIS 2008. I will try to list down the improved
and new features in SSIS 2008
The biggest performance improvement in the SSIS 2008 is incorporation of parallelism in the
processing of execution tree. In SSIS 2005, each execution tree used a single thread whereas in SSIS
2008 , the Data flow engine is redesigned to utilize multiple threads and take advantage of dynamic
scheduling to execute multiple components in parallel, including components within the same execution
tree
SSIS 2008 is incorporated with new Visual Studio Tool for Application(VSTA) scripting engine.
Advantage of VSTA is it enables user to use any .NET language for scripting.
SSIS 2008 gets a new Source and Destination Component for ADO.NET Record sets.
In SSIS 2008, the Lookuo Transformation has faster cache loading and lookup operations. It has new
caching options, including the ability for the reference dataset to use a cache file(.caw) accessed by the
Cache Connectin Manager. In addition same cache can be shared between multiple Lookup
Transformations.
SSIS 2008 has a new debugging aid Data Profiling Task that can help user analyze the data flows
occurring in the package.In many cases, execution errors are caused by unexpected variations in the data
that is being transferred. The Data Profiling Task can help users to discover the cource of these errors by
giving better visibility into the data flow.
One of the main usability enhancement to SSIS 2008 is the new Connections Project Wizard. The
Connections Project Wizard guides user through the steps required to create source and destinations
Q: What are variables and what is variable scope ?
Variables store values that a SSIS package and its containers, tasks, and event handlers can use at run
time. The scripts in the Script task and the Script component can also use variables. The precedence
constraints that sequence tasks and containers into a workflow can use variables when their constraint
definitions include expressions. Integration Services supports two types of variables: user-defined
variables and system variables. User-defined variables are defined by package developers, and system
variables are defined by Integration Services. You can create as many user-defined variables as a
package requires, but you cannot create additional system variables.
Q: Can you name five of the Perfmon counters for SSIS and the value they provide?
SQLServer:SSIS Service
SSIS Package Instances
SQLServer:SSIS Pipeline
BLOB bytes read
BLOB bytes written
BLOB files in use
Buffer memory
Buffers in use
Buffers spooled
Flat buffer memory
Flat buffers in use
Private buffer memory
Private buffers in use
Rows read
Rows written
1. ODBC Support
The ODBC support is becoming first class now I guess because of the future full integration with
Hadoop and an increased demand to integrate more easily with various open source platforms. So I
guess the days when you will be able to easily connect to a Linux machine from a SQL Server are
coming. Attunity connectors also get more readily available and covering more vendors.
Now with CDC one can easily capture the changes in data sources and provide them for reporting, data
analysis or feed into the Data Warehouse.
No fear, it will be understood by the SSIS engine and handled without incidents:
4. Revamped Configurations
This is another big improvement.
Did you ever wonder why you deployed a package and it took the design time parameters? Did you
struggle to deploy your config files or a database along with the package?
No longer! You now can have several configurations, for Dev and Prod, no problem. If you envied your
fellow C# or VB .Net developer being able to store parameters right in the Visual Studio, no more, now
you can, too. As an aside, there is no more BIDS, there is the new Data Tools, but to me it is a Visual
Studio 2010, I just develop special projects in it, and it is a 1st class tool! And how about this: you can
even add parameters after the package has been deployed? Do you feel thrilled as me? Not yet, then
how about the possibility of sharing parameters across many packages within a project?
Remember? No? I do, I remember how I needed to build a console app till 10 PM to just solve the
mystery why the values were wrong sitting along in the office biting nails because at midnight a
package just had to load the latest flight data. I wish I could just debug the mysterious component with
400 lines of code. Sigh and smile, now I will:
Better yet, all my runtime values are captured. Did I say it is a Visual Studio?
6. SSIS Package Format Changed and the Specs are Open Source!
Bye-bye the lineage IDs, cryptic, long XML! Hello comparable, mergable packages!
vs.
Easily compare packages with Diff tools now! Full specs are at: http://msdn.microsoft.com/en-
us/library/gg587140.aspx
7. Built-in Reporting
Yes, there will be three canned reports provided for You, dear developer to benchmark, troubleshoot
and just better support a live implementation:
8. Data Taps
This is totally new: have you ever been asked to fix a package with no rights to access the data source? I
had such an “opportunity”, their DBA just shrugged off my requests to provide with a read only
account. But now you are more in control, you can now turn on and off small data dumps to a CSV file
for an ad-hock analysis. Those, most often, are instrumental in finding metadata differences and thus
allowing a real quick fix to many issues. More on this topic is here: http://goo.gl/AUBP5
So what is different, actually all and more simple, you just deploy with a right-click on the project, no
more fiddling around with the Deployment manifest or manual copy and paste, import, etc.
There are also the APIs to validate a package, configure and deploy a package:
Oh, I have just already covered 10 improvements, wait but there are more:
Un-do and Re-do are now possible (I can hear the wow!);
New designer surface (AKA canvas) with adorners
Shared (across a project) Connection Managers (no more click and copy, pastes)!
Shared (across packages in project) Cache Managers
Do you remember the dreaded errors all over the package after some metadata changed? Now
you can resolve them all up the stream with a single click!
Group items to reduce clutter without resorting to sequence containers:
OK for now!
I hope I wet your appetite enough to go and explore the features yourself. And to stay always tuned do
not forget too bookmark the aggregated SSIS Resources page: http://goo.gl/2WZxp!
all dimensions will be link all dim link with each other (or)directly with
fact table. 1-N relationship with other table.
It is diff to retrieve the data while We can easily retrieve data parsing the query
against the facts n dim.
It involve less
Joins.
4. Data warehouse?
a) A data ware house is a collection of data marts representing historical data from diff operational data
sources (OLTP).
The data from these OLTP are structured and optimized for querying and data analysis in a data
warehouse.
5. Data mart?
a) A data mart is a subset of a data warehouse that can provide data for reporting and analysis
on a section, unIt or a department like sales dept, hr dept.
6. What is OLAP?
a) OLAP stands for online analytical processing. It uses databases tables (fact and dimension
table) to enable multi dimensional viewing, analysis and querying of large amount of data.
7. What is OLTP?
a) OLTP stands for online transactional processing. Except data warehouse databases the other
databases are OLTP.
These OLTP uses normalized schema structure.
These OLTP databases are designed for recording the daily operations and transactions of a
business.
SQL-SERVER-2005:-
1. Surrogate key?
a)It is an artificial or synthetic key that is used as a substItute for a natural keys.
It is just a unique identifier or number for each row that can be used for the primary key to the
table.
(It is a sequence generate key which is assigned to be a primary key in the system(table)).
Data marts are generally designed for a single subject area. An organization may have data pertaining to
different departments like Finance, HR, Marketting etc. stored in data warehouse and each department
may have separate data marts. These data marts can be built on top of the data warehouse.
What is ER model?
ER model or entity-relationship model is a particular methodology of data modeling wherein the goal of
modeling is to normalize the data by reducing redundancy. This is different than dimensional modeling
where the main goal is to improve the data retrieval mechanism.
Dimensional model consists of dimension and fact tables. Fact tables store different transactional
measurements and the foreign keys from dimension tables that qualifies the data. The goal of
Dimensional model is not to achive high degree of normalization but to facilitate easy and faster data
retrieval.
Ralph Kimball is one of the strongest proponents of this very popular data modeling technique which is
often used in many enterprise level data warehouses.
If you want to read a quick and simple guide on dimensional modeling, please check our Guide to
dimensional modeling.
What is dimension?
For an example, consider this: If I just say… “20kg”, it does not mean anything. But if I say, "20kg of
Rice (Product) is sold to Ramesh (customer) on 5th April (date)", then that gives a meaningful sense.
These product, customer and dates are some dimension that qualified the measure - 20kg.
Dimensions are mutually independent. Technically speaking, a dimension is a data element that
categorizes each item in a data set into non-overlapping regions.
What is Fact?
A fact is something that is quantifiable (Or measurable). Facts are typically (but not always) numerical
values that can be aggregated.
Non-additive Measures
Non-additive measures are those which can not be used inside any numeric aggregation function (e.g.
SUM(), AVG() etc.). One example of non-additive fact is any kind of ratio or percentage. Example, 5%
profit margin, revenue to asset ratio etc. A non-numerical data can also be a non-additive measure when
that data is stored in fact tables, e.g. some kind of varchar flags in the fact table.
Semi-additive measures are those where only a subset of aggregation function can be applied. Let’s say
account balance. A sum() function on balance does not give a useful result but max() or min() balance
might be useful. Consider price rate or currency rate. Sum is meaningless on rate; however, average
function might be useful.
Additive Measures
Additive measures can be used with any aggregation function like Sum(), Avg() etc. Example is Sales
Quantity etc.
SSAS
A: Simply using bottomcount will return customers with null sales. You will have to combine it with
NONEMPTY or FILTER.
By default Analysis Services returns members in an order specified during attribute design. Attribute
properties that define ordering are "OrderBy" and "OrderByAttribute". Lets say we want to see order
counts for each year. In Adventure Works MDX query would be:
2. In the child package, create variables to correspond to the ones you created in the Parent
package (e.g. “ChildVar” to correspond to “ParentVar”). Important: assign these variables valid
default values, otherwise you will not be able to run the child package standalone.
3. In the child package, open the package Configurations. Add a new configuration, and select the
“Parent package variable”. Enter the variable name you added to the parent package (in our
example “ParentVar”). Note that variables are case sensitive. Assign this to the variable of the
On the SSIS menu, click Log Events. You can optionally display the Log Events window by
mapping the View.LogEvents command to a key combination of your choosing on the Keyboard page
of the Options dialog box.
As the runtime encounters the events and custom messages that are enabled for logging, log entries for
each event or message are written to the Log Events window.
The log entries remain available in the Log Events window until you rerun the package, run a different
package, or close SQL Server Data Tools.
Optionally, click the log entries to copy, right-click, and then click Copy.
Optionally, double-click a log entry, and in the Log Entry dialog box, view the details for a single
log entry.
In the Log Entry dialog box, click the up and down arrows to display the previous or next log entry,
and click the copy icon to copy the log entry.
Open a text editor, paste, and then save the log entry to a text file