Data Stage Interview Questions
Data Stage Interview Questions
A data stage is basically a tool that is used to design, develop and execute
various applications to fill multiple tables in data warehouse or data marts. It is a
program for Windows servers that extracts data from databases and change
them into data warehouses. It has become an essential part of IBM WebSphere
Data Integration suite.
We can populate a source file in many ways such as by creating a SQL query
in Oracle, or by using row generator extract tool etc.
3) Name the command line functions to import and export the DS jobs?
In Datastage 7.5 many new stages are added for more robustness and smooth
performance, such as Procedure Stage, Command Stage, Generate Report etc.
6) Define Merge?
Merge means to join two or more tables. The two tables are joined on the basis
of Primary key columns in both the tables.
Datastage
We can write parallel routines in C or C++ compiler. Such routines are also
created in DS manager and can be called from transformer stage.
Duplicates can be removed by using Sort stage. We can use the option, as allow
duplicate = false.
All the three concepts are different from each other in the way they use the
memory storage, compare input requirements and how they treat various
records. Join and Merge needs less memory as compared to the Lookup stage.
Job control can be best performed by using Job Control Language (JCL). This tool
is used to execute multiple jobs simultaneously, without using any kind of loop.
17) What are the steps required to kill the job in Datastage?
To kill the job in Datasatge, we have to kill the respective processing ID.
All the stages after the exception activity in Datastage are executed in case of any
unknown error occurs while executing the job sequencer.
It is the environment variable that is used to identify the *.apt file in Datastage. It
is also used to store the node information, disk storage information and scratch
information.
There are two types of Lookups in Datastage i.e. Normal lkp and Sparse lkp. In
Normal lkp, the data is saved in the memory first and then the lookup is
performed. In Sparse lkp, the data is directly saved in the database. Therefore, the
Sparse lkp is faster than the Normal lkp.
We can convert a server job in to a parallel job by using IPC stage and Link
Collector.
In Datastage, OConv () and IConv() functions are used to convert formats from
one format to another i.e. conversions of roman numbers, time, date, radix,
numeral ASCII etc. IConv () is basically used to convert formats for system to
understand. While, OConv () is used to convert formats for users to understand.
26) Explain Usage Analysis in Datastage?
To find rows in sequential file, we can use the System variable @INROWNUM.
The only difference between the Hash file and Sequential file is that the Hash file
saves data on hash algorithm and on a hash key value, while sequential file
doesn’t have any key value to save the data. Basis on this hash key feature,
searching in Hash file is faster than in sequential file.
In Datastage, routines are of two types i.e. Before Sub Routines and After Sub
Routines. We can call a routine from the transformer stage in Datastage.
We can say, ODS is a mini data warehouse. An ODS doesn’t contain information
for more than 1 year while a data warehouse contains detailed information
regarding the entire business.
In Datastage, we can drop the index before loading the data in target by using
the Direct Load functionality of SQL Loaded Utility.
37) Name the third party tools that can be used in Datastage?
The third party tools that can be used in Datastage, are Autosys, TNG and Event
Co-ordinator. I have worked with these tools and possess hands on experience of
working with these third party tools.
There are two types of hash files in DataStage i.e. Static Hash File and Dynamic
Hash File. The static hash file is used when limited amount of data is to be loaded
in the target database. The dynamic hash file is used when we don’t know the
amount of data from the source file.
40) Define Meta Stage?
In Datastage, MetaStage is used to save metadata that is helpful for data lineage
and data analysis.
41) Have you have ever worked in UNIX environment and why it is useful in
Datastage?
Datastage is a tool from ETL (Extract, Transform and Load) and Datastage TX is a
tool from EAI (Enterprise Application Integration).
Transaction size means the number of row written before committing the records
in a table. An array size means the number of rows written/read to or from the
table respectively.
There are three types of views in a Datastage Director i.e. Job View, Log View and
Status View.
DRS stage is faster than the ODBC stage because it uses native databases for
connectivity.
Orabulk stage is used to load large amount of data in one target table of Oracle
database. The BCP stage is used to load large amount of data in one target table
of Microsoft SQL Server.
The DS Designer is used to design work area and add various links to it.
In Datastage, Link Partitioner is used to divide data into different parts through
certain partitioning methods. Link Collector is used to gather data from various
partitions/segments to a single data and save it in the target table.