0% found this document useful (0 votes)
258 views

Data Stage

Datastage is an ETL tool used to build, manage, and expand data marts and data warehouses. It allows users to design jobs that extract, transform, load, and integrate data. Jobs are run, monitored, and scheduled after being built in Datastage. Information Server is a suite of applications that share a common repository and services. It includes applications like Datastage, Business Glossary, and Information Analyzer. The backbone is provided by a WebSphere Application Server instance. Datastage has administrator, designer, and director clients. It uses parallel and server engines, with job types including server jobs, parallel jobs, and job sequences.

Uploaded by

Sathish
Copyright
© © All Rights Reserved
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
258 views

Data Stage

Datastage is an ETL tool used to build, manage, and expand data marts and data warehouses. It allows users to design jobs that extract, transform, load, and integrate data. Jobs are run, monitored, and scheduled after being built in Datastage. Information Server is a suite of applications that share a common repository and services. It includes applications like Datastage, Business Glossary, and Information Analyzer. The backbone is provided by a WebSphere Application Server instance. Datastage has administrator, designer, and director clients. It uses parallel and server engines, with job types including server jobs, parallel jobs, and job sequences.

Uploaded by

Sathish
Copyright
© © All Rights Reserved
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 3

1) what is Datastage ?

Data stage is a ETL tool for east creation and maintainenance of data mart
and datwarehouse. It provides the tools you need to build,
manage and expand them. With Datastage.
With Datstage you can design jobs that extract, integrate,aggregate, load
and transform the data for your dataware house or datamart.
After building the datastage job you can run monitor and schedule it.

2) What is information server ?


Information server is suite of application that all share the same
repository and the same backbone of services and functionality.
It is magnaed using web console cients. Individual applicaiton are managed using
their own set of clients.
The back bone of services is provided by a Webshphere Applicaiton
Server(WAS) instance whihc by default is named server1.

Information Services Director


Business Glossary
Information Analyzer
Datastage
Quality Stage
Meta data wrok bench

META DATA SERVER

INFORMATION SERVER WEB CONSOLE

REPOSITORY

3) what are the Datastage Clients?


Administrator
Designer
Director

4) what are Datastage Engines


Parallel engine : Runs Parallel job
sever Engine : Runs server job and Job Sequence.

5) what are the types of jobs are available in Datastage ?


1.Server Job
2. Parallel Job
3. Job Sequence

6 ) what is Server job


ITs executed by Server Engine
It s compiled in Basic code engine.

7) What is a parallel job


Its Executed by Datastage parallel Engine
Jobs are compiled into Orchestra script language(OSH) and C++ compiler
Built in functionality called pipeline and Parttion Parlllism

8) what you mean by Pipeline parallism?


When Source data is availbel and its start process the target side that
means start downstream process when upstream running
OR It will Tranform, clean, load process execute simultaneously.

9) What is Partition Parlallelism?


Divide the incoming stream of data into nubmer of node (subsets are called
partitions(NODES)).
Nubmer of Partitions are determined by the configuration file.
Facilitates near linear scalablity

10) what are different partions are availble...?


There are two types are partitions are availble
1. Keyless base partition: Rows are distributed independlty of data
Round robin, random, entrire,
2. Keye based partition: data has been distrubted based on key colum
Hash, modulus, DB2,range

11) what are paertiton algorithm


Round robin
Random
Hash
Modulus
Entire
Same
Auto

12) what are the collectiing algorthm


Round robin
Auto
sort merge
Ordered- read all records from the first partition the second

13) what is configuration file?


IT determines the number of nodes(partitions the jobs run, and specify the
resource that can be used by inidividual nodes
for Temporary storage and dataset storage, It refrence $APT_CONFIG_FILE
ENVIORNMETN VARIABLE.

14) what are the componmntes in Configuration file

Fastname: This refers to the node name on a fast network


Pools: Pools allow us to associate different processing nodes
based on their functions and characteristics.
Resource disk: This will specify Specifies the location on your server
where the processing node will write all the data set files.
Resource scratchdisk:The location of temporary files created during
Datastage processes.

-----------------------------------------------------------------------------------
-----------------------------------------------------------------------------
15) what is Conductor node?

Conductor Node :
It is a main process to

1. Start up jobs
2. Resource assignments
3. Responsible to create Section leader (used to create & manage player player
process which
perform actual job execution).
4. Single coordinator for status and error messages.
5. manages orderly shutdown when processing completes in the event of fatal error
�Conductor Node (one per job):
The main process used to startup jobs, determine resource assignments,
and create Section Leader processes on one or more processing nodes.
Acts as a single coordinator for status and error messages, manages orderly
shutdown
when processing completes or in the event of a fatal error.
The conductor node is run from the primary server

�Section Leaders (one per logical processing node):


Used to create and manage player processes which perform the actual job
execution.
The Section Leaders also manage communication between the individual player
processes and
the master Conductor Node.

�Players:
One or more logical groups of processes used to execute the data flow logic.
All players are created as groups on the same server as their managing
Section Leader process.
-----------------------------------------------------------------------------------
-------------------------------------------------------------------------

You might also like