C Optimize Ds Job For Lineage
C Optimize Ds Job For Lineage
iii
iv Design InfoSphere DataStage jobs for optimum lineage
Design InfoSphere DataStage jobs for optimum lineage
Design your IBM InfoSphere DataStage jobs to ensure that complete metadata is
available for lineage reports in IBM InfoSphere Metadata Workbench.
Information that flows across InfoSphere DataStage and QualityStage jobs is called
design lineage. The data output of one job can be the data source of another job. In
this case, the data source is shared between the two jobs. If a source of the job is
not imported into the metadata repository, the design lineage metadata is used to
infer the relationship with other jobs. This relationship is based on the shared
usage of the referenced data source.
Use the following table of actions to ensure that your job design gives complete
metadata for best lineage results.
Table 1. Actions to ensure complete job design metadata for data lineage
How this action affects
Action Description lineage Additional information
Use Connector Connector stages give the The Manage Lineage utility For a list of job stages with
stages maximum amount of reads the design lineage their description, see
metadata about the job metadata from the stages of Alphabetical list of stages.
design. Therefore, use the job. The Manage Whether a particular stage
Connector stages instead of Lineage utility then infers is displayed on the
equivalent generic stages. the database or data file InfoSphere DataStage
For example, use the assets that the job reads Designer client palette
ODBC Connector stage from or writes to. depends on the type of job
rather than the ODBC Connector stages provide and the installed products
Enterprise stage. more information to and add-ons.
enhance the utility.
1
Table 1. Actions to ensure complete job design metadata for data lineage (continued)
How this action affects
Action Description lineage Additional information
Use You can define variables The use of variables For more information
environment and parameters to reuse reduces error and promotes about how to set up job
variables and across all jobs of a project data reuse in job parameters and parameter
job parameters by using environment development. sets, see Making your jobs
variables and job adaptable.
parameters. Wherever
possible, use parameters For general information
and parameter sets as about setting environment
common references across variables, see Guide to
all jobs in a project. setting environment
variables.
After you complete these actions, you are ready to set up InfoSphere Metadata
Workbench to analyze metadata for lineage. Follow these steps:
1. Run the Manage Lineage utility.
This utility automatically runs the Manual Binding and Map Database Alias
utilities.
2. To identify schemas that are identical, run the Data Source Identity utility.
If two schemas are identified as identical, the database tables and database
columns contained by the schemas are also marked as identical when their
names match. This might be necessary when the same data source is imported
into the repository by different means, such as by a connector and a bridge.
3. Run the data lineage report.
The data lineage report shows the movement of data within a job or through
multiple jobs. The report can also show the order of activities in a run of a job.