0% found this document useful (0 votes)
53 views

Informatica Training

Uploaded by

hiren
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
53 views

Informatica Training

Uploaded by

hiren
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 285

Course Objectives

 Understand how to use Informatica PowerCenter 8 components for development


 Be able to build basic ETL mappings
 Be able to create, run and monitor workflows
 Understand available options for loading target data
 Reusing And Sharing Designer objects
 Using Shared Objects; Mapplets
 Mapping Parameters and Variables
 Advanced topics on Workflows
 Repository Manager

1
Introduction and Product Overview

Chapter 1
PowerCenter 8 Architecture
PowerCenter 8.x has a service-oriented architecture that provides the ability to
scale services and share resources across multiple machines.

3
PowerCenter 8 Architecture

4
PowerCenter 8 Server Connectivity

 For a session, the PowerCenter Server holds the connection as long as it needs to read data
from source tables or write data to target tables.

5
Designer Overview

Chapter 2
Designer Interface

 Designer Windows:
 Navigator
 Workspace
 Status bar
 Output
 Overview
 Instance Data
 Target Data

7
Designer Interface

 Designer Tools: The Designer provides the following tools:


 Source Analyzer: Use to import or create source definitions for flat file, XML, COBOL,
Application, and relational sources
 Warehouse Designer: Use to import or create target definitions
 Transformation Developer: Use to create reusable transformations
 Mapplet Designer: Use to create mapplets
 Mapping Designer: Use to create mappings
 Navigator: Use to connect to and work in multiple repositories and folders. You can also copy
and delete objects and create shortcuts using the Navigator.
 Workspace: Use to view or edit sources, targets, mapplets, transformations, and mappings.
You can work with a single tool at a time in the workspace. You can use the workspace in
default or workbook format.

8
Designer Interface
 Status bar: Displays the status of the operation you perform.
 Output: Provides details when you perform certain tasks, such as saving your work or validating a
mapping. Right-click the Output window to access window options, such as printing output text,
saving text to file, and changing the font size.
 Overview: An optional window to simplify viewing workbooks containing large mappings or a large
number of objects. Outlines the visible area in the workspace and highlights selected objects in
color. To open the Overview window, choose View-Overview Window.
 Instance Data: View transformation data while you run the Debugger to debug a mapping.
 Target Data: View target data while you run the Debugger to debug a mapping. You can view a list
of open windows and switch from one window to another in the Designer.

9
Lab 1 - Setting Connections

 This is a demonstration which has to be followed by the participants.


 This lab briefs about connections to Informatica Client’s and other necessary configurations

10
Naming Conventions

Chapter 3
Naming Conventions

 Good Practice to Follow Naming Conventions


 Can be project specific:-
 Workflow: wfl_ followed by workflow functionality
 Session: s_ followed by mapping name
 Mapping: m_ followed by mapping functionality
 Source: Table/File name
 Target: Table/File name
 Ports:
» Input & Output :- Column Names
» Variable:- v_ followed by functionality

12
Naming Conventions - Transformations:

Source Qualifier: sql_(followed by Source Name)


Stored Procedure: sp_(followed by purpose of transformation)
Sequence Generator: seq_
Expression: exp_
Joiner: jnr_
Lookup: lkp_
Filter: fil_
Rank: rnk_
Router: rtr_
Update Strategy: upd_
Aggregator: agg_
Normalizer: nrm_

13
Working With Sources and Targets

Chapter 4
Design Process Overview

Create Source definition(s)


Create Target definition(s)
Create a Mapping
Create a Session Task
Create a Workflow with Task components
Run the Workflow and verify the results

15
Methods of Analyzing Sources

 To extract data from a source, you must first define sources in the repository.
 You can import or create the following types of source definitions in the Source Analyzer:
– Relational database
– Flat file
– COBOL file
– XML object

16
Working with Relational Sources

 You can add and maintain relational source definitions for tables, views, and synonyms:
 Import source definitions: Import source definitions into the Source Analyzer.
 Update source definitions: Update source definitions either manually, or by re-importing
the definition.

17
Importing Relational Source Definitions
 You can import relational source definitions from database tables, views, and synonyms.
 When you import a source definition, you import the following source metadata:
 Source name
 Database location
 Column names
 Datatypes
 Key constraints
Note: When you import a source definition from a synonym, you might need
to manually define the constraints in the definition.
 To import a source definition, you must be able to connect to the source database from the client
machine using a properly configured ODBC data source or gateway. You may also require read
permission on the database object.
 You can also manually define key relationships, which can be logical relationships created in the
repository that do not exist in the database.

18
Importing Relational Source Definitions
To import a source definition:
1. In Source Analyzer, choose Sources-Import from Database.

19
Importing Relational Source Definitions

If no table names appear or if the table you want to import does not
appear, click All.

20
Importing Relational Source Definitions

6. Click OK.

21
Importing Relational Source Definitions

7. Choose Repository-Save

22
Creating Target Definitions

 You can create the following types of target definitions in the Warehouse Designer:
 Relational: You can create a relational target for a particular database platform. Create
a relational target definition when you want to use an external loader to the target
database.
 Flat File: You can create fixed-width and delimited flat file target definitions.
 XML File: You can create an XML target definition to output data to an XML file.

23
Importing a Relational Target Definition

 When you import a target definition from a relational table, the Designer imports the following
target details:
 Target name.
 Database location.
 Column names.
 Datatypes.
 Key constraints.

 Key relationships.

24
Automatic Target Creation
 Drag-and-drop a Source Definition into the Warehouse Designer Workspace

25
Target Definition properties

26
Target Definition properties

27
Data Previewer

 Preview data in
 Relational Sources
 Flat File Sources
 Relational Targets
 Flat File Targets
 Data Preview Option is available in
 Source Analyzer
 Warehouse Designer
 Mapping Designer
 Mapplet Designer

28
Data Previewer Source Analyzer

From Source Analyzer Select Source drop down Menu,


then preview data

29
Data Previewer Source Analyzer

A right mouse click can also be used to preview data

30
LAB 2 - Creating Source Definitions

 Connect to Oracle Database using the Train_Ora_Src connection


 Import the following Source Tables
 Employees
 Customers
 EmployeeTerritories
 Territories
 Region
 Orders
 OrderDetails

31
LAB 3 - Creating Target Definitions

 Connect to Oracle Database using the Train_Ora_Tgt connection


 Import the following Target Tables:-
 DIM_EMPLOYEE
 DIM_CUSTOMER
 FACT_ORDERS

32
DAY 2
Mappings Overview

Chapter 5
Overview
 “A mapping is a set of source and target definitions linked by transformation objects that
define the rules for data transformation.”
 Mappings represent the data flow between sources and targets.
 When the PowerCenter Server runs a session, it uses the instructions configured in the mapping to
read, transform, and write data.
 Every mapping must contain the following components:
 Source instance: Describes the characteristics of a source table or file.
 Transformation: Modifies data before writing it to targets. Use different transformation objects to
perform different functions.
 Target instance: Defines the target table or file.
 Links: Connect sources, targets, and transformations so the PowerCenter Server can move the
data as it transforms it.
Note:
– A mapping can also contain one or more Mapplets. A mapplet is a set of
transformations that you build in the Mapplet Designer and can use in multiple
mappings.

35
Sample Mapping

36
Developing a Mapping

 When you develop a mapping, use the following procedure as a guideline:


 1. All source, target, & reusable objects are created.
 2. Create a new mapping.
 3. Add sources and targets.
 4. Add transformations and transformation logic.
 5. Connect the mapping.
 6. Validate the mapping.
 7. Save the mapping.

37
Mapping Validation
 Mappings must
 Be valid for a session to run
 Be end-to-end complete and contain valid expressions
 Pass all data flow rules
 Mappings are always validated when saved; can be validated without saving
 Output window will always display reason for invalidity

38
Transformation Concepts

Chapter 6
Transformation Concepts

 “A Transformation is a repository object that generates, modifies, or passes data.”


 The Designer provides a set of transformations that perform specific functions.
 Transformations can be active or passive.
 Transformations can be connected to the data flow, or they can be unconnected.
 An Unconnected transformation is called within another transformation, and returns a value to
that transformation.
 Transformations in a mapping represent the operations the PowerCenter Server performs on
the data.
 Data passes into and out of transformations through ports that you link in a mapping or
mapplet.

40
Active Vs Passive Transformation

Active Passive
Number or rows input may not Number or rows input always
equal number of rows output equals number of rows output
Can operate on groups of data rows Operates on one row at a time
May not be re-linked into another May be re-linked into another
data stream (except into a sorted data stream
join where both flows arise from
the same source qualifier)
e.g. Aggregator, Filter, Joiner, Rank, e.g. Expression, Lookup, External
Normalizer, Source Qualifier, Procedure, Sequence Generator,
Update Strategy, Custom Stored Procedure

41
Transformation Views

A transformation has three views :


 Iconized
 Normal
 Edit

Iconized: shows the transformation in the relation to the rest of the


mapping

42
Transformation Views

Normal: shows the flow of data


through the transformation

Edit: shows the transformation


ports and the properties;
allows editing

43
Ports & Expressions

 Ports are present in each transformation and are used to propagate the field values from the
source to the target via the transformations.
 Ports are basically of 3 types:-
 Input
 Output
 Variable
 Ports evaluation follows the Top-Down Approach
 An Expression is a calculation or conditional statement added to a transformation.
 An Expression can be composed or Ports, Functions, operators, variables, literals, return
values & constants.

44
Ports - Evaluation
 The best practice recommends the following approach for port evaluation
 Input Ports:
 Should be evaluated first
 There is no evaluation ordering among input ports (as they do not depend on any other ports)
 Variable Ports:
 Should be evaluated after all input ports are evaluated (as variable ports can reference any
input port)
 Variable ports can reference other variable ports also but not any output ports.
 Ordering of variables is also very important as they can reference each other’s values.

45
Ports - Evaluation

 Output Ports:
 Should be evaluated last
 They can reference any input port or any variable port.
 There is no ordered evaluation of output ports (as they cannot reference each other)

46
Using Variable Ports
 Also known as Local variables.
 Used for temporary storage
 Used to simplify complex expressions
 E.g. – create and store a depreciation formula to be referenced more than once
 Used in another variable port or output port expression
 A variable port cannot also be an input or output port.
 Available in the Expression, Aggregator and Rank.
 Variable ports are NOT visible in Normal view, only in Edit view

47
Using Variable Ports
 The scope of variable ports is limited to a single transformation.
 Variable ports are initialized to either ‘zero’ (for numeric values) or ‘empty string’ (for character & date
variables) when the Mapping logic is processed.
 They are not initialized to ‘Null’
 Variable ports can remember values across rows (useful for comparing values) & they retain their
values until the next evaluation of the variable expression.
 Thus we can effectively use the order of variable ports to do procedural computation.

48
Default Values – Two Usages

 For Input and I/O ports


– Used to replace null values

 For Output ports


– Used to handle transformation calculation errors (not-null handling)

49
Expressions

 Expressions can be entered at the row-level (port) or field-level (transformation level)


 Expressions can be used in the following transformations:-
– Expression: - Output Port Level
– Aggregator - Output Port Level
– Rank - Output Port Level
– Filter - Transformation Level
– Router - Transformation Level
– Update Strategy - Transformation Level
– Transaction Control - Transformation Level

50
Informatica Data Types

Native Data types Transformation Data Types


Specific to the source and target PowerCenter internal database
database types types based on ANSII SQL-92

Display in source and target tables Display in transformations within


within Mapping Designer Mapping Designer

Note:
a) Transformation data types allow mix-n-match of source and target
database types
b) When connecting ports, native and transformation data types must
be either compatible or explicitly converted

51
Source Qualifier Transformation

Chapter 7
What does it do?

 When you add a relational or a flat file source definition to a mapping, you need to connect it to
a Source Qualifier transformation.
 The Source Qualifier transformation represents the rows that the PowerCenter Server reads
when it runs a session.
 You cannot directly connect sources to targets.
 You need to connect them through a Source Qualifier transformation for relational and flat file
sources, or through a Normalizer transformation for COBOL sources.
 Can also be used for Homogeneous Joins.

53
Source Qualifier Transformation
 Active Transformation
 Connected
 Port
 All Input/Output
 Usage ( only applicable for relational sources)
 Modify SQL statements
 User defined Join
 Source Filter
 Sorted ports
 Select Distinct
 Pre/Post SQL
 Convert Data Types

54
Source Qualifier Transformation

Represents the source record set queried by the server.


Mandatory in Mappings using relational or flat file sources

55
Default Query
 For relational sources, the PowerCenter Server generates a query for each Source Qualifier
transformation when it runs a session.
 The default query is a SELECT statement for each source column used in the mapping. Thus, the
PowerCenter Server reads only the columns that are connected to another transformation.

 Although there are many columns in the source definition, only three columns are connected to
another transformation.
 In this case, the PowerCenter Server generates a default query that selects only those three
columns:
SELECT CUSTOMERS.CUSTOMER_ID, CUSTOMERS.COMPANY, CUSTOMERS.FIRST_NAME FROM CUSTOMERS

56
Joining Multiple sources
 You can use one Source Qualifier transformation to join data from multiple relational tables.
 These tables must be accessible from the same instance or database server.
 When a mapping uses related relational sources, you can join both sources in one Source Qualifier
transformation.
 Default join is inner equi-join (where Src1.col_nm = Src2.col_nm) if the relationship between the
tables is defined in the Source Analyzer
 This can increase performance when source tables are indexed.
 Tip: Use the Joiner transformation for heterogeneous sources and to join flat files.

57
Joining Multiple sources

58
LAB 4 - Source Qualifier (Simple Mapping)

 Create a Mapping using Employees as the Source and Employees as the Target instance
 No other transformations are required.
 Ensure target name is user specific (e.g.: Participant user1 should use user1.Employees)

59
Workflows- I

Chapter 8
Workflow Manager Tools

Workflow Designer
 Maps the execution order and dependencies of Sessions, Tasks & Worklets, for the
Informatica Server

Task Developer
 Create Session, Shell Command and Email Tasks
 Tasks created in the Task Developer are reusable

Worklet Designer
 Creates objects that represent a set of tasks
 Worklet objects are reusable

61
Workflow Manager Interface

e.g. The simplest Workflow


62
Workflow - Overview

 A workflow is a set of instructions that describes how and when to run tasks related to
extracting, transforming, and loading data.
 The PowerCenter Server runs workflow tasks according to the conditional links connecting the tasks.
 Workflow Manager is used to develop and manage workflows.
 Workflow Monitor is used to monitor workflows and stop the PowerCenter Server.
 When a workflow starts, the PowerCenter Server retrieves mapping, workflow, and session metadata
from the repository to extract data from the source, transform it, and load it into the target.
 You can run as many sessions in a workflow as you need.
 You can run the Session tasks sequentially or concurrently, depending on your needs.

63
Session Overview
 A session is a set of instructions that tells the PowerCenter Server how and when to move
data from sources to targets.
 A mapping is a set of source and target definitions linked by transformation objects that
define the rules for data transformation.
 To run a session, you must first create a workflow to contain the Session task.

64
Link Task

 Required to connect Workflow Tasks


 Can be used to create branches in a Workflow
 All links are executed-unless a link condition is used which makes a link false
 Links connecting the tasks in a workflow are not allowed to form a closed loop

65
Session Task

Chapter 9
Session Task

 Server Instructions to run the logic of ONE specific Mapping


 E.g- source and target data location specifications, memory allocation, optional
Mapping overrides, scheduling, processing and load instructions
 Becomes a component of a Workflow or Worklet
 If configured in the Task Developer, the Session Task is reusable
 When a session is to be created, valid mappings are displayed in the dialog box

67
Session Task
 Session Task Tabs :
 General
 Properties
 Config Object
 Mapping
 Components
 Metadata Extensions

68
Session Task

69
Session Task

70
Workflows Monitor Overview

Chapter 10
Monitor Workflows

 The Workflow Monitor is the tool for monitoring Workflows and Tasks
 Review details about a Workflow or Tasks in two views:
 Gantt Chart view
 Task view

 The Workflow Monitor displays Workflows that have been run at least once

72
Gantt Chart View

73
Task View

74
Monitoring Workflows
 Perform operations in the Workflow Monitor
 Restart: restart a Task, Workflow or Worklet
 Stop: stop a Task, Workflow or Worklet
 Abort: abort a Task, Workflow or Worklet
 Resume: resume a suspended Workflow after a failed Task is corrected
 View Session and Workflow logs
 Abort has a 60 second timeout
 If the Server has not completed processing and committing data during the timeout period,
the threads and processes associated with the Session are killed.

75
Sequence Generator Transformation

Chapter 11
What does it do?

 The Sequence Generator transformation generates numeric values.


 You can perform the following tasks with a Sequence Generator transformation:
 Create keys.
 Replace missing values.
 Cycle through a sequential range of numbers.

77
Sequence Generator Transformation

 Generates unique keys for any port on a row


 Passive Transformation / Connected
 Ports
 Two predefined output ports
– NEXTVAL
– CURRVAL
 No input ports allowed
 Usage
 Generate Sequence numbers
 Shareable across mappings

78
Sequence Generator Transformation

Connecting CURRVAL and NEXTVAL Ports to a Target

79
Example

Input Output

The rows in the Source have to be loaded into the target with a Unique
ID generated for each record.
Here the Sequence generator helps in creating the IDs for each record
in the target.

80
Sequence Generator Properties

 Properties
 Start value
 End Value
 Increment By
 Number of cached values
 Reset
 Cycle
 Design tip: Set Reset property and Increment by 1. Use in conjunction with lookup. Lookup to
get max(value) from target. Add NextVal to it to get the new ID.

81
LAB 5 - Sequence Generator (1)

 Create copy of mapping created in LAB 4


 Use Sequence Generator to Populate Employee_wk
 Set Properties for Reset, Cycle, Range should be 1 to 100

82
LAB 6 - Sequence Generator (2)

 Use Employees as the source.


 Get the distinct values of Country and load it to Country.
 Use Employees as the source again.
 Get the distinct values of City and load it to City.
 For both these tables, generate the ID values using a sequence generator where ID values
start from 1 for both these tables.
 Use a Sequence Generator
 Start value = 1
 Increment value = 1

83
Expression Transformation

Chapter 12
What does it do?

 You can use the Expression transformation to calculate values in a single row before you write
to the target.
 For example, you might need to adjust employee salaries, concatenate first and last
names, or convert strings to numbers.
 You can use the Expression transformation to perform any non-aggregate calculations.
 You can also use the Expression transformation to test conditional statements before you
output the results to target tables or other transformations.

85
Expression Transformation
 Passive Transformation
 Connected
 Ports
– Mixed
– Variables allowed
 Create expression in output or variable port
 Used to perform majority of data manipulation

86
Expression Transformation

Perform calculations using non-aggregate functions


(row level)

87
Expression Editor
 An expression formula is a calculation or conditional statement for a specific port in a
transformation
 Performs calculation based on ports, functions, operators, variables, constants, and return values
from other transformations

88
Expression Editor

89
Example

Source Target

We are making use of an Expression transformation to concatenate the


“First_Name” and “Last_Name” fields of the source into a single
“Full_Name” field in the target.
We use the available functions in the Expression Editor to get the required
output.

90
LAB 7 - Expression Transformation (1)

 Create a mapping using the Employee flat file as source, DIM_EMPLOYEE as the target
 Concatenate First Name and Last Name to get Employee Name
 Ensure all leading and trailing spaces are removed for character columns
 Use NEXTVAL of Sequence Generator transformation to connect to Employee_wk
 Target load will be truncate / load.
 Do not connect geography_wk, region_nk, region_name and direct_report_wk

91
LAB 8 - Expression Transformation (2)

 Create a mapping using DIM_EMPLOYEE as source and a flat file as target


 The target definition should have only 5 fields
 First_Name
 Last_Name
 Fname_size
 Lname_size
 Title
 Employee Name has to be split into “First Name” and “Last Name”
 Calculate length of “First_Name” and “Last_Name”
 Directly connect the Title to the target

92
LAB 9 - Expression Transformation (3)

 Source:- Employees
 Target:- Employee_LAB_9 File
 In the target file we need
 First name should have the first alphabet in upper case and the rest in lower case
 Last name should be in upper case
 Also compute the employees Age in years

93
LAB 10 - Expression Transformation (4)

 Source:- Orders
 Target:- lab_10_order_dates (Flat file)
 For each order id, find the day, month, year and quarter from the order date.
 In addition, find the 1st day of the quarter and the last day of the quarter using an expression
transformation only.

94
DAY 3
Filter Transformation

Chapter 13
What does it do?

 The Filter transformation allows you to filter rows in a mapping.


 You pass all the rows from a source transformation through the Filter transformation, and then
enter a filter condition for the transformation.
 All ports in a Filter transformation are input / output, and only rows that meet the condition pass
through the Filter transformation.

97
Filter Transformation

 Active Transformation Connected


 Ports
 All Input/Output
 Usage
 Filter rows from mapping/mapplet pipeline

98
Filter Transformation

Drops rows conditionally

Use of logical operators makes the filter very effective


(e.g. SALARY > 30000 AND SALARY < 100000)
99
Filter Transformation in a Mapping

100
Example

Input Output

We are making use of a Filter transformation with a condition to pass only


the records with salary > 2500

101
LAB 11 - Filter Transformation (1)

 Create copy of LAB 7 mapping


 Add a Filter Transformation to the mapping to filter out all records having Region as NULL, set
audit_id = 0
 Target load will be truncate / load

102
LAB 12 - Filter Transformation (2)

 Create copy of LAB 11 mapping


 Filter out all records having either
 Region as NULL
 TitleOfCourtesy = “Dr.”
 Set audit_id = 0
 Target load will be truncate load

103
LAB 13 - Filter Transformation (3)

 Source:- Orders
 Load data to the 3 target tables which should contain only for the months of October,
November and December respectively
 Orders_Oct
 Orders_Nov
 Orders_Dec

104
Router Transformation

Chapter 14
What does it do?

 A Router transformation is similar to a Filter transformation because both transformations allow


you to use a condition to test data.
 A Filter transformation tests data for one condition and drops the rows of data that do not meet
the condition.
 However, a Router transformation tests data for one or more conditions and gives you the
option to route rows of data that do not meet any of the conditions to a default output group.
 If you need to test the same input data based on multiple conditions, use a Router
transformation in a mapping instead of creating multiple Filter transformations to perform the
same task.

106
Router Transformation

Rows sent to multiple filter conditions

 Active Transformation Connected


 Ports
 All input/output
 Specify filter conditions for each Group
 Used to Link source data in one pass to multiple filter conditions

107
Router Groups
 Input group (always one)
 User-defined groups
 Each group has one condition
 All group conditions are evaluated for each row
 One row can pass multiple conditions
 Unlinked group outputs are ignored
 Default group (always one) can capture rows that fail all
Group conditions

108
Router Group Filter Conditions

109
Using Router in a mapping

110
Example

Source Targets

We are making use of a Router transformation to separate records into 2


groups:-
1. The records with Salary > 2500
2. The records with Salary <= 2500 AND Salary >1250

111
Filter Vs Router

112
LAB 14 - Router Transformation (1)

 Create a mapping using Customer as Source


 Add a Router to have groups by Customer, based on Country = ‘USA’, Country = ‘Germany’
and all others
 Load to 3 instances of the DIM_CUSTOMER table

113
LAB 15 - Router Transformation (2)

 Create a copy of LAB 13


 Source:- Orders
 Load data to the 3 target tables which should contain only for the months of October,
November and December respectively
 Orders_Oct
 Orders_Nov
 Orders_Dec
 Make use of Router transformation

114
Joiner Transformation

Chapter 15
What does it do?

 You can use the Joiner transformation to join source data from two related heterogeneous
sources residing in different locations or file systems.
 The Joiner transformation joins two sources with at least one matching port.
 The Joiner transformation uses a condition that matches one or more pairs of ports between
the two sources.
 If you need to join more than two sources, you can add more Joiner transformations to the
mapping.
 The Joiner transformation requires input from two separate pipelines or two branches from one
pipeline.

116
Joiner Transformation

 Active/Connected
 Ports
 Input
 Output
 Master

117
Joins Types

Homogeneous Joins
 Joins that can be performed with a SQL SELECT statement
 Source Qualifier contains a SQL join
 Tables on same database server(or are synonyms)
 Database server does the join “work”
 Multiple Homogeneous joins can be joined
Heterogeneous Joins
 Examples of joins that cannot be done with an SQL statement :
 An Oracle table and a DB2 table
 Two flat files
 A flat file and a database table

118
Heterogeneous Joins

119
Joiner Properties
 Join Types:
 Normal (inner)
 Master Outer
 Detail Outer
 Full Outer

 Joiner can accept sorted data (configure the join condition to use the sort origin ports)
 Joiner Conditions & Nested Joins:
 Multiple Join conditions are supported
 Used to join three or more heterogeneous sources

120
Join Types – 1. Normal Join

With a Normal join, the PowerCenter Server discards all rows of data from
the master and detail source that do not match, based on the condition.

Source tables

Result after Normal Join

121
Join Types – 2. Master Outer Join

A master outer join keeps all rows of data from the detail source and the
matching rows from the master source. (It discards the unmatched rows
from the master source.)
Source tables

Result after Master Outer Join

122
Join Types – 3. Detail Outer Join

A detail outer join keeps all rows of data from the master source and the
matching rows from the detail source. (It discards the unmatched rows
from the detail source.)

Source tables

Result after Detail Outer Join

123
Join Types – 4. Full Outer Join

A full outer join keeps all rows of data from both the master and detail
sources.

Source tables

Result after Full Outer Join

124
LAB 16 - Joiner Transformation (1)

 Use Employee table created in LAB 4 as source.


 Add another source that is a combination of EmployeeTerritories, Territories, Region tables
 Join to the Employee table by doing an inner join on EmployeeID to get RegionID,
RegionDescription, TerritoryID and TerritoryDescription from the tables
 Target:- Emp_Details

125
LAB 17 - Joiner Transformation (2)

 Sources:- Oracle.Part_Type, Oracle.Part_Color tables


 Target:- Part_Details
 Join the tables using a joiner on column part_id and load to target.

126
LAB 18 - Joiner Transformation (3)

 Sources:- Oracle.Part_Type, Oracle.Part_Color tables


 Target:- Oracle.Part_Details
 Join the tables using a joiner on column part_id and load to target.
 All records from Oracle.Part_Type should be present in the target. (Use an appropriate join
type)

127
LAB 19 - Joiner Transformation (4)

 Sources:- Oracle.Part_Type, Oracle.Part_Color tables


 Target:- Oracle.Part_Details
 Join the tables using a joiner on column part_id and load to target.
 All records from Oracle.Part_Color should be present in the target. (Use an appropriate join
type)

128
Aggregator Transformation

Chapter 16
What does it do?

 The Aggregator transformation allows you to perform aggregate calculations, such as averages
and sums.
 The Aggregator transformation is unlike the Expression transformation, in that you can use it to
perform calculations on groups.
 The Expression transformation permits you to perform calculations on a row-by-row basis only.
 When using the transformation language to create aggregate expressions, you can use
conditional clauses to filter rows, providing more flexibility than SQL language.
 The PowerCenter Server performs aggregate calculations as it reads, and stores necessary
data group and row data in an aggregate cache.
 The PowerCenter Server typically returns the last row’s value for all the non-aggregated fields
with the result of the aggregation.

130
Example

Input records:- Group by STORE_ID and ITEM

Output records:- Calculate Total Sales per store

131
Aggregator Transformation

 Active Transformation Connected


 Ports
 Mixed
 Variables allowed
 Group by allowed

 Used for Standard aggregations


 Can also be used to get distinct records

132
Aggregator Transformation

Performs aggregate calculations

133
Aggregate Expressions
Aggregate functions are supported only in the
Aggregator Transformation

Conditional Aggregate Expressions are supported


Ex : Conditional SUM format
SUM (value, condition)
134
Aggregator Transformation
 Aggregate Functions
 Return summary values for non-null data in selected ports
 Used only in Aggregator Transformations
 Used only in Output ports
 Calculate a single value(and row) for all records in a group
 Nested aggregate functions are allowed
 Ex : AVG(), COUNT(), MAX(), SUM()
 Conditional statements can be used with these functions

135
Aggregate Properties
 Sorted Data (can be aggregated more efficiently)
 The Aggregator can handle sorted or unsorted data
 The Server will cache data from each group and release the cached data - upon reaching the
first record of the next group
 Data must be sorted according to the order of the Aggregator “Group By” ports
 Performance gain will depend upon varying factors
 Sorted Input property
 Instructs the Aggregator to expect the data to be sorted
 If you use sorted input and do not presort data correctly, you receive unexpected results.

136
Sorted Vs Unsorted Input

Unsorted data Group by Store, Department date

No rows are released


from aggregator until all
rows are aggregated

Sorted data Group by Store, Department date

For each separate group,


one row is released as
soon as the last row in the
group is aggregated

137
LAB 20 - Aggregator Transformation (1)
 Create a mapping with Sources as Orders, OrderDetails
 Source & Target connection is Train_Ora_Tgt
 Target is Fact_Orders
 Aggregate at Order_ID level
 Formulae:
lead_time_days = requireddate - orderdate,
internal_response_time_days = shippeddate - orderdate,
external_response_time_days = requireddate - shippeddate
total_order_item_count = SUM(Quantity)
total_order_discount_dollars = SUM((Quantity * UnitPrice) * Discount)
total_order_dollars = SUM((Quantity * UnitPrice) - ((Quantity * UnitPrice) * Discount))
 DEFAULT to -1 for customer_wk, employee_wk, order_date_wk, required_date_wk,
shipped_date_wk, ship_to_geography_wk, shipper_wk

138
LAB 21 - Aggregator Transformation (2)

 Source:- Products table


 Target:- Category_details
 Find the total UnitsInStock for each category.
 In addition, for each category find the number of products whose price is greater than $20 as
well as the total number of products.
 Do the above problem with just one aggregator

139
Using Shared Objects: Mapplets

Chapter 17
Overview

 Useful for repetitive tasks / logic


 Represents a set of transformations
 Mapplets are reusable (Use an “instance” of a Mapplet in a Mapping)
 Changes to a Mapplet are inherited by all instances
 Server expands the Mapplet at runtime
 Mapplets help simplify mappings in the following ways:
 Include source definitions
 Accept data from sources in a mapping
 Include multiple transformations
 Pass data to multiple transformations
 Contain unused ports

141
Components

 A mapplet has the following components:


 Mapplet input:
– You can pass data into a mapplet using source definitions and / or Input
transformations.
– When you use an Input transformation, you connect it to the source pipeline in the
mapping.
 Mapplet output:
– Each mapplet must contain one or more Output transformations to pass data from the
mapplet into the mapping.
 Mapplet ports:
– Mapplet ports display only in the Mapping Designer.
– Mapplet ports consist of input ports from Input transformations and output ports for
Output transformations.
– If a mapplet uses source definitions rather than Input transformations for input, it does
not contain any input ports in the mapping.

142
Example: Viewing Mapplet Input and Output

143
Example - Sample Mapplet in a Mapping

144
Mapplet Input Transformation

 Used for data sources outside a Mapplet


 Passive Transformation Connected
 Ports
 Output ports only
 Usage
 Only those ports connected from an Input transformation to another transformation will
display in the resulting Mapplet
 Connecting the same port to more than one transformation is disallowed
 Pass to an Expression transformation first

145
Mapplet Output Transformation

 Used to contain the results of a mapplet pipeline.


 Multiple Output transformations are allowed.
 Passive Transformation Connected
 Ports
 Input ports only
 Usage
 Only these ports connected to an Output transformation (from another transformation) will
display in the resulting Mapplet
 One (or more) Mapplet Output transformations are required in every Mapplet

146
Example - Sample Mapplet in a Mapping

147
Mapplet Source Options
 Internal Sources
 One or more Source definitions / Source Qualifiers within the Mapplet
 External Sources
 Mapplet contains a Mapplet Input Transformation
 Receives data from the Mapping it is used in
 Mixed Sources
 Mapplet contains one or more of either of a Mapplet Input transformations AND one or more of
Source Qualifiers
 Receives data from the Mapping it is used in, AND from the Mapplet

148
Mapplet Data Sources

Data Source Outside a Mapplet


 Source data is defined OUTSIDE the Mapplet logic
 Resulting Mapplet HAS input ports
 When used in a Mapping, the mapplet may occur at any point in mid-flow
Data Source Inside a Mapplet
 Source data is defined WITHIN the Mapplet logic
 No Input transformation is required (or allowed)
 Use a Source Qualifier instead
 Resulting Mapplet has NO input ports
 When used in a Mapping, the Mapplet is the first object in the data flow

149
Mapplet with Multiple Output Groups

 Can output to multiple instances of the same target table

150
Unmapped Mapplet Output Groups

 Disallowed:
 Mapplet Output Group NOT linked
 Link at least one port

151
Unsupported Transformations

 Use any transformation in a Mapplet except:


 XML source definitions
 COBOL source definitions
 Normalizer
 Pre- and Post-Session stored procedures
 Target definitions
 Other Mapplets
 Sequence Generator transformations must be reusable in mapplets.

152
Active and Passive Mapplets

 Passive Mapplets contain only passive transformations


 Active Mapplets contain one or more active transformations
CAUTION: Changing a passive Mapplet into an Active Mapplet may invalidate Mappings which
use that Mapplet
 Do an impact analysis in Repository Manager first
Using Active and Passive Mapplets
 Multiple Passive Mapplets can populate the same target instance
 Multiple Active Mapplets or Active and Passive Mapplets cannot populate the same target
instance

153
LAB A - Creating a Mapplet

 Create a Mapplet in the Mapplet Designer


 The mapplet should have the following Input Port:-
 Designation
 The mapplet should return the following Output Ports:-
 Designation
 Count (of Designations)
 The mapplet should find the counts of the designations
 If this count is < 2, discard; else, pass

154
LAB B - Using a Mapplet in a Mapping

 Use Customers, Suppliers, Employees tables as relational source


 Use Cust_Counts, Emp_Counts, Sup_Counts files as target files
 Use the available Mapplet for calculating the count of Designations
 ContactTitle for Customers & Suppliers
 Title for Employees
 We need to have 3 disconnected Flows in the same Mapping.

155
DAY 4
File List Option

Chapter 18
File List Basics

 You can create a session to run multiple source files for one source instance in the mapping.
 You might use this feature if, for example, your company collects data at several locations
which you then want to move through the same session.
 When you create a mapping to use multiple source files for one source instance, the properties
of all files must exactly match the source definition.
 To use multiple source files, you create a file containing the names and directories of each
source file you want the PowerCenter Server to use.
 This file is referred to as a “File list”.
 When the session starts, the PowerCenter Server reads the file list, then locates and reads the
first file source in the list.
 After the PowerCenter Server reads the first file, it locates and reads the next file in the list.

158
File List - Session Properties

159
LAB C - Using similar input files in a single Mapping

 Use the 3 files generated as output from the Mapplet exercise as sources
 Use the All_Counts file as target
 Use the filelist option in the session properties to read data from all 3 files and specify
“All_Counts” as the single target

160
Lookup Transformation

Chapter 19
What does it do?

 Use a Lookup transformation in a mapping to look up data in a flat file or a relational table,
view, or synonym.
 You can import a lookup definition from any flat file or relational database to which both the
PowerCenter Client and Server can connect.
 You can use multiple Lookup transformations in a mapping.
 The PowerCenter Server queries the lookup source based on the lookup ports in the
transformation.
 It compares Lookup transformation port values to lookup source column values based on the
lookup condition and passes the result (of the lookup) to other transformations and a target.

162
Example

Input Lookup Table

How to find the department name for each employee by using a Lookup
transformation?
This is determined by matching the Dept# from Input & Lookup tables
Output

163
Lookup Transformation

 Passive Transformation
 Connected/Unconnected
 Ports
 Mixed
 “L” indicates Lookup port
 “R” indicates port used as a return value
 Usage
 Get related values
 Verify if records exist or if data has changed
 Multiple conditions are supported
 Lookup SQL override is allowed

164
Lookup Transformation

165
Lookup Transformation

Looks up values in a database table and provides data to other


components in a mapping

166
Lookup Properties
 Lookup conditions
 Lookup Table Name
 Lookup SQL
 Native Database connection
Object name

167
How a Lookup Transformation works
 For each mapping row, one or more port values are looked up in a database table
 If a match is found, one or more table values are returned to the mapping. If no match is found,
default value is returned

168
Lookup Caching
 Caching can significantly impact performance
 Cached
– Lookup table data is cached locally on the server
– Mapping rows are looked up against the cache
– Only one SQL SELECT is needed
– Cache is indexed based on the order by clause
 Uncached
– Each Mapping row needs one SQL SELECT
 If the data does not fit in the memory cache, the PowerCenter Server stores the overflow values
in the cache files.
 When the session completes, the PowerCenter Server releases cache memory and deletes the
cache files unless you configure the Lookup transformation to use a persistent cache.
Rule of thumb: Cache if the number (and size) of records in the lookup table is
small relative to the number of mapping rows requiring the lookup

169
Lookup Caches

 When configuring a lookup cache, you can specify any of the following options:
 Static cache:
 Dynamic cache:
 Persistent cache:
 Shared cache:

170
Lookup Policy on Multiple Match

Options are
 Use first value
 Use last value
 Report error

Note: When Dynamic


Cache is enabled Multiple
match will report error.

171
LAB 22 - Using Connected Lookup (1)

 Source:- Emp table


 Lookup:- Dept Table
 Target:- Emp_Dept table
 Add a Lookup transformation for Dept
 Lookup the Dept table to get the Dname for each employee’s Dept_id

172
LAB 23 - Using Connected Lookup (2)

 Use Mapping in LAB 20


 Add 2 Lookup transformation for DIM_EMPLOYEE and DIM_SHIPPER
 Target: Fact_Orders
 Populate using lookups with natural keys, default = -1
 employee_wk:

Orders.EmployeeID = DIM_EMPLOYEE.employee_nk

 shipper_wk:

Orders.ShipVia = Dim_Shipper.Shipper_nk

 Populate the other keys with Default = -1

173
LAB 24 - Using Connected Lookup (3)

 Source:- Employees
 Target:- Emp_Manager
 Use a lookup transformation to find the manager for each employee
 Load the employee name and manager name into the target
 If a person does not have a manager then do not load that record to the target

174
Unconnected Lookups

Chapter 20
Unconnected Lookup

 Will be physically “unconnected” from other transformations


 There can be NO data flow arrows leading to or from an unconnected Lookup
 Lookup function can be set within any transformation that supports expression

176
Conditional Lookup Technique

Two requirements:
1.Must be Unconnected(or “function mode”) Lookup
2.Lookup function used within a conditional statement
E.g - IIF(ISNULL(cust_id), :lkp.MYLOOKUP(order_no)
Conditional statement is evaluated for each row
Lookup function is called only under the pre-defined condition

177
Conditional Lookup Advantage
 Data lookup is performed only for those rows which require it.Substantial performance can be
gained
E.g.- A Mapping will process 500,000 rows. For two percent of those rows(10,000) the item_id
value is NULL. Item_id can be derived from the SKU_NUMB
IIF(ISNULL(item_id),:lkp.MYLOOKUP(sku_numb))
Net Savings=490,000 lookups

178
Unconnected Lookup Functionality
 One Lookup port value (Return Port) may be returned for each Lookup
WARNING:
If the Return port is not defined, you may get unexpected results.

179
Connected Vs Unconnected Lookups

Connected LOOKUP Unconnected LOOKUP


Part of the mapping data flow Separate from the mapping data flow

Returns multiple values (by Returns one value (by checking the
linking output ports to another Return port option for the output port
transformation that provides the return value)

Executed for every record Only executed when the lookup


passing through the function is called
transformation
More visible, shows where the Less visible, as the lookup is called
lookup values are used from an expression within another
transformation
Default values are used Default values are ignored

180
LAB 25 - Using Unconnected Lookup (1)

 Use the Mapping in LAB 23


 Add two Unconnected Lookups for
 DIM_CUSTOMER
 DIM_CALENDER (Empty table)
 The unconnected lookups can be called from the aggregator

181
LAB 26 - Using Unconnected Lookup (2)

 Use LAB 22 and do the same assignment by replacing the connected lookup with unconnected
lookup.
 Return dept_name alone from the Unconnected lookup.

182
Target Instances

Chapter 21
Target Instances

 A single mapping can have more than one instance of the same
target
 The data would be loaded into the instances in a pipeline
 Usage of multiple instances of the same target for loading is
dependant on the RDBMS in use. Multiple instances may not be used
if the underlying database locks the entire table while inserting
records

184
Target Instances - example

185
Update Strategy Transformation

Chapter 22
What does it do?

 Update Strategy transformations are essential if you want to flag rows destined for the same
target for different database operations (insert / update / delete), or if you want to reject rows.
 In PowerCenter, you set your update strategy at two different levels:
 Within a mapping: Within a mapping, you use the Update Strategy transformation to flag
rows for insert, delete, update, or reject.
 Within a session: When you configure a session, you can instruct the PowerCenter
Server to either treat all rows in the same way (for example, treat all rows as inserts), or
use instructions coded into the session mapping to flag rows for different database
operations.

187
Update Strategy Transformation

 Active Transformation Connected


 Ports
 All input/output
 Usage
 To mark a record for insert / update / delete or rejection
 IIF or DECODE logic determines how to handle the record

188
Update Strategy Transformation

Specifies how each individual row will be used to update


target tables (insert, update, delete, reject)

189
Update Strategy expressions

Operations Constant Numeric Value


INSERT DD_INSERT 0
UPDATE DD_UPDATE 1
DELETE DD_DELETE 2
REJECT DD_REJECT 3

 IIF ( score>69, DD_INSERT, DD_DELETE)


 Expression is evaluated for each row
 Rows are “tagged” according to the logic of the expression
 Appropriate SQL (DML) is submitted to the target database:
insert, delete or update
 DD_REJECT means the row will not have SQL written for it.
 Target will not “see” the row
 “Rejected” rows may be forwarded through Mapping to a reject file

190
LAB 27 - Update Strategy (1)
 Add Employee as Source. Add 2 instances of DIM_EMPLOYEE as Target
 Add a Lookup Transformation (LKP_Target) to get employee_wk from DIM_EMPLOYEE.
 Add Expression Transformation for trimming string columns and getting values from LKP_Target
 Add Router Transformation to separate the data flow for New and Existing Records
 Add 2 Update Strategy Transformations to Flag for Insert and Update
 Add a Sequence Generator for populating the employee_wk for insert rows.
 Add an Unconnected Lookup to retrieve the Max_value of employee_wk from the Target
 Add an Expression Transformation (EXP_MAX_SEQ - between the Update Strategy for insert and the
Target instance for insert) to call the unconnected lookup
 Note: Run LAB 11 first. Some rows are filtered & then run this workflow

191
LAB 28 - Update Strategy (2)

 Use problem statement in LAB 27 and solve the same by replacing the router and 2 update
strategy with a single update strategy.

192
Workflows – Additional Tasks

Chapter 23
Additional Workflow Tasks

 Eight additional Tasks are available in the Workflow Designer


– Command
– Email
– Decision
– Assignment
– Timer
– Control
– Event Wait
– Event Raise

194
Reusable Tasks

 Three types of reusable tasks


 Session: Set of instructions to execute a specific Mapping
 Command: Specific shell commands to run during any Workflow
 Email: Sends email during the Workflow
 Use the Task Developer to create a reusable tasks
 These tasks will then appear in the Navigator and can be dragged & dropped into any workflow

195
Command Task

 Specify one or more Unix shell or DOS commands to run during the Workflow
 Runs in the Informatica Server (Unix or Windows) environment
 Shell command status (successful completion or failure) is held in the pre-defined variable
“$command_task_name.STATUS”
 Each command Task shell command can execute before the Session begins or after the
Informatica Server executes a Session
 Specify one or more Unix shell or DOS (NT, WIn2000) commands to run at a specific point in
the Workflow
 Becomes a component of a Workflow (or Worklet)

196
Command Task

 If configured in the Task Developer, the Command Task is reusable (optional)


 You can use a Command task in the following ways:
 Standalone Command task.
 Pre- and post-session shell command.

197
LAB 29 - Using Command Task

 Copy the workflow of LAB 4 for this lab.


 Add a command task which copies the output file of session task to another directory.

198
Email Task

 Configure to have the Informatica Server to send email at any point in the Workflow
 Becomes a component in a Workflow (or Worklet)
 If configured in the Task Developer, the Email Task is reusable(optional)

199
LAB 30 - Using Email Task

 Copy the workflow of LAB 4 for this lab.


 Configure an email task after the session, to inform successful completion.

200
Non-reusable Tasks

 Six additional Tasks are available in the Workflow Designer


 Decision

 Assignment

 Timer

 Control

 Event Wait

 Event Raise

201
Decision Task

 Specifies a condition to be evaluated in the Workflow


 Use the Decision Task in branches of a Workflow
 Provides additional functionality over a Link

202
Decision Task

 Example Workflow without a Decision Task

203
Assignment Task

 Assigns a value to a Workflow variable


 Variables are defined in the Workflow object

204
Timer Task

 Waits for a specified period of time to execute the next Task


 Absolute Time
 Datetime variable
 Relative Time

205
LAB 31 - Using Timer Task

 Copy the workflow of LAB 29 for this Lab.


 Include a Timer task after the session and configure it so that the command task runs after 1
minute.

206
Control Task

 Used to stop, abort, or fail the top-level workflow or the parent workflow based on an input link
condition.
 A parent workflow or worklet is the workflow or worklet that contains the Control task.

207
Event Wait Task

 Waits for a User-defined or a Pre-defined event to occur


 Once the Event occurs, the Informatica Server completes the rest of the Workflow
 Used with the Event Raise Task
 Events can be a file watch (indicator file) or User-defined
 User-defined events are defined in the Workflow itself

208
Event Raise Task

 Represents the location of a User-defined event


 The Event Raise Task triggers the User-defined event when the Informatica Server executes
the Event Raise Task

209
Sorter Transformation

Chapter 24
What does it do?

 The Sorter transformation allows you to sort data.


 You can sort data in ascending or descending order according to a specified sort key.
 You can also configure the Sorter transformation for case-sensitive sorting, and specify
whether the output rows should be distinct.
 You can sort data from relational or flat file sources.
 You can specify more than one port as part of the sort key.
 When you specify multiple ports for the sort key, the PowerCenter Server sorts each port
sequentially.

211
Sorter Transformation

 Active Transformation Connected


 Ports
 Input / Output
 Define one or more sort keys
 Define sort order for each key
 Usage
 Sort data in mapping / mapplet pipeline
 Before Aggregator

212
Sorter Transformation

 Can sort data from relational tables or flat files


 Sort takes place on the Informatica Server machine
 Multiple sort keys are supported

213
Example

Input

We need to sort By ORDER_ID asc and ITEM_ID asc

Output

214
Sorter Transformation

 Sorter Properties
 Cache size
– Can be adjusted. [Default is 8MB]
– Server uses twice the cache listed
– If cache size is unavailable, Session Task will fail

215
Rank Transformation

Chapter 25
What does it do?

 The Rank transformation allows you to select only the top or bottom rank of data.
 You can use a Rank transformation
 to return the largest or smallest numeric value in a port or group.
 to return the strings at the top or the bottom of a session sort order.
 During the session, the PowerCenter Server caches input data until it can perform the rank
calculations.
 The Rank transformation differs from the transformation functions MAX and MIN, in that it
allows you to select a group of top or bottom values, not just one value.
 While the SQL language provides many functions designed to handle groups of data,
identifying top or bottom strata within a set of rows is not possible using standard SQL
functions.
 You can also write expressions to transform data or perform calculations.

217
Rank Transformation

 Filters the top or bottom range of records for selection.


 Active Transformation
 Connected
 Ports
 Mixed
 One pre-defined output port RANK INDEX
 Variable allowed
 Group By allowed
 Usage
 Select top/bottom
 Number of records

218
Overview
 You can use a Rank transformation to:-
 Return the largest / smallest numeric value in a port or group.
 Return the strings at the top / bottom of a session sort order.

219
Overview
 Rank transformation allows you to group information (like Aggregator) create local variables and
write non-aggregate expressions.
 The Rank transformation differs from the transformation functions MAX and MIN, in that it allows
you to select a group of top or bottom values, not just one value.
 You can connect ports from only one transformation to the Rank transformation.
 The Rank transformation includes input or input / output ports connected to another transformation
in the mapping.
 It also includes variable ports and one rank port.
 Use the rank port to specify the column you want to rank.

220
Rank Index

 The Designer automatically creates a RANKINDEX port for each Rank transformation.
 The PowerCenter Server uses the Rank Index port to store the ranking position for each row in
a group.
 For example, if you create a Rank transformation that ranks the top five salespersons for each
quarter, the rank index numbers the salespeople from 1 to 3:
RANKINDEX SALES_PERSON SALES
1 Sam 10,000
2 Mary 9,000
3 Alice 8,000

 The RANKINDEX is an output port only.


 You can pass the rank index to another transformation in the mapping or directly to a target.

221
Rank Index

 If two rank values match, they receive the same value in the rank index and the transformation
skips the next value.
 For example, if you want to see the top five retail stores in the country and two stores have the
same sales, the return data might look similar to the following:
RANKINDEX SALES STORE
1 10000 Orange
1 10000 Brea
3 90000 Los Angeles
4 80000 Ventura

222
DAY 5
Mapping Parameters and Variables

Chapter 26
Mapping Parameters & Variables

 Defined under the Mappings-Parameters & variables menu option.


 A parameter or variable defined in a mapplet is not visible in any parent mapping.
 A parameter or variable defined in a mapping is not visible in any child mapplet.
 When you use a mapping parameter, you declare and use the parameter in a mapping or
mapplet.
 Then define the value of the parameter in a parameter file.
 During the session, the PowerCenter Server evaluates all references to the parameter to the
value in the parameter file.
 Applies to all transformations within one Mapping / Mapplet.
 Format is $$VariableName or $$ParameterName.

225
Mapping Parameters & Variables
 Variables can change in value during run-time.
 Parameters remain constant during run-time.
 Provides increased developmental flexibility.
 If you declare mapping parameters and variables in a mapping, you can reuse a mapping by altering
the parameter and variable values of the mapping in the session. (This can reduce the overhead of
creating multiple mappings when only certain attributes of a mapping need to be changed)

226
Mapping Variables

 When you use a mapping variable, you declare the variable in the mapping or mapplet, and
then use a variable function in the mapping to change the value of the variable.
 At the beginning of a session, the PowerCenter Server evaluates references to a variable to its
start value.
 At the end of a successful session, the PowerCenter Server saves the final value of the
variable to the repository.
 The next time you run the session, the PowerCenter Server evaluates references to the
variable to the saved value.
 You can override the saved value by defining the start value of the variable in a parameter file.

227
Declarations
 Declare Variables / Parameters in the Mappings / Mapplets menu
 Properties that can be set :
 User-defined Names
 Appropriate aggregation type (count, min etc)
 Optional initial value
 Apply Parameter / Variable in formula

228
System Variables
 SYSDATE: Provides current datetime on the Informatica Server machine
 Not a static value
 $$$SessStartTime: Returns the system date value as a String. Uses system clock on Informatica
Server machine.
 String format is database type dependent
 Used in SQL override
 Has a constant value
 SESSSTARTTIME: Returns the system date value on the Informatica Server
 Used with any function that accepts transformation date / time data types
 Not to be used in SQL override
 Has a constant value

229
Functions to Set Mapping Variables

SetCountVariable:
 Counts the number of evaluated rows and increments or decrements a mapping variable
for each row
SetMaxVariable:
 Evaluates the value of a mapping variable to the higher of two values
SetMinVariable:
 Evaluates the value of a mapping variable to the lower of two values
SetVariable:
 Sets the value of a mapping variable to a specified value

230
Using in mappings & mapplets

 When you create a reusable transformation in the Transformation Developer, you can use any
mapping parameter or variable.
 The Designer validates the usage of any mapping parameter or variable in the expressions of
reusable transformation.
 When you use the reusable transformation in a mapplet or mapping, the Designer validates the
expression again.
 When the Designer validates a mapping variable in a reusable transformation, it treats the variable
as an Integer datatype.
 You cannot use mapping parameters and variables interchangeably between a mapplet and a
mapping.
 Mapping parameters and variables declared for a mapping cannot be used within a mapplet.
 Similarly, you cannot use a mapping parameter or variable declared for a mapplet in a
mapping.

231
Session and workflow parameter files

 You can use a parameter file to define the values for parameters and variables used in a workflow,
worklet, or session.
 You can create a parameter file using a text editor.
 You list the parameters or variables & their values in the parameter file.
 Parameter files can contain the following types of parameters and variables:
 Workflow variables
 Worklet variables
 Session parameters
 Mapping parameters and variables.
 When you use parameters or variables in a workflow, worklet, or session, the PowerCenter Server
checks the parameter file to determine the start value of the parameter or variable.
 You can use a parameter file to initialize workflow variables, worklet variables, mapping parameters,
and mapping variables.

232
Session and workflow parameter files

 You can place parameter files on the PowerCenter Server machine or on a local machine.
 You can include parameter or variable information for more than one workflow, worklet, or
session in a single parameter file by creating separate sections for each object within the
parameter file.
 You can also create multiple parameter files for a single workflow, worklet, or session and
change the file that these tasks use as needed.

233
Parameter File Format

 When you enter values in a parameter file, you must precede the entries with a heading that
identifies the workflow, worklet, or session whose parameters and variables you want to
assign.
 You assign individual parameters and variables directly below this heading, entering each
parameter or variable on a new line.
 You can list parameters and variables in any order for each task.

234
Parameter File Format

 You can define the following heading formats:


 Workflow variables:
[folder name.WF:workflow name]
 Worklet variables:
[folder name.WF:workflow name.WT:worklet name]
 Worklet variables in nested worklets:
[folder name.WF:workflow name.WT:worklet name.WT:worklet name...]
 Session parameters, plus mapping parameters and variables:
[folder name.WF:workflow name.ST:session name]
or
[folder name.session name]
or
[session name]

235
Parameter File Format

 Below each heading, you define parameter and variable values as follows:
parameter1 name=value
parameter2 name=value
variable1 name=value
variable2 name=value
 The parameter file for the session includes the folder and session name, as well as each parameter
and variable:
[Production.s_MonthlyCalculations]
$$State=MA
$$Time=10/1/2000 00:00:00
$InputFile1=sales.txt
$DBConnection_target=sales
$PMSessionLogFile=D:/session logs/firstrun.txt

236
Sample Parameter File

 The following text is an excerpt from a parameter file:


[HET_TGTS.WF:wf_TCOMMIT_INST_ALIAS]
$$platform=unix
[HET_TGTS.WF:wf_TGTS_ASC_ORDR.ST:s_TGTS_ASC_ORDR]
$$platform=unix
$DBConnection_ora=qasrvrk2_hp817
[ORDERS.WF:wf_PARAM_FILE.WT:WL_PARAM_Lvl_1]
$$DT_WL_lvl_1=02/01/2000 00:00:00
$$Double_WL_lvl_1=2.2
[ORDERS.WF:wf_PARAM_FILE.WT:WL_PARAM_Lvl_1.WT:NWL_PARAM_Lvl_2]
$$DT_WL_lvl_2=03/01/2000 00:00:00
$$Int_WL_lvl_2=3
$$String_WL_lvl_2=ccccc

237
LAB 33 - Using Mapping parameters

 Use Order_Details (from MIKONTR db) as the source


 Use Sales as your target file
 Use the following formula for calculating Sales revenue:-
 Sales revenue = UnitPrice * Quantity * (1 – Discount)
 Declare a mapping parameter (USD_to_EUR) which represents the USD/EUR conversion rate
 Specify the value for this mapping parameter in your parameter file
 Use this value to calculate the Sales revenue in Euros

238
LAB 34 - Using Mapping Variables

 Use the mapping in Lab 33


 Define two mapping variables “Total_revenue” and “Total_orders” in the mapping
 Initialize these variables to zero in your parameter file
 Formulae:-
 Total_revenue = ∑ Sales_revenue for each transaction
 Total_orders = COUNT of all distinct orders

239
Designer Features

Chapter 27
Arranging Workspace

241
Propagating Changed Attributes

242
Link Paths

243
Exporting Objects to XML

244
Importing Objects from XML

245
Comparing Objects

246
Other Available Transformations [1/2]
Application Source Qualifier:
Active/Connected
Reads ERP object sources
Custom:
[Active or Passive]/Connected
Calls a procedure in a shared library or DLL.
External Procedure:
Passive/[Connected or Unconnected]
Calls a procedure in a shared library / the COM layer of Windows.
Normalizer:
Active/Connected
Reorganizes records from VSAM, Relational and Flat file
Transaction Control:
Active/Connected
Defines commit and rollback transactions.

247
Other Available Transformations [2/2]
Union:
Active/Connected
Merges data from different databases or flat file systems.
XML Generator:
Active/Connected
Reads data from one or more input ports & outputs XML through a
single output port.
XML Parser:
Active/Connected
Reads XML from one input port and outputs data to one or more output
ports.
XML Source Qualifier:
Active/Connected
Represents the rows that the PowerCenter Server reads from an XML
source when it runs a session

248
Worklets

Chapter 28
Worklets

 A worklet is an object that represents a set of tasks.


 It can contain any task available in the Workflow Manager.
 Use the Worklet Designer to create and edit Worklets.
 You can run Worklets inside a workflow.
 The workflow that contains the worklet is called the parent workflow.
 You can also nest a worklet in another worklet.
 Create a worklet when you want to reuse a set of workflow logic in several workflows.
 The worklet does not contain any scheduling or server information.
 To run a worklet, include the worklet in a workflow.
 The PowerCenter Server writes information about worklet execution in the workflow log.

250
Creating a Reusable Worklet
1. In the Worklet Designer, choose Worklets-Create. The Create Worklet
dialog box appears.

2. Enter a name for the worklet.


3. Click OK.
The Worklet Designer creates a Start task in the worklet.

251
Creating a Non-Reusable Worklet
 You can create non-reusable Worklets in the Workflow Designer as you develop the workflow.
 Non-reusable Worklets only exist in the workflow.
1. In the Workflow Designer, open a workflow.
2. Choose Tasks-Create.
3. Select Worklet for the Task type.
4. Enter a name for the worklet.
5. Click Create.
The Workflow Designer creates the worklet and adds it to the workspace.
6. Click Done.

252
Nesting Worklets

 You can nest a worklet within another worklet.


 When you run a workflow containing nested worklets, the PowerCenter Server runs the nested
worklet from within the parent worklet.
 You might choose to nest worklets to simplify the design of a complex workflow.
 You can group several worklets together by function or simplify the design of a complex
workflow when you nest worklets.
 You might choose to nest worklets to load data to fact and dimension tables.
 Create a nested worklet to load fact and dimension data into a staging area.
 Then, create a nested worklet to load the fact and dimension data from the staging area to
the data warehouse.

253
Multiple Worklets Vs Nesting Worklets

254
Repository Manager

Chapter 29
Repository
 PowerCenter includes the following types of repositories:
 Standalone repository
 Global repository
 Local repository
 Versioned repository

256
Repository
Standalone repository
 A repository that functions individually, unrelated and unconnected to other repositories.
Global repository
 The centralized repository in a domain, a group of connected repositories.
 Each domain can contain one global repository.
 The global repository can contain common objects to be shared throughout the domain through
global shortcuts.

Local repository
 A repository within a domain that is not the global repository.
 Each local repository in the domain can connect to the global repository and use objects in its
shared folders.

257
Repository
Versioned repository
 A global or local repository that allows you to enable version control for the repository.
 A versioned repository can store multiple copies / versions, of an object.
 Each version is a separate object with unique properties.
 Version control features allow you to efficiently develop, test, and deploy metadata into production.
Notes:
 You cannot change a global repository to a local repository, or a versioned repository to a non-
versioned repository.
 However, you can promote an existing local repository to a global repository, and a non-versioned
repository to a versioned repository.
Warning:
 The Informatica repository tables have an open architecture. (Although you can view the repository
tables, Informatica strongly advises against altering the tables or data within the tables)

258
Interface

259
Metadata extensions

 Informatica allows end users and partners to extend the metadata stored in the repository by
associating information with individual objects in the repository.
 PowerCenter Client applications can contain the following types of metadata extensions:
 Vendor-defined.
 User-defined.
 All metadata extensions exist within a domain.

260
Metadata extensions

 Both vendor and user-defined metadata extensions can exist for the following repository
objects:
 Source definitions
 Target definitions
 Transformations
 Mappings
 Mapplets
 Sessions
 Tasks
 Workflows
 Worklets

261
Understanding Workflows

Chapter 30
Running a Workflow

 The PowerCenter Server uses the Load Manager (LM) process and the Data Transformation
Manager Process (DTM) to run the workflow and carry out workflow tasks.
 The Load Manager is the primary PowerCenter Server process.
 It accepts requests from the PowerCenter Client and from pmcmd.
 When the workflow reaches a session, the Load Manager starts the DTM process.
 The DTM process is the process associated with the session task.
 The Load Manager creates one DTM process for each session in the workflow.

263
Running a Workflow

When the PowerCenter Server runs a workflow, the Load Manager performs the following tasks:

1. Locks the workflow and reads workflow properties.


2. Reads the parameter file and expands workflow variables.
3. Creates the workflow log file.
4. Runs workflow tasks.
5. Distributes sessions to worker servers.
6. Starts the DTM to run sessions.
7. Runs sessions from master servers.
8. Sends post-session email if the DTM terminates abnormally.

264
Running a Workflow

 When the PowerCenter Server runs a session, the DTM performs the
following tasks:
1. Fetches session and mapping metadata from the repository.
2. Creates and expands session variables.
3. Creates the session log file.
4. Validates session code pages if data code page validation is enabled.
Checks query conversions if data code page validation is disabled.
5. Verifies connection object permissions.
6. Runs pre-session shell commands.
7. Runs pre-session stored procedures and SQL.
8. Creates and runs mapping, reader, writer, and transformation threads to
extract, transform, and load data.
9. Runs post-session stored procedures and SQL.
10. Runs post-session shell commands.
11. Sends post-session email.

265
Performance Tuning

Chapter 31
Looking for bottlenecks in mapping design

 The goal of performance tuning is to optimize session performance by eliminating performance


bottlenecks.
 To tune the performance of a session, first you identify a performance bottleneck, eliminate it,
and then identify the next performance bottleneck until you are satisfied with the session
performance.
 You can use the test load option to run sessions when you tune session performance.
 Performance bottlenecks can occur in the source and target databases, the mapping, the
session, and the system.
 You can identify performance bottlenecks by the following methods:
 Running test sessions.
 Studying performance details.
 Monitoring system performance.

267
Looking for bottlenecks in mapping design
 Once you determine the location of a performance bottleneck, you can eliminate the bottleneck by
following these guidelines:
 Eliminate source and target database bottlenecks: Have the database administrator
optimize database performance by optimizing the query, increasing the database network
packet size, or configuring index and key constraints.
 Eliminate mapping bottlenecks: Fine tune the pipeline logic and transformation settings and
options in mappings to eliminate mapping bottlenecks.
 Eliminate session bottlenecks: You can optimize the session strategy and use performance
details to help tune session configuration.
 Eliminate system bottlenecks: Have the system administrator analyze information from
system monitoring tools and improve CPU and network performance.

268
Cache management

 The PowerCenter Server creates index and data caches in memory for Aggregator, Rank,
Joiner & Lookup transformations in a mapping.
 The PowerCenter Server stores key values in the index cache and output values in the data
cache.
 You configure memory parameters for the index and data cache in the transformation or
session properties.
 If the PowerCenter Server requires more memory, it stores overflow values in cache files.
 When the session completes, the PowerCenter Server releases cache memory, and in most
circumstances, it deletes the cache files.

269
Caching Storage Overview

270
Memory Cache

 The PowerCenter Server creates a memory cache based on the size configured in the session
properties.
 When you create a mapping, you specify the index and data cache size for each transformation
instance.
 When you create a session, you can override the index and data cache size for each
transformation instance in the session properties.
 When you configure a session, you calculate the amount of memory the PowerCenter Server
needs to process the session.
 Calculate requirements based on factors such as processing overhead and column size for key
and output columns.
 By default, the PowerCenter Server allocates 1,000,000 bytes to the index cache and
2,000,000 bytes to the data cache for each transformation instance.
 If the PowerCenter Server cannot allocate the configured amount of cache memory, it cannot
initialize the session and the session fails.

271
The Golden Rules of Informatica Design

Chapter 32
Why Use The Golden Rules?

To show how the use of best practice benefits a development team
There are many ways to produce the same results – some are more efficient than others.
What might work well in one scenario may be inappropriate in another.
A common standard means all members of a team can follow the logic
Can produce reusable components
Can ensure on a shared platform that the different teams do not impact each other
Prevent outages – for example out of disk space errors
Learn from others mistakes
Reduced rework
Complete projects on time and within budget

273
Golden Rule No. 1 – Set Out Standards

In other words…….. Create a great team!

Naming standards
 Everyone knows what it does
 Easy to pick up someone else’s work
 Don’t end up with 100 connection objects to the same database
 Don’t end up with 100 lookups to the same table
Development Standards
 Annotate Objects clearly
 Audit trail
Shared Object Policy
 Object Stewardship
 Single version of the truth
 Use your shared folder!

274
Golden Rule No. 2 – Know Your Data!

In other words……..talk to the data owners AND see for yourself!

Analyse, analyse and analyse again


 Don’t always accept the statement “there are no errors in this data”
Verify actual values against permitted values
Devise an error handling strategy
 Implement in all mappings
 Don’t rely on the .bad files
Verify the business rules (and get sign off!)
Design and unit test with real data – or at least as realistic as possible
Will design of source and target help or hinder performance?
 Indexes
 Constraints

275
Golden Rule No. 3 – Plan Your Flows

In other words…….. Know where you’re going!

Plan for reuse


 Common lookups
 Common expressions (e.g. date conversions)
 Mapplets
Use appropriate Transformations
 Routers vs multiple filters
 SQ Override vs Joiner/Union
 Expression vs aggregator
Make the most of the resources
 Push processing back to the database
 Why not stage files?
Design for rerun

276
Golden Rule No. 4 – Reduce Data ASAP

In other words…….. Trash what you don’t need!

Only connect ports that you need


 Reduces data passing through transformations
Remove bad data early
 Route to your error handling
 Avoids excessive processing on bad data
 Complex Transforms only on verified data
Filter / Aggregate ASAP
 If possible filter data in source qualifier
 Consider aggregation in SQL override
Minimise Transformations
 Avoid using one expression for one calculation
 You’ll need more DTM if you have excessive transformations

277
Golden Rule No. 5 – Avoid Big Caches

In other words…….. Don’t be a waste of space!

Pre Sort Joiners & Aggregators


 Use the order by in the source qualifier
Only Use necessary ports
 Avoid caching large text ports if the data isn’t needed later
Filter ASAP
 Don’t aggregate / join data you’ll trash later
Utilise the Master / Detail properties
 Make the master the set with the fewest records
Avoid the sorter - order by instead
Need to sort a text file?
 Consider staging it instead!
Size your caches for production

278
Golden Rule No. 6 – Remember It’s Shared

In other words…….. Be a good neighbour!

Use Fewer concurrent sessions


 Running more may cause other teams to go into wait state (and in production this means
missed deadlines)
Use performance stats only when tuning
 Session requires twice the memory which may cause other sessions to fail

279
Don'ts - avoid where possible:

SQL override
– Use source qualifier properties where possible

Multiple filters
– Try a router instead!

Sorters – Very inefficient


Field level stored procedures
– Get called for every row

Extracting more data than you need


– Filter in the source qualifier
– Only map required ports

Aggregator
– Try a SQL override
– Use sorted data where possible
– Create running sums in an expression instead

Use complex rules in filters and routers


Hardcode Values
Give connections environment specific names

280
The Good, the Bad & the Ugly

Informatica Design Examples


The Good………

282
The Bad………..

283
The Ugly…….

284
WORLDWIDE HEADQUARTERS: 6400 SHAFER COURT I ROSEMONT, ILLINOIS USA 60018
TEL. 847.384.6100 I FAX 847.384.0500 I WWW.KANBAY.COM

Thank You!

You might also like