Informatica Training
Informatica Training
1
Introduction and Product Overview
Chapter 1
PowerCenter 8 Architecture
PowerCenter 8.x has a service-oriented architecture that provides the ability to
scale services and share resources across multiple machines.
3
PowerCenter 8 Architecture
4
PowerCenter 8 Server Connectivity
For a session, the PowerCenter Server holds the connection as long as it needs to read data
from source tables or write data to target tables.
5
Designer Overview
Chapter 2
Designer Interface
Designer Windows:
Navigator
Workspace
Status bar
Output
Overview
Instance Data
Target Data
7
Designer Interface
8
Designer Interface
Status bar: Displays the status of the operation you perform.
Output: Provides details when you perform certain tasks, such as saving your work or validating a
mapping. Right-click the Output window to access window options, such as printing output text,
saving text to file, and changing the font size.
Overview: An optional window to simplify viewing workbooks containing large mappings or a large
number of objects. Outlines the visible area in the workspace and highlights selected objects in
color. To open the Overview window, choose View-Overview Window.
Instance Data: View transformation data while you run the Debugger to debug a mapping.
Target Data: View target data while you run the Debugger to debug a mapping. You can view a list
of open windows and switch from one window to another in the Designer.
9
Lab 1 - Setting Connections
10
Naming Conventions
Chapter 3
Naming Conventions
12
Naming Conventions - Transformations:
13
Working With Sources and Targets
Chapter 4
Design Process Overview
15
Methods of Analyzing Sources
To extract data from a source, you must first define sources in the repository.
You can import or create the following types of source definitions in the Source Analyzer:
– Relational database
– Flat file
– COBOL file
– XML object
16
Working with Relational Sources
You can add and maintain relational source definitions for tables, views, and synonyms:
Import source definitions: Import source definitions into the Source Analyzer.
Update source definitions: Update source definitions either manually, or by re-importing
the definition.
17
Importing Relational Source Definitions
You can import relational source definitions from database tables, views, and synonyms.
When you import a source definition, you import the following source metadata:
Source name
Database location
Column names
Datatypes
Key constraints
Note: When you import a source definition from a synonym, you might need
to manually define the constraints in the definition.
To import a source definition, you must be able to connect to the source database from the client
machine using a properly configured ODBC data source or gateway. You may also require read
permission on the database object.
You can also manually define key relationships, which can be logical relationships created in the
repository that do not exist in the database.
18
Importing Relational Source Definitions
To import a source definition:
1. In Source Analyzer, choose Sources-Import from Database.
19
Importing Relational Source Definitions
If no table names appear or if the table you want to import does not
appear, click All.
20
Importing Relational Source Definitions
6. Click OK.
21
Importing Relational Source Definitions
7. Choose Repository-Save
22
Creating Target Definitions
You can create the following types of target definitions in the Warehouse Designer:
Relational: You can create a relational target for a particular database platform. Create
a relational target definition when you want to use an external loader to the target
database.
Flat File: You can create fixed-width and delimited flat file target definitions.
XML File: You can create an XML target definition to output data to an XML file.
23
Importing a Relational Target Definition
When you import a target definition from a relational table, the Designer imports the following
target details:
Target name.
Database location.
Column names.
Datatypes.
Key constraints.
Key relationships.
24
Automatic Target Creation
Drag-and-drop a Source Definition into the Warehouse Designer Workspace
25
Target Definition properties
26
Target Definition properties
27
Data Previewer
Preview data in
Relational Sources
Flat File Sources
Relational Targets
Flat File Targets
Data Preview Option is available in
Source Analyzer
Warehouse Designer
Mapping Designer
Mapplet Designer
28
Data Previewer Source Analyzer
29
Data Previewer Source Analyzer
30
LAB 2 - Creating Source Definitions
31
LAB 3 - Creating Target Definitions
32
DAY 2
Mappings Overview
Chapter 5
Overview
“A mapping is a set of source and target definitions linked by transformation objects that
define the rules for data transformation.”
Mappings represent the data flow between sources and targets.
When the PowerCenter Server runs a session, it uses the instructions configured in the mapping to
read, transform, and write data.
Every mapping must contain the following components:
Source instance: Describes the characteristics of a source table or file.
Transformation: Modifies data before writing it to targets. Use different transformation objects to
perform different functions.
Target instance: Defines the target table or file.
Links: Connect sources, targets, and transformations so the PowerCenter Server can move the
data as it transforms it.
Note:
– A mapping can also contain one or more Mapplets. A mapplet is a set of
transformations that you build in the Mapplet Designer and can use in multiple
mappings.
35
Sample Mapping
36
Developing a Mapping
37
Mapping Validation
Mappings must
Be valid for a session to run
Be end-to-end complete and contain valid expressions
Pass all data flow rules
Mappings are always validated when saved; can be validated without saving
Output window will always display reason for invalidity
38
Transformation Concepts
Chapter 6
Transformation Concepts
40
Active Vs Passive Transformation
Active Passive
Number or rows input may not Number or rows input always
equal number of rows output equals number of rows output
Can operate on groups of data rows Operates on one row at a time
May not be re-linked into another May be re-linked into another
data stream (except into a sorted data stream
join where both flows arise from
the same source qualifier)
e.g. Aggregator, Filter, Joiner, Rank, e.g. Expression, Lookup, External
Normalizer, Source Qualifier, Procedure, Sequence Generator,
Update Strategy, Custom Stored Procedure
41
Transformation Views
42
Transformation Views
43
Ports & Expressions
Ports are present in each transformation and are used to propagate the field values from the
source to the target via the transformations.
Ports are basically of 3 types:-
Input
Output
Variable
Ports evaluation follows the Top-Down Approach
An Expression is a calculation or conditional statement added to a transformation.
An Expression can be composed or Ports, Functions, operators, variables, literals, return
values & constants.
44
Ports - Evaluation
The best practice recommends the following approach for port evaluation
Input Ports:
Should be evaluated first
There is no evaluation ordering among input ports (as they do not depend on any other ports)
Variable Ports:
Should be evaluated after all input ports are evaluated (as variable ports can reference any
input port)
Variable ports can reference other variable ports also but not any output ports.
Ordering of variables is also very important as they can reference each other’s values.
45
Ports - Evaluation
Output Ports:
Should be evaluated last
They can reference any input port or any variable port.
There is no ordered evaluation of output ports (as they cannot reference each other)
46
Using Variable Ports
Also known as Local variables.
Used for temporary storage
Used to simplify complex expressions
E.g. – create and store a depreciation formula to be referenced more than once
Used in another variable port or output port expression
A variable port cannot also be an input or output port.
Available in the Expression, Aggregator and Rank.
Variable ports are NOT visible in Normal view, only in Edit view
47
Using Variable Ports
The scope of variable ports is limited to a single transformation.
Variable ports are initialized to either ‘zero’ (for numeric values) or ‘empty string’ (for character & date
variables) when the Mapping logic is processed.
They are not initialized to ‘Null’
Variable ports can remember values across rows (useful for comparing values) & they retain their
values until the next evaluation of the variable expression.
Thus we can effectively use the order of variable ports to do procedural computation.
48
Default Values – Two Usages
49
Expressions
50
Informatica Data Types
Note:
a) Transformation data types allow mix-n-match of source and target
database types
b) When connecting ports, native and transformation data types must
be either compatible or explicitly converted
51
Source Qualifier Transformation
Chapter 7
What does it do?
When you add a relational or a flat file source definition to a mapping, you need to connect it to
a Source Qualifier transformation.
The Source Qualifier transformation represents the rows that the PowerCenter Server reads
when it runs a session.
You cannot directly connect sources to targets.
You need to connect them through a Source Qualifier transformation for relational and flat file
sources, or through a Normalizer transformation for COBOL sources.
Can also be used for Homogeneous Joins.
53
Source Qualifier Transformation
Active Transformation
Connected
Port
All Input/Output
Usage ( only applicable for relational sources)
Modify SQL statements
User defined Join
Source Filter
Sorted ports
Select Distinct
Pre/Post SQL
Convert Data Types
54
Source Qualifier Transformation
55
Default Query
For relational sources, the PowerCenter Server generates a query for each Source Qualifier
transformation when it runs a session.
The default query is a SELECT statement for each source column used in the mapping. Thus, the
PowerCenter Server reads only the columns that are connected to another transformation.
Although there are many columns in the source definition, only three columns are connected to
another transformation.
In this case, the PowerCenter Server generates a default query that selects only those three
columns:
SELECT CUSTOMERS.CUSTOMER_ID, CUSTOMERS.COMPANY, CUSTOMERS.FIRST_NAME FROM CUSTOMERS
56
Joining Multiple sources
You can use one Source Qualifier transformation to join data from multiple relational tables.
These tables must be accessible from the same instance or database server.
When a mapping uses related relational sources, you can join both sources in one Source Qualifier
transformation.
Default join is inner equi-join (where Src1.col_nm = Src2.col_nm) if the relationship between the
tables is defined in the Source Analyzer
This can increase performance when source tables are indexed.
Tip: Use the Joiner transformation for heterogeneous sources and to join flat files.
57
Joining Multiple sources
58
LAB 4 - Source Qualifier (Simple Mapping)
Create a Mapping using Employees as the Source and Employees as the Target instance
No other transformations are required.
Ensure target name is user specific (e.g.: Participant user1 should use user1.Employees)
59
Workflows- I
Chapter 8
Workflow Manager Tools
Workflow Designer
Maps the execution order and dependencies of Sessions, Tasks & Worklets, for the
Informatica Server
Task Developer
Create Session, Shell Command and Email Tasks
Tasks created in the Task Developer are reusable
Worklet Designer
Creates objects that represent a set of tasks
Worklet objects are reusable
61
Workflow Manager Interface
A workflow is a set of instructions that describes how and when to run tasks related to
extracting, transforming, and loading data.
The PowerCenter Server runs workflow tasks according to the conditional links connecting the tasks.
Workflow Manager is used to develop and manage workflows.
Workflow Monitor is used to monitor workflows and stop the PowerCenter Server.
When a workflow starts, the PowerCenter Server retrieves mapping, workflow, and session metadata
from the repository to extract data from the source, transform it, and load it into the target.
You can run as many sessions in a workflow as you need.
You can run the Session tasks sequentially or concurrently, depending on your needs.
63
Session Overview
A session is a set of instructions that tells the PowerCenter Server how and when to move
data from sources to targets.
A mapping is a set of source and target definitions linked by transformation objects that
define the rules for data transformation.
To run a session, you must first create a workflow to contain the Session task.
64
Link Task
65
Session Task
Chapter 9
Session Task
67
Session Task
Session Task Tabs :
General
Properties
Config Object
Mapping
Components
Metadata Extensions
68
Session Task
69
Session Task
70
Workflows Monitor Overview
Chapter 10
Monitor Workflows
The Workflow Monitor is the tool for monitoring Workflows and Tasks
Review details about a Workflow or Tasks in two views:
Gantt Chart view
Task view
The Workflow Monitor displays Workflows that have been run at least once
72
Gantt Chart View
73
Task View
74
Monitoring Workflows
Perform operations in the Workflow Monitor
Restart: restart a Task, Workflow or Worklet
Stop: stop a Task, Workflow or Worklet
Abort: abort a Task, Workflow or Worklet
Resume: resume a suspended Workflow after a failed Task is corrected
View Session and Workflow logs
Abort has a 60 second timeout
If the Server has not completed processing and committing data during the timeout period,
the threads and processes associated with the Session are killed.
75
Sequence Generator Transformation
Chapter 11
What does it do?
77
Sequence Generator Transformation
78
Sequence Generator Transformation
79
Example
Input Output
The rows in the Source have to be loaded into the target with a Unique
ID generated for each record.
Here the Sequence generator helps in creating the IDs for each record
in the target.
80
Sequence Generator Properties
Properties
Start value
End Value
Increment By
Number of cached values
Reset
Cycle
Design tip: Set Reset property and Increment by 1. Use in conjunction with lookup. Lookup to
get max(value) from target. Add NextVal to it to get the new ID.
81
LAB 5 - Sequence Generator (1)
82
LAB 6 - Sequence Generator (2)
83
Expression Transformation
Chapter 12
What does it do?
You can use the Expression transformation to calculate values in a single row before you write
to the target.
For example, you might need to adjust employee salaries, concatenate first and last
names, or convert strings to numbers.
You can use the Expression transformation to perform any non-aggregate calculations.
You can also use the Expression transformation to test conditional statements before you
output the results to target tables or other transformations.
85
Expression Transformation
Passive Transformation
Connected
Ports
– Mixed
– Variables allowed
Create expression in output or variable port
Used to perform majority of data manipulation
86
Expression Transformation
87
Expression Editor
An expression formula is a calculation or conditional statement for a specific port in a
transformation
Performs calculation based on ports, functions, operators, variables, constants, and return values
from other transformations
88
Expression Editor
89
Example
Source Target
90
LAB 7 - Expression Transformation (1)
Create a mapping using the Employee flat file as source, DIM_EMPLOYEE as the target
Concatenate First Name and Last Name to get Employee Name
Ensure all leading and trailing spaces are removed for character columns
Use NEXTVAL of Sequence Generator transformation to connect to Employee_wk
Target load will be truncate / load.
Do not connect geography_wk, region_nk, region_name and direct_report_wk
91
LAB 8 - Expression Transformation (2)
92
LAB 9 - Expression Transformation (3)
Source:- Employees
Target:- Employee_LAB_9 File
In the target file we need
First name should have the first alphabet in upper case and the rest in lower case
Last name should be in upper case
Also compute the employees Age in years
93
LAB 10 - Expression Transformation (4)
Source:- Orders
Target:- lab_10_order_dates (Flat file)
For each order id, find the day, month, year and quarter from the order date.
In addition, find the 1st day of the quarter and the last day of the quarter using an expression
transformation only.
94
DAY 3
Filter Transformation
Chapter 13
What does it do?
97
Filter Transformation
98
Filter Transformation
100
Example
Input Output
101
LAB 11 - Filter Transformation (1)
102
LAB 12 - Filter Transformation (2)
103
LAB 13 - Filter Transformation (3)
Source:- Orders
Load data to the 3 target tables which should contain only for the months of October,
November and December respectively
Orders_Oct
Orders_Nov
Orders_Dec
104
Router Transformation
Chapter 14
What does it do?
106
Router Transformation
107
Router Groups
Input group (always one)
User-defined groups
Each group has one condition
All group conditions are evaluated for each row
One row can pass multiple conditions
Unlinked group outputs are ignored
Default group (always one) can capture rows that fail all
Group conditions
108
Router Group Filter Conditions
109
Using Router in a mapping
110
Example
Source Targets
111
Filter Vs Router
112
LAB 14 - Router Transformation (1)
113
LAB 15 - Router Transformation (2)
114
Joiner Transformation
Chapter 15
What does it do?
You can use the Joiner transformation to join source data from two related heterogeneous
sources residing in different locations or file systems.
The Joiner transformation joins two sources with at least one matching port.
The Joiner transformation uses a condition that matches one or more pairs of ports between
the two sources.
If you need to join more than two sources, you can add more Joiner transformations to the
mapping.
The Joiner transformation requires input from two separate pipelines or two branches from one
pipeline.
116
Joiner Transformation
Active/Connected
Ports
Input
Output
Master
117
Joins Types
Homogeneous Joins
Joins that can be performed with a SQL SELECT statement
Source Qualifier contains a SQL join
Tables on same database server(or are synonyms)
Database server does the join “work”
Multiple Homogeneous joins can be joined
Heterogeneous Joins
Examples of joins that cannot be done with an SQL statement :
An Oracle table and a DB2 table
Two flat files
A flat file and a database table
118
Heterogeneous Joins
119
Joiner Properties
Join Types:
Normal (inner)
Master Outer
Detail Outer
Full Outer
Joiner can accept sorted data (configure the join condition to use the sort origin ports)
Joiner Conditions & Nested Joins:
Multiple Join conditions are supported
Used to join three or more heterogeneous sources
120
Join Types – 1. Normal Join
With a Normal join, the PowerCenter Server discards all rows of data from
the master and detail source that do not match, based on the condition.
Source tables
121
Join Types – 2. Master Outer Join
A master outer join keeps all rows of data from the detail source and the
matching rows from the master source. (It discards the unmatched rows
from the master source.)
Source tables
122
Join Types – 3. Detail Outer Join
A detail outer join keeps all rows of data from the master source and the
matching rows from the detail source. (It discards the unmatched rows
from the detail source.)
Source tables
123
Join Types – 4. Full Outer Join
A full outer join keeps all rows of data from both the master and detail
sources.
Source tables
124
LAB 16 - Joiner Transformation (1)
125
LAB 17 - Joiner Transformation (2)
126
LAB 18 - Joiner Transformation (3)
127
LAB 19 - Joiner Transformation (4)
128
Aggregator Transformation
Chapter 16
What does it do?
The Aggregator transformation allows you to perform aggregate calculations, such as averages
and sums.
The Aggregator transformation is unlike the Expression transformation, in that you can use it to
perform calculations on groups.
The Expression transformation permits you to perform calculations on a row-by-row basis only.
When using the transformation language to create aggregate expressions, you can use
conditional clauses to filter rows, providing more flexibility than SQL language.
The PowerCenter Server performs aggregate calculations as it reads, and stores necessary
data group and row data in an aggregate cache.
The PowerCenter Server typically returns the last row’s value for all the non-aggregated fields
with the result of the aggregation.
130
Example
131
Aggregator Transformation
132
Aggregator Transformation
133
Aggregate Expressions
Aggregate functions are supported only in the
Aggregator Transformation
135
Aggregate Properties
Sorted Data (can be aggregated more efficiently)
The Aggregator can handle sorted or unsorted data
The Server will cache data from each group and release the cached data - upon reaching the
first record of the next group
Data must be sorted according to the order of the Aggregator “Group By” ports
Performance gain will depend upon varying factors
Sorted Input property
Instructs the Aggregator to expect the data to be sorted
If you use sorted input and do not presort data correctly, you receive unexpected results.
136
Sorted Vs Unsorted Input
137
LAB 20 - Aggregator Transformation (1)
Create a mapping with Sources as Orders, OrderDetails
Source & Target connection is Train_Ora_Tgt
Target is Fact_Orders
Aggregate at Order_ID level
Formulae:
lead_time_days = requireddate - orderdate,
internal_response_time_days = shippeddate - orderdate,
external_response_time_days = requireddate - shippeddate
total_order_item_count = SUM(Quantity)
total_order_discount_dollars = SUM((Quantity * UnitPrice) * Discount)
total_order_dollars = SUM((Quantity * UnitPrice) - ((Quantity * UnitPrice) * Discount))
DEFAULT to -1 for customer_wk, employee_wk, order_date_wk, required_date_wk,
shipped_date_wk, ship_to_geography_wk, shipper_wk
138
LAB 21 - Aggregator Transformation (2)
139
Using Shared Objects: Mapplets
Chapter 17
Overview
141
Components
142
Example: Viewing Mapplet Input and Output
143
Example - Sample Mapplet in a Mapping
144
Mapplet Input Transformation
145
Mapplet Output Transformation
146
Example - Sample Mapplet in a Mapping
147
Mapplet Source Options
Internal Sources
One or more Source definitions / Source Qualifiers within the Mapplet
External Sources
Mapplet contains a Mapplet Input Transformation
Receives data from the Mapping it is used in
Mixed Sources
Mapplet contains one or more of either of a Mapplet Input transformations AND one or more of
Source Qualifiers
Receives data from the Mapping it is used in, AND from the Mapplet
148
Mapplet Data Sources
149
Mapplet with Multiple Output Groups
150
Unmapped Mapplet Output Groups
Disallowed:
Mapplet Output Group NOT linked
Link at least one port
151
Unsupported Transformations
152
Active and Passive Mapplets
153
LAB A - Creating a Mapplet
154
LAB B - Using a Mapplet in a Mapping
155
DAY 4
File List Option
Chapter 18
File List Basics
You can create a session to run multiple source files for one source instance in the mapping.
You might use this feature if, for example, your company collects data at several locations
which you then want to move through the same session.
When you create a mapping to use multiple source files for one source instance, the properties
of all files must exactly match the source definition.
To use multiple source files, you create a file containing the names and directories of each
source file you want the PowerCenter Server to use.
This file is referred to as a “File list”.
When the session starts, the PowerCenter Server reads the file list, then locates and reads the
first file source in the list.
After the PowerCenter Server reads the first file, it locates and reads the next file in the list.
158
File List - Session Properties
159
LAB C - Using similar input files in a single Mapping
Use the 3 files generated as output from the Mapplet exercise as sources
Use the All_Counts file as target
Use the filelist option in the session properties to read data from all 3 files and specify
“All_Counts” as the single target
160
Lookup Transformation
Chapter 19
What does it do?
Use a Lookup transformation in a mapping to look up data in a flat file or a relational table,
view, or synonym.
You can import a lookup definition from any flat file or relational database to which both the
PowerCenter Client and Server can connect.
You can use multiple Lookup transformations in a mapping.
The PowerCenter Server queries the lookup source based on the lookup ports in the
transformation.
It compares Lookup transformation port values to lookup source column values based on the
lookup condition and passes the result (of the lookup) to other transformations and a target.
162
Example
How to find the department name for each employee by using a Lookup
transformation?
This is determined by matching the Dept# from Input & Lookup tables
Output
163
Lookup Transformation
Passive Transformation
Connected/Unconnected
Ports
Mixed
“L” indicates Lookup port
“R” indicates port used as a return value
Usage
Get related values
Verify if records exist or if data has changed
Multiple conditions are supported
Lookup SQL override is allowed
164
Lookup Transformation
165
Lookup Transformation
166
Lookup Properties
Lookup conditions
Lookup Table Name
Lookup SQL
Native Database connection
Object name
167
How a Lookup Transformation works
For each mapping row, one or more port values are looked up in a database table
If a match is found, one or more table values are returned to the mapping. If no match is found,
default value is returned
168
Lookup Caching
Caching can significantly impact performance
Cached
– Lookup table data is cached locally on the server
– Mapping rows are looked up against the cache
– Only one SQL SELECT is needed
– Cache is indexed based on the order by clause
Uncached
– Each Mapping row needs one SQL SELECT
If the data does not fit in the memory cache, the PowerCenter Server stores the overflow values
in the cache files.
When the session completes, the PowerCenter Server releases cache memory and deletes the
cache files unless you configure the Lookup transformation to use a persistent cache.
Rule of thumb: Cache if the number (and size) of records in the lookup table is
small relative to the number of mapping rows requiring the lookup
169
Lookup Caches
When configuring a lookup cache, you can specify any of the following options:
Static cache:
Dynamic cache:
Persistent cache:
Shared cache:
170
Lookup Policy on Multiple Match
Options are
Use first value
Use last value
Report error
171
LAB 22 - Using Connected Lookup (1)
172
LAB 23 - Using Connected Lookup (2)
Orders.EmployeeID = DIM_EMPLOYEE.employee_nk
shipper_wk:
Orders.ShipVia = Dim_Shipper.Shipper_nk
173
LAB 24 - Using Connected Lookup (3)
Source:- Employees
Target:- Emp_Manager
Use a lookup transformation to find the manager for each employee
Load the employee name and manager name into the target
If a person does not have a manager then do not load that record to the target
174
Unconnected Lookups
Chapter 20
Unconnected Lookup
176
Conditional Lookup Technique
Two requirements:
1.Must be Unconnected(or “function mode”) Lookup
2.Lookup function used within a conditional statement
E.g - IIF(ISNULL(cust_id), :lkp.MYLOOKUP(order_no)
Conditional statement is evaluated for each row
Lookup function is called only under the pre-defined condition
177
Conditional Lookup Advantage
Data lookup is performed only for those rows which require it.Substantial performance can be
gained
E.g.- A Mapping will process 500,000 rows. For two percent of those rows(10,000) the item_id
value is NULL. Item_id can be derived from the SKU_NUMB
IIF(ISNULL(item_id),:lkp.MYLOOKUP(sku_numb))
Net Savings=490,000 lookups
178
Unconnected Lookup Functionality
One Lookup port value (Return Port) may be returned for each Lookup
WARNING:
If the Return port is not defined, you may get unexpected results.
179
Connected Vs Unconnected Lookups
Returns multiple values (by Returns one value (by checking the
linking output ports to another Return port option for the output port
transformation that provides the return value)
180
LAB 25 - Using Unconnected Lookup (1)
181
LAB 26 - Using Unconnected Lookup (2)
Use LAB 22 and do the same assignment by replacing the connected lookup with unconnected
lookup.
Return dept_name alone from the Unconnected lookup.
182
Target Instances
Chapter 21
Target Instances
A single mapping can have more than one instance of the same
target
The data would be loaded into the instances in a pipeline
Usage of multiple instances of the same target for loading is
dependant on the RDBMS in use. Multiple instances may not be used
if the underlying database locks the entire table while inserting
records
184
Target Instances - example
185
Update Strategy Transformation
Chapter 22
What does it do?
Update Strategy transformations are essential if you want to flag rows destined for the same
target for different database operations (insert / update / delete), or if you want to reject rows.
In PowerCenter, you set your update strategy at two different levels:
Within a mapping: Within a mapping, you use the Update Strategy transformation to flag
rows for insert, delete, update, or reject.
Within a session: When you configure a session, you can instruct the PowerCenter
Server to either treat all rows in the same way (for example, treat all rows as inserts), or
use instructions coded into the session mapping to flag rows for different database
operations.
187
Update Strategy Transformation
188
Update Strategy Transformation
189
Update Strategy expressions
190
LAB 27 - Update Strategy (1)
Add Employee as Source. Add 2 instances of DIM_EMPLOYEE as Target
Add a Lookup Transformation (LKP_Target) to get employee_wk from DIM_EMPLOYEE.
Add Expression Transformation for trimming string columns and getting values from LKP_Target
Add Router Transformation to separate the data flow for New and Existing Records
Add 2 Update Strategy Transformations to Flag for Insert and Update
Add a Sequence Generator for populating the employee_wk for insert rows.
Add an Unconnected Lookup to retrieve the Max_value of employee_wk from the Target
Add an Expression Transformation (EXP_MAX_SEQ - between the Update Strategy for insert and the
Target instance for insert) to call the unconnected lookup
Note: Run LAB 11 first. Some rows are filtered & then run this workflow
191
LAB 28 - Update Strategy (2)
Use problem statement in LAB 27 and solve the same by replacing the router and 2 update
strategy with a single update strategy.
192
Workflows – Additional Tasks
Chapter 23
Additional Workflow Tasks
194
Reusable Tasks
195
Command Task
Specify one or more Unix shell or DOS commands to run during the Workflow
Runs in the Informatica Server (Unix or Windows) environment
Shell command status (successful completion or failure) is held in the pre-defined variable
“$command_task_name.STATUS”
Each command Task shell command can execute before the Session begins or after the
Informatica Server executes a Session
Specify one or more Unix shell or DOS (NT, WIn2000) commands to run at a specific point in
the Workflow
Becomes a component of a Workflow (or Worklet)
196
Command Task
197
LAB 29 - Using Command Task
198
Email Task
Configure to have the Informatica Server to send email at any point in the Workflow
Becomes a component in a Workflow (or Worklet)
If configured in the Task Developer, the Email Task is reusable(optional)
199
LAB 30 - Using Email Task
200
Non-reusable Tasks
Assignment
Timer
Control
Event Wait
Event Raise
201
Decision Task
202
Decision Task
203
Assignment Task
204
Timer Task
205
LAB 31 - Using Timer Task
206
Control Task
Used to stop, abort, or fail the top-level workflow or the parent workflow based on an input link
condition.
A parent workflow or worklet is the workflow or worklet that contains the Control task.
207
Event Wait Task
208
Event Raise Task
209
Sorter Transformation
Chapter 24
What does it do?
211
Sorter Transformation
212
Sorter Transformation
213
Example
Input
Output
214
Sorter Transformation
Sorter Properties
Cache size
– Can be adjusted. [Default is 8MB]
– Server uses twice the cache listed
– If cache size is unavailable, Session Task will fail
215
Rank Transformation
Chapter 25
What does it do?
The Rank transformation allows you to select only the top or bottom rank of data.
You can use a Rank transformation
to return the largest or smallest numeric value in a port or group.
to return the strings at the top or the bottom of a session sort order.
During the session, the PowerCenter Server caches input data until it can perform the rank
calculations.
The Rank transformation differs from the transformation functions MAX and MIN, in that it
allows you to select a group of top or bottom values, not just one value.
While the SQL language provides many functions designed to handle groups of data,
identifying top or bottom strata within a set of rows is not possible using standard SQL
functions.
You can also write expressions to transform data or perform calculations.
217
Rank Transformation
218
Overview
You can use a Rank transformation to:-
Return the largest / smallest numeric value in a port or group.
Return the strings at the top / bottom of a session sort order.
219
Overview
Rank transformation allows you to group information (like Aggregator) create local variables and
write non-aggregate expressions.
The Rank transformation differs from the transformation functions MAX and MIN, in that it allows
you to select a group of top or bottom values, not just one value.
You can connect ports from only one transformation to the Rank transformation.
The Rank transformation includes input or input / output ports connected to another transformation
in the mapping.
It also includes variable ports and one rank port.
Use the rank port to specify the column you want to rank.
220
Rank Index
The Designer automatically creates a RANKINDEX port for each Rank transformation.
The PowerCenter Server uses the Rank Index port to store the ranking position for each row in
a group.
For example, if you create a Rank transformation that ranks the top five salespersons for each
quarter, the rank index numbers the salespeople from 1 to 3:
RANKINDEX SALES_PERSON SALES
1 Sam 10,000
2 Mary 9,000
3 Alice 8,000
221
Rank Index
If two rank values match, they receive the same value in the rank index and the transformation
skips the next value.
For example, if you want to see the top five retail stores in the country and two stores have the
same sales, the return data might look similar to the following:
RANKINDEX SALES STORE
1 10000 Orange
1 10000 Brea
3 90000 Los Angeles
4 80000 Ventura
222
DAY 5
Mapping Parameters and Variables
Chapter 26
Mapping Parameters & Variables
225
Mapping Parameters & Variables
Variables can change in value during run-time.
Parameters remain constant during run-time.
Provides increased developmental flexibility.
If you declare mapping parameters and variables in a mapping, you can reuse a mapping by altering
the parameter and variable values of the mapping in the session. (This can reduce the overhead of
creating multiple mappings when only certain attributes of a mapping need to be changed)
226
Mapping Variables
When you use a mapping variable, you declare the variable in the mapping or mapplet, and
then use a variable function in the mapping to change the value of the variable.
At the beginning of a session, the PowerCenter Server evaluates references to a variable to its
start value.
At the end of a successful session, the PowerCenter Server saves the final value of the
variable to the repository.
The next time you run the session, the PowerCenter Server evaluates references to the
variable to the saved value.
You can override the saved value by defining the start value of the variable in a parameter file.
227
Declarations
Declare Variables / Parameters in the Mappings / Mapplets menu
Properties that can be set :
User-defined Names
Appropriate aggregation type (count, min etc)
Optional initial value
Apply Parameter / Variable in formula
228
System Variables
SYSDATE: Provides current datetime on the Informatica Server machine
Not a static value
$$$SessStartTime: Returns the system date value as a String. Uses system clock on Informatica
Server machine.
String format is database type dependent
Used in SQL override
Has a constant value
SESSSTARTTIME: Returns the system date value on the Informatica Server
Used with any function that accepts transformation date / time data types
Not to be used in SQL override
Has a constant value
229
Functions to Set Mapping Variables
SetCountVariable:
Counts the number of evaluated rows and increments or decrements a mapping variable
for each row
SetMaxVariable:
Evaluates the value of a mapping variable to the higher of two values
SetMinVariable:
Evaluates the value of a mapping variable to the lower of two values
SetVariable:
Sets the value of a mapping variable to a specified value
230
Using in mappings & mapplets
When you create a reusable transformation in the Transformation Developer, you can use any
mapping parameter or variable.
The Designer validates the usage of any mapping parameter or variable in the expressions of
reusable transformation.
When you use the reusable transformation in a mapplet or mapping, the Designer validates the
expression again.
When the Designer validates a mapping variable in a reusable transformation, it treats the variable
as an Integer datatype.
You cannot use mapping parameters and variables interchangeably between a mapplet and a
mapping.
Mapping parameters and variables declared for a mapping cannot be used within a mapplet.
Similarly, you cannot use a mapping parameter or variable declared for a mapplet in a
mapping.
231
Session and workflow parameter files
You can use a parameter file to define the values for parameters and variables used in a workflow,
worklet, or session.
You can create a parameter file using a text editor.
You list the parameters or variables & their values in the parameter file.
Parameter files can contain the following types of parameters and variables:
Workflow variables
Worklet variables
Session parameters
Mapping parameters and variables.
When you use parameters or variables in a workflow, worklet, or session, the PowerCenter Server
checks the parameter file to determine the start value of the parameter or variable.
You can use a parameter file to initialize workflow variables, worklet variables, mapping parameters,
and mapping variables.
232
Session and workflow parameter files
You can place parameter files on the PowerCenter Server machine or on a local machine.
You can include parameter or variable information for more than one workflow, worklet, or
session in a single parameter file by creating separate sections for each object within the
parameter file.
You can also create multiple parameter files for a single workflow, worklet, or session and
change the file that these tasks use as needed.
233
Parameter File Format
When you enter values in a parameter file, you must precede the entries with a heading that
identifies the workflow, worklet, or session whose parameters and variables you want to
assign.
You assign individual parameters and variables directly below this heading, entering each
parameter or variable on a new line.
You can list parameters and variables in any order for each task.
234
Parameter File Format
235
Parameter File Format
Below each heading, you define parameter and variable values as follows:
parameter1 name=value
parameter2 name=value
variable1 name=value
variable2 name=value
The parameter file for the session includes the folder and session name, as well as each parameter
and variable:
[Production.s_MonthlyCalculations]
$$State=MA
$$Time=10/1/2000 00:00:00
$InputFile1=sales.txt
$DBConnection_target=sales
$PMSessionLogFile=D:/session logs/firstrun.txt
236
Sample Parameter File
237
LAB 33 - Using Mapping parameters
238
LAB 34 - Using Mapping Variables
239
Designer Features
Chapter 27
Arranging Workspace
241
Propagating Changed Attributes
242
Link Paths
243
Exporting Objects to XML
244
Importing Objects from XML
245
Comparing Objects
246
Other Available Transformations [1/2]
Application Source Qualifier:
Active/Connected
Reads ERP object sources
Custom:
[Active or Passive]/Connected
Calls a procedure in a shared library or DLL.
External Procedure:
Passive/[Connected or Unconnected]
Calls a procedure in a shared library / the COM layer of Windows.
Normalizer:
Active/Connected
Reorganizes records from VSAM, Relational and Flat file
Transaction Control:
Active/Connected
Defines commit and rollback transactions.
247
Other Available Transformations [2/2]
Union:
Active/Connected
Merges data from different databases or flat file systems.
XML Generator:
Active/Connected
Reads data from one or more input ports & outputs XML through a
single output port.
XML Parser:
Active/Connected
Reads XML from one input port and outputs data to one or more output
ports.
XML Source Qualifier:
Active/Connected
Represents the rows that the PowerCenter Server reads from an XML
source when it runs a session
248
Worklets
Chapter 28
Worklets
250
Creating a Reusable Worklet
1. In the Worklet Designer, choose Worklets-Create. The Create Worklet
dialog box appears.
251
Creating a Non-Reusable Worklet
You can create non-reusable Worklets in the Workflow Designer as you develop the workflow.
Non-reusable Worklets only exist in the workflow.
1. In the Workflow Designer, open a workflow.
2. Choose Tasks-Create.
3. Select Worklet for the Task type.
4. Enter a name for the worklet.
5. Click Create.
The Workflow Designer creates the worklet and adds it to the workspace.
6. Click Done.
252
Nesting Worklets
253
Multiple Worklets Vs Nesting Worklets
254
Repository Manager
Chapter 29
Repository
PowerCenter includes the following types of repositories:
Standalone repository
Global repository
Local repository
Versioned repository
256
Repository
Standalone repository
A repository that functions individually, unrelated and unconnected to other repositories.
Global repository
The centralized repository in a domain, a group of connected repositories.
Each domain can contain one global repository.
The global repository can contain common objects to be shared throughout the domain through
global shortcuts.
Local repository
A repository within a domain that is not the global repository.
Each local repository in the domain can connect to the global repository and use objects in its
shared folders.
257
Repository
Versioned repository
A global or local repository that allows you to enable version control for the repository.
A versioned repository can store multiple copies / versions, of an object.
Each version is a separate object with unique properties.
Version control features allow you to efficiently develop, test, and deploy metadata into production.
Notes:
You cannot change a global repository to a local repository, or a versioned repository to a non-
versioned repository.
However, you can promote an existing local repository to a global repository, and a non-versioned
repository to a versioned repository.
Warning:
The Informatica repository tables have an open architecture. (Although you can view the repository
tables, Informatica strongly advises against altering the tables or data within the tables)
258
Interface
259
Metadata extensions
Informatica allows end users and partners to extend the metadata stored in the repository by
associating information with individual objects in the repository.
PowerCenter Client applications can contain the following types of metadata extensions:
Vendor-defined.
User-defined.
All metadata extensions exist within a domain.
260
Metadata extensions
Both vendor and user-defined metadata extensions can exist for the following repository
objects:
Source definitions
Target definitions
Transformations
Mappings
Mapplets
Sessions
Tasks
Workflows
Worklets
261
Understanding Workflows
Chapter 30
Running a Workflow
The PowerCenter Server uses the Load Manager (LM) process and the Data Transformation
Manager Process (DTM) to run the workflow and carry out workflow tasks.
The Load Manager is the primary PowerCenter Server process.
It accepts requests from the PowerCenter Client and from pmcmd.
When the workflow reaches a session, the Load Manager starts the DTM process.
The DTM process is the process associated with the session task.
The Load Manager creates one DTM process for each session in the workflow.
263
Running a Workflow
When the PowerCenter Server runs a workflow, the Load Manager performs the following tasks:
264
Running a Workflow
When the PowerCenter Server runs a session, the DTM performs the
following tasks:
1. Fetches session and mapping metadata from the repository.
2. Creates and expands session variables.
3. Creates the session log file.
4. Validates session code pages if data code page validation is enabled.
Checks query conversions if data code page validation is disabled.
5. Verifies connection object permissions.
6. Runs pre-session shell commands.
7. Runs pre-session stored procedures and SQL.
8. Creates and runs mapping, reader, writer, and transformation threads to
extract, transform, and load data.
9. Runs post-session stored procedures and SQL.
10. Runs post-session shell commands.
11. Sends post-session email.
265
Performance Tuning
Chapter 31
Looking for bottlenecks in mapping design
267
Looking for bottlenecks in mapping design
Once you determine the location of a performance bottleneck, you can eliminate the bottleneck by
following these guidelines:
Eliminate source and target database bottlenecks: Have the database administrator
optimize database performance by optimizing the query, increasing the database network
packet size, or configuring index and key constraints.
Eliminate mapping bottlenecks: Fine tune the pipeline logic and transformation settings and
options in mappings to eliminate mapping bottlenecks.
Eliminate session bottlenecks: You can optimize the session strategy and use performance
details to help tune session configuration.
Eliminate system bottlenecks: Have the system administrator analyze information from
system monitoring tools and improve CPU and network performance.
268
Cache management
The PowerCenter Server creates index and data caches in memory for Aggregator, Rank,
Joiner & Lookup transformations in a mapping.
The PowerCenter Server stores key values in the index cache and output values in the data
cache.
You configure memory parameters for the index and data cache in the transformation or
session properties.
If the PowerCenter Server requires more memory, it stores overflow values in cache files.
When the session completes, the PowerCenter Server releases cache memory, and in most
circumstances, it deletes the cache files.
269
Caching Storage Overview
270
Memory Cache
The PowerCenter Server creates a memory cache based on the size configured in the session
properties.
When you create a mapping, you specify the index and data cache size for each transformation
instance.
When you create a session, you can override the index and data cache size for each
transformation instance in the session properties.
When you configure a session, you calculate the amount of memory the PowerCenter Server
needs to process the session.
Calculate requirements based on factors such as processing overhead and column size for key
and output columns.
By default, the PowerCenter Server allocates 1,000,000 bytes to the index cache and
2,000,000 bytes to the data cache for each transformation instance.
If the PowerCenter Server cannot allocate the configured amount of cache memory, it cannot
initialize the session and the session fails.
271
The Golden Rules of Informatica Design
Chapter 32
Why Use The Golden Rules?
To show how the use of best practice benefits a development team
There are many ways to produce the same results – some are more efficient than others.
What might work well in one scenario may be inappropriate in another.
A common standard means all members of a team can follow the logic
Can produce reusable components
Can ensure on a shared platform that the different teams do not impact each other
Prevent outages – for example out of disk space errors
Learn from others mistakes
Reduced rework
Complete projects on time and within budget
273
Golden Rule No. 1 – Set Out Standards
Naming standards
Everyone knows what it does
Easy to pick up someone else’s work
Don’t end up with 100 connection objects to the same database
Don’t end up with 100 lookups to the same table
Development Standards
Annotate Objects clearly
Audit trail
Shared Object Policy
Object Stewardship
Single version of the truth
Use your shared folder!
274
Golden Rule No. 2 – Know Your Data!
275
Golden Rule No. 3 – Plan Your Flows
276
Golden Rule No. 4 – Reduce Data ASAP
277
Golden Rule No. 5 – Avoid Big Caches
278
Golden Rule No. 6 – Remember It’s Shared
279
Don'ts - avoid where possible:
SQL override
– Use source qualifier properties where possible
Multiple filters
– Try a router instead!
Aggregator
– Try a SQL override
– Use sorted data where possible
– Create running sums in an expression instead
280
The Good, the Bad & the Ugly
282
The Bad………..
283
The Ugly…….
284
WORLDWIDE HEADQUARTERS: 6400 SHAFER COURT I ROSEMONT, ILLINOIS USA 60018
TEL. 847.384.6100 I FAX 847.384.0500 I WWW.KANBAY.COM
Thank You!