100% found this document useful (1 vote)

16K views

Informatica - The Basics: Trainer: Muhammed Naufal

The document provides an agenda and overview for an Informatica training. It details the various components and tools of Informatica including the Designer, Workflow Manager, sources, targets, transformations, mappings and workflows. It outlines the topics and labs that will be covered over the 9 day training period.

Uploaded by

Sumanth

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

16K views

Informatica - The Basics: Trainer: Muhammed Naufal

Uploaded by

Sumanth

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 254

Informatica – the

basics

Trainer: Muhammed Naufal

© 2004 Hewlett-Packard Development Company, L.P.

The information contained herein is subject to change without notice
Purpose of the training
• The training is designed to have you start using
PowerCenter, not to make you experts
• You’ll know how to:
− Create logical and physical data flow
− Design some simple transformations
− Choose appropriate transformation for your processing
− Schedule and execute jobs
− Examine runtime log files
− Debug transformations and solve data quality issues
− Run Informatica logic from command line
− Manage security in Powercenter

2
Agenda – Day 1
• Introduction
• Software Installation and Configuration
• DW/BI Basics – Database, Data Warehouse, OLAP/BI,
Enterprise Data Warehouse Architecture, ETL & Data
Integration, Fitment - Informatica ETL in Data warehouse
architecture.
• Informatica Architecture
• Brief Description of Informatica Tool & Components
(Repository Administrator, Repository Manager, Workflow
Manager, Workflow Monitor, Designers, Repository
Server, Repository Agent, Informatica Server)
• LAB: Informatica Installation & Configuration

3
Agenda – Day 2 & 3
• ETL Components
− Designer (Source Analyzer, Workflow Designer, Task
Designer, Mapplet Designer & Mapping Designer)
− Workflow Manager (Task Designer, Worklet Designer,
Workflow Designer, Workflow Monitor)
− Repository Manager & Server
• Informatica Administration – Basics
• LAB: Informatica Administration

4
Agenda – Day 3 & 4
• Transformations
• Classifications
− Active/Passive
− Re-usable
− Connected/Un-Connected
• Transformations and Properties
− Source Qualifier , Expressions, Lookup, Router,
SeqGen, Update Strategy, Targets, Joiner, Filter,
Aggregator, Sorter
• LAB: Transformations - Demo

5
Agenda – Day 5
• Mapplets and Mappings
− Mapping Design
− Mapping Development
− Mapping Parameters and variables
− Incorporating Shortcuts
− Using Debugger
− Re-usable transformations & Mapplets
− Importing Sources & Targets
− Versioning – Overview
• LAB: Mapplet & Mapping Designing

6
Agenda – Day 6
• Development of Mappings – Sample mappings
• LAB: Mapping designing for Flat files loading, DB
table loading, etc

7
Agenda – Day 7
• Workflow and Properties
− Workflow Manager
− Tasks (Assignment, Command, Control, Decision, Event, Timer)
− Session (Re-usable or Local)
− Worklets
− Workflow Design
− Session Configuration (Pre/Post Sessions, Parameters &
Variables, Emailing, Source/Target Connections, Memory
Properties, Files & Directories, Log/Error Handling, Override/revert
properties)
− Workflow Execution & Monitoring
− Workflow recovery principles
− Task recovery strategy
− Workflow recovery options
− Command line execution
• LAB: Workflow Designing, Worklet Designing, Task
Designing, Scheduling, Sample Workflows, etc.
8
Agenda – Day 8
• Advanced Topics
− Revisit Informatica Architecture
− Server Configurations (pmServer/pmRepServer),
− Memory Management
− Caches (Lookups, Aggregators, Types of Caches)
− Performance Tuning
− Release Management & Versioning
− Repository Metadata Overview
• LAB: Performance Tuning Demo, Release
Management Demo, Metadata querying, etc.

9
Agenda – Day 9
• GMAS – ETL Process & Development Methodology
− Design/Development Guidelines, Checklists, etc.
• Best Practices & references
− my.Informatica.com, Informatica discussion groups
etc
• Question & Clarification sessions
• LAB Sessions – Sample mappings/workflows

10
Overview, Software setup &
Configuration
Informatica PowerCenter
• PowerCenter is an ETL tool: Extract, Transform,
Load
• A number of Connectivity options (DB-Specific,
ODBC, Flat, XML, other)
• Metadata Repository/Versioning built in
• Integrated scheduler (possibility to use external)
• Number of cool features – XML Imports/Exports,
integrity reports, Save As Picture etc

12
PowerCenter Concepts
• Repository: stores all information about definition
of processes and execution flow
• Repository server: provides information from the
Repository
• Server: executes operations on the data
− Must have access to sources/targets
− Has memory allocated for cache, processing
• Clients (Designer, workflow manager etc):
manage the Repository
• Sources, Targets can be over any network (local,
FTP, ODBC, other)

13
PowerCenter Concepts II
Sources
Client Rep Server Server(s
s )

Repositor Targets
y

14
Software installation
• Copy the folder Pcenter 7.1.1 from Share??
• Install the Client. Do not install the servers or
ODBC
• After the installation you may delete your local
“Pcenter 7.1.1” folder

15
Registering the Repository
• You need to tell your client tools where is the
Repository Sever
• Go to the Repository Manager,
• Choose Repository->Add Repository .
Repository is <Server Name>, user is your Name

16
Registering the Repository II
• Right-click on the newly created repository,
choose Connect
• Fill in additional details. Password, port, etc.

17
Register the Oracle database
Instance
• Add the connection information to tnsnames.ora
on your machine
• Verify the connection using SQLPLUS

18
Define the ODBC connection to
waretl
• This connection will be used to connect from
Client tools directly to a database (e.g. for
imports)
• Go to Control Panel -> Administrative Tools ->
Data Sources (ODBC)
• On the “System DSN” tab click “Add” and choose
“Microsoft ODBC for Oracle”. Fill in details for
your TNumberOL

19
Environment – misc info
• Login to Informatica with your User/Pwd
• Login to Oracle with User/Pwd
• Work in your INFA folder
− First folder you open becomes your working folder.

20
STOP
Wait for the class to finish the
connectivity setup
Sources, Targets
Client tools - overview
• Designer – defines Sources, Targets,
Transformations and their groups (Mapplets
/Mappings)
• Workflow Manager – defines Tasks and
Workflows (+scheduling)
• Workflow Monitor – monitors job execution
• Repository Manager – defines connection to
repository
• Server Administration Console – for Admins only.
• Must have English locale settings on PC
23
Transformation
The Designer tool selector

Tool Selector

Navigator
Workspace

Messages

24
Definition of the data flow
• The data flow structure is defined in the Designer
client tool
• Following concepts are used:
− Sources: structure of the source data
− Targets: structure of the destination data
− Transformation: an operation on the data
− Mapplet: reusable combination of Transformations
− Mapping: complete data flow from Source to Target

25
Mapping: definition of an e2e flow

Target(s)

Source(s)

Transformation(s)
26
Sources - overview
• Sources define structure of the source data (not
where the data is).
• Source + Connection = complete information
• It is only the internal Informatica information, not
e.g. physical structure in the database.
• Sources are created using the Source Analyzer in
the Designer client tool
• You can either create or import Source definitions

27
Sources – how to create a DB
source
• Go to Tools -> Source Analyzer
• Choose “Import from the database”
• Choose the waretl ODBC connection and fill in remaining
details

• If you get “unable to resolve TNS name” error make sure

you have added the waretl server to all tnsnames.ora files
28
Sources – how to create a DB
source
• Choose table SRC_TNumber and press OK

29
Sources – Creating cont.
• A Source is
created
• You can see
its definition on
the Workspace

30
Editing Source properties
• To Edit a Source
doubleclick on it in
the Workspace
• You can manually
add/delete/edit
columns
• You can define
PK/FK/other
relations in INFA
• You can load based
on PK (Unit). See
the “Update
Strategy”
Transformation

31
Sources: Exercise 1
• Create a comma-delimited file
− Define header columns: DATE_SHIPPED, CUST,
PROD, SALES.
− Have at least 10 data rows. Date format: dd-mmm-yyyy
− Use REAL IDs for CUST and PROD that exist in some
hierarchies. CUST and PROD tables available on
waretl
• Define this comma-delimited source in Source
Analyzer. Use name SRC_TNumber_Flat
• Preview data in your file in the Source Analyzer.

32
STOP
Wait for the rest of the class to finish
If you finish early, read the Designer Guide
-> Chapters 2 and 3
Targets – overview
• Targets define the structure of the destination
objects
• Target + Connection = complete information
• It is only the internal Informatica information, not
e.g. physical structure in the database.
• Targets are created using the Warehouse
Designer in the Designer client tool
• You can either create or import Target definitions
• Defining Targets is works the same way as
defining Sources
34
Targets - Columns

35
Targets: Exercise I
• Import table TGT_TNumber_1 to Warehouse
Designer using “Import from Database” function
• Compare your Target with the Flat File Source
SRC_TNumber_Flat
• Modify the your Target to be able to load all data
from your Flat File. Remember to have your
Oracle and INFA definitions synchronized!
− This can be done literally <1 minute :)

36
Targets: Exercise II
• Define a new Target called TGT_TNumber_2
• Define all columns from our Flat File source plus
new ones:
− FACT_ID (PK)
− GEO_ID
• Create the Target in Oracle too! (and don’t forget
grants to dev_rol1..)
− This too can be done <1 minute :>

37
STOP
Wait for the rest of the class to finish
If you finish early, read the Designer Guide
-> Chapter 4
Transformations, Mappings
Transformations - Overview
• Transformations modify the data during
processing. They are internal INFA objects only
• Transformations can be reusable
• Large number of transformations available
− E.g. Lookup, Filter, Aggregator..
− Every transformation has different configuration options
• If none applicable, write your own
− PL/SQL execution straight from INFA
− Can write custom COM transformations
− “Custom” transformation executes C code

40
Transformations – Overview II
• Transformations can be created in:
− Transformation Developer: reusable
− Mapplet/Mapping Designer: processing-specific
• Usually you can Override the default SQL
generated by a Transformation
− E.g. in Lookup transformation…
• Very good online help available – Transformation
Guide. Use it for self-study!

41
Transformations - Concepts
• Transformations work on Ports:
− Input, Output, Variable, other (Lookup, Group, Sort..)

42
Transformations – Concepts II
• Transformations are configured using Properties
Tab. That’s THE job!
− HUGE number of properties in total…

43
Mappings – Overview
• Mapping is a definition of an actual end-to-end
processing (from Source to Target)
• You connect Transformations by Ports – defining
a data flow (SH30->CHAIN_TYPE/CHAIN)
• The data flows internally in INFA on the Server
• You can visualize the data flowing row by row
from a Source to a Target
− Exceptions: Sorters, Aggregators etc
• A number of things can make the Mapping
invalid…

44
Mappings - Creating
• Choose Mapping Designer, Mappings->Create
• Use name m_TNumberLoad

• Now we need to define the data flow in the

Mapping

45
Transformations: Source Qualifier
• Begins the process flow from physical Source
onwards
• Can select only a couple of columns from Source
(will SELECT only those!)
• Can Join multiple sources (PK/FK relationship
inside INFA, not Oracle)
• For relational sources:
− You can override the “where” condition
− Can Sort the input data
− Can Filter the input data

46
Transformations: SQ II
• For relational sources you can completely
Override the SQL statement
• Is created by default when you drag a Source to
the Mapping Designer
• Standard naming convention is SQ
• Some options are available only for Relational
sources (e.g. sorting, distinct etc)
• As usually – more info in the Transformation
Guide
− Self-study on overwriting the default SQL ad Where
conditions
47
SQ: Creating
• Having Mapping Designer open, drag your Flat
File SRC_TNumber_Flat onto Workspace
• SQ is created automatically for you
• SQ is often a non-reusable component

48
SQ: Ports + Workspace operations
• Sources have only Output ports
• SQ has I/O ports
• Experiment: drag around your Source/SQ
• Experiment: select, delete and connect Port
connectors
• Right-click on the Workspace, see the
Arrange/Arrange Iconic options…

49
Workspace
• Objects are named and color-coded
• Iconic view very useful for large flows. Use Zoom!

50
Exercise: SQ
• Delete the automatically created SQ
• Create manually an SQ for your Source, the ports
and links should be connected automatically
• Hint: there’s a Transformations menu available
when in Mapping designer
• Hint: you can drag Ports to create them in the
target Transformation
• Save your work – the Mapping is Invalid. Why?

51
Exercise: a complete Mapping
• Having Mapping Designer open, drag the
TGT_TNumber_1 Target onto Workspace
• Connect appropriate Ports from SQ to Target

• Save - the Mapping is now Valid! :)

• Our Mapping is actually a SQL-Loader equivalent

52
STOP
Wait for the rest of the class to finish
Connections, Sessions
Execution of a Mapping
• To execute a Mapping you need to specify
WHERE it runs
− Remember, Sources/Targets define only the structure
• A single Mapping can be executed over different
connections
• An executable instance of a Mapping is called a
Session
• A series of Sessions is a Workflow
• Workflows are defined in Workflow Designer

55
Workflow Designer

56
Workflows
• Workflows are series of Tasks linked together
• Workflows can be executed on demand or
scheduled
• There are multiple workflow variables defined for
every server (Server->Server Configuration)
− E.g. $PMSessionLogDir, $PMSourceFileDir etc.
• The parameters are used to define physical
locations for files, logs etc
• They are relative to Server, not your local PC!
• Directories must be accessible from Server
57
Connections
• Connections define WHERE to connect for a
number of reasons (Sources, Targets, Lookups..)
• There are many Connections types available (any
relational, local/flat file, FTP..)
• Connections are defined in the Workflow
Manager
• Connections have their own permissions!
(owner/group/others)

58
Connections: Exercise
• Define a Shared connection to your own Oracle
schema on <Schema_Name>
− In the Workflow Manager click on Create/Edit
Relational Connection:

− Choose “New” Connection of type Oracle and fill in the

necessary info

59
Tasks
• Tasks are definitions of actual actions executed
by Informatica Server
− Sessions (instances of Mappings)
− Email
− Command
− More Tasks available within a Workflow (e.g. Event
Raise, Event Wait, Timer etc.)
• Tasks have huge number of attributes that define
WHERE and HOW the task is executed
• Remember, online manual is your friend :)

60
Sessions : Creating
• Set the Workspace to Task Developer: Go to
Tools->Task Developer
• Go to Tasks->Create, choose Session as a Task
Type
• Create a task called tskTNumber_Load_1

61
Sessions : Creating II
• Choose your Mapping

62
Sessions: important parameters
• Access the Task properties by double-clicking on
it
• General tab
− Name of your Task
• Properties tab:
− Session log file directory (default $PMSessionLogDir\)
and file name
− $Source and $Target variables: for Sources/Targets,
Lookups etc. Use Connections here (also variables :) )
− Treat Source Rows As: defines how the data is loaded.
• Read about the “Update Strategy” Transformation to
understand how to use this property. This is very useful
property, e.g. able to substitute concept of a Unit
63
Sessions : important parameters II
• Config Object tab
− Constraint based load ordering
− Cache LOOKUP() function
• Multiple error handling options
• Error Log File/Directory ($PMBadFileDir)
• All options in the Config Object tab are based on
the Session Configuration (a set of predefined
options). To predefine go to Tasks>Session
Configuration

64
Sessions : important parameters III
• Mapping tab – defines the WHERE –
Connections, Logs…
• For every Source and Target define type of
connection (File Writer, Relational..) and details

65
Sessions: relational connections
• For relational Sources and Targets you can/must
define owner (schema)
• Click on the Source/Target on the Connection tab
• The Attribute is :
− “Owner Name” for Sources
− “Table Name Prefix” for Targets
• If you use a shared Connection and access
private objects (without public synonyms) you
MUST populate this attribute

66
Sessions : important parameters IV
• Components tab
− Pre, post session commands
− Email settings

67
Sessions : Summary
• Lot of options – again, read the online manual
• Most importantly, you define all “Where”s:
$Source, $Target, Connections for all Sources,
Targets, Lookups etc
• Here’s also definition of locations of flat files
• Define error handling/log locations for every Task
• Use Session Configs
• Majority of the Session options can be overwritten
in Workflows :)
− This allows e.g. to execute the same Session over
different Connections!

68
Sessions: Exercise
• For your tskTNumber_Load_1 session define:
− $Source, $Target variables
− Source and Target locations
− Remember, your Source is a flat file and Target is
Oracle.
− Source filetype must be set to “Direct”. This means that
the file contains actual data. “Indirect”= ? :>
− Enable the “Truncate target table” option” for your
relational Target. This purges the target table before
every load
− During all Exercises use only Normal load mode (NOT
Bulk)
− $Source variable is a tricky one :>
69
STOP
Wait for the class to finish
Workflow execution
Workflows: Creating
• Choose Tools-> Workflow Designer
• Choose Workflows->Create
• Create a Workflow wrkTNumber_Load_1

72
Workflows: Properties
• Available when creating a Workflow or using
Workflows->Edit
• General tab:
− Name of the workflow
− Server where the workflow will run
• Also avilable from Server->Assign Server

73
Workflows: Properties II
• Properties tab:
− Parameter filename: holds list of Parameters used in
Mappings. See online manual
− Workflow log (different than Session logs!)

74
Workflows: Properties III
• Scheduler tab:
− Allows to define complete reusable calendars
− Explore on your own! First read the online manual, then
schedule some jobs and see what happens :)

75
Workflows: Properties IV
• Variables tab
− Variables are also used during Mapping execution
− Quick task: Find what is the difference between a
Parameter and a Variable
• Events tab
− Add user-defined Events here
− These events are used later on to Raise or Wait for a
signal (Event)

76
Workflow: Adding tasks
• With your Workflow open drag the
tskTNumber_Load_1 Session onto the
Workspace
• Go to Tasks and choose Link Tasks
• Link the Start Task with tskTNumber_Load_1
− The Start Task does not have any interesting properties

77
Workflow: editing Tasks
• You can edit Task properties in a Workflow the
same way as you do it for a single Task
− Editing the task properties in Workflow overwrites the
default Task properties
− Overwrite only to change the default Task behavior
− Use system variables if a Task will be executed e.g.
every time on a different instance

78
Workflows: Running
• You can run a Workflow automatically
(scheduled) or on-demand
− We’ll run on-demand only in this course
• Before you run a Workflow, run the Workflow
Monitor and connect to Server first!

79
Workflows: Running II
• In Workflow Designer
right-click on the
Workflow
wrkTNumber_Load_1
and choose “Start
Workflow”
• Go to Workflow Monitor
to monitor your Workflow

80
Workflows: Running II
• Workflow Monitor displays Workflow status by
Repository/Server/Workflow/Session
• Two views available: GANTT view and TASK view

81
Workflows: Logs
• You can get the Workflow/Session log by right-
clicking on Workflow/Session and choosing the
log
− Remember, Session log is different than Workflow log!

82
Workflows: Logs II
• Most interesting information is in the Session logs
(e.g. Oracle errors etc)

• Exercise:
− Where do you define location of Where are the
Session/Workflow logs?
− Manually locate and open the logs for your Workflow
run
− Find out why your Workflow has failed :>

83
Logs: Session Log missing for a
failed Session
• Why would you get an error like that?:

84
Workflows: Restarting
• Restart your Workflow using “Restart Workflow
from Task”
• (More about job restarting: Workflow
Administration Guide -> Chapter 14 ->Working
with Tasks and Workflows )
• Debug until your Workflow finishes OK

85
Workflow: verifying
• Check that the data is in your Target table. The
table will be empty initially – why?
• Why there’s an Oracle error in the Session Log
about Truncating the Target table?
• Hint: when you modify a Mapping that is already
used in a Session, you need to refresh and save
the Session.
• Warning! PowerCenter has problems refreshing
objects between tools! Use Task->Edit or File-
>”Close All Tools”

86
E2E flow: Exercise
• Create a new Mapping m_TNumberLoad_2 that
will move all data from TGT_TNumber_1 to
TGT_TNumber_2
• Create a new session tskTNumber_Load_2 for
this Mapping. Define connection information to
waretl, check “Truncate target” option and use
Normal load mode (NOT Bulk)
• Create a new Workflow wrkTNumber_Load_2
that will run tskTNumber_Load_2
• Run your Workflow, make sure it finishes OK
• Check that the data is in TGT_TNumber_2 table.
87
STOP
Wait for the class to finish
Summary: what you have learned so
far
• Now you can:
• Define structures of Sources and Targets
• Define where the Sources and Targets are
• Create a simple Workflow
• Run a Workflow
• Debug the Workflow when it fails

• You have all basic skills to learn further by

yourself!
Transformations

Expressions, Sequences
Transformations – what can you do?
• We’ll be going through a number of
Transformations
• Only some (important) properties will be
mentioned
• Read the Transformation Guide to learn more

91
Transformations: Expression (EXP)
• Expression modifies ports’ values
• This is a pure Server Transformation
• Remember Ports? Input, Output, I/O, Variables
− Convention: name ports IN_ and OUT_
• The only Property of EXP is Tracing Level

92
Expression Editor
• Available in almost every Transformation
• Allows to easily access Ports and Functions

93
EXP: Example
• Let’s create an non-reusable Expression that will
change customer “Bolek” into customer “Lolek”
• You can drag ports to copy them
• IIF function – similar to DECODE

94
Transformations: Sequence
Generator
• SEQ can generate a sequence of numbers
• Very similar to Oracle sequence
• Can generate batches of sequences
• Each target is populated from a batch of cached
values

95
SEQ: Ports
• Nextval
• Currval = Nextval + IncrementBy. No clue how
this is useful

96
SEQ: Properties
• Start Value
• Increment By The difference between two consecutive
values from the NEXTVAL port.
• End Value: The maximum value the PowerCenter Server
generates.
• Current Value: The current value of the sequence.
• Cycle
• Number of Cached Values: The number of sequential
values cached at a time. Use when multiple sessions use
the same reusable SEQ at the same time to ensure each
session receives unique values. 0 = caching disabled
• Reset: Rewind to Start Value every time a Session runs
Disabled for reusable SEQs

97
SEQ: Batch processing
• Guess what happens here?: Start = 1, Increment
By = 2, Cache = 1000?

98
Transformations: Exercise
• Create a Target called TGT_TNumber_Tmp that
will hold FACT_ID, SALES and SALESx2
columns
• Create appropriate table in Oracle, (remember:
grants, synonyms if needed..)
• Add this Target to your Mapping
m_TNumberLoad_2 (so, you should have two
Targets). Save. The Mapping is valid even though
the Target is not connected– why?
• In the Workflow define Connection for this Target
(remember, use Normal load mode). Choose
“Truncate Target” option
99
EXP: Exercise
• Create a reusable expression called EXP_X2 that
will multiply an integer number by two.
• The input number must be accessible after the
transformation
• Use this EXP to multiply the SALES field when
passing it to TGT_TNumber_Tmp. The original
SALES field goes to SALES and the multiplied
one to SALESx2

100
SEQ: Exercise
• Create a reusable SEQ called
SEQ_TNumber_FACT_ID. Start = 1, Increment
By = 6, Cache = 2000
• Populate the FACT_ID field in both targets to
have the same value from the sequence
(parent/detail).

101
Exercise: verify
• Run your modified Mapping m_TNumberLoad_2
• Remember to refresh your Task:
− Try Task Editor -> Edit
− If you get an Error, go to Repository -> “Close All Tools”
• You may need to modify the Workflow before you
run it - what information needs to be added?
• Verify that the data is in both TGT_TNumber_2
and TGT_TNumber_Tmp, the SALES_x2 column
equals SALES*2 and the same FACT_ID is used
for the same row in _2 and _TMP table.
• Rerun the workflow a couple of times. What
happens with FACT_ID field on every run?
102
Solutions:
• Wrong! SEQ initialized for every target. _TMP
and _2 tables will have different FACT_IDs

103
Solutions:
• Wrong! SEQ initialized for every target. _TMP
and _2 tables will have different FACT_IDs

104
The correct solution
• SEQ initialized once (one ID for every row)

105
STOP
Wait for the class to finish
The Debugger
The Debugger
• Debugger allows you to see every row that
passes through a Mapping
− Every transformation can be debugged in your Mapping
− Nice feature available: breakpoint
− The Debugger is available from the Mapping Designer
• To start the debugger, open your Mapping in
Mapping Designer and go to Mappings ->
Debugger -> Start Debugger
• We’ll be debugging the m_TNumberLoad_2
Mapping
• Warning: the Debugger uses a lot of server
resources!
108
The Debugger: Setting up
• You need a Session definition that holds a
Mapping to run Debugger
− Best way is to use an existing Session
− You may create the temporary Debug session. This
limits debug capabilities

109
The Debugger: Setting up II
• Select the Session you want to use
• All properties of this session (Connections etc)
will be used in the debug session

110
The Debugger: Setting up III
• Select the Targets you want to debug. This
defines flow in the Debugger
• You have an option to discard the loaded data

111
The Debugger: running
• The Mapping Designer is now in debug mode

Flow Monitor

Transformation
you’re debugging Target data

112
The Debugger: Running II
• Select EXP_x2 as the
Transformation you’ll
be monitoring

• Remember, you run the Debug session for one

Target!
• Most optimal operation with the Debuger:
− Choose your mapping on the Workspace
− Choose Debugger -> Step to Instance. Goes directly to
your Transformation
− Or, Choose Debugger -> Next instance. This is actually
step-by!

113
The Debugger: View on the data
• You can view individual Transformations in the
Instance window. The data is shown with regard
to given data row:

• “Continue” = run the Session (if there are no

Breakpoints the Session with run until OK/Failure
• When the whole Source is read, the Debugger
finishes
114
The Debugger: Breakpoints
• You can add breakpoints to have the Debugger
stop on given condition
− Mappings ->
Debugger ->
Edit Breakpoints

115
Breakpoints: Setting up
• Global breakpoints vs. Instance breakpoints
• Error breakpoints vs. Data breakpoints
• A number of conditions available..

116
The Debugger: Exercise
• Experiment on your own :)
• Check the difference between Step Into and Go
To
• Setup breakpoints
• You can actually change the data in Debugger
flow! Check out this feature, really useful.
− A number of restrictions apply: usually can modify only
output ports and group conditions

117
STOP
Wait for the class to finish
Transformations

Lookups
Transformations: Lookups (LKP)
• Lookups find values
• Connected and Unconnected
• Cached and Uncached
− PowerCenter has a number of ways to cache values
− Cache can be static or dynamic, persistent or not
− Lookup caching is a very large subject. To learn more
about caching read the Transformation Guide

• Question: where is the lookup cache file created?

120
Lookups: Connected vs.
Unconnected
• Connected lookups receive data from the flow while
Unconnected are “inline” functions
• Unconnected lookup has one port of type “R” = Return

Connected Unconnected
Default values Supported Not supported
(possible using NVL)
Caching Any type Static only
#returned columns Multiple Only one

121
Lookups: Creating
• First, import the definition on Lookup table as
either Source or Target
− REF_CTRL is owned by user infa_dev
• While in the Transformation Designer choose
Transformations -> Create and create lkpRefCtrl

122
Lookups: Creating II
• Choose the Table for your Lookup
− Only one table allowed

123
Lookups: Creating III
• Delete the columns not used in Lookup
− If this is a reusable lookup, be careful when deleting..

124
Lookups: defining lookup ports and
condition
• There are two important types of ports: Input and
Lookup
− Combinations of I/O and O/L ports allowed
• Input ports define comparison columns in the
INFA data flow
• Lookup ports are columns in the Lookup table
• For unconnected Lookup there one “R” port – the
return column

125
Lookups: Creating join ports
• Create two port groups:
− IN_ ports for data in the stream (“I”nput)
− Comparison in the physical table (“L”ookup)
− CTRL_PERD will be the “R”eturn column

126
Lookups: Creating join conditions
• Match the “I”input ports with “L”ookup ports on
the Condition tab

127
Lookup: important Properties
• Lookup Sql Override: you can actually override
the Join. More on Overriding in the Designer
Guide
• Lookup table name: Join table
• Lookup caching enabled: disable for our Exercise
• Lookup policy on multiple match: first or last
• Connection Information: very important! Use
$Source, $Target or other Connection information

128
Lookups: using Unconnected
• We’ve created an Unconnected lookup
− How can you tell?
• Use Unconnected lookup in an Expression as “O” port
using syntax: :LKP.lookup_name(parameters)
• For example: :LKP.lkpRefCtrl('BS',31,IN_CUST_STRCT,1)
− Parameters can be of course port names
− You must put the Unconnected lookup in the Mapping (floating)

129
Lookups: using Connected
• Connected Lookups don’t have “R”eturn port
(they use “O”utput ports)
• Data for the lookup comes from the pipeline

130
Lookups: Exercise
• Modify Mapping m_TNumberLoad_2 to find a
ISO_CNTRY_NUM for every customer.
• Column GEO_ID should be populated with
ISO_CNTRY_NUM.
• Use default values if CUST_ID not found
• Choose Connected or Unconnected Lookup
• Run your modified Workflow
• Verify that when your Workflow finishes OK all
rows have GEO_ID populated

131
STOP
Wait for the class to finish
Transformations

Stored Procedures
Transformations: Stored Procedure
• Used to execute an external (database)
procedure
• Naming convention is name of the Procedure
• Huge number of different configurations possible
− Connected, Unconnected (same as Lookups)
− Pre- and post-session/load
− Returned parameters
• We’ll do an example of a simple stored procedure
• Possible applications: e.g. Oracle Partitioning for
promotions in Informatica (post-session)
− PowerCenter has hard time running inlines!
134
Stored Procedures: Creating
• The easiest way is to Import the procedure from
DB: Transformation -> Import SP

135
Stored Procedures: Importing
• Importing creates required Ports for you..

• .. and uses correct function name :)

136
Stored Procedures: watchouts
• A number of watchouts must be taken into
account when using SP Transformation
− Pre/post SPs require unconnected setup
− When more than one value is returned use Mapping
Variables
− Datatypes of return values must match Ports
• Transformation Guide is your friend

137
Stored Procedure
• Create a procedure in Oracle
• Import the procedure in Designer
• Use this procedure as Connected transformation.
• Run your Workflow
• What does this function do? :>

138
STOP
Wait for the class to finish
Transformations Overview

Aggregator, Filter, Router, Joiner,

Update Strategy, Transaction Control,
Sorter, Variables
Overviews
• Number of interesting Transformations and
techniques – outside of the scope of this training
• Overview gives you an idea that a possibility
exists to do something
• If you want to learn more – self study: read the
Designer Guide and the Transformation Guide

141
Overview: Aggregation (AGG)
• Similar to Oracle’s group-by, functions available
• Active transformation – changes #rows
− A number of restrictions apply
• A number of caching mechanisms available

142
Overview: Filter (FL)
• Allows to reject rows that don’t meet specified criteria.
Rows filtered are not in the reject file.
− Active transformation
• Use as early in the flow as possible

143
Overview: Router (RTR)
• Used to sent data to different Targets
− Active transformation
− Don’t split processing using Router and then join back!
− Often used with Update Strategy preceding
• Typical usage:

144
Router: configuring
• Ports are only Input (in reality I/O)

• Define condition groups (a row is tested against

all groups)

145
Router: using in Mapping
• Router receives the whole stream and sends it
different way depending on conditions

146
Overview: Joiner (JNR)
• Joins pipelines on master/detail basis
− Special Port available that marks one of the pipeline sources as
Master
− Joiner reads ALL (including duplicate) rows for Master and then
looks up the detail rows.
• Outer joins available (including full outer)
• Caching mechanisms available
− Sorted input speeds up processing
• Restrictions:
− Can’t use is either input pipeline contains an Update Strategy
transformation
− Can’t use if one connects a Sequence Generator transformation
directly before the Joiner transformation
− Allows to join two pipelines. If more joins needed, use consecutive
JNRs

147
Joiner: Example
• Joining QDF with CUST_ASSOC_DNORM
before an aggregation

148
Joiner: Ports
• One pipeline is master, the other one is detail.
“M” port denotes which one is which

149
Joiner: Properties
• Join Condition tab defines the join ;)

• Properties tab lets you define join type (amongst

other properties)

150
Overview: Update Strategy (UPD)
• This transformation lets you mark a row as
Update, Insert, Delete or Reject
• You do it by a conditional expression in the
Properties tab of UPD
− E.g. :IIF( ( SALES_DATE > TODAY), DD_REJECT,
DD_UPDATE )
• UPD can replace the Unit concept from S1..

151
Update Strategy: setting up
• To set up, create I/O pass-through ports
• Then enter the conditional expression into
Properties tab of UPD. Use variables:
− Insert DD_INSERT 0
− Update DD_UPDATE 1
− Delete DD_DELETE 2
− Reject DD_REJECT 3
• You must set properly Session properties. Read
more in the Transformation Guide
− For example you must select “Treat Source Rows As”
Session Property option to “Data Driven”

152
Overview: Transaction Control (TC)
• This transformation defines commit points in the
pipeline
• To setup, populate the conditional clause in
Properties
− For example IIF(value = 1, TC_COMMIT_BEFORE,
TC_CONTINUE_TRANSACTION)

153
TC: Defining commit points
• Use following system variables:
− TC_CONTINUE_TRANSACTION.
− TC_COMMIT_BEFORE
− TC_COMMIT_AFTER
− TC_ROLLBACK_BEFORE
− TC_ROLLBACK_AFTER
• There’s “transformation scope” in majority of
transformations
• Transaction control must be effective for every
source, otherwise the mapping is invalid
• Read more in the Transformation Guide
154
Overview: Sorter (SRT)
• Allows to sort on multiple keys. Has only I/O Ports
− Option to output only distinct data (all Ports become
Keys)
− Option for case sensitive sorts
− Option to treat Null as High/Low
− Caching mechanism available
• Sorter speeds up some Transformations (AGG,
JNR..)

155
Variable ports
• You can use Variable ports to:
− Store interim results of complex transformations
− Capture multiple return values from Stored Procedures
− Store values from previous rows
• Remember, Ports are evaluated in order of
dependency:
− Input
− Variable
− Output

156
Variables: storing values from
previous rows
• Useful when e.g. running inlines for distinct
values
• You need to create two variable Ports to store
values from previous rows (why? :> )

157
Transformations: Exercise
• Modify the m_TNumberLoad_2 mapping to use JNR
transformation for geography lookup (instead of LKP)
• Count the number of distinct customers in the pipeline
(modify the Target table to have CNT column). Use
Variables
• Run the modified workflow
• Verify that GEO_ID is derived (from
CUST.ISO_CNTRU_NUM) and loaded into the Target
table
• Once you finish, modify the Transformation to complete
the task without Variables
− Change back your mapping to use Lookup instead of Joiner

158
Transformations: Exercise II
• Build a Loader mapping just with the following objects:
Source, SQ, UPD, Target
• Add a PK on your Source (e.g. TRANX_ID)
• Add TRANX_ID to your flat file
• Use the UPD strategy to insert new rows and update
already existing rows (based on TRANX_ID field)
− Remember, set correct Session parameters
• Load your flat file
• Verify: add and modify some rows in the flat file. Load
again, check that the Target is updated as needed (rows
are added/modified/deleted)

159
STOP
Wait for the class to finish
Mapplets, Worklets
Mapplets: Overview
• Mapplets are reusable sets of Transformations
put together into a logical unit
• Can contain Sources and some Transformations.
• Can’t contain Targets or other Mapplets
• Special Transformations available
− Input
− Output

162
Mapplets in Mapping
• Use Mapplet Input and Output ports
• Connect at least one I and one O port

163
Worklets: Overview
• Worklets are sets of Tasks connected into a
logical unit
− Can be nested
− Can be reusable
− Can pass on persistent variables
• Runtime restrictions apply:
− You cannot run two instances of the same Worklet
concurrently in the same workflow.
− You cannot run two instances of the same Worklet
concurrently across two different workflows.

164
Mapplets: Exercise
• Create a Mapplet mplt_EMP_NAME that:
− Has Input ports of EMPNO and MGR
− Looks up the ENAME and DEPTNO fields from EMP table for the
input EMPNO
− Filters out all rows that have the MGR<=0
• The Mapplet should have three Output ports:
− EMPNO, MGR and DEPTNO concatenated with ENAME as
DEPT_NAME
• Create a table called EMP_DEPT which has
EMPNO,MGR, DEPT_NAME as fields (Take the structure
of the fields from EMP table)
• Create a mapping map_EMP_NAME which has EMP1 as
Source and EMP_DEPT as target and use the above
mapplet inside this mapping.
• Run the Mapping, verify results
165
STOP
Wait for the class to finish
Advanced Scheduling
Advanced Scheduling
• When you build Workflows in the Workflow
Designer you can use a number of non-reusable
components
• Different control techniques available
• To use the non-reusable components go to Tasks
-> Create when in Workflow Designer

168
Workflows: Links
• Links have conditions that set them to True or
False
− Double-click on a Link to get its properties
− If the Link condition evaluated to True (default) the Link
executes its target Task
• Use Expression Editor to modify Link conditions
− Access properties of Tasks, e.g. Start Time
− You can access the Workflow Variables here!
• Number of predefined Workflow Variables are available
• You can create Variables persistent between Workflow runs!
• Workflow Variables must be predefined for the Workflow
(Workflow -> Edit and then the Variables tab)

• Task properties:
169
Expression Editor
• Links (and some Tasks) can use the Expression
Editor for Workflow events

170
Tasks: Command
• “Command” Task executes any script on the
Informatica Server
• It can be reusable
• Property “Run if previous completed” controls
execution flow when more than one script is
defined

171
Tasks: Decision
• The Decision task sets a Workflow variable
(“$Decision_task_name.Condition”)
− On the Properties tab edit the “Decision Name”
parameter. You can use Expression Editor here
• You can use this variable later in a link
− Of course you can evaluate the Decision condition
directly in a Link, but use Decision for Workflow clarity

172
Workflow: Variables
• Variables are integers available from Workflow
• They need to be defined upfront in Workflow
definition
− Go to Workflows - > Edit
• Variables can be persistent between Workflow
runs

173
Workflow: Variables II
• Persistent variables are saved in the repository
• You can check the value of a Variable in the
Workflow log
− Not in the session log!

174
Tasks: Assignment
• The Assignment task sets the Workflow variables
• You can use the Expression Editor
• One Assignment task can set multiple variables

175
Tasks: Email
• Use to … send emails! ;) Can be reusable
• The waretl server is not set up to send emails

• You can use all Workflow Variables and Session

Properties
− Including $PMSuccessEmailUser or
$PMFailureEmailUser server variables
176
Emails: Advanced
• Every Session
can send an
email on success
or failure
− Additional Email
options available!
Go to Edit Email -
> Email Subject
and click on the
small arrow

177
Emails: Additional options
• Additional options are available for Email body
when using from within a Session

178
Events: Overview
• Events are a way of sending a signal from one
Task to another
• Two types of Events supported:
− User-defined (define in Workfow->Events)
− Filewatcher event
• Two Event Tasks available:
− Event Raise
− Event Wait
• Actually you can use Links to do the same..

179
Events: User-defined
• Create user-defined Events in Workflow
properties
− Workflows -> Edit
• Use them later in Event Raise/Wait Tasks

180
Events: Example
• Sending Events = Links (in a way..)

181
Tasks: Event Raise
• Raises a user-defined Event (sends a signal)
− Remember, the Even must be predefined for a
Workflow
• Only one Event can be Raised

182
Tasks: Event Wait
• Waits for an Event to be
raised
− User-defined Event (signal),
or
− Filewatcher event
• Properties available
− Enable Past Events!

183
Event Wait: filewatcher
• The filewatcher event is designed to wait for a
marker file (e.g. *.end)
− There’s an option to delete the file immediately after the
filewatcher kicks in
− No wildcards are allowed
• Discussion: how to emulate S1 filewatcher, waiting
and loading for multiple files?

184
Tasks: Control
• The Control Task fails unconditionally
• The abort command can be send to different
levels
• Read more in the Workflow Administration Guide

185
Tasks: Timer
• The Timer Task executes
− On a date (absolute time)
− After relative waiting time

186
Tasks: Exercise
• Create a new Workflow that will use two
Sessions: ..Load_1 and ..Load_2
• Run the ..Load_2 Session after every 3 runs of
..Load_1
− Don’t cycle the Sessions, rerun the whole Workflow
− How will you verify success?
• Obligatory tasks:
− Decision
− Event Raise/Wait

187
STOP
Wait for the class to finish
Command line execution
pmcmd - overview
• A command line tool to execute commands
directly on the Informatica server
• Useful e.g. to:
− Schedule Powercenter tasks using an external
scheduler
− Get status of the server and its tasks
• Big number of commands available – see online
manual:
− Workflow Administration Guide -> Chapter 23: using
pmcmd

pmcmd getserverdetails -s
waretl.emea.cpqcorp.net:4001 -u bartek -p
mypassword
191
pmrep - overview
• Updates session-related parameters and security
information on the Repository
− E.g. create new user, create the Connection,
import/export objects..
• Very useful for e.g. bulk operations (create 10
users)
• Usage: pmrep command_name [-option1]
argument_1 [-option2] argument_2...

192
pmrep - usage
• The first pmrep command must be “connect”
Pmrep connect -r repository_name -n
repository_username <-x repository_password | -
X repository_password_environment_variable> -h
repserver_host_name -o repserver_port_number

• The last command must be “exit”

pmrep exit
• Full list of commands in the Repository Guide ->
Chapter 16: using pmrep
193
pmcmd: exercise
• Run the workflow wrkTNumber_Load_2 from command line:
pmcmd startworkflow
<-serveraddr|-s> [host:]portno
<<-user|-u> username|<-uservar|-uv> userEnvVar>
<<-password|-p> password|<-passwordvar|-pv> passwordEnvVar>
[<-folder|-f> folder]
[<-startfrom> taskInstancePath]
[-recovery]
[-paramfile paramfile]
[<-localparamfile|-lpf> localparamfile]
[-wait|-nowait]
workflow

194
STOP
Wait for the class to finish
Parameters and Variables
Parameters and Variables
• Parameters and Variables are used to make
Mappings/Workflows/Sessions more flexible
− Example of a Session variable: name of a file to load
− We had an exercise for Workflow variables already
− Don’t confuse with port variables!
• Parameters don’t change but Variables change
between Session runs. The changes are
persistent
• Both Variables and Parameters can be defined in
the Parameter File
− Except port variables
− Variables can initialize without being defined upfront
197
Mapping Parameters and Variables
• Used inside a Mapping/Mapplet
− E.g. to load data incrementally (a week at a time)
− Can’t mix Mapplet and Mapping parameters
• Described in the Designer Guide -> Chapter 8
• Use them inside transformations in regular
expressions
− E.g. in SQs (WHERE), Filters, Routers…

198
Mapping Parameters and Variables
II
• To use a Mapping parameter or variable:
− Declare them in Mappings -> Declare Parameters and
Variables
− If required define parameters and variables in the
Parameter file (discussed later)
− For variables set the Aggregation type to define
partitioning handling
− Change the values of variables using special functions
• SetVariable, SetMaxVariable…

199
Session Parameters
• Very useful! Can be used to have the same Session work
on different files/connections
• Must be defined in the Parameter File
• Conventions:
Parameter Type Naming Convention
Database Connection $DBConnectionName
Source File $InputFileName
Target File $OutputFileName
Lookup File $LookupFileName
Reject File $BadFileName

200
Session Parameters - usage
• You can replace majority of the Session attributes
with Parameters
• Described in detail in the Workflow Administration
Guide -> Chapter 18: Session Parameters

201
Parameter File
• Parameter file is used to define values for:
− Workflow variables
− Worklet variables
− Session parameters
− Mapping/Mapplet parameters and variables
• The variable values in the file take precedence
over the values saved in the Repository
− This means that if a Variable is defined in a Parameter
File, the change of its value in Mapping will have no
effect when the Session runs again!
• Described in detail in the Workflow Administration
Guide -> Chapter 19: Parameter Files
202
Parameter Files II
• Parameter Files can be put on the Informatica
Server machine or on a local machine
− Local files only for pmcmd use
• Parameter files can be defined in two places:
− Session Properties for Session/Mapping parameters
− Workflow properties
− Don’t know why there are two places…
• A single parameter file can have sections to hold
ALL parameters and variables

203
Parameter File Format
• You define headers for different sections of your
parameter file:
− Workflow variables: [folder name.WF:workflow name]
− Worklet variables: [folder name.WF:workflow name.WT:worklet
name]
− Worklet variables in nested worklets: [folder name.WF:workflow
name.WT:worklet name.WT:worklet name...]
− Session parameters, plus mapping parameters and variables:
[folder name.WF:workflow name.ST:session name]
− or
− [folder name.session name]
− or
− [session name]
• Values are defined as:
− name=value

204
Parameter File Example
[folder_Production.s_MonthlyCalculations]
$$State=MA
$$Time=10/1/2000 00:00:00
$InputFile1=sales.txt
$DBConnection_target=sales

[folder_Test.s_MonthlyCalculations]
$$State=MA
$$Time=10/1/2000 00:00:00
$InputFile1=sales_test_file.txt
$DBConnection_target=sales_test_conn

205
Exercise: Mapping Variables &
Parameters
• Modify the map_EMP_NAME Mapping to load
only one Employee specified by a Parameter
• Remember to define the Parameter in the Parameter File

• Modify the Mapping to store the SAL as

SAL+(SAL*30/100) (increase salary by 30%) -
Use Mapping Variables
• Test!

206
Exercise: Session Parameters
• Modify the S_EMP_NAME Mapping to use a
Parameter for the file name to be loaded
− Remember to define the Parameter in the Parameter
File

• How would you load e.g. 10 files one after

another using the same Session?

207
STOP
Wait for the class to finish
Security overview
Security in PowerCenter
• Security topics are described in Repository Guide
-> Chapter 5; Repository Security
• PowerCenter manages privileges internally
− Repository privileges (individual or group)
− Folder permissions
− Connection privileges
• Authentication can be either internal or using
LDAP
• Security is managed through the Repository
Manager
− You need to have appropriate privileges to manage
security! :)
210
Users, Groups
• Individual (User) and Group privileges are
combined to get the overall view on someone’s
permissions
• The group Administrators has all possible privs

211
Repository privileges
• Repository privileges are granted to Groups and Users
• The Repository privileges work on Objects!
• Detailed description of Repository privileges is in the
Repository Guide -> Chapter 5 -> Repository Privileges

212
Object permissions
• Object
permissions
apply in
conjunction
with Repository
privileges
− Folders
− Connections
− Other..

213
Performance tuning (basics)
Performance tuning
• Workflow performance depends on a number of
things:
− Mapping performance
• Database performance
• Lookups
• Complex transformations (aggregator, sorters)
− Source/Target performance
− Power of the Informatica Server/Repository Server
machines
• Good overview in the Workflow Administration
Guide -> Chapter 25: Performance Tuning

215
Performance: What can we tune
• Eliminate source and target database bottlenecks
− Database/remote system throughout put
− Lookup logic
• Eliminate mapping bottlenecks
− Transformation logic
• Eliminate session bottlenecks
− Performance-relates Session parameters
− Increase #partitions
• Eliminate system bottlenecks
− Increase #CPUs, memory
• Evaluate bottlenecks in this order!

216
Bottlenecks: Identifying
• Target bottlenecks
− If Target is relational or remote location, change to local
Flat file and compare run time
• Source bottlenecks - Usually only if relational or
remote source.
− Use Filter directly after the SQ
− Run the SQ query manually and direct output to
/dev/null
• LAN speed can affect the performance
dramatically for remote Sources/Targets
− Query remotely/locally to identify LAN problems

217
Bottlenecks: Identifying Mapping
• Mapping bottlenecks
− Put Filters just before Targets: if the run time about the
same you may have a Mapping bottleneck
− Some transformations are obvious candidates
• Lookups
− Multiple Transformation Errors slow down
transformations
− Use Performance Details file

218
Performance Detail File
• Enable in Session Properties

• The Performance Detail File has very useful

information about every single transformation
− File is created in the SessionLog directory
− Big number of performance statistics available
− Workflow Administration Guide -> Chapter 14
Monitoring Workflows -> Creating and Viewing
Performance Details
219
Performance File - example

Transformation Name Counter Name Counter Value

LKP_CUST_GENERIC Lookup_inputrows 107295
Lookup_outputrows 214590
Lookup_rowsinlookupcache 1239356

220
Bottlenecks: Identifying Session
• Usually related to insufficient cache or buffer
sizes
• Use the Performance File
− Any value other than zero in the readfromdisk and
writetodisk counters for Aggregator, Joiner, or Rank
transformations indicate a session bottleneck.

221
Allocating Buffer Memory
By default, a session has enough buffer blocks for 83 sources
and targets. If you run a session that has more than 83 sources
and targets, you can increase the number of available memory
blocks by adjusting the following session parameters:
♦ DTM Buffer Size. Increase the DTM buffer size found in the
Performance settings of the Properties tab. The default setting is
12,000,000 bytes.
♦ Default Buffer Block Size. Decrease the buffer block size found
in the Advanced settings of the Config Object tab. The default
setting is 64,000 bytes.
To configure these settings, first determine the number of
memory blocks the PowerCenter Server requires to initialize the
session. Then, based on default settings, you can calculate the
buffer size and/or the buffer block size to create the required
number of session blocks. 222
Example - Buffer Size/Buffer Block
For example, you create a session that contains a single partition using a mapping
that contains 50 sources and 50 targets.
1. You determine that the session requires 200 memory blocks:
[(total number of sources + total number of targets)* 2] = (session buffer
blocks)
100 * 2 = 200
2. Next, based on default settings, you determine that you can change the DTM
Buffer Size to 15,000,000, or you can change the Default Buffer Block Size to
54,000:
(session Buffer Blocks) = (.9) * (DTM Buffer Size) / (Default Buffer Block
Size) * (number of partitions)
200 = .9 * 14222222 / 64000 * 1
or
200 = .9 * 12000000 / 54000 * 1
223
Bottlenecks: Identifying System
• Obvious to spot on the hardware:
− 100% CPU
− High paging/second (low physical memory)
− High physical disk reads/writes

224
A balanced Session
• The Session Log has statistics on
Reader/Transformation/Writer threads (at the end of the
file)

• MASTER> PETL_24018 Thread [READER_1_1_1]

created Total Run Time = [595.053326] secs, Total Idle
Time = [319.658512] secs, Busy Percentage =
[46.280695].
• MASTER> PETL_24019 Thread [TRANSF_1_1_1]
created Total Run Time = [592.979465] secs, Total Idle
Time = [248.725231] secs, Busy Percentage =
[58.055001].
• MASTER> PETL_24022 Thread [WRITER_1_*_1]
created Total Run Time = [535.331108] secs, Total Idle
225
Increasing performance
• A huge subject in itself
• For every bottleneck there is a number of
optimization techniques available
• Think creatively, having the overall architecture in
mind
− Relational databases (Sources, Targets, Lookups..)
− Informatica server

226
Tuning Sources/Targets
• Increase the database throughout put
• Limit SQs
− Limit incoming data (# rows)
− Tune SQ queries
− Prepare the data on the source side (if possible)
• For Targets use Bulk Loading and avoid
PKs/Indexes
• Increase LAN speed for remote connections

227
Tuning Transformations
• Tune Lookups with regard to DB performance
• Use appropriate caching techniques
− For Lookups: static vs dynamic, persistent
• If possible use sorted transformations
− Aggregator, Joiner
• Use Filters as early in the pipeline as possible
• Use port variables for complex calculations
(factor out common logic)
• Use single-pass reading

228
Optimizing Sessions/System
• Increase physical servers capacity
− #CPUs
− Memory
− LAN
− HDD speed
• Use appropriate buffer sizes
− Big number of options available
• Use bigger number of machines
− Informatica Grids
− Oracle’s RACs

229
Pipeline Partitioning - Overview
• Pipeline Partitioning is a way to split a single
pipeline into multiple processing threads

• Workflow Administration Guide -> Chapter 13:

Pipeline Partitioning

230
Default Partition Points

231
Pipeline Partitioning
• In a way one partition is a portion of the data
− Partition point is where you create “boundaries” between threads
− Different partition points can have different #partitions
• This means that there can be multiple
− Reader threads
− Transformation threads
− Writer threads
• This requires multi-CPU machines and relational
databases with parallel options enables
• HUGE performance benefits can be achieved
− If you know what you’re doing, otherwise you may actually lower
system performance!

232
Understanding Pipeline Flow
• Pipeline partitions are added in the Mapping tab
of Session properties (Workflow Manager)

233
Partitioning Limitations
• You need to have a streamlined data flow to add
partition points

Can’t add partition points to these

transformations because not all
the columns flow through this
part of the pipeline

234
Partition Types
♦ Round-robin. The PowerCenter Server distributes data evenly among all
partitions. Use round-robin partitioning where you want each partition to
process approximately the same number of rows.
♦ Hash. The PowerCenter Server applies a hash function to a partition key
to group data among partitions. If you select hash auto-keys, the
PowerCenter Server uses all grouped or sorted ports as the partition key.
If you select hash user keys, you specify a number of ports to form the
partition key. Use hash partitioning where you want to ensure that the
PowerCenter Server processes groups of rows with the same partition
key in the same partition.
♦ Key range. You specify one or more ports to form a compound partition
key. The PowerCenter Server passes data to each partition depending
on the ranges you specify for each port. Use key range partitioning where
the sources or targets in the pipeline are partitioned by key range.
♦ Pass-through. The PowerCenter Server passes all rows at one partition
point to the next partition point without redistributing them. Choose pass-
through partitioning where you want to create an additional pipeline stage
to improve performance, but do not want to change the distribution of
data across partitions.
♦ Database partitioning. The PowerCenter Server queries the IBM DB2
system for table partition information and loads partitioned data to the
corresponding nodes in the target database. Use database partitioning
with IBM DB2 targets stored on a multi-node tablespace.
For more information, refer Workflow Administration Guide 235
Migration strategies
Migration Strategies
• There’s always a need to migrate objects
between stages
− E.g. Test -> QA -> Prod
• Usual problems with object synchronization
• There are two main types of migration
− Repository per Stage
− Folder per stage

237
Folder Migrations
• One folder per stage
− Complex directory structure
(multiple stages per project folder)
• Not allowed to nest directories
− Lower server requirements (one
repository)
− Easier security management (one
user login)
− Folders are created and managed
in the Repository Manager
• You need to have appropriate privs

238
Repository migrations
• In this case you have a separate Repository (not
necessarily Repository Server) per Stage
• Reduces the Repository size/complexity
• Streamlines folder structure Test repository

Prod repository

239
Copy Wizard
• Copy Wizard assists you to copy Folders or Deployment
Groups
− Use Edit -> Copy (..) Edit -> Paste

240
Copy Wizard II
• You can copy between repositories or within the
same repository
• The Wizard helps you to resolve conflicts
− Connections
− Variables
− Folder names
− Other

241
XML Exports/Imports
• If not possible to copy between folders or
repositories (e.g. no access at all for Dev group to
QA repository), one can use XML Export/Import

242
XML Imports/Exports II
• You can Export/Import any type of object
• When Exporting/Importing there are all
dependencies exported (e.g. Sources for
Mappings)
• When Importing an Import Wizard will help you to
resolve any possible conflicts
− Different folder names
− Existing objects
− other

243
XML – other use
• How can one use XML data imports?
− Transfer of objects between repositories
− Automatic Transformation creation from existing
processes
− Quick import of Source/Target definitions from different
format
− Backup of PowerCenter objects

244
Deployment Groups
• For versioned Repositories you can group objects
into Deployment Groups
• Greater flexibility and reduced migration effort
• You can define whole application or just a part of
it
• No need to have one folder per application
• A deployment Group can be Static or Dynamic
• Additional complexity (dependant child objects)
• Read more in the Repository Guide -> Chapter 9:
Grouping Versioned Objects
245
Exercise: Copy Wizard
• Create a new folder TNumber_PROD
• Copy your entire folder TNumber to folder
TNumber_PROD
• Modify the m_TNumberLoad_1 Mapping back to
use hardcoded file name (instead of a Parameter)
− In the TNumber folder
• Migrate your change to TNumber_PROD folder
• Use “Advanced” options

246
The Aggregator Test
The Test Objectives
• This test checks some skills you should have
learned during the course
• It’s supposed to prove your knowledge, not your
colleagues or mine
• It’s close to real life development work
• The test requires from you
− Application of gained knowledge – use training
materials and online guides!
− Creativity
− Persistance

248
The Test Description
• Your task is to load some data into target
database, manipulating it on the way
− So, it’s a typical ETL process
• You’ll have to:
− Define and create Informatica and Oracle objects
− Modify the source information
− Run Informatica workflows
− Verify that the data has been correctly loaded and
transformed

249
The Test : Workflow I
1. Define a workflow that will load the data from
file agg_src_file_1.txt to Oracle
• Create your own target Oracle table for called
ODS_TNUMBER
• If a numerical value is not numerical then load the
row anyway, using 0 for numerical value
• Use an Update Strategy transformation based on the
original transaction ID (ORIG_TRX_ID) to insert new
rows and update existing rows
• Verify:
• #rows in = #rows out
• Sum(values) in the source file = sum(vaules) in the target

250
The Test: Workflow II
• Move all the data from table ODS_TNUMBER to
table QDF_TNUMBER, adding following columns
on the way:
− TRADE_CHANL_ID from CUST table
− GEO_NAME from GEO.NAME table, linking via
CUST.ISO_CNTRY_NUM
• Filter out all rows with sales values <=0
• Create your QDF table

251
The Test: Workflow III
• Create a report (Oracle table) that will give
information how much was daily sales in each
Sector
− Sector is level 2 in the 710 hierarchy
− Use DNORM table to get SECTOR information
− Use most recent CTRL_PERD
− Create appropriate Oracle report table

252
Test rules
• No data row can be dropped
− #rows in the source file = #rows in the target file, unless
a source file row is supposed to update already loaded
row
• If an ID is not known it is supposed to be replaced
with a replacement code
− Product ID replacement key: ‘82100000’
− Customer ID replacement key: '9900000003’
• Don’t change any Oracle or source data, however
you may create your own objects

253
Task hints
• Some values may be a “bit” different than other –
try to fix as many data issues as possible
• Remember about performance! Large lookups,
aggregations, joins…
• Use log files and the Debugger
• Use reusable components if feasible

254

Informatica IICS Interview Questions
100% (1)
Informatica IICS Interview Questions
33 pages
Notes Informatica
100% (3)
Notes Informatica
121 pages
Complete Informatica
100% (2)
Complete Informatica
321 pages
SAP Security
No ratings yet
SAP Security
4 pages
Informatica Power Center 9
100% (1)
Informatica Power Center 9
166 pages
Informatica Powercenter Course
No ratings yet
Informatica Powercenter Course
8 pages
Informatica Certification
No ratings yet
Informatica Certification
7 pages
Informatica Tutorial
100% (2)
Informatica Tutorial
91 pages
SCD Type 1 Implementation Using Informatica PowerCenter
No ratings yet
SCD Type 1 Implementation Using Informatica PowerCenter
7 pages
Informatica PowerCenter Tips
No ratings yet
Informatica PowerCenter Tips
6 pages
SCD Type-2 Using Dynamic Lookup
100% (2)
SCD Type-2 Using Dynamic Lookup
12 pages
SnapLogic Second Edition
From Everand
SnapLogic Second Edition
Gerardus Blokdyk
No ratings yet
SAP Security00002
No ratings yet
SAP Security00002
3 pages
My Interview Questions and Answers For Job
No ratings yet
My Interview Questions and Answers For Job
78 pages
Informatica
100% (1)
Informatica
289 pages
Powercenter Informatica Tutorials
100% (50)
Powercenter Informatica Tutorials
8 pages
Informatica Cloud Enterprise Labs
No ratings yet
Informatica Cloud Enterprise Labs
90 pages
Informatica Material
No ratings yet
Informatica Material
115 pages
Informatica Tutorials
No ratings yet
Informatica Tutorials
2 pages
Informatica Lab
100% (2)
Informatica Lab
34 pages
CD I Bootcamp Consolidated Day 11650307222139
No ratings yet
CD I Bootcamp Consolidated Day 11650307222139
91 pages
Informatica Mapping Scenarios
No ratings yet
Informatica Mapping Scenarios
81 pages
Informatica PowerCenter Version 9
No ratings yet
Informatica PowerCenter Version 9
3 pages
Basic Informatica PowerCenter Case Study
100% (1)
Basic Informatica PowerCenter Case Study
28 pages
Informatica Material Beginner
No ratings yet
Informatica Material Beginner
29 pages
Of PSK Real Time Training - 9739096158
No ratings yet
Of PSK Real Time Training - 9739096158
111 pages
PC8x90L1D StudentGuide 20081124GV9
100% (2)
PC8x90L1D StudentGuide 20081124GV9
272 pages
Powercenter 9x Developer Level 1 Ondemand Lab Guide635845966401197748 5 PDF Free
No ratings yet
Powercenter 9x Developer Level 1 Ondemand Lab Guide635845966401197748 5 PDF Free
293 pages
Informatica Guide
No ratings yet
Informatica Guide
159 pages
Informatica Unix Training Course Outline
No ratings yet
Informatica Unix Training Course Outline
14 pages
A Certification Questions
100% (2)
A Certification Questions
67 pages
Informatica Bhaskar20161012
No ratings yet
Informatica Bhaskar20161012
90 pages
Informatica MDM Training 2
No ratings yet
Informatica MDM Training 2
80 pages
CE Student Guide PC - HandsOnWorkshop
No ratings yet
CE Student Guide PC - HandsOnWorkshop
328 pages
ETL Tools: Basic Details About Informatica
No ratings yet
ETL Tools: Basic Details About Informatica
121 pages
Informatica
0% (1)
Informatica
32 pages
Informatica, Datawarehouse, Oracle, Unix - FINAL INTERVIEW QUESTIONS (ETL - INFORMATICA)
No ratings yet
Informatica, Datawarehouse, Oracle, Unix - FINAL INTERVIEW QUESTIONS (ETL - INFORMATICA)
63 pages
Informatica Interview Questions Scenario Based PDF
No ratings yet
Informatica Interview Questions Scenario Based PDF
14 pages
1.01 Aggregation Using Sorted Input: Informatica Mappings
No ratings yet
1.01 Aggregation Using Sorted Input: Informatica Mappings
64 pages
Best Informatica Interview Questions
67% (3)
Best Informatica Interview Questions
38 pages
Informatica Interview Questioner-Ambarish
No ratings yet
Informatica Interview Questioner-Ambarish
211 pages
Informatica PowerCenter Scenario-II
No ratings yet
Informatica PowerCenter Scenario-II
9 pages
IICS Student Guide
No ratings yet
IICS Student Guide
587 pages
Informatica Interview Questions and Answers
No ratings yet
Informatica Interview Questions and Answers
5 pages
PowerCenter Pushdown
No ratings yet
PowerCenter Pushdown
25 pages
Info777 Resume77
No ratings yet
Info777 Resume77
4 pages
Instant Pentaho Data Integration Kitchen
From Everand
Instant Pentaho Data Integration Kitchen
Sergio Ramazzina
No ratings yet
Basic DBA Query v.1: Oracle Database
From Everand
Basic DBA Query v.1: Oracle Database
Oraclesql-plsql
5/5 (1)
Informatica Powercenter 7 Level I Developer: Education Services
100% (1)
Informatica Powercenter 7 Level I Developer: Education Services
289 pages
INFORMATICA Online Training With 100% Placement Assistance
No ratings yet
INFORMATICA Online Training With 100% Placement Assistance
9 pages
Introduction To Datawarehousing
No ratings yet
Introduction To Datawarehousing
5 pages
Informatica Powercenter 9 Training Detail Day Wise Content
No ratings yet
Informatica Powercenter 9 Training Detail Day Wise Content
12 pages
Informatica Training
No ratings yet
Informatica Training
285 pages
Informatica
No ratings yet
Informatica
7 pages
Informatica Basic Study
No ratings yet
Informatica Basic Study
286 pages
Complete Reference To Informatica PDF
100% (3)
Complete Reference To Informatica PDF
52 pages
Complete Reference To Informatica
100% (1)
Complete Reference To Informatica
52 pages
Informatica Powercenter 7.1 Basics: Education Services
No ratings yet
Informatica Powercenter 7.1 Basics: Education Services
287 pages
Oracle BI Applications 7.9: Develop A Data Warehouse: D48127GC10 Edition 1.0 June 2007 D51352
No ratings yet
Oracle BI Applications 7.9: Develop A Data Warehouse: D48127GC10 Edition 1.0 June 2007 D51352
14 pages
Informatica Training - Presentation Transcript
No ratings yet
Informatica Training - Presentation Transcript
10 pages
New Informatica Concepts - Day
100% (1)
New Informatica Concepts - Day
98 pages
Informatica Power Center 9
No ratings yet
Informatica Power Center 9
166 pages
Powercenter 8 Level I Developer: Education Services
No ratings yet
Powercenter 8 Level I Developer: Education Services
18 pages
SAP Security 00006
No ratings yet
SAP Security 00006
9 pages
SAP Security
No ratings yet
SAP Security
6 pages
SAP Security00001
No ratings yet
SAP Security00001
2 pages
Quaid Saifee WIT 2012 MISA Presentation
No ratings yet
Quaid Saifee WIT 2012 MISA Presentation
21 pages
Informatica MDM - Multidomain: Hemang Desai Softpath
No ratings yet
Informatica MDM - Multidomain: Hemang Desai Softpath
68 pages
Thesis Statement For Merchant of Venice Essay
100% (3)
Thesis Statement For Merchant of Venice Essay
6 pages
Triple Integrals in Rectangular Coordinates PDF
No ratings yet
Triple Integrals in Rectangular Coordinates PDF
62 pages
Tahoul - BBP - MM - External Services Management - MM-05 - V3.0
No ratings yet
Tahoul - BBP - MM - External Services Management - MM-05 - V3.0
24 pages
Radhika Patel CV
No ratings yet
Radhika Patel CV
1 page
Maths
No ratings yet
Maths
127 pages
Alcris
No ratings yet
Alcris
5 pages
Celebrating 20 Years of TACI and Kwar Adhola Foreword by Owor Jad Adrian
No ratings yet
Celebrating 20 Years of TACI and Kwar Adhola Foreword by Owor Jad Adrian
1 page
Seidensticker - Comic Elements in Euripides' Bacchae
No ratings yet
Seidensticker - Comic Elements in Euripides' Bacchae
19 pages
A Primer On Elliptic Functions With Applications in Classical Mechanics (Alan Blaisard)
No ratings yet
A Primer On Elliptic Functions With Applications in Classical Mechanics (Alan Blaisard)
23 pages
Kaaraka - Introduction
100% (1)
Kaaraka - Introduction
2 pages
Quiz Game Project
No ratings yet
Quiz Game Project
37 pages
Reference Manual: Date: 10/17/08
No ratings yet
Reference Manual: Date: 10/17/08
31 pages
Lucy Tunic Template 108
100% (1)
Lucy Tunic Template 108
30 pages
Twelfth Night Act II Quiz
No ratings yet
Twelfth Night Act II Quiz
1 page
The Fundamental Wisdom of The Middle Way - Nagarjuna
100% (3)
The Fundamental Wisdom of The Middle Way - Nagarjuna
394 pages
Revalidated - MATH - GR8 - QTR1-MODULE-3 - (28 Pages)
No ratings yet
Revalidated - MATH - GR8 - QTR1-MODULE-3 - (28 Pages)
28 pages
KTG - Physics Presentation
No ratings yet
KTG - Physics Presentation
12 pages
K To 12 DLL
No ratings yet
K To 12 DLL
8 pages
CS3691 ESIOT Soft Copy Notes (2)
No ratings yet
CS3691 ESIOT Soft Copy Notes (2)
125 pages
As IT Notes Shahd PDF
No ratings yet
As IT Notes Shahd PDF
27 pages
El111 Midterm Sample
No ratings yet
El111 Midterm Sample
8 pages
Algebra Word Problems Practice Workbook With Full Solutions
100% (1)
Algebra Word Problems Practice Workbook With Full Solutions
343 pages
Atin Cu Pung Singsing Critique Paper
100% (1)
Atin Cu Pung Singsing Critique Paper
2 pages
Problem Set 2
No ratings yet
Problem Set 2
12 pages
Hadoop - Project 5th Sem - 1
No ratings yet
Hadoop - Project 5th Sem - 1
62 pages
(Template) Detailed Lesson Plan Template For Final Demonstration Teaching (AP)
No ratings yet
(Template) Detailed Lesson Plan Template For Final Demonstration Teaching (AP)
4 pages
Lesson 3 Living and Nonliving Things
No ratings yet
Lesson 3 Living and Nonliving Things
5 pages
Annual Sample Paper - VI - Maths
No ratings yet
Annual Sample Paper - VI - Maths
5 pages
Operating Systems: - Chapter 3
No ratings yet
Operating Systems: - Chapter 3
3 pages

Informatica - The Basics: Trainer: Muhammed Naufal

Uploaded by

Informatica - The Basics: Trainer: Muhammed Naufal

Uploaded by

Informatica – the

Trainer: Muhammed Naufal

© 2004 Hewlett-Packard Development Company, L.P.

• If you get “unable to resolve TNS name” error make sure

• Now we need to define the data flow in the

• Save - the Mapping is now Valid! :)

− Choose “New” Connection of type Oracle and fill in the

• You have all basic skills to learn further by

• Remember, you run the Debug session for one

• “Continue” = run the Session (if there are no

• Question: where is the lookup cache file created?

• .. and uses correct function name :)

Aggregator, Filter, Router, Joiner,

• Define condition groups (a row is tested against

• Properties tab lets you define join type (amongst

• You can use all Workflow Variables and Session

• The last command must be “exit”

• Modify the Mapping to store the SAL as

• How would you load e.g. 10 files one after

• The Performance Detail File has very useful

Transformation Name Counter Name Counter Value

• MASTER> PETL_24018 Thread [READER_1_1_1]

• Workflow Administration Guide -> Chapter 13:

Can’t add partition points to these

You might also like