0% found this document useful (0 votes)

1K views

Informatica Power Center Best Practices

Informatica PowerCenter Development Best Practices TABLE OF CONTENTS Abstract................................................................................................................................3 Content overview.................................................................................................................3 1. Lookup - Performance considerations.............................................................................3 1.1. Unwanted columns.....................

Uploaded by

rishabh_200

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

1K views

Informatica Power Center Best Practices

Uploaded by

rishabh_200

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

You are on page 1/ 8

Informatica PowerCenter Development Best Practices

TABLE OF CONTENTS

Abstract................................................................................................................................3 Content overview.................................................................................................................3 1. Lookup - Performance considerations.............................................................................3 1.1. Unwanted columns....................................................................................................3 1.2. Size of the source versus size of lookup...................................................................3 1.3. JOIN instead of Lookup............................................................................................4 1.4. Conditional call of lookup........................................................................................4 1.5. SQL query.................................................................................................................4 1.6. Increase cache...........................................................................................................4 1.7. Cachefile file-system................................................................................................4 1.8. Useful cache utilities.................................................................................................4 2. Workflow performance basic considerations................................................................5 2.1. SQL tuning....................................................................................................................6 3. Pre/Post-Session command - Uses...................................................................................7 4. Sequence generator design considerations....................................................................8 5. FTP Connection object platform independence............................................................8

Abstract
This article explains a few of the important development best practices, like lookups, workflow performance etc.

Content overview
Lookup - Performance considerations Workflow performance basic considerations Pre/Post-Session commands - Uses Sequence generator design considerations FTP Connection object platform independence

1. Lookup - Performance considerations

What is a lookup transformation? It is just not another transformation that fetches you data to look up against source data. A Lookup is an important and useful transformation when used effectively. If used improperly, performance of your mapping will be severely impaired. Let us see the different scenarios where you can face problems with Lookup and also how to tackle them.

1.1. Unwanted columns

By default, when you create a lookup on a table, PowerCenter gives you all the columns in the table. If not all the columns are required for the lookup condition or return, delete the unwanted columns from the transformations. By not removing the unwanted columns, the cache size will increase.

1.2. Size of the source versus size of lookup

Let us say, you have 10 rows in the source and one of the columns has to be checked against a big table (1 million rows). Then PowerCenter builds the cache for the lookup table and then checks the 10 source rows against the cache. It takes more time to build the cache of 1 million rows than going to the database 10 times and lookup against the table directly. Use uncached lookup instead of building the static cache, as the number of source rows is quite less than that of the lookup.

1.3. JOIN instead of Lookup

In the same context as above, if the Lookup transformation is after the source qualifier and there is no active transformation in-between, you can as well go for the SQL over ride of source qualifier and join traditionally to the lookup table using database joins, if both the tables are in the same database and schema.

1.4. Conditional call of lookup

Instead of going for connected lookups with filters for a conditional lookup call, go for unconnected lookup. Is the single column return bothering for this? Go ahead and change the SQL override to concatenate the required columns into one big column. Break them at the calling side into individual columns again.

1.5. SQL query

Find the execution plan of the Lookup SQL and see if you can add some indexes or hints to the query to make it fetch data faster. You may have to take the help of a database developer to accomplish this if you, yourself are not a SQLer.

1.6. Increase cache

If none of the above options provide performance enhancements, then the problem may lie with the cache. The cache that you assigned for the lookup is not sufficient to hold the data or index of the lookup. Whatever data that doesn't fit into the cache is spilt into the cache files designated in $PMCacheDir. When the PowerCenter doesn't find the data you are looking for in the cache, it swaps the data from the file to the cache and keeps doing this until the data is found. This is quite expensive being that this type of operation is very I/O intense. To stop this issue from occurring, increase the size of the cache so the entire data set resides in memory. When increasing the cache you also have to be aware of the system constraints. If your cache size is greater than the resources available, the session will fail due to the lack of resources.

1.7. Cachefile file-system

In many cases, if you have cache directory in a different file-system than that of the hosting server, the cache file piling up may take time and result in latency. So with the help of your system administrators try to look into this aspect as well.

1.8. Useful cache utilities

If the same lookup SQL is being used by another lookup, then shared cache or a reusable lookup should be used. Also, if you have a table where the data is not changed often, you can use the persist cache option to build the cache once and use it many times by consecutive flows.

2. Workflow performance basic considerations

Though performance tuning has been the most feared part of development, it is the easiest, if the intricacies are known. With the newer and newer versions of PowerCenter, there is added flexibility for the developer to build better performing workflows. The major blocks for performance are the design of the mapping, SQL tuning if databases are involved. Regarding the design of the mapping, I have few basic considerations to be made. Please note that these are not any rules-of-thumb, but will make you act sensibly in different scenarios.

1. I would always suggest you to think twice before using an Update Strategy, though it adds a certain level of flexibility in the mapping. If you have a straight-through mapping which takes data from source and directly inserts all the records into the target, you wouldnt need an update strategy. 2. Use a pre-SQL delete statement if you wish to delete specific rows from target before loading into the target. Use truncate option in the session properties, if you wish to clean the table before loading. I would avoid a separate pipe-line in the mapping that runs before the load with update-strategy transformation. 3. You have 3 sources and 3 targets with one-on-one mapping. If the load is independent according to business requirement, I would create 3 different mappings and 3 different session instances and they all run in parallel in my workflow after my Start task. Ive observed that the workflow runtime comes down between 30-60% of serial processing. 4. PowerCenter is built to work of high volumes of data. So let the server be completely busy. Induce parallelism as far as possible into the mapping/workflow. 5. If using a transformation like a Joiner or Aggregator transformation, sort the data on the join keys or group by columns prior to these transformations to decrease the processing time. 6. Filtering should be done at the database level instead within the mapping. The database engine is much more efficient in filtering than PowerCenter.
The above examples are just some things to consider when tuning a mapping.

2.1. SQL tuning

SQL queries/actions occur in PowerCenter in one of the below ways.

Relational Source Qualifier Lookup SQL Override Stored Procedures Relational Target

Using the execution plan to tune a query is the best way to gain an understanding of how the database will process the data. Some things to keep in mind when reading the execution plan include: "Full Table Scans are not evil", "Indexes are not always fast", and Indexes can be slow too". Analyse the table data to see if picking up 20 records out of 20 million is best using index or using table scan. Fetching 10 records out of 15 using index is faster or using full table scan is easier. Many times the relational target indexes create performance problems when loading records into the relational target. If the indexes are needed for other purposes, it is suggested to drop the indexes at the time of loading and then rebuild them in postSQL. When dropping indexes on a target you should consider integrity constraints and the time it takes to rebuild the index on post load vs. actual load time.

3. Pre/Post-Session command - Uses

It is a very good practice to email the success or failure status of a task, once it is done. In the same way, when a business requirement drives, make use of the Post Session Success and Failure email for proper communication. The built-in feature offers more flexibility with Session Logs as attachments and also provides other run-time data like Workflow run instance ID, etc. Any archiving activities around the source and target flat files can be easily managed within the session using the session properties for flat file command support that is new in PowerCenter v8.6. For example, after writing the flat file target, you can setup a command to zip the file to save space. If you have any editing of data in the target flat files which your mapping couldnt accommodate, write a shell/batch command or script and call it in the Post-Session command task. I prefer taking trade-offs between PowerCenter capabilities and the OS capabilities in these scenarios.

4. Sequence generator design considerations

In most of the cases, I would advice you to avoid the use of sequence generator transformation, while populating an ID column in the relational target table. I suggest you rather create a sequence on the target database and enable the trigger on that table to fetch the value from the database sequence. There are many advantages to using a database sequence generator:

Fewer PowerCenter objects will be present in a mapping which reduces development time and also maintenance effort. ID generation is PowerCenter independent if a different application is used in future to populate the target. Migration between environments is simplified because there is no additional overhead of considering the persistent values of the sequence generator from the repository database.

In all of the above cases, a sequence created in the target database would make life lot easier for the table data maintenance and also for the PowerCenter development. In fact, databases will have specific mechanisms (focused) to deal with sequences and so you can implement manual Push-down optimization on your PowerCenter mapping design for yourself. DBAs will always complain about triggers on the databases, but I would still insist on using sequence-trigger combination for huge volumes of data as well.

5. FTP Connection object platform independence

If you have any files to be read as source from Windows server when your PowerCenter server is hosted on UNIX/LINUX, then make use of FTP users on the Windows server and use File Reader with FTP Connection object. This connection object can be added as any other connection string. This gives the flexibility of platform independence. This will further reduce the overhead of having SAMBA mounts on to the Informatica boxes.

Apache Cassandra Administrator Associate - Exam Practice Tests
From Everand
Apache Cassandra Administrator Associate - Exam Practice Tests
Cristian Scutaru
No ratings yet
PBL2 SME Governance Problem Statement-V2
No ratings yet
PBL2 SME Governance Problem Statement-V2
3 pages
Parjdev
No ratings yet
Parjdev
1,130 pages
Partitioning Oracle Sources in PowerCenter
No ratings yet
Partitioning Oracle Sources in PowerCenter
12 pages
Informatica PowerCenter Scenario-II
No ratings yet
Informatica PowerCenter Scenario-II
9 pages
PowerCenter Pushdown
No ratings yet
PowerCenter Pushdown
25 pages
PowerCenter Level1 Unit03
No ratings yet
PowerCenter Level1 Unit03
18 pages
HDInsight Essentials - Second Edition
From Everand
HDInsight Essentials - Second Edition
Rajesh Nadipalli
No ratings yet
Bussiness Intelligence
No ratings yet
Bussiness Intelligence
6 pages
Optim Installation &amp Configuration Guide
No ratings yet
Optim Installation &amp Configuration Guide
548 pages
Datastage Administration: Ibm Infosphere Datastage V11.5
No ratings yet
Datastage Administration: Ibm Infosphere Datastage V11.5
23 pages
Teradata University
No ratings yet
Teradata University
8 pages
Adm Ash Addm
No ratings yet
Adm Ash Addm
50 pages
Oracle Forms Reports Questions and Answers
No ratings yet
Oracle Forms Reports Questions and Answers
8 pages
OLAP Solutions: Building Multidimensional Information Systems
From Everand
OLAP Solutions: Building Multidimensional Information Systems
Erik Thomsen
3/5 (4)
Datastage Admin
No ratings yet
Datastage Admin
161 pages
Optimizing Hadoop for MapReduce
From Everand
Optimizing Hadoop for MapReduce
Khaled Tannir
No ratings yet
ColdFusion Interview Questions, Answers, and Explanations: ColdFusion Certification Review
From Everand
ColdFusion Interview Questions, Answers, and Explanations: ColdFusion Certification Review
equitypress
No ratings yet
4) Information Schema & Performanc Schema PDF
No ratings yet
4) Information Schema & Performanc Schema PDF
22 pages
Instant Pentaho Data Integration Kitchen
From Everand
Instant Pentaho Data Integration Kitchen
Sergio Ramazzina
No ratings yet
Data Pipeline
No ratings yet
Data Pipeline
13 pages
Oracle Cloud Solutions Infrastructure 1072 PDF
No ratings yet
Oracle Cloud Solutions Infrastructure 1072 PDF
3 pages
Monitoring Hadoop
From Everand
Monitoring Hadoop
Gurmukh Singh
No ratings yet
Data Warehouse Definition
No ratings yet
Data Warehouse Definition
12 pages
Oracle SQL Tuning 1230324983128347 2
No ratings yet
Oracle SQL Tuning 1230324983128347 2
97 pages
Mastering Data Engineering and Analytics with Databricks: A Hands-on Guide to Build Scalable Pipelines Using Databricks, Delta Lake, and MLflow (English Edition)
From Everand
Mastering Data Engineering and Analytics with Databricks: A Hands-on Guide to Build Scalable Pipelines Using Databricks, Delta Lake, and MLflow (English Edition)
Manoj Kumar
No ratings yet
BIRT Performance Scorecard Root Cause Analysis and Data Visualization
No ratings yet
BIRT Performance Scorecard Root Cause Analysis and Data Visualization
8 pages
Table Expressions
No ratings yet
Table Expressions
74 pages
Oracle Data Pump Export and Import
No ratings yet
Oracle Data Pump Export and Import
4 pages
Toad UserGuide
No ratings yet
Toad UserGuide
348 pages
IBM Cognos 10 Dynamic Query Cookbook
100% (1)
IBM Cognos 10 Dynamic Query Cookbook
81 pages
Document:: Oracle Data Integrator 11g (11.1.1.3) Password Recovery of User (Advance ODI Administration)
No ratings yet
Document:: Oracle Data Integrator 11g (11.1.1.3) Password Recovery of User (Advance ODI Administration)
15 pages
Talend Big Data Data Transformation Pig
No ratings yet
Talend Big Data Data Transformation Pig
8 pages
Database Security and Privacy UNIT - IV - PPT
No ratings yet
Database Security and Privacy UNIT - IV - PPT
84 pages
Data Cleaning with Power BI: The definitive guide to transforming dirty data into actionable insights
From Everand
Data Cleaning with Power BI: The definitive guide to transforming dirty data into actionable insights
Gus Frazer
No ratings yet
9 - CT071-3-3-DDAC - Introduction To Azure Cosmos DB
No ratings yet
9 - CT071-3-3-DDAC - Introduction To Azure Cosmos DB
30 pages
HBase Administration Cookbook
From Everand
HBase Administration Cookbook
Yifeng Jiang
No ratings yet
Professional Hadoop Solutions
From Everand
Professional Hadoop Solutions
Boris Lublinsky
4/5 (2)
Microsoft Dynamics NAV 7 Programming Cookbook
From Everand
Microsoft Dynamics NAV 7 Programming Cookbook
Rakesh Raul
No ratings yet
Database testing Third Edition
From Everand
Database testing Third Edition
Gerardus Blokdyk
No ratings yet
Subject: Business Intelligence
100% (1)
Subject: Business Intelligence
30 pages
TIBCO Hawk Rulebase Standard Guidelines
No ratings yet
TIBCO Hawk Rulebase Standard Guidelines
9 pages
PowerCenter 8 Advanced Mapping Design Exam:Skill Set Inventory
No ratings yet
PowerCenter 8 Advanced Mapping Design Exam:Skill Set Inventory
4 pages
Conncetivity To IBM Universe and Unidata
No ratings yet
Conncetivity To IBM Universe and Unidata
54 pages
Pentaho Data Integration Cookbook - Second Edition
From Everand
Pentaho Data Integration Cookbook - Second Edition
María Carina Roldán
No ratings yet
Informatica MDM Course Contents
No ratings yet
Informatica MDM Course Contents
7 pages
Odi Architecture
No ratings yet
Odi Architecture
26 pages
OBIEe Resume
100% (1)
OBIEe Resume
7 pages
Data Warehousing and Data Mining - Unit2
No ratings yet
Data Warehousing and Data Mining - Unit2
14 pages
Advanced SQL Case Study
No ratings yet
Advanced SQL Case Study
42 pages
A Performance Comparison of SQL and NoSQL Databases
No ratings yet
A Performance Comparison of SQL and NoSQL Databases
5 pages
Business rules A Complete Guide
From Everand
Business rules A Complete Guide
Gerardus Blokdyk
No ratings yet
Resume Temmplate
No ratings yet
Resume Temmplate
2 pages
70-764 - Administering A SQL Database Infrastructure
No ratings yet
70-764 - Administering A SQL Database Infrastructure
8 pages
Kanna Technologies - DBA Content
No ratings yet
Kanna Technologies - DBA Content
3 pages
Big Data Landscape 2017
No ratings yet
Big Data Landscape 2017
1 page
Informatica Audit Tables
100% (2)
Informatica Audit Tables
27 pages
Google Cloud Data Engineer 100+ Practice Exam Questions With Well Explained Answers
From Everand
Google Cloud Data Engineer 100+ Practice Exam Questions With Well Explained Answers
vivian njoroge
No ratings yet
IBM InfoSphere DataStage and QualityStage Version 11 Release 3 Designer Client Guide
No ratings yet
IBM InfoSphere DataStage and QualityStage Version 11 Release 3 Designer Client Guide
279 pages
DataStage Adminguide
0% (1)
DataStage Adminguide
40 pages
5-Terraform Modules
No ratings yet
5-Terraform Modules
6 pages
S7-300 Module Specification
No ratings yet
S7-300 Module Specification
564 pages
Chapter 5 6 and 7
No ratings yet
Chapter 5 6 and 7
40 pages
Cassini - PLM Brochure
No ratings yet
Cassini - PLM Brochure
4 pages
Flyer HIT 3seitig en
No ratings yet
Flyer HIT 3seitig en
2 pages
ZXA10 C300&C320 (V2.0.1) Optical Access Convergence Equipment Command Reference
No ratings yet
ZXA10 C300&C320 (V2.0.1) Optical Access Convergence Equipment Command Reference
2,438 pages
Virus Scan Exclusions For Microsoft Products
No ratings yet
Virus Scan Exclusions For Microsoft Products
19 pages
3-V To 5.5-V Multichannel Rs-232 Line Driver/Receiver With 15-Kv Iec Esd Protection
No ratings yet
3-V To 5.5-V Multichannel Rs-232 Line Driver/Receiver With 15-Kv Iec Esd Protection
31 pages
DTC-320 StreamXpert Installation
No ratings yet
DTC-320 StreamXpert Installation
11 pages
Half Yearly Rev
No ratings yet
Half Yearly Rev
7 pages
Email Scenario
100% (1)
Email Scenario
2 pages
Response To RFP Notes
No ratings yet
Response To RFP Notes
6 pages
KMK Pipeline Success Story
No ratings yet
KMK Pipeline Success Story
1 page
LAS #8 21st Century Literature From The Philippines and The World
No ratings yet
LAS #8 21st Century Literature From The Philippines and The World
12 pages
Saptarshi's Resume
No ratings yet
Saptarshi's Resume
1 page
Schemas Sailors (Sid, Sname, Age, Rating), Boats (Bid, Bname, Color), Reserves (Sid, Bid, Day)
No ratings yet
Schemas Sailors (Sid, Sname, Age, Rating), Boats (Bid, Bname, Color), Reserves (Sid, Bid, Day)
2 pages
Touchstone Online Course Students Guide 230316 PDF
No ratings yet
Touchstone Online Course Students Guide 230316 PDF
21 pages
CVDEMARCASSIUS
No ratings yet
CVDEMARCASSIUS
2 pages
Aricent PPR
No ratings yet
Aricent PPR
52 pages
M Configuring Virtual Interfaces
No ratings yet
M Configuring Virtual Interfaces
2 pages
R18 CSM 3-2 Devops
No ratings yet
R18 CSM 3-2 Devops
28 pages
Machine Learning Mastery
No ratings yet
Machine Learning Mastery
2 pages
Experiment #8 Serial Communication Using The Asynchronous Communications Interface Adapter (Acia)
No ratings yet
Experiment #8 Serial Communication Using The Asynchronous Communications Interface Adapter (Acia)
29 pages
CATP1 Prelim2020 QPaper
No ratings yet
CATP1 Prelim2020 QPaper
19 pages
Lecture3.11-3.13 NP Theory and Non Deterministic Algorithm
No ratings yet
Lecture3.11-3.13 NP Theory and Non Deterministic Algorithm
42 pages
Multi Organization Architecture in Oracle Applications R12
No ratings yet
Multi Organization Architecture in Oracle Applications R12
8 pages
Inserting Images
No ratings yet
Inserting Images
25 pages
Helpinghands: by Ankit Thakur Hardik Parekh Ankit Solanki
No ratings yet
Helpinghands: by Ankit Thakur Hardik Parekh Ankit Solanki
58 pages
Mobility Management Entity Overview: Product Description
No ratings yet
Mobility Management Entity Overview: Product Description
68 pages
Request For Proposal - MPLS Network Management - V1.9 - 20 Sep
No ratings yet
Request For Proposal - MPLS Network Management - V1.9 - 20 Sep
16 pages

Informatica Power Center Best Practices

Uploaded by

Informatica Power Center Best Practices

Uploaded by

Informatica PowerCenter Development Best Practices

1. Lookup - Performance considerations

1.1. Unwanted columns

1.2. Size of the source versus size of lookup

1.3. JOIN instead of Lookup

1.4. Conditional call of lookup

1.5. SQL query

1.6. Increase cache

1.7. Cachefile file-system

1.8. Useful cache utilities

2. Workflow performance basic considerations

2.1. SQL tuning

3. Pre/Post-Session command - Uses

4. Sequence generator design considerations

5. FTP Connection object platform independence

You might also like