Informatica Power Center Best Practices
Informatica Power Center Best Practices
TABLE OF CONTENTS
Abstract................................................................................................................................3 Content overview.................................................................................................................3 1. Lookup - Performance considerations.............................................................................3 1.1. Unwanted columns....................................................................................................3 1.2. Size of the source versus size of lookup...................................................................3 1.3. JOIN instead of Lookup............................................................................................4 1.4. Conditional call of lookup........................................................................................4 1.5. SQL query.................................................................................................................4 1.6. Increase cache...........................................................................................................4 1.7. Cachefile file-system................................................................................................4 1.8. Useful cache utilities.................................................................................................4 2. Workflow performance basic considerations................................................................5 2.1. SQL tuning....................................................................................................................6 3. Pre/Post-Session command - Uses...................................................................................7 4. Sequence generator design considerations....................................................................8 5. FTP Connection object platform independence............................................................8
Abstract
This article explains a few of the important development best practices, like lookups, workflow performance etc.
Content overview
Lookup - Performance considerations Workflow performance basic considerations Pre/Post-Session commands - Uses Sequence generator design considerations FTP Connection object platform independence
If the same lookup SQL is being used by another lookup, then shared cache or a reusable lookup should be used. Also, if you have a table where the data is not changed often, you can use the persist cache option to build the cache once and use it many times by consecutive flows.
1. I would always suggest you to think twice before using an Update Strategy, though it adds a certain level of flexibility in the mapping. If you have a straight-through mapping which takes data from source and directly inserts all the records into the target, you wouldnt need an update strategy. 2. Use a pre-SQL delete statement if you wish to delete specific rows from target before loading into the target. Use truncate option in the session properties, if you wish to clean the table before loading. I would avoid a separate pipe-line in the mapping that runs before the load with update-strategy transformation. 3. You have 3 sources and 3 targets with one-on-one mapping. If the load is independent according to business requirement, I would create 3 different mappings and 3 different session instances and they all run in parallel in my workflow after my Start task. Ive observed that the workflow runtime comes down between 30-60% of serial processing. 4. PowerCenter is built to work of high volumes of data. So let the server be completely busy. Induce parallelism as far as possible into the mapping/workflow. 5. If using a transformation like a Joiner or Aggregator transformation, sort the data on the join keys or group by columns prior to these transformations to decrease the processing time. 6. Filtering should be done at the database level instead within the mapping. The database engine is much more efficient in filtering than PowerCenter.
The above examples are just some things to consider when tuning a mapping.
Relational Source Qualifier Lookup SQL Override Stored Procedures Relational Target
Using the execution plan to tune a query is the best way to gain an understanding of how the database will process the data. Some things to keep in mind when reading the execution plan include: "Full Table Scans are not evil", "Indexes are not always fast", and Indexes can be slow too". Analyse the table data to see if picking up 20 records out of 20 million is best using index or using table scan. Fetching 10 records out of 15 using index is faster or using full table scan is easier. Many times the relational target indexes create performance problems when loading records into the relational target. If the indexes are needed for other purposes, it is suggested to drop the indexes at the time of loading and then rebuild them in postSQL. When dropping indexes on a target you should consider integrity constraints and the time it takes to rebuild the index on post load vs. actual load time.
It is a very good practice to email the success or failure status of a task, once it is done. In the same way, when a business requirement drives, make use of the Post Session Success and Failure email for proper communication. The built-in feature offers more flexibility with Session Logs as attachments and also provides other run-time data like Workflow run instance ID, etc. Any archiving activities around the source and target flat files can be easily managed within the session using the session properties for flat file command support that is new in PowerCenter v8.6. For example, after writing the flat file target, you can setup a command to zip the file to save space. If you have any editing of data in the target flat files which your mapping couldnt accommodate, write a shell/batch command or script and call it in the Post-Session command task. I prefer taking trade-offs between PowerCenter capabilities and the OS capabilities in these scenarios.
Fewer PowerCenter objects will be present in a mapping which reduces development time and also maintenance effort. ID generation is PowerCenter independent if a different application is used in future to populate the target. Migration between environments is simplified because there is no additional overhead of considering the persistent values of the sequence generator from the repository database.
In all of the above cases, a sequence created in the target database would make life lot easier for the table data maintenance and also for the PowerCenter development. In fact, databases will have specific mechanisms (focused) to deal with sequences and so you can implement manual Push-down optimization on your PowerCenter mapping design for yourself. DBAs will always complain about triggers on the databases, but I would still insist on using sequence-trigger combination for huge volumes of data as well.