Performance Tuning Guide
Performance Tuning Guide
Informatica® PowerCenter®
(Version 8.6.1)
Informatica PowerCenter Performance Tuning Guide
Version 8.6.1
December 2008
This software and documentation contain proprietary information of Informatica Corporation and are provided under a license agreement containing restrictions on use and disclosure and are also
protected by copyright law. Reverse engineering of the software is prohibited. No part of this document may be reproduced or transmitted in any form, by any means (electronic, photocopying,
recording or otherwise) without prior consent of Informatica Corporation. This Software may be protected by U.S. and/or international Patents and other Patents Pending.
Use, duplication, or disclosure of the Software by the U.S. Government is subject to the restrictions set forth in the applicable software license agreement and as provided in DFARS 227.7202-1(a) and
227.7702-3(a) (1995), DFARS 252.227-7013(c)(1)(ii) (OCT 1988), FAR 12.212(a) (1995), FAR 52.227-19, or FAR 52.227-14 (ALT III), as applicable.
The information in this product or documentation is subject to change without notice. If you find any problems in this product or documentation, please report them to us in writing.
Informatica, PowerCenter, PowerCenterRT, PowerCenter Connect, PowerCenter Data Analyzer, PowerExchange, PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer,
Informatica B2B Data Exchange and Informatica On Demand are trademarks or registered trademarks of Informatica Corporation in the United States and in jurisdictions throughout the world. All
other company and product names may be trade names or trademarks of their respective owners.
Portions of this software and/or documentation are subject to copyright held by third parties, including without limitation: Copyright DataDirect Technologies. All rights reserved. Copyright © 2007
Adobe Systems Incorporated. All rights reserved. Copyright © Sun Microsystems. All rights reserved. Copyright © RSA Security Inc. All Rights Reserved. Copyright © Ordinal Technology Corp. All
rights reserved. Copyright © Platon Data Technology GmbH. All rights reserved. Copyright © Melissa Data Corporation. All rights reserved. Copyright © Aandacht c.v. All rights reserved. Copyright
1996-2007 ComponentSource®. All rights reserved. Copyright Genivia, Inc. All rights reserved. Copyright 2007 Isomorphic Software. All rights reserved. Copyright © Meta Integration Technology,
Inc. All rights reserved. Copyright © Microsoft. All rights reserved. Copyright © Oracle. All rights reserved. Copyright © AKS-Labs. All rights reserved. Copyright © Quovadx, Inc. All rights reserved.
Copyright © SAP. All rights reserved. Copyright 2003, 2007 Instantiations, Inc. All rights reserved. Copyright © Intalio. All rights reserved.
This product includes software developed by the Apache Software Foundation (http://www.apache.org/), software copyright 2004-2005 Open Symphony (all rights reserved) and other software which is
licensed under the Apache License, Version 2.0 (the “License”). You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0. Unless required by applicable law or agreed to in
writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
This product includes software which was developed by Mozilla (http://www.mozilla.org/), software copyright The JBoss Group, LLC, all rights reserved; software copyright, Red Hat Middleware, LLC,
all rights reserved; software copyright © 1999-2006 by Bruno Lowagie and Paulo Soares and other software which is licensed under the GNU Lesser General Public License Agreement, which may be
found at http://www.gnu.org/licenses/lgpl.html. The materials are provided free of charge by Informatica, “as-is”, without warranty of any kind, either express or implied, including but not limited to the
implied warranties of merchantability and fitness for a particular purpose.
The product includes ACE(TM) and TAO(TM) software copyrighted by Douglas C. Schmidt and his research group at Washington University, University of California, Irvine, and Vanderbilt
University, Copyright (c) 1993-2006, all rights reserved.
This product includes software copyright (c) 2003-2007, Terence Parr. All rights reserved. Your right to use such materials is set forth in the license which may be found at http://www.antlr.org/
license.html. The materials are provided free of charge by Informatica, “as-is”, without warranty of any kind, either express or implied, including but not limited to the implied warranties of
merchantability and fitness for a particular purpose.
This product includes software developed by the OpenSSL Project for use in the OpenSSL Toolkit (copyright The OpenSSL Project. All Rights Reserved) and redistribution of this software is subject to
terms available at http://www.openssl.org.
This product includes Curl software which is Copyright 1996-2007, Daniel Stenberg, <[email protected]>. All Rights Reserved. Permissions and limitations regarding this software are subject to terms
available at http://curl.haxx.se/docs/copyright.html. Permission to use, copy, modify, and distribute this software for any purpose with or without fee is hereby granted, provided that the above copyright
notice and this permission notice appear in all copies.
The product includes software copyright 2001-2005 (C) MetaStuff, Ltd. All Rights Reserved. Permissions and limitations regarding this software are subject to terms available at http://www.dom4j.org/
license.html.
The product includes software copyright (c) 2004-2007, The Dojo Foundation. All Rights Reserved. Permissions and limitations regarding this software are subject to terms available at http://
svn.dojotoolkit.org/dojo/trunk/LICENSE.
This product includes ICU software which is copyright (c) 1995-2003 International Business Machines Corporation and others. All rights reserved. Permissions and limitations regarding this software
are subject to terms available at http://www-306.ibm.com/software/globalization/icu/license.jsp
This product includes software copyright (C) 1996-2006 Per Bothner. All rights reserved. Your right to use such materials is set forth in the license which may be found at http://www.gnu.org/software/
kawa/Software-License.html.
This product includes OSSP UUID software which is Copyright (c) 2002 Ralf S. Engelschall, Copyright (c) 2002 The OSSP Project Copyright (c) 2002 Cable & Wireless Deutschland. Permissions and
limitations regarding this software are subject to terms available at http://www.opensource.org/licenses/mit-license.php.
This product includes software developed by Boost (http://www.boost.org/) or under the Boost software license. Permissions and limitations regarding this software are subject to terms available at http:/
/www.boost.org/LICENSE_1_0.txt.
This product includes software copyright © 1997-2007 University of Cambridge. Permissions and limitations regarding this software are subject to terms available at http://www.pcre.org/license.txt.
This product includes software copyright (c) 2007 The Eclipse Foundation. All Rights Reserved. Permissions and limitations regarding this software are subject to terms available at http://
www.eclipse.org/org/documents/epl-v10.php.
The product includes the zlib library copyright (c) 1995-2005 Jean-loup Gailly and Mark Adler.
This product includes software licensed under the Academic Free License (http://www.opensource.org/licenses/afl-3.0.php). This product includes software copyright © 2003-2006 Joe WaInes, 2006-
2007 XStream Committers. All rights reserved. Permissions and limitations regarding this software are subject to terms available at http://xstream.codehaus.org/license.html. This product includes
software developed by the Indiana University Extreme! Lab. For further information please visit http://www.extreme.indiana.edu/.
This Software is protected by U.S. Patent Numbers 6,208,990; 6,044,374; 6,014,670; 6,032,158; 5,794,246; 6,339,775; 6,850,947; 6,895,471; 7,254,590 and other U.S. Patents Pending.
DISCLAIMER: Informatica Corporation provides this documentation “as is” without warranty of any kind, either express or implied, including, but not limited to, the implied warranties of non-
infringement, merchantability, or use for a particular purpose. Informatica Corporation does not warrant that this software or documentation is error free. The information provided in this software or
documentation may include technical inaccuracies or typographical errors. The information in this software and documentation is subject to change at any time without notice.
Chapter 2: Bottlenecks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Using Thread Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Eliminating Bottlenecks Based on Thread Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Target Bottlenecks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Identifying Target Bottlenecks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Eliminating Target Bottlenecks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Source Bottlenecks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Identifying Source Bottlenecks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Eliminating Source Bottlenecks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Mapping Bottlenecks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Identifying Mapping Bottlenecks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Eliminating Mapping Bottlenecks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Session Bottlenecks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Identifying Session Bottlenecks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Eliminating Session Bottlenecks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
System Bottlenecks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Identifying System Bottlenecks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Eliminating System Bottlenecks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
iii
Chapter 4: Optimizing the Source . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Optimizing the Query . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Using Conditional Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Increasing Database Network Packet Size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Connecting to Oracle Database Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Using Teradata FastExport . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Using tempdb to Join Sybase or Microsoft SQL Server Tables . . . . . . . . . . . . . . . . . . . . . . . . 17
iv Table of Contents
Optimizing Multiple Lookups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
Creating a Pipeline Lookup Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
Optimizing Sequence Generator Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
Optimizing Sorter Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
Allocating Memory. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
Work Directories for Partitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
Optimizing Source Qualifier Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
Optimizing SQL Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
Eliminating Transformation Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
v
Chapter 9: Optimizing the PowerCenter Components . . . . . . . . . . . . . . . . . . . . . . . . 47
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
Optimizing PowerCenter Repository Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
Location of the Repository Service Process and Repository . . . . . . . . . . . . . . . . . . . . . . . 48
Ordering Conditions in Object Queries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
Using a Single-Node DB2 Database Tablespace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
Optimizing the Database Schema . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
Optimizing Integration Service Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
Using Native and ODBC Drivers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
Running the Integration Service in ASCII Data Movement Mode . . . . . . . . . . . . . . . . . . 49
Caching PowerCenter Metadata for the Repository Service . . . . . . . . . . . . . . . . . . . . . . 49
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
vi Table of Contents
Preface
The PowerCenter Performance Tuning Guide is written for PowerCenter administrators and developers, network
administrators, and database administrators who are interested in improving PowerCenter performance. This
guide assumes you have knowledge of your operating systems, networks, PowerCenter, relational database
concepts, and flat files in your environment. For more information about database performance tuning not
covered in this guide, see the documentation accompanying your database products.
Informatica Resources
Informatica Customer Portal
As an Informatica customer, you can access the Informatica Customer Portal site at http://my.informatica.com.
The site contains product information, user group information, newsletters, access to the Informatica customer
support case management system (ATLAS), the Informatica How-To Library, the Informatica Knowledge Base,
Informatica Documentation Center, and access to the Informatica user community.
Informatica Documentation
The Informatica Documentation team takes every effort to create accurate, usable documentation. If you have
questions, comments, or ideas about this documentation, contact the Informatica Documentation team
through email at [email protected]. We will use your feedback to improve our
documentation. Let us know if we can contact you regarding your comments.
vii
Informatica Knowledge Base
As an Informatica customer, you can access the Informatica Knowledge Base at http://my.informatica.com. Use
the Knowledge Base to search for documented solutions to known technical issues about Informatica products.
You can also find answers to frequently asked questions, technical white papers, and technical tips.
North America / South America Europe / Middle East / Africa Asia / Australia
viii Preface
CHAPTER 1
Overview
The goal of performance tuning is to optimize session performance by eliminating performance bottlenecks. To
tune session performance, first identify a performance bottleneck, eliminate it, and then identify the next
performance bottleneck until you are satisfied with the session performance. You can use the test load option to
run sessions when you tune session performance.
If you tune all the bottlenecks, you can further optimize session performance by increasing the number of
pipeline partitions in the session. Adding partitions can improve performance by utilizing more of the system
hardware while processing the session.
Because determining the best way to improve performance can be complex, change one variable at a time, and
time the session both before and after the change. If session performance does not improve, you might want to
return to the original configuration.
Complete the following tasks to improve session performance:
1. Optimize the target. Enables the Integration Service to write to the targets efficiently. For more
information, see “Optimizing the Target” on page 11.
2. Optimize the source. Enables the Integration Service to read source data efficiently. For more information,
see “Optimizing the Source” on page 15.
3. Optimize the mapping. Enables the Integration Service to transform and move data efficiently. For more
information, see “Optimizing Mappings” on page 19.
4. Optimize the transformation. Enables the Integration Service to process transformations in a mapping
efficiently. For more information, see “Optimizing Transformations” on page 25.
5. Optimize the session. Enables the Integration Service to run the session more quickly. For more
information, see “Optimizing Sessions” on page 33.
6. Optimize the grid deployments. Enables the Integration Service to run on a grid with optimal
performance. For more information, see “Optimizing Grid Deployments” on page 41.
7. Optimize the PowerCenter components. Enables the Integration Service and Repository Service to
function optimally. For more information, see “Optimizing the PowerCenter Components” on page 47.
8. Optimize the system. Enables PowerCenter service processes to run more quickly. For more information,
see “Optimizing the System” on page 51.
1
2 Chapter 1: Performance Tuning Overview
CHAPTER 2
Bottlenecks
This chapter includes the following topics:
♦ Overview, 3
♦ Using Thread Statistics, 4
♦ Target Bottlenecks, 5
♦ Source Bottlenecks, 5
♦ Mapping Bottlenecks, 6
♦ Session Bottlenecks, 7
♦ System Bottlenecks, 7
Overview
The first step in performance tuning is to identify performance bottlenecks. Performance bottlenecks can occur
in the source and target databases, the mapping, the session, and the system. The strategy is to identify a
performance bottleneck, eliminate it, and then identify the next performance bottleneck until you are satisfied
with the performance.
Look for performance bottlenecks in the following order:
1. Target
2. Source
3. Mapping
4. Session
5. System
Use the following methods to identify performance bottlenecks:
♦ Run test sessions. You can configure a test session to read from a flat file source or to write to a flat file target
to identify source and target bottlenecks.
♦ Analyze performance details. Analyze performance details, such as performance counters, to determine
where session performance decreases.
♦ Analyze thread statistics. Analyze thread statistics to determine the optimal number of partition points.
♦ Monitor system performance. You can use system monitoring tools to view the percentage of CPU use, I/O
waits, and paging to identify system bottlenecks. You can also use the Workflow Monitor to view system
resource usage.
3
Using Thread Statistics
You can use thread statistics in the session log to identify source, target, or transformation bottlenecks. By
default, the Integration Service uses one reader thread, one transformation thread, and one writer thread to
process a session. The thread with the highest busy percentage identifies the bottleneck in the session.
The session log provides the following thread statistics:
♦ Run time. Amount of time the thread runs.
♦ Idle time. Amount of time the thread is idle. It includes the time the thread waits for other thread processing
within the application. Idle time includes the time the thread is blocked by the Integration Service, but it
not the time the thread is blocked by the operating system.
♦ Busy time. Percentage of the run time the thread is by according to the following formula:
(run time - idle time) / run time X 100
You can ignore high busy percentages when the total run time is short, such as under 60 seconds. This does
not necessarily indicate a bottleneck.
♦ Thread work time. The percentage of time the Integration Service takes to process each transformation in a
thread. The session log shows the following information for the transformation thread work time:
Thread work time breakdown:
<transformation name>: <number> percent
<transformation name>: <number> percent
<transformation name>: <number> percent
If a transformation takes a small amount of time, the session log does not include it. If a thread does not
have accurate statistics, because the session ran for a short period of time, the session log reports that the
statistics are not accurate.
Example
When you run a session, the session log lists run information and thread statistics similar to the following text:
***** RUN INFO FOR TGT LOAD ORDER GROUP [1], CONCURRENT SET [1] *****
Thread [READER_1_1_1] created for [the read stage] of partition point [SQ_two_gig_file_32B_rows] has
completed.
Total Run Time = [505.871140] secs
Total Idle Time = [457.038313] secs
Busy Percentage = [9.653215]
Thread [TRANSF_1_1_1] created for [the transformation stage] of partition point [SQ_two_gig_file_32B_rows]
has completed.
Total Run Time = [506.230461] secs
Total Idle Time = [1.390318] secs
Busy Percentage = [99.725359]
Thread work time breakdown:
LKP_ADDRESS: 25.000000 percent
SRT_ADDRESS: 21.551724 percent
RTR_ZIP_CODE: 53.448276 percent
Thread [WRITER_1_*_1] created for [the write stage] of partition point [scratch_out_32B] has completed.
Total Run Time = [507.027212] secs
Total Idle Time = [384.632435] secs
Busy Percentage = [24.139686]
4 Chapter 2: Bottlenecks
In this session log, the total run time for the transformation thread is 506 seconds and the busy percentage is
99.7%. This means the transformation thread was never idle for the 506 seconds. The reader and writer busy
percentages were significantly smaller, about 9.6% and 24%. In this session, the transformation thread is the
bottleneck in the mapping.
To determine which transformation in the transformation thread is the bottleneck, view the busy percentage of
each transformation in the thread work time breakdown. In this session log, the transformation
RTR_ZIP_CODE had a busy percentage of 53%.
Target Bottlenecks
The most common performance bottleneck occurs when the Integration Service writes to a target database.
Small checkpoint intervals, small database network packet sizes, or problems during heavy loading operations
can cause target bottlenecks.
RELATED TOPICS:
“Optimizing the Target” on page 11
Source Bottlenecks
Performance bottlenecks can occur when the Integration Service reads from a source database. Inefficient query
or small database network packet sizes can cause source bottlenecks.
Target Bottlenecks 5
♦ Database query
If the session reads from a flat file source, you probably do not have a source bottleneck.
RELATED TOPICS:
“Optimizing the Source” on page 15
Mapping Bottlenecks
If you determine that you do not have a source or target bottleneck, you may have a mapping bottleneck.
6 Chapter 2: Bottlenecks
Identifying Mapping Bottlenecks
To identify mapping bottlenecks, complete the following tasks:
♦ Read the thread statistics and work time statistics in the session log. When the Integration Service spends
more time on the transformation thread than the writer or reader threads, you have a transformation
bottleneck. When the Integration Service spends more time on one transformation, it is the bottleneck in
the transformation thread.
♦ Analyze performance counters. High errorrows and rowsinlookupcache counters indicate a mapping
bottleneck.
♦ Add a Filter transformation before each target definition. Set the filter condition to false so that no data is
loaded into the target tables. If the time it takes to run the new session is the same as the original session,
you have a mapping bottleneck.
RELATED TOPICS:
“Optimizing Mappings” on page 19
Session Bottlenecks
If you do not have a source, target, or mapping bottleneck, you may have a session bottleneck. Small cache size,
low buffer memory, and small commit intervals can cause session bottlenecks.
RELATED TOPICS:
“Optimizing Sessions” on page 33
System Bottlenecks
After you tune the source, target, mapping, and session, consider tuning the system to prevent system
bottlenecks. The Integration Service uses system resources to process transformations, run sessions, and read
and write data. The Integration Service also uses system memory to create cache files for transformations, such
as Aggregator, Joiner, Lookup, Sorter, XML, and Rank.
Session Bottlenecks 7
Using the Workflow Monitor to Identify System Bottlenecks
You can view the Integration Service properties in the Workflow Monitor to see CPU, memory, and swap usage
of the system when you are running task processes on the Integration Service. Use the following Integration
Service properties to identify performance issues:
♦ CPU%. The percentage of CPU usage includes other external tasks running on the system.
♦ Memory usage. The percentage of memory usage includes other external tasks running on the system. If the
memory usage is close to 95%, check if the tasks running on the system are using the amount indicated in
the Workflow Monitor or if there is a memory leak. To troubleshoot, use system tools to check the memory
usage before and after running the session and then compare the results to the memory usage while running
the session.
♦ Swap usage. Swap usage is a result of paging due to possible memory leaks or a high number of concurrent
tasks.
8 Chapter 2: Bottlenecks
another disk device or upgrade to a faster disk device. You can also use a separate disk for each partition in
the session.
♦ If physical disk queue length is greater than two, consider adding another disk device or upgrading the disk
device. You also can use separate disks for the reader, writer, and transformation threads.
♦ Consider improving network bandwidth.
♦ When you tune UNIX systems, tune the server for a major database system.
♦ If the percent time spent waiting on I/O (%wio) is high, consider using other under-utilized disks. For
example, if the source data, target data, lookup, rank, and aggregate cache files are all on the same disk,
consider putting them on different disks.
RELATED TOPICS:
“Reducing Paging” on page 52
“Optimizing the System” on page 51
System Bottlenecks 9
10 Chapter 2: Bottlenecks
CHAPTER 3
11
Increasing Database Checkpoint Intervals
The Integration Service performance slows each time it waits for the database to perform a checkpoint. To
decrease the number of checkpoints and increase performance, increase the checkpoint interval in the database.
Note: Although you gain performance when you reduce the number of checkpoints, you also increase the
recovery time if the database shuts down unexpectedly.
RELATED TOPICS:
“Target-Based Commit” on page 37
Minimizing Deadlocks 13
14 Chapter 3: Optimizing the Target
CHAPTER 4
15
Using Conditional Filters
A simple source filter on the source database can sometimes negatively impact performance because of the lack
of indexes. You can use the PowerCenter conditional filter in the Source Qualifier to improve performance.
Whether you should use the PowerCenter conditional filter to improve performance depends on the session.
For example, if multiple sessions read from the same source simultaneously, the PowerCenter conditional filter
may improve performance.
However, some sessions may perform faster if you filter the source data on the source database. You can test the
session with both the database filter and the PowerCenter filter to determine which method improves
performance.
Optimizing Mappings
This chapter includes the following topics:
♦ Overview, 19
♦ Optimizing Flat File Sources, 19
♦ Configuring Single-Pass Reading, 20
♦ Optimizing Pass-Through Mappings, 20
♦ Optimizing Filters, 21
♦ Optimizing Datatype Conversions, 21
♦ Optimizing Expressions, 21
♦ Optimizing External Procedures, 23
Overview
Mapping-level optimization may take time to implement, but it can significantly boost session performance.
Focus on mapping-level optimization after you optimize the targets and sources.
Generally, you reduce the number of transformations in the mapping and delete unnecessary links between
transformations to optimize the mapping. Configure the mapping with the least number of transformations and
expressions to do the most amount of work possible. Delete unnecessary links between transformations to
minimize the amount of data moved.
19
in the source file is less than the default setting, you can decrease the line sequential buffer length in the session
properties.
Optimizing Filters
Use one of the following transformations to filter data:
♦ Source Qualifier transformation. The Source Qualifier transformation filters rows from relational sources.
♦ Filter transformation. The Filter transformation filters data within a mapping. The Filter transformation
filters rows from any type of source.
If you filter rows from the mapping, you can improve efficiency by filtering early in the data flow. Use a filter in
the Source Qualifier transformation to remove the rows at the source. The Source Qualifier transformation
limits the row set extracted from a relational source.
If you cannot use a filter in the Source Qualifier transformation, use a Filter transformation and move it as close
to the Source Qualifier transformation as possible to remove unnecessary data early in the data flow. The Filter
transformation limits the row set sent to a target.
Avoid using complex expressions in filter conditions. To optimize Filter transformations, use simple integer or
true/false expressions in the filter condition.
Note: You can also use a Filter or Router transformation to drop rejected rows from an Update Strategy
transformation if you do not need to keep rejected rows.
Optimizing Expressions
You can also optimize the expressions used in the transformations. When possible, isolate slow expressions and
simplify them.
Complete the following tasks to isolate the slow expressions:
1. Remove the expressions one-by-one from the mapping.
2. Run the mapping to determine the time it takes to run the mapping without the transformation.
If there is a significant difference in session run time, look for ways to optimize the slow expression.
Optimizing Filters 21
Factoring Out Common Logic
If the mapping performs the same task in multiple places, reduce the number of times the mapping performs
the task by moving the task earlier in the mapping. For example, you have a mapping with five target tables.
Each target requires a Social Security number lookup. Instead of performing the lookup five times, place the
Lookup transformation in the mapping before the data flow splits. Next, pass the lookup results to all five
targets.
If you factor out the aggregate function call, as below, the Integration Service adds COLUMN_A to
COLUMN_B, then finds the sum of both.
SUM(COLUMN_A + COLUMN_B)
This results in three IIFs, two comparisons, two additions, and a faster session.
Evaluating Expressions
If you are not sure which expressions slow performance, evaluate the expression performance to isolate the
problem.
Complete the following steps to evaluate expression performance:
1. Time the session with the original expressions.
2. Copy the mapping and replace half of the complex expressions with a constant.
3. Run and time the edited session.
4. Make another copy of the mapping and replace the other half of the complex expressions with a constant.
5. Run and time the edited session.
Optimizing Transformations
This chapter includes the following topics:
♦ Optimizing Aggregator Transformations, 25
♦ Optimizing Custom Transformations, 26
♦ Optimizing Joiner Transformations, 27
♦ Optimizing Lookup Transformations, 27
♦ Optimizing Sequence Generator Transformations, 30
♦ Optimizing Sorter Transformations, 30
♦ Optimizing Source Qualifier Transformations, 31
♦ Optimizing SQL Transformations, 31
♦ Eliminating Transformation Errors, 32
25
Using Sorted Input
To increase session performance, sort data for the Aggregator transformation. Use the Sorted Input option to
sort data.
The Sorted Input option decreases the use of aggregate caches. When you use the Sorted Input option, the
Integration Service assumes all data is sorted by group. As the Integration Service reads rows for a group, it
performs aggregate calculations. When necessary, it stores group information in memory.
The Sorted Input option reduces the amount of data cached during the session and improves performance. Use
this option with the Source Qualifier Number of Sorted Ports option or a Sorter transformation to pass sorted
data to the Aggregator transformation.
You can increase performance when you use the Sorted Input option in sessions with multiple partitions.
RELATED TOPICS:
“Increasing the Cache Sizes” on page 37
RELATED TOPICS:
“Caches” on page 36
Types of Caches
Use the following types of caches to increase performance:
♦ Shared cache. You can share the lookup cache between multiple transformations. You can share an unnamed
cache between transformations in the same mapping. You can share a named cache between transformations
in the same or different mappings.
♦ Persistent cache. To save and reuse the cache files, you can configure the transformation to use a persistent
cache. Use this feature when you know the lookup table does not change between session runs. Using a
persistent cache can improve performance because the Integration Service builds the memory cache from the
cache files instead of from the database.
The Lookup transformation includes three lookup ports used in the mapping, ITEM_ID, ITEM_NAME, and
PRICE. When you enter the ORDER BY statement, enter the columns in the same order as the ports in the
lookup condition. You must also enclose all database reserved words in quotes.
Enter the following lookup query in the lookup SQL override:
SELECT ITEMS_DIM.ITEM_NAME, ITEMS_DIM.PRICE, ITEMS_DIM.ITEM_ID FROM ITEMS_DIM ORDER BY
ITEMS_DIM.ITEM_ID, ITEMS_DIM.PRICE --
RELATED TOPICS:
“Optimizing Expressions” on page 21
RELATED TOPICS:
“Optimizing Sequence Generator Transformations” on page 45
Allocating Memory
For optimal performance, configure the Sorter cache size with a value less than or equal to the amount of
available physical RAM on the Integration Service node. Allocate at least 16 MB of physical memory to sort
data using the Sorter transformation. The Sorter cache size is set to 16,777,216 bytes by default. If the
Integration Service cannot allocate enough memory to sort data, it fails the session.
RELATED TOPICS:
“Error Tracing” on page 38
Optimizing Sessions
This chapter includes the following topics:
♦ Grid, 33
♦ Pushdown Optimization, 34
♦ Concurrent Sessions and Workflows, 34
♦ Buffer Memory, 34
♦ Caches, 36
♦ Target-Based Commit, 37
♦ Real-time Processing, 38
♦ Staging Areas, 38
♦ Log Files, 38
♦ Error Tracing, 38
♦ Post-Session Emails, 39
Grid
You can use a grid to increase session and workflow performance. A grid is an alias assigned to a group of nodes
that allows you to automate the distribution of workflows and sessions across nodes.
A Load Balancer distributes tasks to nodes without overloading any node.
When you use a grid, the Integration Service distributes workflow tasks and session threads across multiple
nodes. A Load Balancer distributes tasks to nodes without overloading any node. Running workflows and
sessions on the nodes of a grid provides the following performance gains:
♦ Balances the Integration Service workload.
♦ Processes concurrent sessions faster.
♦ Processes partitions faster.
The Integration Service requires CPU resources for parsing input data and formatting the output data. A grid
can improve performance when you have a performance bottleneck in the extract and load steps of a session.
A grid can improve performance when memory or temporary storage is a performance bottleneck. When a
PowerCenter mapping contains a transformation that has cache memory, deploying adequate memory and
separate disk storage for each cache instance improves performance.
33
Running a session on a grid can improve throughput because the grid provides more resources to run the
session. Performance improves when you run a few sessions on the grid at a time. Running a session on a grid is
more efficient than running a workflow over a grid if the number of concurrent session partitions is less than
the number of nodes.
When you run multiple sessions on a grid, session subtasks share node resources with subtasks of other
concurrent sessions. Running a session on a grid requires coordination between processes running on different
nodes. For some mappings, running a session on a grid requires additional overhead to move data from one
node to another node. In addition to loading the memory and CPU resources on each node, running multiple
sessions on a grid adds to network traffic.
When you run a workflow on a grid, the Integration Service loads memory and CPU resources on nodes
without requiring coordination between the nodes.
RELATED TOPICS:
“Optimizing Grid Deployments” on page 41
Pushdown Optimization
To increase session performance, push transformation logic to the source or target database. Based on the
mapping and session configuration, the Integration Service executes SQL against the source or target database
instead of processing the transformation logic within the Integration Service.
Buffer Memory
When the Integration Service initializes a session, it allocates blocks of memory to hold source and target data.
The Integration Service allocates at least two blocks for each source and target partition. Sessions that use a
large number of sources and targets might require additional memory blocks. If the Integration Service cannot
allocate enough memory blocks to hold the data, it fails the session.
You can configure the amount of buffer memory, or you can configure the Integration Service to calculate
buffer settings at run time. For information, see the PowerCenter Advanced Workflow Guide.
To increase the number of available memory blocks, adjust the following session properties:
♦ DTM Buffer Size. Increase the DTM buffer size on the Properties tab in the session properties.
♦ Default Buffer Block Size. Decrease the buffer block size on the Config Object tab in the session properties.
Before you configure these settings, determine the number of memory blocks the Integration Service requires to
initialize the session. Then, based on default settings, calculate the buffer size and the buffer block size to create
the required number of session blocks.
If you have XML sources or targets in a mapping, use the number of groups in the XML source or target in the
calculation for the total number of sources and targets.
2. Based on default settings, you determine that you can change the DTM Buffer Size to 15,000,000, or you
can change the Default Buffer Block Size to 54,000:
(session Buffer Blocks) = (.9) * (DTM Buffer Size) / (Default Buffer Block Size) *
(number of partitions)
200 = .9 * 14222222 / 64000 * 1
or
200 = .9 * 12000000 / 54000 * 1
Note: For a session that contains n partitions, set the DTM Buffer Size to at least n times the value for the
session with one partition. The Log Manager writes a warning message in the session log if the number of
memory blocks is so small that it causes performance degradation. The Log Manager writes this warning
message even if the number of memory blocks is enough for the session to run successfully. The warning
message also gives a suggestion for the proper value.
If you modify the DTM Buffer Size, increase the property by multiples of the buffer block size.
Buffer Memory 35
3. Click the Ports tab.
4. Add the precision for all columns in the target.
5. If you have more than one target in the mapping, repeat steps 2 to 4 for each additional target to calculate
the precision for each target.
6. Repeat steps 2 to 5 for each source definition in the mapping.
7. Choose the largest precision of all the source and target precisions for the total precision in the buffer block
size calculation.
The total precision represents the total bytes needed to move the largest row of data. For example, if the total
precision equals 33,000, then the Integration Service requires 33,000 bytes in the buffers to move that row. If
the buffer block size is 64,000 bytes, the Integration Service can move only one row at a time.
Ideally, a buffer accommodates at least 100 rows at a time. So if the total precision is greater than 32,000,
increase the size of the buffers to improve performance.
To increase the buffer block size, open the session properties and click the Config Object tab. Edit the Default
Buffer Block Size property in the Advanced settings.
Increase the DTM buffer block setting in relation to the size of the rows. As with DTM buffer memory
allocation, increasing buffer block size should improve performance. If you do not see an increase, buffer block
size is not a factor in session performance.
Caches
The Integration Service uses the index and data caches for XML targets and Aggregator, Rank, Lookup, and
Joiner transformations. The Integration Service stores transformed data in the data cache before returning it to
the pipeline. It stores group information in the index cache. Also, the Integration Service uses a cache to store
data for Sorter transformations.
To configure the amount of cache memory, use the cache calculator or specify the cache size. You can also
configure the Integration Service to calculate cache memory settings at run time. For more information, see the
PowerCenter Advanced Workflow Guide.
If the allocated cache is not large enough to store the data, the Integration Service stores the data in a temporary
disk file, a cache file, as it processes the session data. Performance slows each time the Integration Service pages
to a temporary file. Examine the performance counters to determine how often the Integration Service pages to
a file.
Perform the following tasks to optimize caches:
♦ Limit the number of connected input/output and output only ports.
♦ Select the optimal cache directory location.
♦ Increase the cache sizes.
♦ Use the 64-bit version of PowerCenter to run large cache sessions.
Target-Based Commit
The commit interval setting determines the point at which the Integration Service commits data to the targets.
Each time the Integration Service commits, performance slows. Therefore, the smaller the commit interval, the
more often the Integration Service writes to the target database, and the slower the overall performance.
If you increase the commit interval, the number of times the Integration Service commits decreases and
performance improves.
When you increase the commit interval, consider the log file limits in the target database. If the commit interval
is too high, the Integration Service may fill the database log file and cause the session to fail.
Therefore, weigh the benefit of increasing the commit interval against the additional time you would spend
recovering a failed session.
Target-Based Commit 37
Click the General Options settings in the session properties to review and adjust the commit interval.
Real-time Processing
Flush Latency
Flush latency determines how often the Integration Service flushes real-time data from the source. The lower
you set the flush latency interval, the more frequently the Integration Service commits messages to the target.
Each time the Integration Service commits messages to the target, the session consumes more resources and
throughput drops.
Increase the flush latency to improve throughput. Throughput increases as you increase the flush latency up to
a certain threshold, depending on the hardware and available resources.
Source-Based Commit
Source-based commit interval determines how often the Integration Service commits real-time data to the
target. To obtain the fastest latency, set the source-based commit to 1.
Staging Areas
When you use a staging area, the Integration Service performs multiple passes on the data. When possible,
remove staging areas to improve performance. The Integration Service can read multiple sources with a single
pass, which can alleviate the need for staging areas.
RELATED TOPICS:
“Configuring Single-Pass Reading” on page 20
Log Files
A workflow runs faster when you do not configure it to write session and workflow log files. Workflows and
sessions always create binary logs. When you configure a session or workflow to write a log file, the Integration
Service writes logging events twice. You can access the binary logs session and workflow logs in the
Administration Console.
Error Tracing
To improve performance, reduce the number of log events generated by the Integration Service when it runs the
session. If a session contains a large number of transformation errors, and you do not need to correct them, set
the session tracing level to Terse. At this tracing level, the Integration Service does not write error messages or
row-level information for reject data.
If you need to debug the mapping and you set the tracing level to Verbose, you may experience significant
performance degradation when you run the session. Do not use Verbose tracing when you tune performance.
Post-Session Emails
When you attach the session log to a post-session email, enable flat file logging. If you enable flat file logging,
the Integration Service gets the session log file from disk. If you do not enable flat file logging, the Integration
Service gets the log events from the Log Manager and generates the session log file to attach to the email. When
the Integration Service retrieves the session log from the log service, workflow performance slows, especially
when the session log file is large and the log service runs on a different node than the master DTM. For optimal
performance, configure the session to write to log file when you configure post-session email to attach a session
log.
Post-Session Emails 39
40 Chapter 7: Optimizing Sessions
CHAPTER 8
Overview
When you run PowerCenter on a grid, you can configure the grid, sessions, and workflows to use resources
efficiently and maximize scalability.
To improve PowerCenter performance on a grid, complete the following tasks:
♦ Add nodes to the grid.
♦ Increase storage capacity and bandwidth.
♦ Use shared file systems.
♦ Use a high-throughput network when you complete the following tasks:
− Access sources and targets over the network.
− Transfer data between nodes of a grid when using the Session on Grid option.
Storing Files
When you configure PowerCenter to run on a grid, you specify the storage location for different types of session
files, such as source files, log files, and cache files. To improve performance, store files in optimal locations. For
example, store persistent cache files on a high-bandwidth shared file system. Different types of files have
different storage requirements.
You can store files in the following types of locations:
♦ Shared file systems. Store files on a shared file system to enable all Integration Service processes to access the
same files. You can store files on low-bandwidth and high-bandwidth shared file systems.
41
♦ Local. Store files on the local machine running the Integration Service process when the files do not have to
be accessed by other Integration Service processes.
Proper configuration and tuning can be critical for small grid performance. You can also configure mappings
and sessions to avoid the intrinsic limitations of shared file systems.
RELATED TOPICS:
“Distributing Files Across File Systems” on page 44
Examples
In the following excerpt of a raw parameter file, the placeholder “{fs}” represents the file system where the
directory is located and must be assigned by a script before being used:
[SessionFFSrc_FFTgt_CA]
$InputFile_driverInfo_CA={fs}/driverinfo_ca.dat
$SubDir_processed_CA={fs}
# Session has Output file directory set to:
# $PmTargetFileDir/$SubDir_processed_CA
# This file is the input of SessionFFSrc_DBTgt_CA.
$SubDir_RecordLkup_Cache_CA={fs}
# This session builds this persistent lookup cache to be used
# by SessionFFSrc_DBTgt_CA.
# The Lookup cache directory name in the session is set to:
# $PmCacheDir/$SubDir_RecordLkup_Cache_CA
[SessionFFSrc_FFTgt_NY]
$InputFile_driverInfo_NY={fs}/driverinfo_ny.dat
$SubDir_processed_NY={fs}
[SessionFFSrc_DBTgt_CA]
$SubDir_processed_CA={fs}
# session has Source file directory set to:
# $PmTargetFileDir/$SubDir_processed_CA
$SubDir_RecordLkup_Cache_CA={fs}
# Use the persistent lookup cache built in SessionFFSrc_FFTgt_CA.
In the following parameter file excerpt, a script has replaced the placeholder with the appropriate file system
names, such as file_system_1 and file_system_2:
[SessionFFSrc_FFTgt_CA]
$InputFile_driverInfo_CA=file_system_1/driverinfo_ca.dat
$SubDir_processed_CA=file_system_2
# Session has Output file directory set to:
# $PmTargetFileDir/$SubDir_processed_CA
# This file is the input of SessionFFSrc_DBTgt_CA.
$SubDir_RecordLkup_Cache_CA=file_system_1
# This session builds this persistent lookup cache to be used
# by SessionFFSrc_DBTgt_CA.
# The Lookup cache directory name in the session is set to:
# $PmCacheDir/$SubDir_RecordLkup_Cache_CA
[SessionFFSrc_FFTgt_NY]
$InputFile_driverInfo_NY=file_system_2/driverinfo_ny.dat
$SubDir_processed_NY=file_system_1
[SessionFFSrc_DBTgt_CA]
$SubDir_processed_CA=file_system_1
# session has Source file directory set to:
# $PmTargetFileDir/$SubDir_processed_CA
$SubDir_RecordLkup_Cache_CA=file_system_2
# Use the persistent lookup cache built in SessionFFSrc_FFTgt_CA.
Overview
You can optimize performance of the following PowerCenter components:
♦ PowerCenter repository
♦ Integration Service
If you run PowerCenter on multiple machines, run the Repository Service and Integration Service on different
machines. To load large amounts of data, run the Integration Service on the higher processing machine. Also,
run the Repository Service on the machine hosting the PowerCenter repository.
47
Location of the Repository Service Process and Repository
You can optimize the performance of a Repository Service that you configured without the high availability
option. To optimize performance, ensure that the Repository Service process runs on the same machine where
the repository database resides.
Overview
Often performance slows because the session relies on inefficient connections or an overloaded Integration
Service process system. System delays can also be caused by routers, switches, network protocols, and usage by
many users.
Slow disk access on source and target databases, source and target file systems, and nodes in the domain can
slow session performance. Have the system administrator evaluate the hard disks on the machines.
After you determine from the system monitoring tools that you have a system bottleneck, make the following
global changes to improve the performance of all sessions:
♦ Improve network speed. Slow network connections can slow session performance. Have the system
administrator determine if the network runs at an optimal speed. Decrease the number of network hops
between the Integration Service process and databases. For more information, see “Improving Network
Speed” on page 52.
♦ Use multiple CPUs. You can use multiple CPUs to run multiple sessions in parallel and run multiple
pipeline partitions in parallel. For more information, see “Using Multiple CPUs” on page 52.
♦ Reduce paging. When an operating system runs out of physical memory, it starts paging to disk to free
physical memory. Configure the physical memory for the Integration Service process machine to minimize
paging to disk. For more information, see “Reducing Paging” on page 52.
♦ Use processor binding. In a multi-processor UNIX environment, the Integration Service may use a large
amount of system resources. Use processor binding to control processor usage by the Integration Service
process. Also, if the source and target database are on the same machine, use processor binding to limit the
resources used by the database. For more information, see “Using Processor Binding” on page 52.
51
Improving Network Speed
The performance of the Integration Service is related to network connections. A local disk can move data 5 to
20 times faster than a network. Consider the following options to minimize network activity and to improve
Integration Service performance.
If you use flat file as a source or target in a session and the Integration Service runs on a single node, store the
files on the same machine as the Integration Service to improve performance. When you store flat files on a
machine other than the Integration Service, session performance becomes dependent on the performance of the
network connections. Moving the files onto the Integration Service process system and adding disk space might
improve performance.
If you use relational source or target databases, try to minimize the number of network hops between the source
and target databases and the Integration Service process. Moving the target database onto a server system might
improve Integration Service performance.
When you run sessions that contain multiple partitions, have the network administrator analyze the network
and make sure it has enough bandwidth to handle the data moving across the network from all partitions.
Reducing Paging
Paging occurs when the Integration Service process operating system runs out of memory for a particular
operation and uses the local disk for memory. You can free up more memory or increase physical memory to
reduce paging and the slow performance that results from paging. Monitor paging activity using system tools.
You might want to increase system memory in the following circumstances:
♦ You run a session that uses large cached lookups.
♦ You run a session with many partitions.
If you cannot free up memory, you might want to add memory to the system.
Overview
After you tune the application, databases, and system for maximum single-partition performance, you may find
that the system is under-utilized. At this point, you can configure the session to have two or more partitions.
You can use pipeline partitioning to improve session performance. Increasing the number of partitions or
partition points increases the number of threads. Therefore, increasing the number of partitions or partition
points also increases the load on the nodes in the Integration Service. If the Integration Service node or nodes
contain ample CPU bandwidth, processing rows of data in a session concurrently can increase session
performance.
Note: If you use a single-node Integration Service and you create a large number of partitions or partition points
in a session that processes large amounts of data, you can overload the system.
If you have the partitioning option, perform the following tasks to manually set up partitions:
♦ Increase the number of partitions.
♦ Select the best performing partition types at particular points in a pipeline.
♦ Use multiple CPUs.
55
Use the following tips when you add partitions to a session:
♦ Add one partition at a time. To best monitor performance, add one partition at a time, and note the session
settings before you add each partition.
♦ Set DTM Buffer Memory. When you increase the number of partitions, increase the DTM buffer size. If
the session contains n partitions, increase the DTM buffer size to at least n times the value for the session
with one partition. For more information about the DTM Buffer Memory, see “Buffer Memory” on
page 34.
♦ Set cached values for Sequence Generator. If a session has n partitions, you should not need to use the
“Number of Cached Values” property for the Sequence Generator transformation. If you set this value to a
value greater than 0, make sure it is at least n times the original value for the session with one partition.
♦ Partition the source data evenly. Configure each partition to extract the same number of rows.
♦ Monitor the system while running the session. If CPU cycles are available, you can add a partition to
improve performance. For example, you may have CPU cycles available if the system has 20 percent idle
time.
♦ Monitor the system after adding a partition. If the CPU utilization does not go up, the wait for I/O time
goes up, or the total data transformation rate goes down, then there is probably a hardware or software
bottleneck. If the wait for I/O time goes up by a significant amount, then check the system for hardware
bottlenecks. Otherwise, check the database configuration.
Performance Counters
This appendix includes the following topics:
♦ Overview, 59
♦ Errorrows Counter, 59
♦ Readfromdisk and Writetodisk Counters, 60
♦ Rowsinlookupcache Counter, 61
Overview
All transformations have counters. The Integration Service tracks the number of input rows, output rows, and
error rows for each transformation. Some transformations have performance counters. You can use the
following performance counters to increase session performance:
♦ Errorrows
♦ Readfromcache and Writetocache
♦ Readfromdisk and Writetodisk
♦ Rowsinlookupcache
Errorrows Counter
Transformation errors impact session performance. If a transformation has large numbers of error rows in any
of the Transformation_errorrows counters, you can eliminate the errors to improve performance.
RELATED TOPICS:
“Eliminating Transformation Errors” on page 32
59
Transformation_writetodisk counters to analyze how the Integration Service reads from or writes to disk. To
view the session performance details while the session runs, right-click the session in the Workflow Monitor and
choose Properties. Click the Properties tab in the details dialog box.
To analyze the disk access, first calculate the hit or miss ratio. The hit ratio indicates the number of read or
write operations the Integration Service performs on the cache.
The miss ratio indicates the number of read or write operations the Integration Service performs on the disk.
Use the following formula to calculate the cache miss ratio:
[(# of reads from disk) + (# of writes to disk)]/[(# of reads from memory cache) + (# of
writes to memory cache)]
To minimize reads and writes to disk, increase the cache size. The optimal cache hit ratio is 1.
RELATED TOPICS:
“Optimizing Multiple Lookups” on page 30
Rowsinlookupcache Counter 61
62 Appendix A: Performance Counters
INDEX
A repository metadata 49
sequence values 30
aggregate functions tuning 36
minimizing calls 22 cache directory
Aggregator transformation sharing 36
incremental aggregation 26 cache files
optimizing with filters 26 optimal storage 42
optimizing with group by ports 25 Char datatypes
optimizing with limited port connections 26 removing trailing blanks 22
optimizing with Sorted Input 26 checkpoint intervals
performance details 59, 60 increasing 12
tuning 25 clustered file systems
ASCII mode See also shared file systems
performance 49 high availability 42
commit interval
session performance 37
B converting
binding datatypes 21
processor 52 CPU
bottlenecks multiple for concurrent workflows 52
eliminating 3 multiple for pipeline partitioning 56
identifying 3 Custom transformation
mappings 6 minimizing function calls 26
on UNIX 8 processing blocks of data 26
on Windows 8 tuning 26
sessions 7
sources 5
system 7 D
targets 5 data cache
thread statistics 4 connected ports 36
buffer block size optimal location 36
optimal 35 optimal size 37
buffer length data flow
optimal setting 19 monitoring 59
buffer memory optimizing 59
allocating 34 data movement mode
buffering optimal 49
data 23 database drivers
bulk loading optimal for Integration Service 49
tuning relational targets 12 database query
busy time source bottlenecks, identifying 6
thread statistic 4 databases
checkpoint intervals 12
joins 27
C minimizing deadlocks 13
cache network packet size 13, 16
optimal location 36 optimizing sources for partitioning 57
optimal size 37 optimizing targets for partitioning 58
reduce cached rows 29 tuning Oracle targets 13
63
tuning single-sorted queries 57 source data 31
tuning sources 15 filters
datatypes sources 16
Char 22 flat file logging
optimizing conversions 21 post-session emails 39
Varchar 22 flat files
DB2 buffer length 19
PowerCenter repository performance 48 compared to XML files 20
deadlocks delimited source files 20
minimizing 13 optimal storage location 52
DECODE function optimizing sources 19
compared to Lookup function 22 flush latency
using for optimization 22 performance, increasing 38
delimited flat files function calls
sources 20 minimizing for Custom transformation 26
directories functions
shared caches 36 compared to operators 22
disk DECODE versus LOOKUP 22
access, minimizing 52
dropping
indexes and key constraints 11 G
DTM buffer
grid
optimal pool size 35
node bottleneck 43
optimal storage locations 41
E performance 33, 41
Sequence Generator performance, increasing 45
error tracing group by ports
See tracing levels optimizing Aggregator transformation 25
errors
minimizing tracing level 38
evaluating H
expressions 23
high availability
expressions
clustered file systems 42
evaluating 23
replacing with local variables 22
tuning 21
external loader
I
performance 12 IBM DB2
External Procedure transformation repository database schema, optimizing 48
blocking data 23 idle time
thread statistic 4
IIF expressions
F tuning 23
incremental aggregation
factoring
optimizing Aggregator transformation 26
common logic from mapping 22
index cache
FastExport
optimal location 36
for Teradata sources 16
optimal size 37
file sharing
indexes
cluster file systems 42
dropping 11
network file systems 42
for Lookup table 29
file storage
Integration Service
local 42
commit interval 37
shared file system 41
grid 33
types 41
optimal database drivers 49
file systems
tuning 49
cluster 42
IPC protocol
network 42
Oracle sources 16
shared, configuring 43
Filter transformation
source bottlenecks, identifying 6
filtering
data 21
64 Index
J Microsoft SQL Server
in-memory database 17
Joiner transformation repository database schema, optimizing 48
designating master source 27 minimizing
performance details 59, 60 aggregate function calls 22
sorted data 27
tuning 27
joins N
in database 27
network
improving speed 52
K tuning 52
network file systems
key constraints See shared file systems
dropping 11 network packets
increasing 13, 16
non-persistent cache
L optimal storage for files 42
local variables numeric operations
replacing expressions 22 compared to string operations 22
log files
optimal storage 42
lookup condition O
matching 28 object queries
optimizing 29 ordering conditions 48
LOOKUP function operations
compared to DECODE function 22 numeric versus string 22
minimizing for optimization 22 operators
Lookup SQL Override option compared to functions 22
reducing cache size 29 optimal file storage
Lookup transformation log files 42
optimizing 61 non-persistent cache files 42
optimizing lookup condition 29 parameter files 42
optimizing lookup condition matching 28 source files 42
optimizing multiple lookup expressions 30 target files 42
optimizing with cache reduction 29 temporary files 42
optimizing with caches 28 Oracle
optimizing with concurrent caches 28 external loader 12
optimizing with database drivers 27 IPC protocol 16
optimizing with high-memory machine 29 optimizing connections 16
optimizing with indexing 29 tuning targets 13
optimizing with ORDER BY statement 29 ORDER BY
tuning 27 optimizing for Lookup transformation 29
M P
mappings page size
bottlenecks, eliminating 7 minimum for optimizing repository database schema 48
bottlenecks, identifying 7 paging
factoring common logic 22 reducing 52
pass-through mapping, tuning 20 parameter files
single-pass reading 20 optimal storage 42
tuning 19 performance guidelines 44
memory partition types
64-bit PowerCenter 37 optimal 56
buffer 34 partitions
increasing 52 adding 55
Microsoft SQL Server databases 17 pass-through mapping
Sybase ASE databases 17 tuning 20
methods performance
filtering data 21 flush latency 38
real-time sessions 38
Index 65
repository database schema, optimizing 48
tuning, overview 1
S
performance counters select distinct
Rowsinlookupcache 61 filtering source data 31
Transformation_errorrows 59 Sequence Generator transformation
Transformation_readfromcache 59 grid performance 45
Transformation_readfromdisk 60 reusable 30
Transformation_writetocache 59 tuning 30
Transformation_writetodisk 60 sequential merge
types 59 optimal file storage 42
persistent cache session log files
for lookups 28 disabling 38
persistent cache files session on grid
configuration guidelines 43 Sequence Generator performance, increasing 45
optimal storage 42 sessions
pipeline partitioning bottlenecks, causes 7
adding partitions 55 bottlenecks, eliminating 7
multiple CPUs 56 bottlenecks, identifying 7
optimal partition types 56 concurrent 34
optimizing performance 55 grid 33
optimizing source databases 57 pushdown optimization 34
optimizing target databases 58 tuning 33
tuning source database 57 shared cache
pipelines for lookups 28
data flow monitoring 59 shared file systems
ports configuring 43
connected, limiting 36 CPU, balancing 43
post-session email high bandwidth 42
performance 39 low bandwidth 42
PowerCenter repository overview 42
optimal location 48 server load, distributing 43
performance on DB2 48 single-pass reading
tuning 47 definition 20
processor sorted input
binding 52 optimizing Aggregator transformation 26
pushdown optimization Sorter transformation
performance 34 optimizing partition directories 31
optimizing with memory allocation 30
tuning 30
Q source files
flat versus XML 20
queries
optimal storage 42
tuning relational sources 15
Source Qualifier transformation
tuning 31
R sources
bottlenecks, causes 5
Rank transformation bottlenecks, eliminating 6
performance details 59, 60 filters 16
read test mapping identifying bottlenecks 5
source bottlenecks, identifying 6 relational, tuning 15
real-time sessions tuning queries 15
performance, increasing 38 spaces
removing trailing, removing 22
trailing blank spaces 22 SQL transformation
repositories tuning 31
database schema, optimizing 48 staging areas
Repository Service removing 38
caching repository metadata 49 string operations
Repository Service process compared to numeric operations 22
optimal location 48 minimizing 22
run time Sybase ASE
thread statistic 4 in-memory database 17
66 Index
Sybase IQ
external loader 12
U
system UNIX
bottlenecks on UNIX, identifying 8 bottlenecks, eliminating 8
bottlenecks on Windows, identifying 8 processor binding 52
bottlenecks, causes 7 system bottlenecks 8
bottlenecks, eliminating 8
bottlenecks, identifying with Workflow Monitor 8
tuning 51 V
Varchar datatypes
tablespace
optimal type for DB2 48 W
target files Windows
optimal storage 42 bottlenecks 8
targets bottlenecks, eliminating 8
allocating buffer memory 34 workflow log files
bottlenecks, causes 5 disabling 38
bottlenecks, eliminating 5 workflows
identifying bottlenecks 5 concurrent 34
temporary files
optimal storage 42
Teradata FastExport
performance for sources 16
X
thread statistics XML file
bottlenecks, eliminating 4 compared to flat file 20
bottlenecks, identifying 4 XML sources
threads allocating buffer memory 34
bottlenecks, identifying 4
busy time 4
idle time 4
run time 4
thread work time 4
tracing levels
minimizing 38
transformation thread
thread worktime 4
transformations
eliminating errors 32
optimizing 59
tuning 25
tuning
Aggregator transformation 25
caches 36
Custom transformation 26
expressions 21
Integration Service 49
Joiner transformation 27
Lookup transformation 27
mappings 19
network 52
PowerCenter repository 47
relational sources 15
Sequence Generator transformation 30
sessions 33
Sorter transformation 30
Source Qualifier transformation 31
SQL transformation 31
system 51
transformations 25
Index 67
68 Index
NOTICES
This Informatica product (the “Software”) includes certain drivers (the “DataDirect Drivers”) from DataDirect Technologies, an operating company of Progress Software Corporation (“DataDirect”)
which are subject to the following terms and conditions:
1. THE DATADIRECT DRIVERS ARE PROVIDED “AS IS” WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING BUT NOT LIMITED TO,
THE IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT.
2. IN NO EVENT WILL DATADIRECT OR ITS THIRD PARTY SUPPLIERS BE LIABLE TO THE END-USER CUSTOMER FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
CONSEQUENTIAL OR OTHER DAMAGES ARISING OUT OF THE USE OF THE ODBC DRIVERS, WHETHER OR NOT INFORMED OF THE POSSIBILITIES OF DAMAGES IN
ADVANCE. THESE LIMITATIONS APPLY TO ALL CAUSES OF ACTION, INCLUDING, WITHOUT LIMITATION, BREACH OF CONTRACT, BREACH OF WARRANTY,
NEGLIGENCE, STRICT LIABILITY, MISREPRESENTATION AND OTHER TORTS.