0% found this document useful (0 votes)
39 views

Performance Optimization Techniques

Performance Optimization Techniques

Uploaded by

AmarnathMaiti
Copyright
© © All Rights Reserved
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views

Performance Optimization Techniques

Performance Optimization Techniques

Uploaded by

AmarnathMaiti
Copyright
© © All Rights Reserved
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 4

Informatica Performance Optimization Techniques.

Step 1. Identifying the Bottlenecks


Performance bottlenecks can occur in the source and target, the mapping, the
session, and the system.
Identifying and eliminating bottleneck is an iterative method that should be
continued until an acceptable throughput is achieved.

When a PowerCenter session is triggered, Integration Service starts Data


Transformation Manager (DTM), which is responsible for starting the reader thread,
transformation thread and writer thread.Thread Statistics in the session log give
run time information from all three threads: reader,transformation and writer.
The session log provides enough run time thread statistics to help us to understand
and pinpoint the performance bottleneck.

� Run Time: Amount of time the thread runs.


� Idle Time: Amount of time the thread is idle. Includes the time the thread waits
for other thread processing.
� Busy Time: Percentage of the run time. It is (run time - idle time) / run time x
100.
� Thread Work Time: The percentage of time taken to process each transformation in
a thread.

The thread with the highest busy percentage is the bottleneck.

All transformations have counters to help measure and improve their performance.
Analyzing these performance details can help to identify session bottlenecks. The
Integration Service tracks the number of input rows, output rows, and error rows
for each transformation.
One can set up the session to gather performance counters in the Workflow Manager.

To increase session performance, one can use the following performance counters:

Readfromcache and Writetocache Counters & Readfromdisk and Writetodisk Counters:


If a session contains Aggregator, Rank, or Joiner transformations, examine the
Transformation_readfromcache and Transformation_writetocache counters along with
the Transformation_readfromdisk and Transformation_writetodisk counters to analyze
how the Integration Service reads from or writes to disk. If these counters display
any number other than zero, you can increase the cache sizes to improve session
performance.

To view the performance details file:


1. Locate the performance details file. The Informatica Server names the file
session_name.perf, and stores it in the same directory as the session log.
2. Open the file in any text editor.You can view the Integration Service properties
in the Workflow Monitor to see CPU, memory,
and swap system usage when you are running task processes on the Integration
Service. Use the following Integration Service properties to identify performance
issues:

CPU%:The percentage of CPU usage includes other external tasks running on the
system. A high CPU usage
indicates the need for additional processing power required by the server.

Memory Usage:The percentage of memory usage includes other external tasks running
on the system. If the memory usage is close to 95%, check if the tasks running on
the system are using the amount indicated in the Workflow Monitor, or if there is a
memory leak. To troubleshoot, use system tools to check the memory usage before and
after running the session and then compare the results to the memory usage while
running the session.

Swap Usage:Swap usage is a result of paging due to possible memory leaks or a high
number of concurrent tasks.
-----------------------------------------------------------------------------------
--------------------------------------

Step 2. Resolve different performance bottlenecks

1. Optimizing source and targets


Optimizing Sources and Targets
To achieve good performance, we first need to ensure that there are no bottlenecks
at source
or target; to ensure that we are utilizing the database to an optimum level,
consider the
following points:
Load Balancing:
Perform reading/writing/sorting/grouping/filtering data in the database. Use
Informatica
for the more complex logic, outside joins, data integration, multiple source feeds,
etc... The
balancing act is difficult without DBA knowledge. In order to reach a balance, you
must be able
to recognize which operations are best in the database, and which are best in
Informatica. This
does not degrade the use of the ETL tool, but enhances it - it�s a must if you are
performance
tuning for high-volume throughput.
Estimating size of data:
Don�t be afraid to estimate: small, medium, large, and extra-large source data set
sizes (in
terms of: numbers of rows, average number of bytes per row), expected throughput
for each,
and turnaround time for load. Give this information to your DBAs and ask them to
tune the
database for �worst case�.
Help them to assess which tables are expected to be high read/high write, which
operations
will sort, order, etc... Moving disks or assigning the right table to the right
disk space could
make all the difference. Utilize a PERL script to generate �fake� data for small,
medium, large,
and extra-large data sets. Run each of these through your mappings - in this way,
the DBA can
watch or monitor throughput as a real load size occurs.
Optimizing query:
If a session joins multiple source tables in one Source Qualifier, you might be
able to
improve performance by optimizing the query with optimizing hints. The database
optimizer
usually determines the most efficient way to process the source data. However, you
might
know properties of the source tables that the database optimizer does not. The
database
administrator can create optimizer hints to tell the database how to execute the
query for a
particular set of source tables.
Using Bulk Loads:
Use Bulk Load session property that inserts a large amount of data into a DB2,
Sybase ASE, Oracle,
or Microsoft SQL Server database, bypassing the database log which speeds up
performance.
Using External Loaders :
To increase session performance, configure PowerCenter to use an external loader
utility.
External loaders can be used for Oracle, DB2, Sybase and Teradata.
8
Dropping Indexes and Key Constraints:
When you define key constraints or indexes in target tables, you slow down the
loading of data
to those tables. To improve performance, drop indexes and key constraints before
you run the
session. You can rebuild those indexes and key constraints after the session
completes in the
Post SQL of the session.
Increasing Database Network Packet Size:
If you read from Oracle, Sybase ASE, or Microsoft SQL Server sources, you can
improve
performance by increasing the network packet size. Increase the network packet size
to allow
larger packets of data to cross the network at the same time.
Localization:
Localize stored procedures, functions, views and sequences in the source database
and avoid
synonyms. This could potentially affect the performance as much as 3 times or more.
If you are
reading from a flat file, make sure the file is copied to Informatica Server before
reading data
from the file.
Avoiding DB Level Sequences:
Avoid any database based sequence generators; if you absolutely must, have a shared
sequence generator, then build a staging table from the flat file, add a SEQUENCE
ID column,
and call a POST TARGET LOAD stored procedure to populate that column. Place the
post
target load procedure into the flat file to staging table load mapping. A single
call to inside the
database, followed by a batch operation to assign sequences, is the fastest method
for utilizing
shared sequence generators.
Minimizing Deadlocks:
Encountering deadlocks can slow session performance. You can increase the number of
target
connection groups in a session to avoid deadlocks. To use a different target
connection group
for each target in a session, use a different database connection name for each
target instance.
Increasing Database Checkpoint Intervals:
The Integration Service performance slows down each time it waits for the database
to perform
a checkpoint. To decrease the number of checkpoints and increase performance,
increase the
checkpoint interval in the database.
2 Optimizing Buffer Block Size
When the Integration Service initializes a session, it allocates blocks of memory
to hold source
and target data. Sessions that use a large number of sources and targets might
require additional
memory blocks.
Not having enough buffer memory for DTM processes can slow down reading,
transforming or
writing, and cause large fluctuations in performance; adding extra memory blocks
can keep the
threads busy and improve session performance. You can do this by adjusting the
buffer block size
and DTM Buffer size.
DTM Buffer is the temporary storage area for data, divided into blocks. Buffer size
and block size
are tunable session properties; the default value for both of these session
parameters is Auto.
9
To identify the optimal buffer block size, sum up the precision of individual
source and targets
columns. The largest precision among them should be the buffer block size for one
row. Ideally,
a buffer block should accommodate at least 100 rows at a time.
Buffer Block Size = Largest Row Precision * 100
You can change the buffer block size in the session configuration as shown below:

When you increase the DTM Buffer memory, the Integration Service creates more
buffer blocks,
which improves performance. You can identify the required DTM Buffer size based on
this
calculation:
Session Buffer Blocks = (total number of sources + total number of targets) * 2
DTM Buffer size = Session Buffer Blocks * Buffer Block size / 0.9
You can change the DTM Buffer size in the session configuration as shown below

You might also like