Performance Tuning Techniques For Handling High Volume of Data in Informatica
Performance Tuning Techniques For Handling High Volume of Data in Informatica
www.hexaware.com
Whitepaper
Performance Tuning Techniques for Handling High Volume of Data in Informatica
Table of Contents
03
04
05
05
06
08
13
15
15
1. Overview
2. Scenario
3. Architectural Overview
4. Steps involved in Performance Tuning
4.1 Database Level Tuning
4.2 Informatica Level Tuning
4.3 Balancing the Constraints
5. Iterative Testing
5.1 Test Results
www.hexaware.com
Whitepaper
Performance Tuning Techniques for Handling High Volume of Data in Informatica
1. Overview
Data is - and has been from the beginning - created, stored and retrieved by disparate, incompatible systems. Between 30% and 35% of all
the data in the industry is still on mainframes, in languages and data structures that are archaic and generally unavailable. The wave of
specialty applications like HR, Sales, Accounting, ERP, Manufacturing etc, have all contributed their share to the chaos. The latest (and
perhaps most disruptive) development is the growth in outsourcing, in all kinds and flavors, including business processes, IT management
and geographic, adding even more complexity to the mix.
When it comes to data integration, there are three perspectives that matter. First, business users need an accurate and holistic view of all
their information. Second, IT management needs to do more with less even though the data volumes are increasing dramatically and finally,
IT developers need to reduce time to results.
Informatica provides the markets leading data integration platform. ETL Mappings are designed for data loading into the data warehouse
environment for better reporting purposes which in turn helps understand the business trends in a much better way. The major problem faced
by anyone working with Informatica ETL is to design a mapping(s) which does not compromise on its performance. But most of the time we
end up creating a mapping which achieves only the functionality but suffers in terms of performance.
This paper will walk you through the process of achieving good performance improvement in designing new Informatica mapping and also
fine tune the performance of existing Informatica ETL loads. After going through this paper you will have fair knowledge on the performance
break points that are to be considered while designing the mapping or fine tuning the existing ones.
In general, Informatica works in synchronization with its sources. In order to achieve good performance, tuning has to be carried out at two
levels namely
DB Level Tuning
To achieve good performance, we first need to ensure that all the bottlenecks on the DB side are removed by which we can ensure that the
sources are in sync and hence full utilization of the source system is achieved. The below table (Table1) gives you the details of how to
overcome the DB side bottlenecks and the results achieved.
Table 1
S.No
How?
Why?
Results
How?
Removal of unwanted fields
Why?
The Source Qualifier Query should
have a select statement with only the
fields that are required and which get
loaded into the Target table.
Results
This will bring down the running time
of the SQ Queries and this means
the data will be fetched quickly from
the database
www.hexaware.com
Whitepaper
Performance Tuning Techniques for Handling High Volume of Data in Informatica
Table 1
S.No
How?
Why?
Results
Also we need to establish a balance between these two levels by way of shifting the constraints according to the need and adopt a right
method for testing the performance. Here we adopt the Rapid Evaluation and Iterative Testing (RITE) method for testing the performance
improvement.
2. Scenario
This whitepaper will take an example of an application namely Service Contracts. This application had huge performance issues having its
load running for over 48 hrs and still not completing. The idea was to improve the performance of the load by fine tuning the mappings and
the SQ queries involved in the load and there by removing any bottlenecks present to ensure a smooth completion of the Load in an optimal
time.
This will be achieved by trying various combinations of DB and Transformation changes made to the mapping and by iterative testing.The
performance tuning methods mentioned below can be applied to virtually any kind of application which suffers performance issues.
www.hexaware.com
Whitepaper
Performance Tuning Techniques for Handling High Volume of Data in Informatica
3. Architectural Overview
The architecture of the Load process is shown below
www.hexaware.com
Whitepaper
Performance Tuning Techniques for Handling High Volume of Data in Informatica
Internally, the database engine, like Oracle, uses a map function to converts the bit location to the distinct value. Many bitmap indexes can
be used together since the database can merge it, so this can improve the response time.
When to use it?
Low cardinality
The BITMAP index works best when there is a low cardinality of data.
If the number of distinct values of a column is less than 1% of the number of rows in the table or if the values in a column are repeated more
than 100 times, then the column is a candidate for a bitmap index.
B-tree indexes are most effective for high-cardinality data: i.e., data with many possible values, such as CUSTOMER_NAME or
PHONE_NUMBER.
No or little insert/update
Updating bitmap indexes take a lot of resources. Building and maintaining an index structure can be expensive and can consume resources
such as disk space, CPU, and I/O capacity. Designers must ensure that the benefits of any index outweigh the negatives of index maintenance.
Use this simple estimation guide for the cost of index maintenance: each index maintained by an INSERT, DELETE, or UPDATE of the
indexed keys requires about three times as much resource as the actual DML operation on the table. What this means is that if you INSERT
into a table with three indexes, then it will be approximately 10 times slower than an INSERT into a table with no indexes. For DML and
particularly for INSERT-heavy applications, the index design should be seriously reviewed, which might require a compromise between the
query and INSERT performance.
In the current scenario there is a BITMAP index on the DW_PROC_STATUS column in one of the tables. This field either stores G or B
only and this value is updated based on the validity of the data. Although the cardinality of data was less with respect to the huge volume of
data, the Insert/Update on the column was by extent very large and this brings down the performance drastically. The BITMAP index best
work for retrieving the data from DB than for updating the field.
To over come this, the Index is dropped from DW_PROC_STATUS column and normal index is applied on all Key columns which significantly
improved the performance of the ETL Load.
www.hexaware.com
Whitepaper
Performance Tuning Techniques for Handling High Volume of Data in Informatica
www.hexaware.com
Whitepaper
Performance Tuning Techniques for Handling High Volume of Data in Informatica
In the current scenario, the entire SQ Queries execution plan has been checked and they are modified to improve on its performance.
Figure 4 - Target
The solution to improve on performance in this scenario was that, all the unwanted fields that were not required to be fetched and those that
were not getting loaded into the Target table were removed from the SQ Query and hence the running time of the query was subsequently
reduced and hence the bottleneck on its performance was removed. (Refer Figure 5, 6)
Figure 6
www.hexaware.com
Whitepaper
Performance Tuning Techniques for Handling High Volume of Data in Informatica
This query will run a full table scan on both the emp and dept tables. Even if there was an index on the deptno column of emp table, the subquery would not be using the index hence, performance would suffer.
We can rewrite the query as:
This query uses an existing deptno index on the emp table, making the query much faster. Thus, wherever possible use EXISTS in place of
IN clause in a query.
In the current scenario, all the SQ Queries which had poor performance because of the use of WHERE IN clause (refer figure 7) in the
query has been modified to use WHERE EXISTS clause (refer figure 8). Below mentioned is one of the scenarios in which we modified
the query.
Figure 7
Figure 8
www.hexaware.com
Whitepaper
Performance Tuning Techniques for Handling High Volume of Data in Informatica
Figure 10
10
www.hexaware.com
Whitepaper
Performance Tuning Techniques for Handling High Volume of Data in Informatica
Figure 11
These transformations were added wherever necessary without disturbing the existing functionality and the SQ query in the corresponding
pipelines are modified accordingly. (Refer figure 12, 13, 14 and 15).
Figure 12
11
www.hexaware.com
Whitepaper
Performance Tuning Techniques for Handling High Volume of Data in Informatica
Figure 13
Figure 14
12
www.hexaware.com
Whitepaper
Performance Tuning Techniques for Handling High Volume of Data in Informatica
Figure 15
From the above figure you can note that Lookup conditions that are having the key column for comparisons have an index on them.
13
www.hexaware.com
Whitepaper
Performance Tuning Techniques for Handling High Volume of Data in Informatica
Figure 17
To overcome this bottleneck, it better to avoid usage of SORTER and this constraint is shifted to be handled at the database level. The
SORTER transformation is removed and an ORDER BY clause is used at the SQ Query itself and hence the sorting of data happens at the
source level itself. This improved the overall performance of data loading into the Target tables. (Refer figure 18 and 19).
Figure 18
14
www.hexaware.com
Whitepaper
Performance Tuning Techniques for Handling High Volume of Data in Informatica
Figure 19
5. Iterative Testing
Rapid Evaluation and Iterative Testing (RITE) is used as the testing method here to see if the performance has been improved or not. Initially
the load is tested with minor changes and checked if there is any significant change in its performance. By this method we can revert and
make the next set of changes and test the load again quicker than any other testing methods. (Refer Figure 20).
Figure 20
This significant improvement in performance has been achieved by following the above said methods for performance tuning. These
methods can be followed for any mapping designs.
15
www.hexaware.com
Whitepaper
Performance Tuning Techniques for Handling High Volume of Data in Informatica
Address
1095 Cranbury South River Road, Suite 10, Jamesburg, NJ 08831. Main: 609-409-6950 | Fax: 609-409-6910
Safe Harbor
Certain statements on this whitepaper concerning our future growth prospects are forward-looking statements, which involve a number of risks, and uncertainties that could cause
actual results to differ materially from those in such forward-looking statements. The risks and uncertainties relating to these statements include, but are not limited to, risks and
uncertainties regarding fluctuations in earnings, our ability to manage growth, intense competition in IT services including those factors which may affect our cost advantage, wage
increases in India, our ability to attract and retain highly skilled professionals, time and cost overruns on fixed-price, fixed-time frame contracts, client concentration, restrictions on
immigration, our ability to manage our international operations, reduced demand for technology in our key focus areas, disruptions in telecommunication networks, our ability to
successfully complete and integrate potential acquisitions, liability for damages on our service contracts, the success of the companies in which Hexaware has made strategic
investments, withdrawal of governmental fiscal incentives, political instability, legal restrictions on raising capital or acquiring companies outside India, and unauthorized use of our
intellectual property and general economic conditions affecting our industry.
www.hexaware.com