0% found this document useful (1 vote)
610 views

Informatica

This document provides an introduction and overview of Informatica PowerCenter 9.x. It describes Informatica as an ETL tool used to extract, transform, and load data from source systems into target systems like data warehouses. It explains the key components of Informatica PowerCenter including the client, server, repository, repository server, designer tools, and transformations. It also summarizes some of the new features introduced in versions 7.x, 8.x, and 9.x.

Uploaded by

alphacom4
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (1 vote)
610 views

Informatica

This document provides an introduction and overview of Informatica PowerCenter 9.x. It describes Informatica as an ETL tool used to extract, transform, and load data from source systems into target systems like data warehouses. It explains the key components of Informatica PowerCenter including the client, server, repository, repository server, designer tools, and transformations. It also summarizes some of the new features introduced in versions 7.x, 8.x, and 9.x.

Uploaded by

alphacom4
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 32

INTRODUCTION TO INFORMATICA 9.

Informatica powercenter is an ETL tool , used to extract data from one

or more source systems, transform the data by applying business

rules in the form transformations to make consistency from

inconsistency available in the source systems and finally load

required data into target system (Data Warehousing Engine).


RDBMS
Oracle
SQL Server
DB2
MS-Access, etc
Informatica Data
[ETL logic] Warehousing

Non-RDBMS
Flat-files Data Processing
XML files
COBOL Extraction Loading
CSV files
Target System

Source Systems
Dataflow Diagram
There are ways to use Informatica power center,
1.Data migration
2.Data integration
3.Data conversion

1.Data migration : migrate data from source to target without modifications; here
target is a staging database always maintains most recent data. Use bulk load for
staging database.

2. Data integration : If required information available in multiple source systems,


integrate it and pass that data to single target system; here target a working
database(Data Mart or ODS).

3. Data conversion : convert input data into different formats of target requirements
like
a. string to number
b. string to date
c. detail to summarize, etc.
Note : here data integration is also called as data merging and data conversion
called as data cleansing and/ or data scrubbing.
Informatica is an Integrated tool and it is integrated with following tools.
1. Informatica power center client
2. Informatica power center server
3. Informatica power center repository
4. Informatica power center repository server
Integration Services take care of to integrate all the tools during run time.
1. Informatica power center client: It is used design required ETL logic using various
objects, Source Definitions, Target Definitions and Transformations called as mapping .
It is the combination of following tools to make ETL logic,
1. Power center designer
2. Power center workflow manager
3. Power center workflow monitor
4. Power center repository manager
5. Repository server administrative console( in 8.x it is a web based application tool)

Source Definition Transformation Target Definition


Stage Stage Stage
Informatica Power Center Server: It is used to Execute ETL logic to physically extract data
from different source systems and transform the data by using various transformations and load data
to target system.

Source systems Power center


Target system
Server

Power center
Repository
server

Power center
repository

Informatica server architecture


Informatica Power Center Repository: It is a relational database and it used to
exclusively maintain ETL meta data contents like source definitions, target definitions, transformations,
mapplets, mappings, etc.. It is persistent in nature. It is created and administered by power center
administrative console.

Informatica power center repository server: It is a mediator between power


center repository and power center server. It is used to access the repository to find required
ETL meta data contents for particular ETL logic and provides services to the Informatica
server during execution.
Informatica Power Center Designer: It used to create ETL meta data contents
using various following tools:
1. Source Analyzer
2. Target Designer
3. Transformation Developer
4. Mapplet Designer
5. Mapping Designer
Informatica Power center Workflow Manager: It is used create set of
workflows for different ETL logics, validate them and manage them.
It contains following tools:
1. Task Developer
2. Worklett Designer
3. Workflow Designer

Informatica Power center Workflow Monitor: It is used to display session


instance information (starting and ending time of the execution) and status information
(succeeded or failed)
Power center repository manager: It is used to manage the ETL logics available in
the multiple repositories created by the Administrator.

Repository manager

RP_sales
Folder1,
folder2., etc

RP_shares
Folder1,
folder2., etc
RP_finance
Folder1,
folder2., etc
Power center administrative console:
It is used to create multiple repositories and administers those repositories.
These repositories managed by repository manager.
As a power center administrator create new repositories and delete existing
repositories, and modifying existing repositories.
As an administrator grant permissions to the users and revoke permissions
from the user.
As an administrator promote local repository as global repository, but cant
devote global to local.
Source analyzer: It is used import required source tables from available source
systems like either an RDBMS or non-RDBMS. It is used to analyze structure of the table
and their relations.

Target designer: It is used to import required target tables from the target
system or create itself in the target designer.

Transformation developer: It is used to design required transformations


according to business needs are called as reusable transformations.

Mapplet designer: It is used to construct required transformations


logic is called as reusable object.

Mapping designer: required ETL logic using source definitions,


transformations, target definitions called as mapping. Transformations constructed in
the mapping designer itself called as non- reusable transformations.
Difference between power center 6.x, 7.x, 8.x,
9.x

Power center 6.x Power center 7.x Power center 8.x


It allows different users at a It is also allows different It is also allows different
time to read, write and users at a time to read, write users at a time to read, write
execute an ETL logic at a and execute an ETL logic at a and execute an ETL logic at a
time. time. time.
Advanced transformations New transformations Some advanced
are not available in 6th introduced in 7th version like transformations added in 8th
version like flat file lookup, flat file lookup, union, update version like Java, SQL, HTTP,
union, update strategy, and strategy, and transactional SAP R/3, etc.
transactional control, etc. control, but some advanced
transformations added in 8th
version. Memory allocation can
Memory allocation can automatically done by the
Memory allocation can
manually done by the server for buffer and cache
manually done by the
developer for buffer and memory
developer for buffer and
cache memory
cache memory.
Transformations available in Informatica power center

1. Source qualifier 12. Sequence generator


2. Joiner 13. Normalizor
3. Lookup 14. Custom
a. connect lookup 15. Mapplet input
b. unconnected lookup 16. Mapplet output
4. Expression 17. Rank
5. Aggregator 18. Sorter
6. Union 19. Java
7. Filter 20. SQL
8. Router 21. HTTP
9. Update Strategy 22. SAP R/3
10. Transactional Control 23. Data Masking
11. Stored Procedure
24. Data Unstructered
a. connected
b. unconnected
Transformations are two types:
1. Active Transformations: Active transformations have self capability to change level of
number of input records into different level of output records. That means input records
are not equal to output records.
2. Passive Transformations: Passive transformations don’t have self capability to
change level of number of input records into different level of output records. That
means input records are equal to output records.

1. Source qualifier: It is a dependent transformation and created automatically with source


definitions. It is an active transformation used to extract data from source tables. It is also
define user defined join condition if data coming from same RDBMS with same username and
password. To improve the performance generate custom SQL query. SQL override mechanism
possible for query optimization itself in the source qualifier .

Note: if source data available in the form of Non-RDBMS then not possible generate SQL queries in
the source qualifier.

2. Expression Transformation: It is used to define arithmetic operations very detail level. It


is very useful for data cleansing and scrubbing in order to convert data into required format. It
is a passive transformation, that means input records are equal to output records.
3. Aggregator: It is used to convert input detail data into summarized format by using
various aggregate functions like SUM, AVG, MIN, MAX, COUNT. It is an active transformation,
that means number of input records are not equal to the number of out records.
There are three mechanisms can be done for unsorted aggregator, a) sorting, b) classification
to divide data into groups and c) aggregate data using various aggregate functions based on
group by ports.

Definition for port: Port is a connector used to connect with either upstream of the pipeline
or downstream of the pipeline.
1. Input port: it is used to connect with upstream of the pipeline and not available for
upstream of the pipeline.
2. Output port: it is used connect with downstream of the pipeline and created itself in the
transformation with reference of input or variable port.
3. Variable port: it is created in the transformation itself not available for input as well as
output pipelines. It holds data from input port and pass that value to output port.
4. Joiner Transformation: It allows to define join condition to merge data
coming from two homogeneous or heterogeneous source systems. It is an
independent transformation used in any level of the ETL logic. It is an active
transformation, if any record not meet join condition then informatica server rejects
that record itself in the joiner.
It supports four join types:
1. Normal join (default): selects only matching records.
2. Master outer join: selects matching records from both tables and only matching
records from detail table.
3. Detail outer join: selects matching records from both tables and only matching
records from master table.
4. Full outer join: selects both matching and non-matching records from both tables.
Note: It allows to define only equi join. It doesn’t allow more than two tables to define
complex join.
5. Lookup transformation: It internally contains lookup table defined either on source
or target otherwise directly define from external data source. It is mainly used for reference at
middle of the transformation stage using lookup condition.
It is a passive transformation, if lookup condition satisfied it passes not null values to the
downstream of the pipeline, otherwise it passes null values to the downstream of the pipeline.
Lookup transformation divided into two types:
1. Connected lookup: It is participated in pipeline and working as a procedure and returns more
than one port values at a time.
2. Unconnected lookup: It is not participated in pipeline and working as a function means it
returns only one port values at a time.
Note: connected lookup transformation used in only one location of the pipeline at a time. But
unconnected lookup transformations used in multiple pipelines at a time with in a single
mapping.

Lookup transformation used in ways:


1. Static lookup transformation: It is used to support both RDBMS and non-RDBMS( flat file)
to define lookup table. It is used as both connected and unconnected lookup
transformation.
2. Dynamic lookup transformation: It is used to support only RDBMS and used as only
connected lookup transformation.
6. Sorter Transformation: It is used to sort incoming data either in ascending
order or descending order. It is an active transformation can change order of the records. It
can also be delete duplicate records using select distinct property.

7.Rank transformation: It assigns the rank on particular value with respect group
by port. Default it selects highest rank. It can also select either top or bottom ranks based
on group by column. It is an active transformation to select specific required ranks. Rank
transformation itself contains rank index port generates rank values on selected port as
rank port.

8. Transactional Control Transformation: It is used to control input


transactional data in order to manage buffer memory during loading using various
commands like TC_COMMIT_BEFORE, TC_COMMIT_AFTER, TC_ROLLBACK_BEFORE,
TC_ROLL_AFTER. It is an active transformation and it can be arrange data in bunches
and commit that data to the target in order.

9. Union Transformation: It is used to merge same type of data coming


from multiple source systems (by default Union All). It is an active transformation and it
can be used to combined data from multiple pipelines and pass that data to the single
pipeline.
10. Update Strategy Transformation: It is used to identifying incoming
data by informatica server as update or insert, delete or reject in target table
individually. It is an active transformation and reject or select incoming data depending
on update strategy expression and forward rejected records to reject file. It is introduced
in 7.x version.

11. Filter Transformation: Every time conditional based data is needed for target
requirements to select required data reject unwanted data. Filter transformation is used to define
filter condition to select required data based on condition.
Example: deptno=10 and sal>=2000
It allows only one condition at a time.

12. Router Transformation: It is used to define multiple filter conditions to split


incoming data and pass to multiple target tables through multiple pipelines.
Condition 1. deptno=10 and sal>2000
Condition 2. deptno=20 and sal>2500
Condition 3. deptno=30 and sal>2000
Mechanism: If first condition satisfied then that data passed to first target table. Otherwise
passed to next condition one by one. If all conditions failed then data passed to default group.
13. Stored Procedure Transformation:
It is created with procedures or functions created in the database. It maintains pre-
compiled meta data and informatica server initiates the stored procedure then
database sever execute the procedure and pass output data to informatica
server.
It is used in two ways:
1. Connected
2. Unconnected
Connected stored procedure transformations are used in only one pipe line where
as unconnected stored procedure transformations are used in multiple pipelines
with in single mapping designer.
It is a passive transformation and it generates same level data when informatica
server gives input signal to the stored procedure transformation.
14. Sequence Generator Transformation: It used to generate Sequence of values at
unique level as primary key for target tables. It has two predefined output ports nextval, currval.
Connect nextval first to any target table and connect currval to some other target table. Define start
value as current value (default 1) with required value, define increment by value. Enable reset port to
start sequence from first value.
Target 1
Empno
Sequence generator

Nextval
currval Target 2
Empno
Mapping Parameters and Variables: These are useful to extract specific required
from input source.
Parameter: Parameter assign with a value that cannot be changed during execution (initial value
equal to final value).
Example: month=$$month (initial value $$month=1)
Variable: Variable assign with a value that can be changed during execution till reach max value
that means initial value is not equal to final value.
Example: day=$$day (initial value $$day=1 and it is incremented by with specified value till reach
max value)
Mapping parameters and variables are used for incremental loading of fact table.

Target load plan: If single mapping constructed with multiple pipelines individually then
informatica server load data into any target table in any order. Then as a designer we can
specify the order to load data into target tables as we need.

source1 transtage target1

source2 transtage target2


If we want to load data in specified
order target2, target4,target1,target3
source3 transtage target3 than use target load plan.

source4 transtage target4


Target load order group: If same type of target tables connected with in single
pipeline load with same data, then informatica server load data into any target table in any
order. Then as a designer, define primary key to foreign key relation between multiple tables
then informatica server load data into target table that define with primary key and with the
reference of that primary key informatica server load data into some other target table if defined
as foreign key. This loading is called constraint (condition) based loading .

Primary key
target1

Primary key/
source transtage Foreign key
target2

Foreign key
target3
Target load types:
1. Bulk load: It is useful to load bulk amount of data into target, if target
table not defined with primary key or index. Bulk is used for full loading
of staging database without any modifications.
2. Normal load: It is useful to load cleansed data into target table, if target
table defined with primary key or index. Normal is used for incremental
loading of working database.

source Data migration target

Bulk load

source Data conversion target

Normal load
Process Memories:
1. Buffer Memory
2. Cache memory
Buffer memory is a temporary memory created during execution. The default size is 12 MB. It is
created by DTM manager and divided into buffer blocks. Each block size is 64KB (default).
Informatica server extracts data from source and stored that data in buffer memory blocks
and transfer that data to target through buffer blocks. It is an user defined memory
increase or decrease depending on input volume of data.
Cache memory is a high speed temporary memory allocated by DTM manager and created by
informatica server at transformation stage in order to improve performance.
Types of cache memories:
1. Index cache (default size 1 MB)
2. Data cache (default size 2 MB)
3. In-memory cache (default size 8 MB)
4. Shared cache
5. Scheduled cache
Index cache maintains key column values where as data cache maintains output data related to
key column.
Informatica server creates cache memory on various transformations like

1. Joiner
2. Aggregator
3. Lookup
4. Rank
5. Sorter
1. Joiner: Informatica server creates two types of cache memories in joiner, a) joiner index cache,
b) joiner data cache. Joiner index cache maintains join condition values of master source where
as data cache maintains data of master source related to join condition.

2. Aggregator: It contains two cache memories, a) index cache, b) Data cache. Aggregator
index cache maintains group by port values where as data cache maintains output aggregated
values.

3. Lookup: It contains two cache memories, a) index cache, b) Data cache. Lookup index cache
maintains lookup condition values where as data cache maintains lookup values related to
lookup condition.

4. Rank: It contains two cache memories, a) index cache, b) Data cache. Rank index cache
maintains group by port values where as data cache maintains rank port values.

5. Sorter: It contains in-memory cache to maintains sorted data with respect to key column either
in ascending order or descending order.
Note : in informatica power center 8.x all process memories are
related to mappings are automatic in nature. These are
automatically increased by informatica power center server during
run time based on input data.
Load manager (LM): It is used to create various cache memories like shared cache
and scheduled cache to load session instance information into shared memory and
scheduling information into scheduled memory and initiates the DTM manager.

DTM manager: It is also called as master thread and creates different types of child
threads like reader, writer, transformation, pre-session and post session threads.
Thread is a task, informatica server execute ETL logic depending on threads only.
a. Reader thread: Used for data extraction.
b. Transformation thread: Used for data transformation
c. Writer thread: Used for data loading
d. Pre-session thread: Used to execute SQL statements before pipeline runs.
e. Post-session thread: Used to execute SQL statements after pipeline runs.
DTM manager creates number of threads depending on partitioning points. Threads are
increased directly proportional to number of partitioning points. If number of
partitioning points increased performance increased and then execution time reduced.

Transfor
source target
-mation

No of threads
No of partitions
Working with Workflow Manager:

It is used to create multiple workflows to execute ETL logics by informatica


power center server.
Workflow is a set tasks linked together to give instructions to the informatica
server during run time in order to find sources , targets locations and
execute based upon threads created by DTM manger.

Available tasks to create workflows:


1.Start task
2.Session task
3.Timer task
4.Event wait
5.Event raise
6.Decision task
7.Control task
8.Email task
9.Command task.
There are two ways to create workflows :
1.Sequential batch
2.Concurrent batch
3.Sequential to concurrent

1 sequential batch is created with multiple tasks connected in serial. In this


batch if parent task failed, then all child tasks are automatically failed, because
of dependency existed between parent task and child task. Informatica server
not responsible to execute remaining tasks.

2.Concurrent batch is created with multiple tasks connected in parallel. In this


batch if first task is failed, then power center server execute tasks until final
task completed, because no dependency existed between tasks.

3. Sequential to concurrent batch created in serial and parallel modes.

Concurrent batch created for STAR SCHEMA, where as sequential to concurrent


batch created for SNOWFLAKE STAR SCHEMA.

Note : for all batches start task is the parent task, where informatica server
starts the execution.
Working with workflow monitor :

It allows multiple users to view status of the execution by


informatica server. Informatica server allows ten tasks at a time
in running mode, remaining all are in waiting mode. Informatica
server executes all the session in round robin fashion.

It displays the run time errors like


1.Database errors,
2.Fatal errors
3.Transformation errors, etc.

Common errors raised by informatica server are, a. file not found


at specified location, b. table are view does not existed, c. data
type incompatibility, d. session task execution terminated
unexpectedly, etc.

You might also like