Informatica
Informatica
Non-RDBMS
Flat-files Data Processing
XML files
COBOL Extraction Loading
CSV files
Target System
Source Systems
Dataflow Diagram
There are ways to use Informatica power center,
1.Data migration
2.Data integration
3.Data conversion
1.Data migration : migrate data from source to target without modifications; here
target is a staging database always maintains most recent data. Use bulk load for
staging database.
3. Data conversion : convert input data into different formats of target requirements
like
a. string to number
b. string to date
c. detail to summarize, etc.
Note : here data integration is also called as data merging and data conversion
called as data cleansing and/ or data scrubbing.
Informatica is an Integrated tool and it is integrated with following tools.
1. Informatica power center client
2. Informatica power center server
3. Informatica power center repository
4. Informatica power center repository server
Integration Services take care of to integrate all the tools during run time.
1. Informatica power center client: It is used design required ETL logic using various
objects, Source Definitions, Target Definitions and Transformations called as mapping .
It is the combination of following tools to make ETL logic,
1. Power center designer
2. Power center workflow manager
3. Power center workflow monitor
4. Power center repository manager
5. Repository server administrative console( in 8.x it is a web based application tool)
Power center
Repository
server
Power center
repository
Repository manager
RP_sales
Folder1,
folder2., etc
RP_shares
Folder1,
folder2., etc
RP_finance
Folder1,
folder2., etc
Power center administrative console:
It is used to create multiple repositories and administers those repositories.
These repositories managed by repository manager.
As a power center administrator create new repositories and delete existing
repositories, and modifying existing repositories.
As an administrator grant permissions to the users and revoke permissions
from the user.
As an administrator promote local repository as global repository, but cant
devote global to local.
Source analyzer: It is used import required source tables from available source
systems like either an RDBMS or non-RDBMS. It is used to analyze structure of the table
and their relations.
Target designer: It is used to import required target tables from the target
system or create itself in the target designer.
Note: if source data available in the form of Non-RDBMS then not possible generate SQL queries in
the source qualifier.
Definition for port: Port is a connector used to connect with either upstream of the pipeline
or downstream of the pipeline.
1. Input port: it is used to connect with upstream of the pipeline and not available for
upstream of the pipeline.
2. Output port: it is used connect with downstream of the pipeline and created itself in the
transformation with reference of input or variable port.
3. Variable port: it is created in the transformation itself not available for input as well as
output pipelines. It holds data from input port and pass that value to output port.
4. Joiner Transformation: It allows to define join condition to merge data
coming from two homogeneous or heterogeneous source systems. It is an
independent transformation used in any level of the ETL logic. It is an active
transformation, if any record not meet join condition then informatica server rejects
that record itself in the joiner.
It supports four join types:
1. Normal join (default): selects only matching records.
2. Master outer join: selects matching records from both tables and only matching
records from detail table.
3. Detail outer join: selects matching records from both tables and only matching
records from master table.
4. Full outer join: selects both matching and non-matching records from both tables.
Note: It allows to define only equi join. It doesn’t allow more than two tables to define
complex join.
5. Lookup transformation: It internally contains lookup table defined either on source
or target otherwise directly define from external data source. It is mainly used for reference at
middle of the transformation stage using lookup condition.
It is a passive transformation, if lookup condition satisfied it passes not null values to the
downstream of the pipeline, otherwise it passes null values to the downstream of the pipeline.
Lookup transformation divided into two types:
1. Connected lookup: It is participated in pipeline and working as a procedure and returns more
than one port values at a time.
2. Unconnected lookup: It is not participated in pipeline and working as a function means it
returns only one port values at a time.
Note: connected lookup transformation used in only one location of the pipeline at a time. But
unconnected lookup transformations used in multiple pipelines at a time with in a single
mapping.
7.Rank transformation: It assigns the rank on particular value with respect group
by port. Default it selects highest rank. It can also select either top or bottom ranks based
on group by column. It is an active transformation to select specific required ranks. Rank
transformation itself contains rank index port generates rank values on selected port as
rank port.
11. Filter Transformation: Every time conditional based data is needed for target
requirements to select required data reject unwanted data. Filter transformation is used to define
filter condition to select required data based on condition.
Example: deptno=10 and sal>=2000
It allows only one condition at a time.
Nextval
currval Target 2
Empno
Mapping Parameters and Variables: These are useful to extract specific required
from input source.
Parameter: Parameter assign with a value that cannot be changed during execution (initial value
equal to final value).
Example: month=$$month (initial value $$month=1)
Variable: Variable assign with a value that can be changed during execution till reach max value
that means initial value is not equal to final value.
Example: day=$$day (initial value $$day=1 and it is incremented by with specified value till reach
max value)
Mapping parameters and variables are used for incremental loading of fact table.
Target load plan: If single mapping constructed with multiple pipelines individually then
informatica server load data into any target table in any order. Then as a designer we can
specify the order to load data into target tables as we need.
Primary key
target1
Primary key/
source transtage Foreign key
target2
Foreign key
target3
Target load types:
1. Bulk load: It is useful to load bulk amount of data into target, if target
table not defined with primary key or index. Bulk is used for full loading
of staging database without any modifications.
2. Normal load: It is useful to load cleansed data into target table, if target
table defined with primary key or index. Normal is used for incremental
loading of working database.
Bulk load
Normal load
Process Memories:
1. Buffer Memory
2. Cache memory
Buffer memory is a temporary memory created during execution. The default size is 12 MB. It is
created by DTM manager and divided into buffer blocks. Each block size is 64KB (default).
Informatica server extracts data from source and stored that data in buffer memory blocks
and transfer that data to target through buffer blocks. It is an user defined memory
increase or decrease depending on input volume of data.
Cache memory is a high speed temporary memory allocated by DTM manager and created by
informatica server at transformation stage in order to improve performance.
Types of cache memories:
1. Index cache (default size 1 MB)
2. Data cache (default size 2 MB)
3. In-memory cache (default size 8 MB)
4. Shared cache
5. Scheduled cache
Index cache maintains key column values where as data cache maintains output data related to
key column.
Informatica server creates cache memory on various transformations like
1. Joiner
2. Aggregator
3. Lookup
4. Rank
5. Sorter
1. Joiner: Informatica server creates two types of cache memories in joiner, a) joiner index cache,
b) joiner data cache. Joiner index cache maintains join condition values of master source where
as data cache maintains data of master source related to join condition.
2. Aggregator: It contains two cache memories, a) index cache, b) Data cache. Aggregator
index cache maintains group by port values where as data cache maintains output aggregated
values.
3. Lookup: It contains two cache memories, a) index cache, b) Data cache. Lookup index cache
maintains lookup condition values where as data cache maintains lookup values related to
lookup condition.
4. Rank: It contains two cache memories, a) index cache, b) Data cache. Rank index cache
maintains group by port values where as data cache maintains rank port values.
5. Sorter: It contains in-memory cache to maintains sorted data with respect to key column either
in ascending order or descending order.
Note : in informatica power center 8.x all process memories are
related to mappings are automatic in nature. These are
automatically increased by informatica power center server during
run time based on input data.
Load manager (LM): It is used to create various cache memories like shared cache
and scheduled cache to load session instance information into shared memory and
scheduling information into scheduled memory and initiates the DTM manager.
DTM manager: It is also called as master thread and creates different types of child
threads like reader, writer, transformation, pre-session and post session threads.
Thread is a task, informatica server execute ETL logic depending on threads only.
a. Reader thread: Used for data extraction.
b. Transformation thread: Used for data transformation
c. Writer thread: Used for data loading
d. Pre-session thread: Used to execute SQL statements before pipeline runs.
e. Post-session thread: Used to execute SQL statements after pipeline runs.
DTM manager creates number of threads depending on partitioning points. Threads are
increased directly proportional to number of partitioning points. If number of
partitioning points increased performance increased and then execution time reduced.
Transfor
source target
-mation
No of threads
No of partitions
Working with Workflow Manager:
Note : for all batches start task is the parent task, where informatica server
starts the execution.
Working with workflow monitor :