reusable Session-27_Re-Usable Tasks.pptx

• Informatica :
• Reusable Tasks

Reusable Tasks
 Three types of reusable Tasks
Session – Set of instructions
to execute a specific
Mapping
Command – Specific shell
commands to run during
any Workflow
Email – Sends email during
the Workflow

Reusable Tasks
 Use the Task Developer to
create reusable tasks
 These tasks will then appear
in the Navigator and can be
dragged and dropped into
any workflow

Reusable Tasks in a Workflow
 In a workflow, a reusable task is represented with the symbol
Reusable
Non-reusable

 Server instructions to runs the logic of ONE specific Mapping
e.g. - source and target data location specifications, memory allocation, optional
Mapping overrides, scheduling, processing and load instructions
 Becomes a component
of a Workflow (or
Worklet)
 If configured in the
Task Developer, the
Session Task is reusable
(optional)
Session Task

Session Task – Properties and Parameters
Session Task
Session
parameter
Properties Tab
Parameter file

Command Task
 Specify one (or more) Unix shell or DOS (NT, Win2000) commands to run at a specific
point in the Workflow
 Becomes a component of a Workflow (or Worklet)
 If configured in the Task Developer, the Command Task is reusable (optional)
Commands can also be referenced in a Session through the Session “Components” tab as
Pre- or Post-Session commands

Command Task
 Specify one (or more) Unix shell or DOS (NT, Win2000)
commands to run at a specific point in the workflow
 Becomes a component of a workflow (or worklet)
 If created in the Task Developer, the Command task is
reusable
 If created in the Workflow Designer, the Command task
is not reusable
 Commands can also be invoked under the Components
tab of a Session task to run pre- or post-session

Command Task
 Specify one or more Unix shell or DOS commands to run
during the Workflow
 Runs in the Informatica Server (UNIX or Windows)
environment
 Command task status (successful completion or failure) is
held in the pre-defined task variable
$command_task_name.STATUS
 Each Command Task shell command can execute before the
Session begins or after the Informatica Server executes a
Session

Command Task (cont’d)
Add Cmd
Remove Cmd

Email Task
 Configure to have the Informatica Server to send
email at any point in the Workflow
 Becomes a component in a Workflow (or Worklet)
 If configured in the Task Developer, the Email Task is
reusable (optional)
 Emails can also be invoked under the Components tab
of a Session task to run pre- or post-session

 Configure to have the Informatica Server to send email at any point in the
Workflow
 Becomes a component in a Workflow (or Worklet)
 If configured in the Task Developer, the Email Task is reusable (optional)
Email can also be configured in a Session, to be sent (only) at the completion of the
Session
Email Task

Workflow Tasks
 Command. Specifies a shell command run during the workflow.
 Control. Stops or aborts the workflow.
 Decision. Specifies a condition to evaluate.
 Email. Sends email during the workflow.
 Event-Raise. Notifies the Event-Wait task that an event has occurred.
 Event-Wait. Waits for an event to occur before executing the next task.
 Session. Runs a mapping you create in the Designer.
 Assignment. Assigns a value to a workflow variable.
 Timer. Waits for a timed event to trigger.

Non-Reusable Tasks
 Six additional Tasks are available in the Workflow Designer
Decision
Assignment
Timer
Control
Event Wait
Event Raise

Additional Workflow Tasks
 Six additional Tasks are available in the Workflow Designer
 All are Workflow-specific only
• Decision
• Assignment
• Timer
• Control
• Event Wait
• Event Raise

Decision Task
 Specifies a condition to be evaluated in the Workflow
 Use the Decision Task in branches of a Workflow
 Use link conditions downstream to control execution flow by testing
the Decision result

Assignment Task
 Assigns a value to a Workflow Variable
 Variables are defined in the Workflow object
Expressions Tab
General Tab

Timer Task
 Waits for a specified period of time to execute the next
Task
General Tab
• Absolute Time
• Datetime Variable
• Relative Time
Timer Tab

Control Task
 Stop or ABORT the Workflow
General
Tab
Properties Tab

Event Wait Task
 Pauses processing of the pipeline until a specified
event occurs
 Events can be:
 Pre-defined – file watch
 User-defined – created by an Event Raise task elsewhere in
the workflow

Event Wait Task (cont’d)
General Tab
Properties Tab

Event Wait Task (cont’d)
Events Tab
User-defined event configured
in the Workflow object

Event Raise Task
 Represents the location of a user-defined event
 The Event Raise Task triggers the user-defined event when the
Informatica Server executes the Event Raise Task
Used with the Event Wait Task
General Tab Properties Tab

reusable Session-27_Re-Usable Tasks.pptx

Workflow Scheduler Objects
 Setup reusable schedules to associate with multiple Workflows
 Used in Workflows and Session Tasks

Workflow Scheduler
Set and customize workflow-specific schedule

Copyright ©2004
Step 5 - Scheduler
 Availability
 Reusable, as a special object
 Non-reusable, under Workflows  Edit  Scheduler
Non-
Reusable
Reusable

Copyright ©2004
Step 5 - Scheduler
 Basic Properties
Start when
server starts
Default
mode, not
scheduled
Run again as
soon as
previous run
is completed
Calendar based run
windows
Runs every 15 minutes, starting 3/7/03
15:21 and ending 3/24/03

Copyright ©2004
Step 5 - Scheduler
 Custom Repeats
Repeat frequency
Repeat any day of
the month, or several
days a month
Here, repeats every
last Saturday of the
month
Repeat any day of
the week, or several
days a week

Copyright ©2004
Step 5 - Scheduler
 Custom Repeats
 The scheduler cannot specify a time window within a day (I.e. run every day
between 8PM and 11PM)
 For this, use a link condition between the start task and the next task and
schedule the Workflow to run continuously or every (n) minutes
Runs if workflow started
between 8 and 10:59 PM

Workflow Server Connection
Configure:
1. Relational
2. Queue
3. FTP
4. Application
5. Loader
 Configure Server data access connections
- Used in Session Tasks’

Relational Connection (Native)
 Create a relational (database) connection
 Instructions to the Server to locate relational tables
 Used in Session Tasks

Normalizer Transformations
Active Transformation
Connected
Ports
• Input / output or output
Usage
• Required for VSAM Source
definitions
• Normalize flat file or
relational source
definitions
• Generate multiple
records from one record
• The Normalizer transformation receives a row that contains multiple-occurring
columns and returns a row for each instance of the multiple-occurring data.
• The transformation processes multiple-occurring columns or multiple-occurring groups
of columns in each source row.

Turn one row
YEAR,ACCOUNT,MONTH1,MONTH2,MONTH3, … MONTH12
1997,Salaries,21000,21000,22000,19000,23000,26000,29000,29000,34000,34000,40000,45000
1997,Benefits,4200,4200,4400,3800,4600,5200,5800,5800,6800,6800,8000,9000
1997,Expenses,10500,4000,5000,6500,3000,7000,9000,4500,7500,8000,8500,8250
Into multiple rows

Generated Column ID

Debugger
By the end of this section you will be familiar with:
 Creating a Debug Session
 Debugger windows and indicators
 Debugger functionality and options
 Viewing data with the Debugger
 Setting and using Breakpoints
 Tips for using the Debugger

Debugger Features
 Wizard driven tool that runs a test session
 View source / target data
 View transformation data
 Set breakpoints and evaluate expressions
 Initialize variables
 Manually change variable values
 Data can be loaded or discarded
 Debug environment can be saved for later use

Debugger Interface
Target Instance
window
Transformation
Instance
Data window
Flashing
yellow
SQL
indicator
Debugger Mode
indicator
Solid yellow
arrow is current
transformation
indicator
Output Window –
Debugger Log
Edit
Breakpoints

Set Breakpoints
2. Choose global or
specific transformation
3. Choose to break on
data condition or error.
Optionally skip rows.
4. Add breakpoint(s)
5. Add data conditions
1. Edit breakpoint
6. Continue (to next breakpoint)

 Server must be running before starting a Debug Session
 When the Debugger is started, a spinning icon displays. Spinning
stops when the Debugger Server is ready
 The flashing yellow/green arrow points to the current active
Source Qualifier. The solid yellow arrow points to the current
Transformation instance
Debugger Tips
Next Instance – proceeds a single step at a time; one
row moves from transformation to transformation
Step to Instance – examines one transformation at a
time, following successive rows through the same
transformation

Copyright ©2004
 You can change the partitioning method at each partition point to
redistribute data between stage threads more efficiently
 All methods but pass-through come at the cost of some performance
 Round Robin
 Distributes data evenly between stage threads
 Use in the transformation stage, when reading from unevenly partitioned
sources
 Hash Key
 Keeps data belonging to the same group in the same partition so the data is
aggregated or sorted properly
 Use with Aggregator, Sorter, Joiner and Rank transformations
 Hash auto keys

Hash keys generated by the server engine, based on ‘groups by’ and
‘order by’ ports in transformations
 Hash user keys

Define the ports you want to group by
Partition Methods

Copyright ©2004
Performance & Tuning

Copyright ©2004
 Collect base performance data
 Establish reference points for your particular system
- Your goal is to measure optimal I/O performance on your system
- Create pass through mappings for each main source/target combination
- Make notes of the read and write throughput counters in the session statistics
- Time these sessions and compute Mb/hour or Gb/hour numbers
- Do this for various combinations of file and relational sources and targets
- Try and have the system to yourself when you run your benchmarks
 Collect performance data for your existing mappings
 Before tuning them
- Collect read and write throughput data
- Collect Mb/hour ot Gb/hour data
 Identify and remove the bottlenecks in your mappings
 Keep notes of what you do and how it affects the performance
 Go after one problem at a time and re-check performance after each change
 If a fix does not provide speed improvement, revert to your previous
configuration
Informatica Tuning 101

Copyright ©2004
Collecting Reference Data
 Use a pass-through mapping
 a source definition
 a source qualifier
 a target definition
 No transformations
 no transformation thread
 best possible engine performance
for this source and target
combination

Copyright ©2004
Target Bottleneck
 Common sources of problems
 Indexes or key constraints
 Database commit points too high or too low
 Common Solutions
 Drop indexes and key constraints before loading, rebuild after loading
 Use bulk loading or external loaders when practical
 Experiment with the frequency of database commit points

Copyright ©2004
 Inefficient SQL query
 Table partitioning does not fit the query
 analyze the query issued by the Source Qualifier. It appears in the session log.
Most SQL interpreter tools allow you to view an execution plan for your query.
 consider using database optimizer hints to make sure correct indexes are used
 consider indexing tables when you have order by or group by clauses
 try database parallel queries if supported
 try partitioning the session if appropriate
 If you have table partitioning, make sure your query does not pull data across
partition lines
 If you have a query filter on non-indexed columns, try moving the filter outside
of the query, into a Filter Transformation
Source Bottleneck

Copyright ©2004
 Common sources of problems
 too many transforms
 unused links between ports
 too many input/output or outputs ports connected out of aggregator, ranking, lookup
transformations
 unnecessary data-type conversions
 Common solutions
 eliminate transformation errors
 if several mappings read from the same source, try single pass reading
 optimize datatypes, use integers for comparisons.
 don’t convert back and forth between datatypes
 optimize lookups and lookup tables, using cache and indexing tables
 put your filters early in the data flow, use a simple filter condition
 for aggregators, use sorted input, integer columns to group by and simplify expressions
 if you use reusable sequence generators, increase number of cached values
 if you use the same logic in different data streams, apply it before the streams branch off
 optimize expressions:
- isolate slow and complex expressions
- reduce or simplify aggregate functions
- use local variables to encapsulate repeated computations
- integer computations are faster than character computations
- use operators rather that the equivalent function, ‘||’ faster than CONCAT().
Mapping Bottleneck

Copyright ©2004
 Common sources of problems
 inappropriate memory allocation settings
 under-utilized or over-utilized resources (CPU and RAM)
 error tracing override set to high level
 Common solutions
 experiment with DTM buffer pool and buffer block size
- As good starting point is 25MB for DTM buffer and 64K for buffer block size
 make sure to keep data caches and indexes in memory
- Avoid paging to disk, but be aware of your RAM limits
 run sessions in parallel, in parallel workflow execution paths, whenever possible
- Here also, be cautious not to hit your glass ceiling
 if your mapping allows it, use partitioning
 experiment with database commit interval
 turn off decimal arithmetic (it is off by default)
 use debugger rather than high error tracing,reduce your tracing level for production
runs
- Create a reusable session configuration object to store tracing level and block buffer size
 don’t stage your data if you can avoid it, read directly from original sources
 look at the performance of your session components (run each separately)
Session Bottleneck

Copyright ©2004
 slow network connections
 overloaded or under-powered servers
 slow disk performance
 get the best machines to run your server. Better yet, use several servers against the
same repository (power center only).
 use multiple CPUs and session partitioning
 make sure you have good network connections between Informatica server and
database servers
 Locate the Repository database on the Informatica server machine
 shutdown unneeded processes or network services on your servers
 use 7 bit ASCII data movement (the default) if you don’t need Unicode
 evaluate hard disk performance, try locating sources and targets on different drives
 Use different drives for transformation caches, if they don’t fit in memory
 get as much RAM as you can for your servers
System Bottleneck

Copyright ©2004
 View Session statistics through the Workflow Monitor
Select Transformation
Statistics tab
Number of input rows for each source
file
Number of output rows for the
relational target table. Load is spread
evenly across partitions
Output rows sent to the flat file top
10 target, confined to one partition
Using Statistics Counters
These numbers are available in real-
time, they are updated every few
seconds.

Copyright ©2004
 Turning it on
 In the Workflow Manager, edit session
 Collecting Performance data requires an additional 200K of memory per session
1 - Select Properties tab
2 - Select Performance section
3 - Check ‘Collect Performance
Data’ box for a test run to see
how your partitioning strategy
is performing
Using Performance Counters

Copyright ©2004
 Monitor Session performance through the Workflow Monitor
Select Performance tab, only visible while the session
is running and until you close the window. These
numbers are saved in the ‘.perf’ file
Input rows and output
rows counters for each
transformation
Read from disk/cache,
Write to disk/cache
counters for ranks,
aggregators and joiners
Error rows counters for
each transformation

Copyright ©2004
 How to use the counters
 Input & output rows to verify

data integrity

Rows repartition at a partition point
 Error rows

Did you expect this transformation to reject rows due to error ?
 Read/Write to disk

If the counters have non-zero values, your transformation is paging to disk
 Read/Write to cache

Use in conjunction with read/write to disk to estimate the size of the cache needed to hold
everything within RAM
 New group key

Aggregator and ranker

Number of groups created
 Does this number seem right ? If not, your grouping condition may be wrong
 Old group key

Aggregator and ranker

Number of times a group was reused
 Rows in Lookup Cache

Lookup only

Use to estimate the total cache size

Copyright ©2004
 Using Session log’s Run Info
 Only available when the session is finished
 One entry per stage per partition
 Counters:
 Run time, total run time for the thread
 Idle time, total time the thread spent doing nothing (included in total run
time)
 Busy percentage, a function of the two counters abover
 Replaces V5 buffer efficiency counters
Scroll down to the Run Info section
Using Run Info Counters

Copyright ©2004
Reader Transform Writer
High % Low % Low %
Low % High % Low %
Low % Low % High %
Using Run Info Counters
 Run Info Busy Percentage
 You need to compare the values for each stage to properly evaluate where the
bottleneck may be
 You want to look for a high value (busy) that stands out. This indicates a problem area.
 High values across the board are indicators of an optimized session
Bottlenecks in
red

Copyright ©2004
Review Quiz
1. What is a benefit of buffered processing stages ?
a) Safety net against network errors
b) Lower memory requirements
c) Overlapping data processing
2. How do you identify a target bottleneck ?
a) By changing the output of the session to point to a flat file instead of a relational
target
b) By reading the Run-Info section of the session log and looking for a low busy
percentage at the writer stage
c) By replacing the mapping with a pass-through mapping connected to the same
target
3. The ‘Collect Performance Data’ option is enabled by default ?
a) No, never
b) Yes, always
c) No, unless you run a debugging session

Copyright ©2004
4. You have a shared session memory set to 25MB and a buffer block size set to
64K. How many rows of data can the server move to memory in a single
operation ?
a) 40,000 rows if average row size is 655 bytes
b) 100 rows if average row size is 655 bytes
c) 2,500 rows if average row size is 64k
5. The Aggregator Transformation’s ‘Write To Cache’ tells the number of rows
written to the disk cache ?
a) TRUE.
b) FALSE
Review Quiz

reusable Session-27_Re-Usable Tasks.pptx

More Related Content

Similar to reusable Session-27_Re-Usable Tasks.pptx (20)

Recently uploaded (20)

reusable Session-27_Re-Usable Tasks.pptx

Editor's Notes