SlideShare a Scribd company logo
• Informatica :
• Reusable Tasks
Reusable Tasks
Reusable Tasks
 Three types of reusable Tasks
Session – Set of instructions
to execute a specific
Mapping
Command – Specific shell
commands to run during
any Workflow
Email – Sends email during
the Workflow
Reusable Tasks
 Use the Task Developer to
create reusable tasks
 These tasks will then appear
in the Navigator and can be
dragged and dropped into
any workflow
Reusable Tasks in a Workflow
 In a workflow, a reusable task is represented with the symbol
Reusable
Non-reusable
 Server instructions to runs the logic of ONE specific Mapping
e.g. - source and target data location specifications, memory allocation, optional
Mapping overrides, scheduling, processing and load instructions
 Becomes a component
of a Workflow (or
Worklet)
 If configured in the
Task Developer, the
Session Task is reusable
(optional)
Session Task
Session Task – Properties and Parameters
Session Task
Session
parameter
Properties Tab
Parameter file
Command Task
 Specify one (or more) Unix shell or DOS (NT, Win2000) commands to run at a specific
point in the Workflow
 Becomes a component of a Workflow (or Worklet)
 If configured in the Task Developer, the Command Task is reusable (optional)
Commands can also be referenced in a Session through the Session “Components” tab as
Pre- or Post-Session commands
Command Task
 Specify one (or more) Unix shell or DOS (NT, Win2000)
commands to run at a specific point in the workflow
 Becomes a component of a workflow (or worklet)
 If created in the Task Developer, the Command task is
reusable
 If created in the Workflow Designer, the Command task
is not reusable
 Commands can also be invoked under the Components
tab of a Session task to run pre- or post-session
Command Task
 Specify one or more Unix shell or DOS commands to run
during the Workflow
 Runs in the Informatica Server (UNIX or Windows)
environment
 Command task status (successful completion or failure) is
held in the pre-defined task variable
$command_task_name.STATUS
 Each Command Task shell command can execute before the
Session begins or after the Informatica Server executes a
Session
Command Task
Command Task (cont’d)
Add Cmd
Remove Cmd
Email Task
 Configure to have the Informatica Server to send
email at any point in the Workflow
 Becomes a component in a Workflow (or Worklet)
 If configured in the Task Developer, the Email Task is
reusable (optional)
 Emails can also be invoked under the Components tab
of a Session task to run pre- or post-session
 Configure to have the Informatica Server to send email at any point in the
Workflow
 Becomes a component in a Workflow (or Worklet)
 If configured in the Task Developer, the Email Task is reusable (optional)
Email can also be configured in a Session, to be sent (only) at the completion of the
Session
Email Task
Email Task
Email Task (cont’d)
Non-Reusable Tasks
Workflow Tasks
 Command. Specifies a shell command run during the workflow.
 Control. Stops or aborts the workflow.
 Decision. Specifies a condition to evaluate.
 Email. Sends email during the workflow.
 Event-Raise. Notifies the Event-Wait task that an event has occurred.
 Event-Wait. Waits for an event to occur before executing the next task.
 Session. Runs a mapping you create in the Designer.
 Assignment. Assigns a value to a workflow variable.
 Timer. Waits for a timed event to trigger.
Non-Reusable Tasks
 Six additional Tasks are available in the Workflow Designer
Decision
Assignment
Timer
Control
Event Wait
Event Raise
Additional Workflow Tasks
 Six additional Tasks are available in the Workflow Designer
 All are Workflow-specific only
• Decision
• Assignment
• Timer
• Control
• Event Wait
• Event Raise
Decision Task
 Specifies a condition to be evaluated in the Workflow
 Use the Decision Task in branches of a Workflow
 Use link conditions downstream to control execution flow by testing
the Decision result
Assignment Task
 Assigns a value to a Workflow Variable
 Variables are defined in the Workflow object
Expressions Tab
General Tab
Timer Task
 Waits for a specified period of time to execute the next
Task
General Tab
• Absolute Time
• Datetime Variable
• Relative Time
Timer Tab
Control Task
 Stop or ABORT the Workflow
General
Tab
Properties Tab
Event Wait Task
 Pauses processing of the pipeline until a specified
event occurs
 Events can be:
 Pre-defined – file watch
 User-defined – created by an Event Raise task elsewhere in
the workflow
Event Wait Task (cont’d)
General Tab
Properties Tab
Event Wait Task (cont’d)
Events Tab
User-defined event configured
in the Workflow object
Event Raise Task
 Represents the location of a user-defined event
 The Event Raise Task triggers the user-defined event when the
Informatica Server executes the Event Raise Task
Used with the Event Wait Task
General Tab Properties Tab
reusable Session-27_Re-Usable Tasks.pptx
reusable Session-27_Re-Usable Tasks.pptx
Workflow Scheduler Objects
 Setup reusable schedules to associate with multiple Workflows
 Used in Workflows and Session Tasks
Workflow Scheduler
Set and customize workflow-specific schedule
Copyright ©2004
Step 5 - Scheduler
 Availability
 Reusable, as a special object
 Non-reusable, under Workflows  Edit  Scheduler
Non-
Reusable
Reusable
Copyright ©2004
Step 5 - Scheduler
 Basic Properties
Start when
server starts
Default
mode, not
scheduled
Run again as
soon as
previous run
is completed
Calendar based run
windows
Runs every 15 minutes, starting 3/7/03
15:21 and ending 3/24/03
Copyright ©2004
Step 5 - Scheduler
 Custom Repeats
Repeat frequency
Repeat any day of
the month, or several
days a month
Here, repeats every
last Saturday of the
month
Repeat any day of
the week, or several
days a week
Copyright ©2004
Step 5 - Scheduler
 Custom Repeats
 The scheduler cannot specify a time window within a day (I.e. run every day
between 8PM and 11PM)
 For this, use a link condition between the start task and the next task and
schedule the Workflow to run continuously or every (n) minutes
Runs if workflow started
between 8 and 10:59 PM
Workflow Server Connection
Configure:
1. Relational
2. Queue
3. FTP
4. Application
5. Loader
 Configure Server data access connections
- Used in Session Tasks’
Relational Connection (Native)
 Create a relational (database) connection
 Instructions to the Server to locate relational tables
 Used in Session Tasks
reusable Session-27_Re-Usable Tasks.pptx
reusable Session-27_Re-Usable Tasks.pptx
reusable Session-27_Re-Usable Tasks.pptx
reusable Session-27_Re-Usable Tasks.pptx
Normalizer Transformation
Normalizer Transformations
Active Transformation
Connected
Ports
• Input / output or output
Usage
• Required for VSAM Source
definitions
• Normalize flat file or
relational source
definitions
• Generate multiple
records from one record
• The Normalizer transformation receives a row that contains multiple-occurring
columns and returns a row for each instance of the multiple-occurring data.
• The transformation processes multiple-occurring columns or multiple-occurring groups
of columns in each source row.
Turn one row
YEAR,ACCOUNT,MONTH1,MONTH2,MONTH3, … MONTH12
1997,Salaries,21000,21000,22000,19000,23000,26000,29000,29000,34000,34000,40000,45000
1997,Benefits,4200,4200,4400,3800,4600,5200,5800,5800,6800,6800,8000,9000
1997,Expenses,10500,4000,5000,6500,3000,7000,9000,4500,7500,8000,8500,8250
Into multiple rows
Normalizer Transformations
Generated Column ID
Normalizer Transformations
Debugger
Debugger
By the end of this section you will be familiar with:
 Creating a Debug Session
 Debugger windows and indicators
 Debugger functionality and options
 Viewing data with the Debugger
 Setting and using Breakpoints
 Tips for using the Debugger
Debugger Features
 Wizard driven tool that runs a test session
 View source / target data
 View transformation data
 Set breakpoints and evaluate expressions
 Initialize variables
 Manually change variable values
 Data can be loaded or discarded
 Debug environment can be saved for later use
Debugger Interface
Target Instance
window
Transformation
Instance
Data window
Flashing
yellow
SQL
indicator
Debugger Mode
indicator
Solid yellow
arrow is current
transformation
indicator
Output Window –
Debugger Log
Edit
Breakpoints
Set Breakpoints
2. Choose global or
specific transformation
3. Choose to break on
data condition or error.
Optionally skip rows.
4. Add breakpoint(s)
5. Add data conditions
1. Edit breakpoint
6. Continue (to next breakpoint)
 Server must be running before starting a Debug Session
 When the Debugger is started, a spinning icon displays. Spinning
stops when the Debugger Server is ready
 The flashing yellow/green arrow points to the current active
Source Qualifier. The solid yellow arrow points to the current
Transformation instance
Debugger Tips
Next Instance – proceeds a single step at a time; one
row moves from transformation to transformation
Step to Instance – examines one transformation at a
time, following successive rows through the same
transformation
Copyright ©2004
 You can change the partitioning method at each partition point to
redistribute data between stage threads more efficiently
 All methods but pass-through come at the cost of some performance
 Round Robin
 Distributes data evenly between stage threads
 Use in the transformation stage, when reading from unevenly partitioned
sources
 Hash Key
 Keeps data belonging to the same group in the same partition so the data is
aggregated or sorted properly
 Use with Aggregator, Sorter, Joiner and Rank transformations
 Hash auto keys

Hash keys generated by the server engine, based on ‘groups by’ and
‘order by’ ports in transformations
 Hash user keys

Define the ports you want to group by
Partition Methods
Copyright ©2004
Performance & Tuning
Copyright ©2004
 Collect base performance data
 Establish reference points for your particular system
- Your goal is to measure optimal I/O performance on your system
- Create pass through mappings for each main source/target combination
- Make notes of the read and write throughput counters in the session statistics
- Time these sessions and compute Mb/hour or Gb/hour numbers
- Do this for various combinations of file and relational sources and targets
- Try and have the system to yourself when you run your benchmarks
 Collect performance data for your existing mappings
 Before tuning them
- Collect read and write throughput data
- Collect Mb/hour ot Gb/hour data
 Identify and remove the bottlenecks in your mappings
 Keep notes of what you do and how it affects the performance
 Go after one problem at a time and re-check performance after each change
 If a fix does not provide speed improvement, revert to your previous
configuration
Informatica Tuning 101
Copyright ©2004
Collecting Reference Data
 Use a pass-through mapping
 a source definition
 a source qualifier
 a target definition
 No transformations
 no transformation thread
 best possible engine performance
for this source and target
combination
Copyright ©2004
1-Writing to a slow target ?
2-Reading from a slow source ?
3-Transformation inefficiencies ?
4-Session inefficiencies ?
5-System not
optimized ?
Identifying Bottlenecks
Copyright ©2004
Target Bottleneck
Change session’s
writer to a file
write
Copyright ©2004
Target Bottleneck
 Common sources of problems
 Indexes or key constraints
 Database commit points too high or too low
 Common Solutions
 Drop indexes and key constraints before loading, rebuild after loading
 Use bulk loading or external loaders when practical
 Experiment with the frequency of database commit points
Copyright ©2004
Source Bottleneck
OR
Copyright ©2004
 Common sources of problems
 Inefficient SQL query
 Table partitioning does not fit the query
 Common Solutions
 analyze the query issued by the Source Qualifier. It appears in the session log.
Most SQL interpreter tools allow you to view an execution plan for your query.
 consider using database optimizer hints to make sure correct indexes are used
 consider indexing tables when you have order by or group by clauses
 try database parallel queries if supported
 try partitioning the session if appropriate
 If you have table partitioning, make sure your query does not pull data across
partition lines
 If you have a query filter on non-indexed columns, try moving the filter outside
of the query, into a Filter Transformation
Source Bottleneck
Copyright ©2004
Mapping Bottleneck
Under
Properties -> Performance
Copyright ©2004
 Common sources of problems
 too many transforms
 unused links between ports
 too many input/output or outputs ports connected out of aggregator, ranking, lookup
transformations
 unnecessary data-type conversions
 Common solutions
 eliminate transformation errors
 if several mappings read from the same source, try single pass reading
 optimize datatypes, use integers for comparisons.
 don’t convert back and forth between datatypes
 optimize lookups and lookup tables, using cache and indexing tables
 put your filters early in the data flow, use a simple filter condition
 for aggregators, use sorted input, integer columns to group by and simplify expressions
 if you use reusable sequence generators, increase number of cached values
 if you use the same logic in different data streams, apply it before the streams branch off
 optimize expressions:
- isolate slow and complex expressions
- reduce or simplify aggregate functions
- use local variables to encapsulate repeated computations
- integer computations are faster than character computations
- use operators rather that the equivalent function, ‘||’ faster than CONCAT().
Mapping Bottleneck
Copyright ©2004
Session Bottleneck
Copyright ©2004
 Common sources of problems
 inappropriate memory allocation settings
 under-utilized or over-utilized resources (CPU and RAM)
 error tracing override set to high level
 Common solutions
 experiment with DTM buffer pool and buffer block size
- As good starting point is 25MB for DTM buffer and 64K for buffer block size
 make sure to keep data caches and indexes in memory
- Avoid paging to disk, but be aware of your RAM limits
 run sessions in parallel, in parallel workflow execution paths, whenever possible
- Here also, be cautious not to hit your glass ceiling
 if your mapping allows it, use partitioning
 experiment with database commit interval
 turn off decimal arithmetic (it is off by default)
 use debugger rather than high error tracing,reduce your tracing level for production
runs
- Create a reusable session configuration object to store tracing level and block buffer size
 don’t stage your data if you can avoid it, read directly from original sources
 look at the performance of your session components (run each separately)
Session Bottleneck
Copyright ©2004
System Bottleneck
Copyright ©2004
 Common sources of problems
 slow network connections
 overloaded or under-powered servers
 slow disk performance
 Common Solutions
 get the best machines to run your server. Better yet, use several servers against the
same repository (power center only).
 use multiple CPUs and session partitioning
 make sure you have good network connections between Informatica server and
database servers
 Locate the Repository database on the Informatica server machine
 shutdown unneeded processes or network services on your servers
 use 7 bit ASCII data movement (the default) if you don’t need Unicode
 evaluate hard disk performance, try locating sources and targets on different drives
 Use different drives for transformation caches, if they don’t fit in memory
 get as much RAM as you can for your servers
System Bottleneck
Copyright ©2004
 View Session statistics through the Workflow Monitor
Select Transformation
Statistics tab
Number of input rows for each source
file
Number of output rows for the
relational target table. Load is spread
evenly across partitions
Output rows sent to the flat file top
10 target, confined to one partition
Using Statistics Counters
These numbers are available in real-
time, they are updated every few
seconds.
Copyright ©2004
 Turning it on
 In the Workflow Manager, edit session
 Collecting Performance data requires an additional 200K of memory per session
1 - Select Properties tab
2 - Select Performance section
3 - Check ‘Collect Performance
Data’ box for a test run to see
how your partitioning strategy
is performing
Using Performance Counters
Copyright ©2004
 Monitor Session performance through the Workflow Monitor
Select Performance tab, only visible while the session
is running and until you close the window. These
numbers are saved in the ‘.perf’ file
Input rows and output
rows counters for each
transformation
Read from disk/cache,
Write to disk/cache
counters for ranks,
aggregators and joiners
Using Performance Counters
Error rows counters for
each transformation
Copyright ©2004
 How to use the counters
 Input & output rows to verify

data integrity

Rows repartition at a partition point
 Error rows

Did you expect this transformation to reject rows due to error ?
 Read/Write to disk

If the counters have non-zero values, your transformation is paging to disk
 Read/Write to cache

Use in conjunction with read/write to disk to estimate the size of the cache needed to hold
everything within RAM
 New group key

Aggregator and ranker

Number of groups created
 Does this number seem right ? If not, your grouping condition may be wrong
 Old group key

Aggregator and ranker

Number of times a group was reused
 Rows in Lookup Cache

Lookup only

Use to estimate the total cache size
Using Performance Counters
Copyright ©2004
 Using Session log’s Run Info
 Only available when the session is finished
 One entry per stage per partition
 Counters:
 Run time, total run time for the thread
 Idle time, total time the thread spent doing nothing (included in total run
time)
 Busy percentage, a function of the two counters abover
 Replaces V5 buffer efficiency counters
Scroll down to the Run Info section
Using Run Info Counters
Copyright ©2004
Reader Transform Writer
High % Low % Low %
Low % High % Low %
Low % Low % High %
Using Run Info Counters
 Run Info Busy Percentage
 You need to compare the values for each stage to properly evaluate where the
bottleneck may be
 You want to look for a high value (busy) that stands out. This indicates a problem area.
 High values across the board are indicators of an optimized session
Bottlenecks in
red
Copyright ©2004
Review Quiz
1. What is a benefit of buffered processing stages ?
a) Safety net against network errors
b) Lower memory requirements
c) Overlapping data processing
2. How do you identify a target bottleneck ?
a) By changing the output of the session to point to a flat file instead of a relational
target
b) By reading the Run-Info section of the session log and looking for a low busy
percentage at the writer stage
c) By replacing the mapping with a pass-through mapping connected to the same
target
3. The ‘Collect Performance Data’ option is enabled by default ?
a) No, never
b) Yes, always
c) No, unless you run a debugging session
Copyright ©2004
4. You have a shared session memory set to 25MB and a buffer block size set to
64K. How many rows of data can the server move to memory in a single
operation ?
a) 40,000 rows if average row size is 655 bytes
b) 100 rows if average row size is 655 bytes
c) 2,500 rows if average row size is 64k
5. The Aggregator Transformation’s ‘Write To Cache’ tells the number of rows
written to the disk cache ?
a) TRUE.
b) FALSE
Review Quiz

More Related Content

PPT
Informatica Power Center - Workflow Manager
ZaranTech LLC
 
PPT
Datastage Introduction To Data Warehousing
Vibrant Technologies & Computers
 
PPTX
SSIS : Ftp and script task
Kiki Noviandi
 
PPTX
Automate Evolve Training: Excel Workflow for Automatic Routing
Precisely
 
DOCX
Step types
vamshimahi
 
PPTX
Connectors in mule
Sindhu VL
 
PPT
NoCOUG Presentation on Oracle RAT
HenryBowers
 
Informatica Power Center - Workflow Manager
ZaranTech LLC
 
Datastage Introduction To Data Warehousing
Vibrant Technologies & Computers
 
SSIS : Ftp and script task
Kiki Noviandi
 
Automate Evolve Training: Excel Workflow for Automatic Routing
Precisely
 
Step types
vamshimahi
 
Connectors in mule
Sindhu VL
 
NoCOUG Presentation on Oracle RAT
HenryBowers
 

Similar to reusable Session-27_Re-Usable Tasks.pptx (20)

PPTX
Enhanced Reframework Session_16-07-2022.pptx
Rohit Radhakrishnan
 
PDF
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
PDF
Grafana Optimization.pdf
ShreyasKashyap12
 
PPTX
Grafana optimization for Prometheus
Mitsuhiro Tanda
 
PDF
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
PDF
Stop hardcoding follow parameterization
Preeti Sagar
 
PPT
Performance Teting - VU Scripting Using Rational (http://www.geektester.blogs...
raj.kamal13
 
PDF
Essentials of Automations - The Art of Triggers and Actions in FME
Safe Software
 
PDF
FME World Tour 2015 - FME & Data Migration Simon McCabe
IMGS
 
PPTX
Tibco business works
Cblsolutions.com
 
PPT
Intro to tsql
Syed Asrarali
 
PPT
Intro to tsql unit 14
Syed Asrarali
 
PPTX
Informatica overview
karthik kumar
 
PPTX
Informatica overview
karthik kumar
 
DOCX
Db2 migration -_tips,_tricks,_and_pitfalls
sam2sung2
 
PDF
Back to FME School - Day 2: Your Data and FME
Safe Software
 
PPTX
Oracle applications r12.2, ebr, online patching means lot of work for devel...
Ajith Narayanan
 
PPT
Process management
Digpal Singh Jhala
 
PPTX
Why I am hooked on the future of React
Maurice De Beijer [MVP]
 
PDF
Process mining with Disco (Eng)
Dafna Levy
 
Enhanced Reframework Session_16-07-2022.pptx
Rohit Radhakrishnan
 
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Grafana Optimization.pdf
ShreyasKashyap12
 
Grafana optimization for Prometheus
Mitsuhiro Tanda
 
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Stop hardcoding follow parameterization
Preeti Sagar
 
Performance Teting - VU Scripting Using Rational (http://www.geektester.blogs...
raj.kamal13
 
Essentials of Automations - The Art of Triggers and Actions in FME
Safe Software
 
FME World Tour 2015 - FME & Data Migration Simon McCabe
IMGS
 
Tibco business works
Cblsolutions.com
 
Intro to tsql
Syed Asrarali
 
Intro to tsql unit 14
Syed Asrarali
 
Informatica overview
karthik kumar
 
Informatica overview
karthik kumar
 
Db2 migration -_tips,_tricks,_and_pitfalls
sam2sung2
 
Back to FME School - Day 2: Your Data and FME
Safe Software
 
Oracle applications r12.2, ebr, online patching means lot of work for devel...
Ajith Narayanan
 
Process management
Digpal Singh Jhala
 
Why I am hooked on the future of React
Maurice De Beijer [MVP]
 
Process mining with Disco (Eng)
Dafna Levy
 
Ad

Recently uploaded (20)

PPTX
Web_Engineering_Assignment_Clean.pptxfor college
HUSNAINAHMAD39
 
PDF
CH2-MODEL-SETUP-v2017.1-JC-APR27-2017.pdf
jcc00023con
 
PPTX
Trading Procedures (1).pptxcffcdddxxddsss
garv794
 
PDF
Technical Writing Module-I Complete Notes.pdf
VedprakashArya13
 
PDF
Company Presentation pada Perusahaan ADB.pdf
didikfahmi
 
PPT
2009worlddatasheet_presentation.ppt peoole
umutunsalnsl4402
 
PDF
Chad Readey - An Independent Thinker
Chad Readey
 
PPTX
World-population.pptx fire bunberbpeople
umutunsalnsl4402
 
PDF
blockchain123456789012345678901234567890
tanvikhunt1003
 
PPTX
Future_of_AI_Presentation for everyone.pptx
boranamanju07
 
PDF
The_Future_of_Data_Analytics_by_CA_Suvidha_Chaplot_UPDATED.pdf
CA Suvidha Chaplot
 
PDF
Linux OS guide to know, operate. Linux Filesystem, command, users and system
Kiran Maharjan
 
PPTX
Analysis of Employee_Attrition_Presentation.pptx
AdawuRedeemer
 
PDF
oop_java (1) of ice or cse or eee ic.pdf
sabiquntoufiqlabonno
 
PPTX
Complete_STATA_Introduction_Beginner.pptx
mbayekebe
 
PPTX
Presentation (1) (1).pptx k8hhfftuiiigff
karthikjagath2005
 
PPTX
1intro to AI.pptx AI components & composition
ssuserb993e5
 
PPTX
Economic Sector Performance Recovery.pptx
yulisbaso2020
 
PDF
TIC ACTIVIDAD 1geeeeeeeeeeeeeeeeeeeeeeeeeeeeeer3.pdf
Thais Ruiz
 
PPTX
Decoding Physical Presence: Unlocking Business Intelligence with Wi-Fi Analytics
meghahiremath253
 
Web_Engineering_Assignment_Clean.pptxfor college
HUSNAINAHMAD39
 
CH2-MODEL-SETUP-v2017.1-JC-APR27-2017.pdf
jcc00023con
 
Trading Procedures (1).pptxcffcdddxxddsss
garv794
 
Technical Writing Module-I Complete Notes.pdf
VedprakashArya13
 
Company Presentation pada Perusahaan ADB.pdf
didikfahmi
 
2009worlddatasheet_presentation.ppt peoole
umutunsalnsl4402
 
Chad Readey - An Independent Thinker
Chad Readey
 
World-population.pptx fire bunberbpeople
umutunsalnsl4402
 
blockchain123456789012345678901234567890
tanvikhunt1003
 
Future_of_AI_Presentation for everyone.pptx
boranamanju07
 
The_Future_of_Data_Analytics_by_CA_Suvidha_Chaplot_UPDATED.pdf
CA Suvidha Chaplot
 
Linux OS guide to know, operate. Linux Filesystem, command, users and system
Kiran Maharjan
 
Analysis of Employee_Attrition_Presentation.pptx
AdawuRedeemer
 
oop_java (1) of ice or cse or eee ic.pdf
sabiquntoufiqlabonno
 
Complete_STATA_Introduction_Beginner.pptx
mbayekebe
 
Presentation (1) (1).pptx k8hhfftuiiigff
karthikjagath2005
 
1intro to AI.pptx AI components & composition
ssuserb993e5
 
Economic Sector Performance Recovery.pptx
yulisbaso2020
 
TIC ACTIVIDAD 1geeeeeeeeeeeeeeeeeeeeeeeeeeeeeer3.pdf
Thais Ruiz
 
Decoding Physical Presence: Unlocking Business Intelligence with Wi-Fi Analytics
meghahiremath253
 
Ad

reusable Session-27_Re-Usable Tasks.pptx

  • 1. • Informatica : • Reusable Tasks
  • 3. Reusable Tasks  Three types of reusable Tasks Session – Set of instructions to execute a specific Mapping Command – Specific shell commands to run during any Workflow Email – Sends email during the Workflow
  • 4. Reusable Tasks  Use the Task Developer to create reusable tasks  These tasks will then appear in the Navigator and can be dragged and dropped into any workflow
  • 5. Reusable Tasks in a Workflow  In a workflow, a reusable task is represented with the symbol Reusable Non-reusable
  • 6.  Server instructions to runs the logic of ONE specific Mapping e.g. - source and target data location specifications, memory allocation, optional Mapping overrides, scheduling, processing and load instructions  Becomes a component of a Workflow (or Worklet)  If configured in the Task Developer, the Session Task is reusable (optional) Session Task
  • 7. Session Task – Properties and Parameters Session Task Session parameter Properties Tab Parameter file
  • 8. Command Task  Specify one (or more) Unix shell or DOS (NT, Win2000) commands to run at a specific point in the Workflow  Becomes a component of a Workflow (or Worklet)  If configured in the Task Developer, the Command Task is reusable (optional) Commands can also be referenced in a Session through the Session “Components” tab as Pre- or Post-Session commands
  • 9. Command Task  Specify one (or more) Unix shell or DOS (NT, Win2000) commands to run at a specific point in the workflow  Becomes a component of a workflow (or worklet)  If created in the Task Developer, the Command task is reusable  If created in the Workflow Designer, the Command task is not reusable  Commands can also be invoked under the Components tab of a Session task to run pre- or post-session
  • 10. Command Task  Specify one or more Unix shell or DOS commands to run during the Workflow  Runs in the Informatica Server (UNIX or Windows) environment  Command task status (successful completion or failure) is held in the pre-defined task variable $command_task_name.STATUS  Each Command Task shell command can execute before the Session begins or after the Informatica Server executes a Session
  • 12. Command Task (cont’d) Add Cmd Remove Cmd
  • 13. Email Task  Configure to have the Informatica Server to send email at any point in the Workflow  Becomes a component in a Workflow (or Worklet)  If configured in the Task Developer, the Email Task is reusable (optional)  Emails can also be invoked under the Components tab of a Session task to run pre- or post-session
  • 14.  Configure to have the Informatica Server to send email at any point in the Workflow  Becomes a component in a Workflow (or Worklet)  If configured in the Task Developer, the Email Task is reusable (optional) Email can also be configured in a Session, to be sent (only) at the completion of the Session Email Task
  • 18. Workflow Tasks  Command. Specifies a shell command run during the workflow.  Control. Stops or aborts the workflow.  Decision. Specifies a condition to evaluate.  Email. Sends email during the workflow.  Event-Raise. Notifies the Event-Wait task that an event has occurred.  Event-Wait. Waits for an event to occur before executing the next task.  Session. Runs a mapping you create in the Designer.  Assignment. Assigns a value to a workflow variable.  Timer. Waits for a timed event to trigger.
  • 19. Non-Reusable Tasks  Six additional Tasks are available in the Workflow Designer Decision Assignment Timer Control Event Wait Event Raise
  • 20. Additional Workflow Tasks  Six additional Tasks are available in the Workflow Designer  All are Workflow-specific only • Decision • Assignment • Timer • Control • Event Wait • Event Raise
  • 21. Decision Task  Specifies a condition to be evaluated in the Workflow  Use the Decision Task in branches of a Workflow  Use link conditions downstream to control execution flow by testing the Decision result
  • 22. Assignment Task  Assigns a value to a Workflow Variable  Variables are defined in the Workflow object Expressions Tab General Tab
  • 23. Timer Task  Waits for a specified period of time to execute the next Task General Tab • Absolute Time • Datetime Variable • Relative Time Timer Tab
  • 24. Control Task  Stop or ABORT the Workflow General Tab Properties Tab
  • 25. Event Wait Task  Pauses processing of the pipeline until a specified event occurs  Events can be:  Pre-defined – file watch  User-defined – created by an Event Raise task elsewhere in the workflow
  • 26. Event Wait Task (cont’d) General Tab Properties Tab
  • 27. Event Wait Task (cont’d) Events Tab User-defined event configured in the Workflow object
  • 28. Event Raise Task  Represents the location of a user-defined event  The Event Raise Task triggers the user-defined event when the Informatica Server executes the Event Raise Task Used with the Event Wait Task General Tab Properties Tab
  • 31. Workflow Scheduler Objects  Setup reusable schedules to associate with multiple Workflows  Used in Workflows and Session Tasks
  • 32. Workflow Scheduler Set and customize workflow-specific schedule
  • 33. Copyright ©2004 Step 5 - Scheduler  Availability  Reusable, as a special object  Non-reusable, under Workflows  Edit  Scheduler Non- Reusable Reusable
  • 34. Copyright ©2004 Step 5 - Scheduler  Basic Properties Start when server starts Default mode, not scheduled Run again as soon as previous run is completed Calendar based run windows Runs every 15 minutes, starting 3/7/03 15:21 and ending 3/24/03
  • 35. Copyright ©2004 Step 5 - Scheduler  Custom Repeats Repeat frequency Repeat any day of the month, or several days a month Here, repeats every last Saturday of the month Repeat any day of the week, or several days a week
  • 36. Copyright ©2004 Step 5 - Scheduler  Custom Repeats  The scheduler cannot specify a time window within a day (I.e. run every day between 8PM and 11PM)  For this, use a link condition between the start task and the next task and schedule the Workflow to run continuously or every (n) minutes Runs if workflow started between 8 and 10:59 PM
  • 37. Workflow Server Connection Configure: 1. Relational 2. Queue 3. FTP 4. Application 5. Loader  Configure Server data access connections - Used in Session Tasks’
  • 38. Relational Connection (Native)  Create a relational (database) connection  Instructions to the Server to locate relational tables  Used in Session Tasks
  • 44. Normalizer Transformations Active Transformation Connected Ports • Input / output or output Usage • Required for VSAM Source definitions • Normalize flat file or relational source definitions • Generate multiple records from one record • The Normalizer transformation receives a row that contains multiple-occurring columns and returns a row for each instance of the multiple-occurring data. • The transformation processes multiple-occurring columns or multiple-occurring groups of columns in each source row.
  • 45. Turn one row YEAR,ACCOUNT,MONTH1,MONTH2,MONTH3, … MONTH12 1997,Salaries,21000,21000,22000,19000,23000,26000,29000,29000,34000,34000,40000,45000 1997,Benefits,4200,4200,4400,3800,4600,5200,5800,5800,6800,6800,8000,9000 1997,Expenses,10500,4000,5000,6500,3000,7000,9000,4500,7500,8000,8500,8250 Into multiple rows Normalizer Transformations
  • 48. Debugger By the end of this section you will be familiar with:  Creating a Debug Session  Debugger windows and indicators  Debugger functionality and options  Viewing data with the Debugger  Setting and using Breakpoints  Tips for using the Debugger
  • 49. Debugger Features  Wizard driven tool that runs a test session  View source / target data  View transformation data  Set breakpoints and evaluate expressions  Initialize variables  Manually change variable values  Data can be loaded or discarded  Debug environment can be saved for later use
  • 50. Debugger Interface Target Instance window Transformation Instance Data window Flashing yellow SQL indicator Debugger Mode indicator Solid yellow arrow is current transformation indicator Output Window – Debugger Log Edit Breakpoints
  • 51. Set Breakpoints 2. Choose global or specific transformation 3. Choose to break on data condition or error. Optionally skip rows. 4. Add breakpoint(s) 5. Add data conditions 1. Edit breakpoint 6. Continue (to next breakpoint)
  • 52.  Server must be running before starting a Debug Session  When the Debugger is started, a spinning icon displays. Spinning stops when the Debugger Server is ready  The flashing yellow/green arrow points to the current active Source Qualifier. The solid yellow arrow points to the current Transformation instance Debugger Tips Next Instance – proceeds a single step at a time; one row moves from transformation to transformation Step to Instance – examines one transformation at a time, following successive rows through the same transformation
  • 53. Copyright ©2004  You can change the partitioning method at each partition point to redistribute data between stage threads more efficiently  All methods but pass-through come at the cost of some performance  Round Robin  Distributes data evenly between stage threads  Use in the transformation stage, when reading from unevenly partitioned sources  Hash Key  Keeps data belonging to the same group in the same partition so the data is aggregated or sorted properly  Use with Aggregator, Sorter, Joiner and Rank transformations  Hash auto keys  Hash keys generated by the server engine, based on ‘groups by’ and ‘order by’ ports in transformations  Hash user keys  Define the ports you want to group by Partition Methods
  • 55. Copyright ©2004  Collect base performance data  Establish reference points for your particular system - Your goal is to measure optimal I/O performance on your system - Create pass through mappings for each main source/target combination - Make notes of the read and write throughput counters in the session statistics - Time these sessions and compute Mb/hour or Gb/hour numbers - Do this for various combinations of file and relational sources and targets - Try and have the system to yourself when you run your benchmarks  Collect performance data for your existing mappings  Before tuning them - Collect read and write throughput data - Collect Mb/hour ot Gb/hour data  Identify and remove the bottlenecks in your mappings  Keep notes of what you do and how it affects the performance  Go after one problem at a time and re-check performance after each change  If a fix does not provide speed improvement, revert to your previous configuration Informatica Tuning 101
  • 56. Copyright ©2004 Collecting Reference Data  Use a pass-through mapping  a source definition  a source qualifier  a target definition  No transformations  no transformation thread  best possible engine performance for this source and target combination
  • 57. Copyright ©2004 1-Writing to a slow target ? 2-Reading from a slow source ? 3-Transformation inefficiencies ? 4-Session inefficiencies ? 5-System not optimized ? Identifying Bottlenecks
  • 58. Copyright ©2004 Target Bottleneck Change session’s writer to a file write
  • 59. Copyright ©2004 Target Bottleneck  Common sources of problems  Indexes or key constraints  Database commit points too high or too low  Common Solutions  Drop indexes and key constraints before loading, rebuild after loading  Use bulk loading or external loaders when practical  Experiment with the frequency of database commit points
  • 61. Copyright ©2004  Common sources of problems  Inefficient SQL query  Table partitioning does not fit the query  Common Solutions  analyze the query issued by the Source Qualifier. It appears in the session log. Most SQL interpreter tools allow you to view an execution plan for your query.  consider using database optimizer hints to make sure correct indexes are used  consider indexing tables when you have order by or group by clauses  try database parallel queries if supported  try partitioning the session if appropriate  If you have table partitioning, make sure your query does not pull data across partition lines  If you have a query filter on non-indexed columns, try moving the filter outside of the query, into a Filter Transformation Source Bottleneck
  • 63. Copyright ©2004  Common sources of problems  too many transforms  unused links between ports  too many input/output or outputs ports connected out of aggregator, ranking, lookup transformations  unnecessary data-type conversions  Common solutions  eliminate transformation errors  if several mappings read from the same source, try single pass reading  optimize datatypes, use integers for comparisons.  don’t convert back and forth between datatypes  optimize lookups and lookup tables, using cache and indexing tables  put your filters early in the data flow, use a simple filter condition  for aggregators, use sorted input, integer columns to group by and simplify expressions  if you use reusable sequence generators, increase number of cached values  if you use the same logic in different data streams, apply it before the streams branch off  optimize expressions: - isolate slow and complex expressions - reduce or simplify aggregate functions - use local variables to encapsulate repeated computations - integer computations are faster than character computations - use operators rather that the equivalent function, ‘||’ faster than CONCAT(). Mapping Bottleneck
  • 65. Copyright ©2004  Common sources of problems  inappropriate memory allocation settings  under-utilized or over-utilized resources (CPU and RAM)  error tracing override set to high level  Common solutions  experiment with DTM buffer pool and buffer block size - As good starting point is 25MB for DTM buffer and 64K for buffer block size  make sure to keep data caches and indexes in memory - Avoid paging to disk, but be aware of your RAM limits  run sessions in parallel, in parallel workflow execution paths, whenever possible - Here also, be cautious not to hit your glass ceiling  if your mapping allows it, use partitioning  experiment with database commit interval  turn off decimal arithmetic (it is off by default)  use debugger rather than high error tracing,reduce your tracing level for production runs - Create a reusable session configuration object to store tracing level and block buffer size  don’t stage your data if you can avoid it, read directly from original sources  look at the performance of your session components (run each separately) Session Bottleneck
  • 67. Copyright ©2004  Common sources of problems  slow network connections  overloaded or under-powered servers  slow disk performance  Common Solutions  get the best machines to run your server. Better yet, use several servers against the same repository (power center only).  use multiple CPUs and session partitioning  make sure you have good network connections between Informatica server and database servers  Locate the Repository database on the Informatica server machine  shutdown unneeded processes or network services on your servers  use 7 bit ASCII data movement (the default) if you don’t need Unicode  evaluate hard disk performance, try locating sources and targets on different drives  Use different drives for transformation caches, if they don’t fit in memory  get as much RAM as you can for your servers System Bottleneck
  • 68. Copyright ©2004  View Session statistics through the Workflow Monitor Select Transformation Statistics tab Number of input rows for each source file Number of output rows for the relational target table. Load is spread evenly across partitions Output rows sent to the flat file top 10 target, confined to one partition Using Statistics Counters These numbers are available in real- time, they are updated every few seconds.
  • 69. Copyright ©2004  Turning it on  In the Workflow Manager, edit session  Collecting Performance data requires an additional 200K of memory per session 1 - Select Properties tab 2 - Select Performance section 3 - Check ‘Collect Performance Data’ box for a test run to see how your partitioning strategy is performing Using Performance Counters
  • 70. Copyright ©2004  Monitor Session performance through the Workflow Monitor Select Performance tab, only visible while the session is running and until you close the window. These numbers are saved in the ‘.perf’ file Input rows and output rows counters for each transformation Read from disk/cache, Write to disk/cache counters for ranks, aggregators and joiners Using Performance Counters Error rows counters for each transformation
  • 71. Copyright ©2004  How to use the counters  Input & output rows to verify  data integrity  Rows repartition at a partition point  Error rows  Did you expect this transformation to reject rows due to error ?  Read/Write to disk  If the counters have non-zero values, your transformation is paging to disk  Read/Write to cache  Use in conjunction with read/write to disk to estimate the size of the cache needed to hold everything within RAM  New group key  Aggregator and ranker  Number of groups created  Does this number seem right ? If not, your grouping condition may be wrong  Old group key  Aggregator and ranker  Number of times a group was reused  Rows in Lookup Cache  Lookup only  Use to estimate the total cache size Using Performance Counters
  • 72. Copyright ©2004  Using Session log’s Run Info  Only available when the session is finished  One entry per stage per partition  Counters:  Run time, total run time for the thread  Idle time, total time the thread spent doing nothing (included in total run time)  Busy percentage, a function of the two counters abover  Replaces V5 buffer efficiency counters Scroll down to the Run Info section Using Run Info Counters
  • 73. Copyright ©2004 Reader Transform Writer High % Low % Low % Low % High % Low % Low % Low % High % Using Run Info Counters  Run Info Busy Percentage  You need to compare the values for each stage to properly evaluate where the bottleneck may be  You want to look for a high value (busy) that stands out. This indicates a problem area.  High values across the board are indicators of an optimized session Bottlenecks in red
  • 74. Copyright ©2004 Review Quiz 1. What is a benefit of buffered processing stages ? a) Safety net against network errors b) Lower memory requirements c) Overlapping data processing 2. How do you identify a target bottleneck ? a) By changing the output of the session to point to a flat file instead of a relational target b) By reading the Run-Info section of the session log and looking for a low busy percentage at the writer stage c) By replacing the mapping with a pass-through mapping connected to the same target 3. The ‘Collect Performance Data’ option is enabled by default ? a) No, never b) Yes, always c) No, unless you run a debugging session
  • 75. Copyright ©2004 4. You have a shared session memory set to 25MB and a buffer block size set to 64K. How many rows of data can the server move to memory in a single operation ? a) 40,000 rows if average row size is 655 bytes b) 100 rows if average row size is 655 bytes c) 2,500 rows if average row size is 64k 5. The Aggregator Transformation’s ‘Write To Cache’ tells the number of rows written to the disk cache ? a) TRUE. b) FALSE Review Quiz

Editor's Notes

  • #21: The Decision task itself does not stop the flow; it merely sets a flag that can then be tested by a link condition. Only a link condition (or a Control task) can halt the flow.
  • #33: To use the reusable option, you need to create a reusable scheduler object first. Note that, when a scheduled workflow fails, it is taken out of its schedule by design. You will have to correct the error(s) and manually re-schedule the workflow. Schedule changes kick in as soon as you save your workflow. When a workflow is scheduled, you can see a blue line in the workflow monitor’s Gantt chart view set at the workflow’s next scheduled start time. You can also look at the workflow properties. They should have a ‘scheduled’ run type and tell you the next schedule run time
  • #34: Beware of the ‘ Run on server init’ default choice. If your server has to be restarted, all the workflows that use that option will start. The next run will then be according to the given schedule. Also, be careful to set the schedule option to ‘Run on demand’ when you backup a workflow to a new folder. This will prevent the archived workflow(s) from being run at server start up. When using an external scheduler, set all workflows to run on demand and let the external scheduler start your workflows.
  • #36: Q: How to make your workflow run on the first BUSINESS day of the month ? A: in the scheduler, select repeat every month and select 1st, 2cd, 3th day of the month. Then, in the workflow manager, set a link condition between your start task and the first workflow task (if you have multiple branches out of the start task, you need some mods). The link condition would check: If this is the first of the month, wkf can run if this is a week day If this is the second or the third, wkf can run only if this is a Monday: (TO_CHAR(WORKFLOWSTARTTIME, ‘DD’) = ‘01’ AND TO_CHAR(WORKFLOWSTARTTIME, ‘D’) != ‘1’ AND TO_CHAR(WORKFLOWSTARTTIME, ‘D’) != ‘7’) OR (TO_CHAR(WORKFLOWSTARTTIME, ‘DD’) != ‘02’ AND TO_CHAR(WORKFLOWSTARTTIME, ‘D’) = ‘2’) OR (TO_CHAR(WORKFLOWSTARTTIME, ‘DD’) != ‘03’ AND TO_CHAR(WORKFLOWSTARTTIME, ‘D’) = ‘2’)
  • #50: Start the Debugger. Choose an existing session or define a one-time debug session. Monitor the Debugger. While you run the Debugger, the Designer displays the following windows: Debug log. View messages from the Debugger. Session log. View session log. Target window. View target data. Instance window. View transformation data. 4. Move through the mapping session: Next Instance – The Debugger continues running until it reaches the next transformation or until it encounters a break. Step to Instance – The Debugger continues running until it reaches the selected transformation in the mapping or until it encounters a break. Show current instance – The Debugger shows the current instance in the Instance window. Continue – The Debugger continues running until it encounters the next break. Break now – The Debugger pauses wherever it is currently processing. 5. Modify data and breakpoints. When the Debugger pauses, you can modify data and see the effect on transformations and targets as the data moves through the pipeline. You can also modify breakpoint information.
  • #52: Additional notes: When debugging a mapplet within a mapping the debugger will expand the mapplet into it’s individual transformations.
  • #53: The overall goal in choosing a partition method is to have each partition handle roughly the same amount of data. Explain how round-robin distributes data. Q: what is a hash function ? A: a function that takes a key as input (a string) and returns a positive integer within a given range. The same input always returns the same integer. Do on-screen demos showing how to select keys and ranges for a key range partition point and hash keys for a hash user key method. For shops using DB2 targets, a new method is available at the target: ‘database partitioning’. It will read table partitioning information from the system table and channel data to each table partition accordingly.
  • #55: To compute MB/hour, use the average width of your source rows and the number of rows extracted.
  • #59: Warning: switching to a flat file writer will always improve performance. You have to compare the numbers with your baseline. Also, this is valid for Inserts only, you can’t update or delete on a flat file target
  • #60: On Oracle for star schema DW applications: Drop Index before loading, then Rebuild indexes with NO LOGGING + GATHER STATISTICS (analyze indexes as they are being built) Run DBMS stats without the CASCADING option (since indexes are already analyzed) Also helps to enable parallelism on the tables and tablespaces
  • #61: The goal here is to get an idea of the time it takes to just get the rows from the source, to isolate the reader stage. Can also look at the session log’s run info section (covered later). You may improve performance with flat file sources with using fixed length format instead of delimited (takes more resource to parse a delimited file). Also, if you row width is higher than the default 1K, set it to a number that is close to the actual row width. You can generate the SQL in the source qualifier and export it to another tool (toad or sql-plus or query analyzer) You can also time the query in the debugger or from the log files.
  • #63: Counters to look for: RowsInLookupCache -> to optimize the largest lookup first ErrorRows -> of course, all errors should be removed from the mapping Input and Output rows -> if partitioning, to ensure data is well balanced between partitions Also, look at Run Info section in session log
  • #64: Single pass reading: instead of reading the same sources multiple times in multiple mappings, have a single mapping do the job and point to multiple targets If you use the same lookup more than once in a workflow, make it persistent. Set the first session that uses the lookup to refresh it from the database. Use log file of performance counters to find your most expensive lookup and work on optimizing the lookup query or lookup condition on that lookup first. Log file will mark the time when it get its first row back from the lookup query and mark the time when it finishes the building of the cache
  • #73: New group key/Old group key: for instance an AGG without sorted input, new group key is incremented every time the first row of a new group comes in. When another row comes in that belongs to an existing group, the old group key is incremented. Performance numbers are saved in a file after the session run. Look in the session log directory for files named <session_name>.perf. If you specify a session log directory for the session, the performance file will be written there.
  • #76: The ideal mapping would be busy 100% for all 3 stages, meaning the reader, transformation and writer stages never had to wait for each other. The data just kept moving through.
  • #77: 1-c or use the run-info and look for a high busy % at the writer and low busy % at the transformation stage 2-a 3-a
  • #78: 4-b the 25MB do not enter in this computation 5-b it is the WriteToDisk setting that indicates the number of rows written to the disk