0% found this document useful (0 votes)
7 views

Grid Management

The document outlines the configuration and management of grid systems and load balancing in PowerCenter Integration Services. It details how to create grids, assign nodes, configure integration services to run on grids, and manage resources and dispatch modes for task execution. Additionally, it explains the role of the Load Balancer in distributing tasks based on resource availability and service levels to optimize performance and scalability.

Uploaded by

premkumarb17
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Grid Management

The document outlines the configuration and management of grid systems and load balancing in PowerCenter Integration Services. It details how to create grids, assign nodes, configure integration services to run on grids, and manage resources and dispatch modes for task execution. Additionally, it explains the role of the Load Balancer in distributing tasks based on resource availability and service levels to optimize performance and scalability.

Uploaded by

premkumarb17
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 19

Grid Management/Load Balancer

Confidential
© 2009 VMware Inc. All rights reserved
Grid Management

Grid
• Grid is defined as an alias assigned to a group of nodes that run sessions and
workflows.
• When a workflow is run on a grid, scalability and performance is improved by
distributing Session and Command tasks to service processes running on
nodes in the grid.

Grid Management
• Create a grid and assign nodes to the grid.
• Configure the PowerCenter Integration Service to run on a grid.
• Assign resources to nodes.

2 Confidential
Create a grid and assign nodes to the grid

• In the Administrator tool, select Create > Grid.


The Create Grid window appears.

• Edit the following properties:

3 Confidential
Configure Integration service to run on a grid

Power center integration service can be configured to run on a grid in the


following way.
• In the Administrator tool, select the PowerCenter Integration Service Properties tab.
• Edit the grid and node assignments, and select Grid.
• Select the grid you want to assign to the PowerCenter Integration Service.
Note: This property can be configured when integration service is created also

Things to be taken care of when the integration service is configured to run


on a grid.
• When a session or a workflow is run on a grid, a service process runs on each node in
the grid.
Example : Each informatica session run can be seen as pmdtm process on the server.

• Each service process running on a node must be compatible or configured the same.
• Service processes must also have access to the directories and input files used by the
PowerCenter Integration Service.

4 Confidential
Configure Integration service to run on a grid

Example : Workflow has two sessions


Session-1: Reads the data from the Oracle table and loads the data into a flat file
Session-2: Reads the data from the flat file generated from session-1 and would load
the data into MySQL database.

Shared Storage
• A shared storage needs to be setup between the two nodes in a grid
• Verify the shared storage location. Verify that the shared storage location is
accessible to each node in the grid.
• Configure $PMRootDir to the shared location on each node in the grid.
• Configure service process variables with identical absolute paths to the shared
directories on each node in the grid.
• If the PowerCenter Integration Service uses operating system profiles, the
operating system user must have access to the shared storage location.

5 Confidential
Configure Integration service to run on a grid

Process to configure the service processes to access the shared


storage
• Select the PowerCenter Integration Service in the Navigator.
• Click the Processes tab.This tab displays the service process for each node
assigned to the grid.
• Configure $PMRootDir to point to the shared location.
• Configure the following service process settings for each node in the grid:
 Code pages. For accurate data movement and transformation, verify that the code
pages are compatible, for each service process. Use the same code page for each
node where possible.
 Service process variables. Configure the service process variables the same for each
service process.
­ Example, the setting for $PMTgtDir, $PMSrcdir , $PMCacheDir must be identical
on each node in the grid.

6 Confidential
Configure resources

• Informatica resources are the database connections, files, directories, node


names, and operating system types required by a task.
• PowerCenter Integration Service can be configured to check resources. by
unchecking the property ignoreresourcerequirements in the advanced
properties tab of integration service.
• When integration service does not ignore resource requirements the
Load Balancer matches the resources available to nodes in the grid with
the resources required by the workflow.
• Integration Service dispatches tasks in the workflow to nodes where the
required resources are available.
• If the PowerCenter Integration Service is not configured to run on a grid, the
Load Balancer ignores resource requirements.

7 Confidential
Configure Resources

8 Confidential
Configure Resources

Example :
• If a multi node grid ( Node_1 and Node_2) contain a session parameter file
called sales1.txt and it exists on node_1
• Create a file resource for it named sessionparamfile_sales1 on Node_1.
• A workflow developer creates a session that uses the parameter file and
assigns the sessionparamfile_sales1 file resource to the session.
• When the PowerCenter Integration Service runs the workflow on the grid, the
Load Balancer distributes the session assigned the sessionparamfile_sales1
resource to nodes that have the resource defined.

9 Confidential
Configure Resources

10 Confidential
Load Balancer

• The Load Balancer is a component of the PowerCenter Integration Service that


dispatches tasks to PowerCenter Integration Service processes running on
nodes in a grid.
• Load Balancer matches task requirements with resource availability to identify
the best PowerCenter Integration Service process to run a task. It can dispatch
tasks on a single node or across nodes.
Following properties of the domain would determine how load balancer
dispatches the tasks.
 Dispatch mode: The dispatch mode determines how the Load Balancer dispatches
tasks.
 Service level: Service levels establish dispatch priority among tasks that are waiting to
be dispatched. You can create different service levels that a workflow developer can
assign to workflows.

11 Confidential
Resource Provision Threshold

Maximum Memory Percentage : The maximum percentage of virtual


memory allocated on the node relative to the total physical memory size.
Example :If the property threshold is set to 120% on a node, and virtual
memory usage on the node is above 120% of the physical memory , the
Load Balancer does not dispatch new tasks to the node.
The Load Balancer uses this threshold in metric-based and adaptive
dispatch modes.
Maximum processes: The maximum number of running processes
allowed for each PowerCenter Integration
Service process that runs on the node. This threshold specifies the
maximum number of running Session or
Command tasks allowed for each PowerCenter Integration Service
process that runs on the node

12 Confidential
Resource Provision Threshold

Calculate CPU Profile


• The CPU profile is an index of the processing power of a node compared to a
baseline system. The baseline system is a Pentium 2.4 GHz computer running
Windows 2000.
Example, if a SPARC 480 MHz computer is 0.28 times as fast as the baseline
computer, the CPU profile for the SPARC computer should be set to 0.28
• In adaptive dispatch mode, the Load Balancer uses the CPU profile to rank the
computing throughput of each CPU and bus architecture in a grid.
• This ensures that nodes with higher processing power get precedence for
dispatch.
• This value is not used in round-robin or metric-based dispatch modes

13 Confidential
Configure Dispatch Mode

• The Load Balancer uses the dispatch mode to select a node to run a task.
• Dispatch mode can be configured in the domain properties. Therefore, all PowerCenter
Integration Services in a domain use the same dispatch mode.
• Dispatch mode change in the domain, would need a bounce of integration service
( Domain restart is not needed)

Informatica Load Balancer uses the following dispatch modes


 Round-robin
 Metric-based
 Adaptive

14 Confidential
Configure Dispatch mode

Round Robin Dispatch mode :


• The Load Balancer dispatches tasks to available nodes in a round-robin
fashion.
• Load balancer checks the Maximum Processes threshold on each available
node and excludes a node if dispatching a task causes the threshold to be
exceeded.
• This mode is the least compute-intensive and is useful when the load on the
grid is even and the tasks to dispatch have similar computing requirements.
• This dispatch mode does not care about how the server is performing before
the job is dispatched all it cares is whether the property maximum number of
processes which can run on the node is met or not.

15 Confidential
Configure Dispatch mode

Metric based Dispatch mode


 The Load Balancer evaluates nodes in a round-robin fashion.
 It checks all resource provision thresholds on each available node and excludes a node if
dispatching a task causes the thresholds to be exceeded or if the node is out of free swap space
 The Load Balancer continues to evaluate nodes until it finds a node that can accept the task.
 To determine whether a task can run on a particular node, the Load Balancer collects and stores
statistics from the last three runs of the task. It compares these statistics with the resource
provision thresholds defined for the node.
 If no statistics exist in the repository, the Load Balancer uses the following default values:
­ 40 MB memory
­ 15% CPU
 The Load Balancer dispatches tasks for execution in the order the Workflow Manager or
scheduler submits them.
 The Load Balancer does not bypass any tasks in the dispatch queue. Therefore, if a resource
intensive task is first in the dispatch queue, all other tasks with the same service level must wait in
the queue until the Load Balancer dispatches the resource intensive.
 This mode prevents overloading nodes when tasks have uneven computing requirements.

16 Confidential
Configure Dispatch mode

Adaptive based Dispatch mode


• The Load Balancer ranks nodes according to current CPU availability.
• It checks all resource provision thresholds on each available node and
excludes a node if dispatching a task causes the thresholds to be exceeded.
• Load Balancer can use the CPU profile to rank nodes according to the amount
of computing resources on the node.
• To determine whether a task can run on a particular node, the Load Balancer
collects and stores statistics from the last three runs of the task. It compares
these statistics with the resource provision thresholds defined for the node
• If no statistics exist in the repository, the Load Balancer uses the following
default values:
 40 MB memory
 15% CPU

17 Confidential
Configure Dispatch mode
• The order in which the Load Balancer dispatches tasks from the dispatch queue
depends on the task requirements and dispatch priority.
Example:
 if multiple tasks with the same service level are waiting in the dispatch queue and adequate
computing resources are not available to run a resource intensive task.
 Load Balancer reserves a node for the resource intensive task and keeps dispatching less
intensive tasks to other nodes.

• Adaptive dispatch mode would use the following properties to determine on


how the node is used.
 Maximum CPU run queue length
 Maximum memory %
 Maximum processes
 CPU Profile

18 Confidential
Service levels

• Service levels establish priorities among tasks that are waiting to be


dispatched.
• When the Load Balancer has more tasks to dispatch than the PowerCenter
Integration Service can run at the time, the Load Balancer places those tasks
in the dispatch queue.
• When multiple tasks are waiting in the dispatch queue, the Load Balancer uses
service levels to determine the order in which to dispatch tasks from the
queue.
• When a service level is created in Informatica Administrator, a workflow
developer can assign it to a workflow in the Workflow Manager. All tasks in a
workflow have the same service level. The Load Balancer uses service levels
to dispatch tasks from the dispatch queue.
For example, you create two service levels:
 Service level “Low” has dispatch priority 10 and maximum dispatch wait time 7,200
seconds.
 Service level “High” has dispatch priority 2 and maximum dispatch wait time 1,800
seconds.

19 Confidential

You might also like