0% found this document useful (0 votes)
70 views

Tuning Rac

LMON monitors the entire cluster to manage global enqueues and resources. It handles instance and process expirations and recovery for the Global Cache Service. The LMD process manages enqueue requests and deadlock detection for the Global Cache Service. The LMSn processes handle Global Cache Service messaging and tasks like resource requests and lock validation. Together these processes coordinate access to shared resources across RAC instances.

Uploaded by

Maliha Khan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
70 views

Tuning Rac

LMON monitors the entire cluster to manage global enqueues and resources. It handles instance and process expirations and recovery for the Global Cache Service. The LMD process manages enqueue requests and deadlock detection for the Global Cache Service. The LMSn processes handle Global Cache Service messaging and tasks like resource requests and lock validation. Together these processes coordinate access to shared resources across RAC instances.

Uploaded by

Maliha Khan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 6

====================== RAC Instances and Processes

====================================

LMON The Global Enqueue Service Monitor (LMON) monitors the entire cluster
tomanage global enqueues and resources. LMON manages instance and process
expirations
and the associated recovery for the Global Cache Service.

LMD The Global Enqueue Service Daemon (LMD) is the lock agent process that
manages enqueue manager service requests for Global Cache Service enqueues to
control access to global enqueues and resources. The LMD process also
handles deadlock detection and remote enqueue requests.

LMSn These Global Cache Service processes (LMSn) are processes for the
Global Cache Service (GCS). RAC software provides for up to ten Global Cache
Service
processes. The number of LMSn varies depending on the amount of
messaging traffic among nodes in the cluster. The LMSn processes do these things:

Handle blocking interrupts from the remote instance for Global Cache
Service resources.
Manage resource requests and cross-instance call operations for shared
resources.
Build a list of invalid lock elements and validate lock elements during
recovery.
Handle global lock deadlock detection and monitor lock conversion
timeouts.

LCK process Manages global enqueue requests and cross-instance broadcast.


DIAG The Diagnosability Daemon monitors the health of the instance. It captures
data for instance process failures.

======================= Global Cache Resources (GCS) and Global Enqueue Services


(GES) ====================

The key role is played by GCS and GES (which are basically RAC processes). GCS
ensures a single system image of the data even though the data is accessed by
multiple instances. The GCS and GES
are integrated components of Real Application Clusters that coordinate simultaneous
access to the shared database and to shared resources within the database and
database cache. GES and GCS
together maintain a Global Resource Directory (GRD) to record information about
resources and enqueues. GRD remains in memory and is stored on all the instances.
Each instance manages a
portion of the directory. This distributed nature is a key point for fault
tolerance of the RAC. Coordination of concurrent tasks within a shared cache server
is called synchronization.

Synchronization uses the private interconnect and heavy message transfers. The
following types f resources require synchronization: data blocks and enqueues. GCS
maintains the modes for
blocks in the global role and is responsible for block transfers between the
instances. LMS processes handle the GCS messages and do the bulk of the GCS
processing.

An enqueue is a shared memory structure that serializes access to database


resources. It can be local or global. Oracle uses enqueues in three modes 1) Null
(N) mode, 2) Share (S) mode,
and 3) Exclusive (X) mode. Blocks are the primary structures for reading and
writing into and out of buffers. It is often the most requested resource.

GES maintains or handles the synchronization of the dictionary cache, library


cache, transaction locks, and DDL locks. In other words, GES manages enqueues other
than data blocks.
To synchronize access to the data dictionary cache, latches are used in exclusive
mode and in single-node cluster databases. Global enqueues are used in cluster
database mode.

===================== Cache Fusion and Resource Coordination


=======================

Since each node in Real Application Cluster has its own memory (cache) that is not
shared with other nodes, RAC must coordinate the buffer caches of different nodes
while minimizing
additional disk I/O that could reduce performance. Cache Fusion is the technology
that uses high-speed interconnects to provide cache-to-cache transfers of data
blocks between instances in
a cluster. Cache Fusion functionality allows direct memory writes of dirty blocks
to alleviate the need to force a disk write and re-read (or ping) the committed
blocks. However, this is not to say
that disk writes do not occur. Disk writes are still required for cache replacement
and when a checkpoint occurs. Cache Fusion addresses the issues involved in
concurrency between instances:
concurrent reads on multiple nodes, concurrent reads and writes on different nodes,
and concurrent writes on different nodes.

Resource mode The modes are null, shared, and exclusive. The block can be held
in different modes, depending on whether a resource holder intends to modify data
or merely read them.

Resource role The roles are locally managed and globally managed. Global
Resource Directory (GRD) is not a database. It is a collection of internal
structures and
is used to find the current status of the data blocks. Whenever a block is
transferred out of a local cache to another instance’s cache, GRD is updated. The
following information about a resource
is available in GRD:

Data Block Identifiers (DBA)


Location of most current versions
Modes of the data blocks (N, S, X)
The roles of the blocks (local or global)

=============================== Interconnect Traffic – Sessions Waiting


==================================
Wait Wait Description

global cache busy A wait event that occurs whenever a session has to
wait for an
ongoing operation on the resource to complete.
gc buffer busy A wait event that is signaled when a process has to wait
for a
block to become available because another process is obtaining
a resource for this block.
buffer busy global CR Waits on a consistent read (block needed for reading) via
the
global cache.

The top global cache (gc) waits to look out for include

gc current block busy Happens when an instance requests a CURR data block (wants
to do some DML) and the block to be transferred is in use.

gc buffer busy A wait event that occurs whenever a session has to wait for an
ongoing operation on the resource to complete because the block is in use. The
process has to
wait for a block to become available because another process is obtaining a
resource or this block.

gc cr request This happens when one instance is waiting for blocks from another
instance’s cache (sent via the interconnect). This wait says that the current
instance can’t
find a consistent read (CR) version of a block in the local cache. If the block is
not in the remote cache, then a db file sequential read wait will also follow this
one. Tune the SQL
that is causing large amounts of reads that get moved from node to node. Try to put
users that re using the same blocks on the same instance so that blocks are not
moved
from instance to instance. Some non-Oracle application servers will move the same
process from node to node looking for the fastest node (unaware that they are
moving
the same blocks from node to node). Pin these long processes to the same node.
Potentially increase the size of the local cache if slow I/O combined with a small
cache
is the problem. Monitor V$CR_BLOCK_SERVER to see if there is an issue like reading
UNDO segmens. Correlated to the waits the values for P1,P2,P3=file, block, lenum
(look in V$LOCK_ELEMENT for the row where lock_element_addr has the same valueas
lenum). Happens when an instance requests a CR data block and the block to be
transferred hasn’t arrived at the requesting instance. This is the one I see the
most, and
it’s usually because the SQL is poorly tuned and many index blocks are being moved
back and forth between instances.

SELECT inst_id, event, p1 FILE_NUMBER, p2 BLOCK_NUMBER, WAIT_TIME


FROM gv$session_wait
WHERE event IN ('buffer busy global cr', 'global cache busy',
'buffer busy global cache');

The output from this query should look something like this:
INST_ID EVENT FILE_NUMBER BLOCK_NUMBER WAIT_TIME
------- -------------------------------- --- ----------- ------------ ----------

1 global cache busy 9 150 15


2 global cache busy 9 150 10

Run this query to identify objects that are causing contention for these sessions
and identifying the object that corresponds to the file and block for each
file_number/block_number
combination returned (this query is a bit slower):

SELECT owner, segment_name, segment_type


FROM dba_extents
WHERE file_id = 9
AND 150 BETWEEN block_id AND block_id+blocks-1;

The output will be similar to


OWNER SEGMENT_NAME SEGMENT_TYPE
--------- - ---------------------------- ---------------
SYSTEM MOD_TEST_IND INDEX

==================================== GES Lock Blockers and Waiters


==================================

Sessions that are holding global locks that persistently block others can be
problematic to a RAC implementation and are in many instances associated with
application design. Sessions waiting
on a lock to release hang and are required to poll the blocked object to determine
the status. Large numbers of sessions holding global locks will create substantial
interconnect traffic and
inhibit performance.

-- GES LOCK BLOCKERS:


--INSTANCE_ID The instance on which a blocking session resides
--SID Unique identifier for the session
--GRANT_LEVEL Lists how GES lock is granted to user associated w/ blocking session
--REQUEST_LEVEL Lists the status the session is attempting to obtain
--LOCK_STATE Lists current status the lock has obtained
--SEC Lists how long this session has waited

SET numwidth 10
COLUMN LOCK_STATE FORMAT a16 tru;
COLUMN EVENT FORMAT a30 tru;
SELECT dl.inst_id INSTANCE_ID, s.sid SID ,p.spid SPID,
dl.resource_name1 RESOURCE_NAME,
decode(substr(dl.grant_level,1,8),'KJUSERNL','Null','KJUSERCR','Row-S (SS)',
'KJUSERCW','Row-X (SX)','KJUSERPR','Share','KJUSERPW','S/Row-X (SSX)',
'KJUSEREX','Exclusive',request_level) AS GRANT_LEVEL,
decode(substr(dl.request_level,1,8),'KJUSERNL','Null','KJUSERCR','Row-S (SS)',
'KJUSERCW','Row-X (SX)','KJUSERPR','Share','KJUSERPW','S/Row-X (SSX)',
'KJUSEREX','Exclusive',request_level) AS REQUEST_LEVEL,
decode(substr(dl.state,1,8),'KJUSERGR','Granted','KJUSEROP','Opening',
'KJUSERCA','Canceling','KJUSERCV','Converting') AS LOCK_STATE,
s.sid, sw.event EVENT, sw.seconds_in_wait SEC
FROM gv$ges_enqueue dl, gv$process p, gv$session s, gv$session_wait sw
WHERE blocker = 1
AND (dl.inst_id = p.inst_id and dl.pid = p.spid)
AND (p.inst_id = s.inst_id and p.addr = s.paddr)
AND (s.inst_id = sw.inst_id and s.sid = sw.sid)
ORDER BY sw.seconds_in_wait DESC;

GES LOCK WAITERS:

--INSTANCE_ID The instance on which a blocking session resides


--SID Unique identifier for the session
--GRANT_LEVEL Lists how GES lock is granted to user associated w/ blocking session
--REQUEST_LEVEL Lists the status the session is attempting to obtain
--LOCK_STATE Lists current status the lock has obtained
--SEC Lists how long this session has waited

SET numwidth 10
COLUMN LOCK_STATE FORMAT a16 tru;
COLUMN EVENT FORMAT a30 tru;
SELECT dl.inst_id INSTANCE_ID, s.sid SID, p.spid SPID,
dl.resource_name1 RESOURCE_NAME,
decode(substr(dl.grant_level,1,8),'KJUSERNL','Null','KJUSERCR','Row-S (SS)',
'KJUSERCW','Row-X (SX)','KJUSERPR','Share','KJUSERPW','S/Row-X (SSX)',
'KJUSEREX','Exclusive',request_level) AS GRANT_LEVEL,
decode(substr(dl.request_level,1,8),'KJUSERNL','Null','KJUSERCR','Row-S (SS)',
'KJUSERCW','Row-X (SX)','KJUSERPR','Share','KJUSERPW','S/Row-X (SSX)',
'KJUSEREX','Exclusive',request_level) AS REQUEST_LEVEL,
decode(substr(dl.state,1,8),'KJUSERGR','Granted','KJUSEROP','Opening',
'KJUSERCA','Canceling','KJUSERCV','Converting') AS LOCK_STATE,
s.sid,sw.event EVENT, sw.seconds_in_wait SEC
FROM gv$ges_enqueue dl, gv$process p,gv$session s,gv$session_wait sw
WHERE blocked = 1
AND (dl.inst_id = p.inst_id and dl.pid = p.spid)
AND (p.inst_id = s.inst_id and p.addr = s.paddr)
AND (s.inst_id = sw.inst_id and s.sid = sw.sid)
ORDER BY sw.seconds_in_wait DESC;

=================================== Fusion Reads and Writes


===============================

Fusion writes occur when a block previously changed by another instance needs to be
written to disk in response to a checkpoint or cache aging

Here is a query to determine ratio of Cache Fusion Writes:

SELECT A.inst_id "Instance",


A.VALUE/B.VALUE "Cache Fusion Writes Ratio"
FROM GV$SYSSTAT A, GV$SYSSTAT B
WHERE A.name='DBWR fusion writes'
AND B.name='physical writes'
AND B.inst_id=a.inst_id
ORDER BY A.INST_ID;
A high large value for Cache Fusion Writes ratio may indicate

Insufficiently large caches


Insufficient checkpoints
Large numbers of buffers written due to cache replacement or checkpointing

Oracle recommends that the average latency of a consistent block request typically
should not exceed 15 milliseconds, depending on the system configuration and
volume. When you are
sending many blocks across the interconnect, this is really too high (especially
since going to disk is this fast usually). For a high-volume system, it should be
in the single-digit millisecond-to-microsecond
range. The average latency of a consistent block request is the average latency of
a consistent read request round-trip from the requesting instance to the holding
instance and back to the
requesting instance.

set numwidth 20
column "AVG CR BLOCK RECEIVE TIME (ms)" format 9999999.9
select b1.inst_id, b2.value "GCS CR BLOCKS RECEIVED",
b1.value "GCS CR BLOCK RECEIVE TIME",
((b1.value / b2.value) * 10) "AVG CR BLOCK RECEIVE TIME (ms)"
from gv$sysstat b1, gv$sysstat b2
where b1.name = 'gc cr block receive time'
and b2.name = 'gc cr blocks received'
and b1.inst_id = b2.inst_id;

You might also like