Clusterware Stack Management
Clusterware Stack Management
Management
and Troubleshooting
by Syed Jaffar Hussain, Kai Yu
In Chapter 1, we mentioned that the Oracle RAC cluster database environment requires
cluster manager software
(“Clusterware”) that is tightly integrated with the operating system (OS) to provide the
cluster management functions
that enable the Oracle database in the cluster environment.
Oracle Clusterware was originally introduced in Oracle 9i on Linux with the original name
Oracle Clusterware
Management Service. Cluster Ready Service (CRS) as a generic cluster manager was
introduced in Oracle 10.1 for all
platforms and was renamed to today’s name, Oracle Clusterware, in Oracle 10.2. Since
Oracle 10g, Oracle Clusterware
has been the required component for Oracle RAC. On Linux and Windows systems, Oracle
Clusterware is the only
clusterware we need to run Oracle RAC, while on Unix, Oracle Clusterware can be
combined with third-party
clusterware such as Sun Cluster and Veritas Cluster Manager.
Oracle Clusterware combines a group of servers into a cluster environment by enabling
communication between
the servers so that they work together as a single logical server. Oracle Clusterware serves
as the foundation of the
Oracle RAC database by managing its resources. These resources include Oracle ASM
instances, database instances,
Oracle databases, virtual IPs (VIPs), the Single Client Access Name (SCAN), SCAN listeners,
Oracle Notification
Service (ONS), and the Oracle Net listener. Oracle Clusterware is responsible for startup
and failover for the resources.
Because Oracle Clusterware plays such a key role in the high availability and scalability of
the RAC database,
the system administrator and the database administrator should pay careful attention to
its configuration and
management.
This chapter describes the architecture and complex technical stack of Oracle Clusterware
and explains how
those components work. The chapter also describes configuration best practices and
explains how to manage and
troubleshoot the clusterware stack. The chapter assumes the latest version of Oracle
Clusterware 12cR1.
The following topics will be covered in this chapter:
Oracle Clusterware 12cR1 and its components
•
Clusterware startup sequence
•
Clusterware management
•
Troubleshooting cluster stack startup failure
•
CRS logs and directory structure
•
RACcheck, diagcollection.sh, and oratop
•
Debugging and tracing CRS components
•
RAC database hang analysis
• CHAPTER 2 ■ CLUSTERWARE STACK MANAGEMENT AND TROUBLESHOOTING
33
Cluster Time Synchronization Service (CTSS): A new daemon process introduced with
11gR2, which handles the
time synchronization among all the nodes in the cluster. You can use the OS’s Network
Time Protocol (NTP) service to
synchronize the time. Or, if you disable NTP service, CTSS will provide the time
synchronization service. This service
runs as the octssd.bin process on Linux/Unix or octssd.exe on Windows.
Event Management (EVM): This background process publishes events to all the members of
the cluster. On
Linux/Unix, the process name is evmd.bin, and on Windows, it is evmd.exe.
ONS: This is the publish and subscribe service that communicates Fast Application
Notification (FAN) events.
This service is the ons process on Linux/Unix and ons.exe on Windows.
Oracle ASM: Provides the volume manager and shared storage management for Oracle
Clusterware and Oracle
Database.
Clusterware agent processes: Oracle Agent (oraagent) and Oracle Root Agent
(orarootagent). The oraagent
agent is responsible for managing all Oracle-owned ohasd resources. The orarootagent is
the agent responsible for
managing all root-owned ohasd resources.
Listener
DB Resource
ASM
instance
SCAN
Listener
Services
GSD
GPNPD
mDNSD
GIPCD
OHASD
oraagent
OHASD
oraclerootagent
Diskgroup
SCANIP
cssdagent
cssdmonitor
CRSD
Diskmon
CTSSD
CRSD
orarootagent
EVMD
ONS
eONS
GNS
CRSD
oraagent
Level 0 Level 1 Level 2 Level 3 Level 4
GPNPD
Process on the High Availability Stack
CTSSD
Process on the Cluster Ready Service Stack
CSSD
Services
Resource managed by Cluster Ready Service
34
Level 0: The OS automatically starts Clusterware through the OS’s init process. The init
process spawns only
one init.ohasd, which in turn starts the OHASD process. This is configured in the
/etc/inittab file:
$cat /etc/inittab|grep init.d | grep –v grep
h1:35:respawn:/etc/init.d/init.ohasd run >/dev/null 2>&1 </dev/null
Oracle Linux 6.x and Red Hat Linux 6.x have deprecated inittab. init.ohasd is configured
in startup
in /etc/init/oracle-ohasd.conf:
$ cat /etc/init/oracle-ohasd.conf
......
start on runlevel [35]
stop on runlevel [!35]
respawn
exec /etc/init.d/init.ohasd run >/dev/null 2>&1 </dev/null
This starts up "init.ohasd run", which in turn starts up the ohasd.bin background process:
$ ps -ef | grep ohasd | grep -v grep
root 4056 1 1 Feb19 ? 01:54:34 /u01/app/12.1.0/grid/bin/ohasd.bin reboot
root 22715 1 0 Feb19 ? 00:00:00 /bin/sh /etc/init.d/init.ohasd run
Once OHASD is started on Level 0, OHASD i