Recover Corrupt/Missing OCR With No Backup - (Oracle 10g) : Oracle DBA Tips Corner
Recover Corrupt/Missing OCR With No Backup - (Oracle 10g) : Oracle DBA Tips Corner
Return to the Oracle DBA Tips Corner.
Contents
1. Overview
2. Example Configuration
3. Recover Corrupt/Missing OCR
4. About the Author
Overview
It happens. Not very often, but it can happen. You are faced with a corrupt or missing Oracle
Cluster Registry (OCR) and have no backup to recover from. So, how can something like this
occur? We know that the CRSD process is responsible for creating backup copies of the OCR
every 4 hours from the master node in the CRS_home/cdata directory. These backups are meant
to be used to recover the OCR from a lost or corrupt OCR file using the ocrconfig -restore
command, so how is it possible to be in a situation where the OCR needs to be recovered and
you have no viable backup? Well, consider a scenario where you add a node to the cluster and
before the next backup (before 4 hours) you find the OCR has been corrupted. You may have
forgotten to create a logical export of the OCR before adding the new node or worse yet, the
logical export you took is also corrupt. In either case, you are left with a corrupt OCR and no
recent backup. Talk about a bad day! Another possible scenario could be a shell script that
wrongly deletes all available backups. Talk about an even worse day.
In the event the OCR is corrupt on one node and all options to recover it have failed, one safe
way to re-create the OCR (and consequently the voting disk) is to reinstall the Oracle
Clusterware software. In order to accomplish this, a complete outage is required for the entire
cluster throughout the duration of the re-install. The Oracle Clusterware software will need to be
fully removed, the OCR and voting disks reformatted, all virtual IP addresses (VIPs) de-
installed, and a complete reinstall of the Oracle Clusterware software will need to be performed.
It should also be noted that any patches that were applied to the original clusterware install will
need to be re-applied. As you can see, having a backup of the OCR and voting disk can
dramatically simplify the recovery of your system!
A second and much more efficient method used to re-create the OCR (and consequently the
voting disk as well) is to re-run the root.sh script from the primary node in the cluster. This is
described in Doc ID: 399482.1 on the My Oracle Support web site. In my opinion, this method is
quicker and much less intrusive than reinstalling Oracle Clusterware. Using root.sh to re-create
the OCR/Voting Disk is the focus of this article.
It is worth mentioning that only one of the two methods mentioned above needs to be performed
in order to recover from a lost or corrupt OCR. In addition to recovering the OCR, either method
could also be used to restore the SCLS directories from an accidental delete. These are internal
only directories which are created by root.sh and on the Linux platform are located at
/etc/oracle/scls_scr. If the SCLS directories are accidentally removed then they can only be
created using the same methods used to re-create the OCR which is the focus of this article.
There are two other critical files in Oracle Clusterware that if accidentally deleted, are a bit
easier to recover from:
Voting Disk
If there are multiple voting disks and one was accidentally deleted, then check if
there are any backups of this voting disk. If there are no backups then we can add
one using the crsctl add votedisk command.
If these files are accidentally deleted, then stop the Oracle Clusterware on that
node and restart it again. This will recreate these socket files. If the socket files
for cssd are deleted, then the Oracle Clusterware stack may not come down in which
case the node has to be bounced.
Example Configuration
The example configuration used in this article consists of a two-node RAC with a clustered
database named racdb.idevelopment.info running Oracle RAC 10g Release 2 on the Linux
x86 platform. The two node names are racnode1 and racnode2, each hosting a single Oracle
instance named racdb1 and racdb2 respectively. For a detailed guide on building the example
clustered database environment, please see:
Building an Inexpensive Oracle RAC 10g Release 2 on Linux - (CentOS 5.3 / iSCSI)
The example Oracle Clusterware environment is configured with three mirrored voting disks and
two mirrored OCR files all of which are located on an OCFS2 clustered file system. Note that the
voting disk is owned by the oracle user in the oinstall group with 0644 permissions while the
OCR file is owned by root in the oinstall group with 0640 permissions:
located 3 votedisk(s).
Network Settings
Oracle RAC Node 1 - (racnode1)
Device IP Address Subnet Gateway Purpose
eth0 192.168.1.151 255.255.255.0 192.168.1.1 Connects racnode1 to the public network
eth1 192.168.2.151 255.255.255.0 Connects racnode1 to iSCSI shared storage (Openfiler).
eth2 192.168.3.151 255.255.255.0 Connects racnode1 (interconnect) to racnode2 (racnode2-priv)
/etc/hosts
127.0.0.1 localhost.localdomain localhost
To describe the steps required in recovering the OCR, it is assumed the current OCR has been
accidentally deleted and no viable backups are available. It is also assumed the CRS stack was up and
running on both nodes in the cluster at the time the OCR files were removed:
[root@racnode1 ~]# rm /u02/oradata/racdb/OCRFile
[root@racnode1 ~]# rm /u02/oradata/racdb/OCRFile_mirror
Although all OCR files have been lost or corrupted, the Oracle Clusterware daemons as
well as the clustered database remain running. In this scenario, Oracle Clusterware and
all managed resources need to be shut down in order to start the OCR recovery.
Attempting to stop CRS using crsctl stop crs will fail given it cannot write to the
now lost/corrupt OCR file:
With the environment in this unstable state, shutdown all database instances from all nodes in
the cluster and then reboot each node:
------------------------------------------------
When the Oracle RAC nodes come back up, note that Oracle Clusterware will fail to start
as a result of the lost/corrupt OCR file:
The "OCR initialization failed accessing OCR device" and PROC-26 errors can be safely
ignored given the OCR is not available. The most important action is that the SCR entries
are cleaned up.
Keep in mind that if you have more than two nodes in your cluster, you need to run
rootdelete.sh on all other nodes as well.
The primary node is the node where the Oracle Clusterware installation was performed
on (which is typically node1). For the purpose of this example, I originally installed
Oracle Clusterware from the machine racnode1 which is therefore the primary node.
The rootdeinstall.sh script will clear out any old data from a raw storage device in preparation
for the new OCR. If the OCR is on a clustered file system, a new OCR file(s) will be created with
null data.
Amoung several other tasks, this script will create the OCR and voting disk(s).
Done.
6. Oracle 10.2.0.1 users should note that running root.sh on the last node will fail. Most
notably is the silent mode VIPCA configuration failing because of BUG 4437727 in
10.2.0.1. Refer to my article Building an Inexpensive Oracle RAC 10g Release 2 on
Linux - (CentOS 5.3 / iSCSI) to workaround these errors.
7. The Oracle Clusterware and Oracle RAC software in my configuration were patched with
10.2.0.4 and therefore did not receive any errors during the running of root.sh on the
last node.
8. Configure Server-Side ONS using racgons.
Log in as the owner of the Oracle Clusterware software which is typically the oracle user
account and configure all network interfaces. The first step is to identify the current
interfaces and IP addresses using oifcfg iflist. As discussed in the network settings
section, eth0/192.168.1.0 is my public interface/network, eth1/192.168.2.0 is my iSCSI
storage network and not used specifically for Oracle Clusterware, and eth2/192.168.3.0 is
the cluster_interconnect interface/network.
As the Oracle Clusterware software owner (typically oracle), add a cluster TNS listener
configuration to OCR using netca. This may give errors if the listener.ora contains the
entries already. If this is the case, move the listener.ora to /tmp from the
$ORACLE_HOME/network/admin or from the $TNS_ADMIN directory if the
TNS_ADMIN environmental is defined and then run netca. Add all the listeners that were
added during the original Oracle Clusterware software installation.
As a final step, log in as the Oracle Clusterware software owner (typically oracle) and
add all resources back to the OCR using the srvctl command.
Please ensure that these commands are not run as the root user account.
After completing the steps above, the OCR should have been successfully recreated. Bring up all
of the resources that were added to the OCR and run cluvfy to verify the cluster configuration.
All articles, scripts and material located at the Internet address of http://www.idevelopment.info is the copyright of
Jeffrey M. Hunter and is protected under copyright laws of the United States. This document may not be hosted on
any other site without my express, prior, written permission. Application to host any of the material elsewhere can
be made by contacting me at [email protected].
I have made every effort and taken great care in making sure that the material included on my web site is
technically accurate, but I disclaim any and all responsibility for any loss, damage or destruction of data or any
other property which may arise from relying on it. I will in no case be liable for any monetary damages arising from
such loss, damage or destruction.
Last modified on
Wednesday, 14-Oct-2009 11:13:51 EDT
Page Count: 2885