FAQ HADR in SAP Env
FAQ HADR in SAP Env
Applies to:
All SAP releases on DB2 for Linux, UNIX and Windows Version 8 and above.
Summary
DB2 provides several features to increase high availability of the database servers. The DB2 High Availability
and Disaster Recovery (HADR) feature is an easy-to-use data replication feature that provides a high
availability (HA) and disaster recovery (DR) solution for both partial and complete site failures.
This FAQ is intended to help you better understand this feature and to provide a number of
recommendations on the setup, tune, and maintenance of your DB2 HADR environment. The questions are
categorized into the following main areas: Introduction, implementation, and operations.
Author:
Cherry Liang
Company: SAP
Created on: August 2011
Author Bio
Cherry Liang is a member of SAP Active Global Support North America Center of Expertise (CoE). She is
supporting customers running SAP products on DB2 for Linux, UNIX and Windows.
Frequently Asked Questions (FAQ) about IBM DB2 High Availability Disaster Recovery in an SAP System Environment
Table of Contents
Introduction ................................................................................................................................................... 3
1.
What is the DB2 High Availability Disaster Recovery (HADR) feature? .............................................. 3
2.
3.
4.
5.
6.
Implementation ............................................................................................................................................. 6
7.
What are the basic requirements for setting up DB2 HADR? ............................................................. 6
8.
What are the main considerations when setting up DB2 HADR? ....................................................... 6
9.
10.
11.
12.
What is the recommended configuration for log archiving in a DB2 HADR environment? ................... 9
13.
Which DB/DBM CFG parameters should I tune in a DB2 HADR environment? .................................. 9
14.
Which DB2 registry variables should I tune in a DB2 HADR environment? ...................................... 11
15.
How do I manage client connectivity of the SAP system after a failover or takeover? ...................... 12
Operations .................................................................................................................................................. 14
16.
17.
18.
19.
20.
21.
22.
Can I connect to the standby database and perform READ operations? .......................................... 16
23.
24.
If the logging is blocked on the primary database server, what should I check for? .......................... 20
25.
How do I install Fix Packs in a DB2 HADR environment to minimize system downtime?.................. 20
26.
How do I change OS, application, and DB2 configuration parameters in a DB2 HADR environment?21
27.
28.
29. What do I have to consider when performing a table and index reorganization in a DB2 HADR
environment?........................................................................................................................................... 22
30.
Frequently Asked Questions (FAQ) about IBM DB2 High Availability Disaster Recovery in an SAP System Environment
Introduction
1. What is the DB2 High Availability Disaster Recovery (HADR) feature?
DB2 High Availability Disaster Recovery (HADR) is a feature in DB2 Enterprise Server Edition that allows the
replication of any logged database activity to a local or remote location. It provides a high availability solution
for both partial and complete site failures. This feature has been provided by DB2 since version 8.2.
HADR protects against data loss by continually replicating data changes from a source database that is
called primary database to a target database called standby database. With HADR, the log records are
transmitted from the primary to the standby database. The HADR standby database replays all the log
records to its copy of the database, keeping it synchronized with the primary database. The standby
database is in a continuous rollforward pending mode and in a state of near-readiness, so the takeover to the
standby database is fast.
Depending on your business requirements, the following high availability and disaster recovery solutions are
possible with the implementation of the DB2 HADR feature:
Local HA solution
The standby database server is in the same room or the next building, continually kept in sync with
the primary database in SYNC and NEARSYNC modes. HADR provides an extremely fast failover.
Transaction loss can be excluded when using SYNC mode.
DR solution
The standby database server is located remotely, without performance overheads in the primary
database in ASYNC and SUPERASYNC modes. A transaction loss is possible.
You require zero or least committed transaction data loss. The replica (standby) database must be
continually kept up-to-date with the primary database.
You have no downtime for system change windows. HADR allows the rolling DB2 Fix Pack upgrade
and configuration parameter changes to both databases, and operating system
maintenance/recycling windows with no downtime for DB2 clients.
Frequently Asked Questions (FAQ) about IBM DB2 High Availability Disaster Recovery in an SAP System Environment
heartbeat messages are spaced at known time intervals, so each end can identify how many heartbeats
have been missed and take appropriate actions.
Upon successful connection of the standby database to the primary database, the HADR system enters the
catch-up phase. During the catch-up phase, the size of the logical log gap from the standby to the primary
database is established, and the primary database starts sending all logical logs that are required for the
standby database to reach the same point as the primary database.
The logical log reading is accomplished in the primary database by the db2lfr EDU. This process relays
logical log pages to the db2hadrp EDU, which in turn relays the pages over the TCP/IP layer for the
db2hadrs in the standby database to catch and relay to its logical recovery subsystem.
After all the logs on the disk and in the memory of the primary database have been relayed to the standby
database, the HADR system enters the Peer state. The primary database continues shipping the log page
to the standby database whenever it flushes a log page to the disk. The log pages are replayed on the
standby as they arrive and are also written to local log files on the standby database.
4. Which operations are replicated by DB2 HADR?
The following operations are replicated to the standby database from the primary database:
Data Definition Language (DDL): Includes CREATE and DROP TABLE, INDEX, and more
operations
Tablespace:
CREATE, DROP, or ALTER.
Container sizes, file types (raw device or file system), and paths must be identical on the
primary and the standby database.
Tablespace types (DMS or SMS) must be identical on both the servers.
If the database is enabled for automatic storage, the storage paths must be identical.
Frequently Asked Questions (FAQ) about IBM DB2 High Availability Disaster Recovery in an SAP System Environment
Non-logged Large Objects (LOBs) are not replicated, but the space is allocated and LOB data is set
to binary zeroes on the standby database.
LOAD NONRECOVERABLE
The table on the standby is marked as bad and future log records regarding this table are
skipped.
Frequently Asked Questions (FAQ) about IBM DB2 High Availability Disaster Recovery in an SAP System Environment
Implementation
7. What are the basic requirements for setting up DB2 HADR?
HDAR is only supported in archive logging mode and not in circular logging mode.
The operating system on the primary and standby databases should be the same version, including
the patches. Only for a short time, that is, during upgrades, they are allowed to be different.
The DB2 version and level must be the same on both the primary and the standby systems. Only for
a short time, that is, during upgrades, they are allowed to be different.
The DB2 software for both the primary and the standby database must have the same bit size (32-bit
or 64-bit).
Tablespaces must be identical on the primary and standby databases including the following:
Tablespace type
Tablespace size
Container path
Container size
Container file type
The amount of space allocated for log files should also be the same on both the primary and standby
databases.
Use identical host computers for the HADR primary and standby database servers. This means that
they should be from the same vendor and have the same architecture.
Ensure that the primary and standby database servers have equal amounts of memory.
Ensure that the primary and standby database servers have identical database (DB) and database
management (DBM) configuration parameters.
HADR topology
Analyze the high availability and disaster recovery requirements from a business standpoint. Plan for
an appropriate HADR topology.
Logfile management
A fast log disk device on both the primary and standby database is critical to performance of HADRenabled databases. Typically, the response time for log I/O to disk should be in the low millisecond
range.
Use dedicated, high performing disks or file systems for the database logs. Do not share devices
between the logging file system and tablespace file systems.
Network
Network bandwidth must be greater than the database log generation rate. Incorrect provisioning of
bandwidth for HADR solutions can adversely affect production performance and can invalidate the
overall solution. Network delays affect the primary database only in SYNC and NEARSYNC modes.
It is important to have a good network latency to send the log data from the primary to the standby
database. To test the bandwidth and latency, you can use the HADR simulator that you can
download at:
Frequently Asked Questions (FAQ) about IBM DB2 High Availability Disaster Recovery in an SAP System Environment
http://www.ibm.com/developerworks/wikis/display/data/HADR_sim
Use a dedicated network for HADR connection, and consider using multiple network adaptors to
ensure that the failure of a single adapter does not result in the loss of network.
Synchronous (SYNC):
Log operation happens in the following sequence:
1. The log pages are first written to the primary database log disk.
Frequently Asked Questions (FAQ) about IBM DB2 High Availability Disaster Recovery in an SAP System Environment
Near-synchronous (NEARSYNC):
Log operation happens in the following sequence:
1. The log pages are written to the primary database log disk and sent to the standby database in
parallel.
2. The standby database sends the acknowledgement as soon as the log pages are received into
its memory.
3. After receiving acknowledgements from the standby and log pages are written to log disk, the
primary database notifies an application that a transaction was prepared or committed.
This is the default option and recommended in an SAP system environment. Log write is considered
successful only when the primarys log buffer has been written to log files on the primary database
and an acknowledgement is received that the log buffer has been written to the main memory on the
standby database. In a fast network, the overhead to the primary database is minimal. Running an
HADR cluster in near-sync-mode therefore hardly affects the stability of the primary database.
Transaction loss occurs only if both sites fail simultaneously and if the target site has not transferred
all the received data to nonvolatile storage.
In case of a failover, all uncommitted transactions are rolled back.
Asynchronous (ASYNC):
Log operation happens in the following sequence:
1. The log pages are written to the primary database log disk and sent to the standby database in
parallel.
2. After log pages have been delivered to the TCP layer of the primary system's host machine and
log pages are written to the log disk, the primary database notifies an application that a
transaction was prepared or committed.
Log write is considered successful when logs have been written to the disk on the primary system
and log data has been sent through TCP/IP to the standby system.
Since the primary system does not wait for acknowledgement from the standby system, transactions
might be considered committed when they are still on their way to the standby system. Transaction
loss can occur in this mode.
Running an HADR cluster in async-mode does not affect the performance of the primary database.
Frequently Asked Questions (FAQ) about IBM DB2 High Availability Disaster Recovery in an SAP System Environment
standby database, transactions are considered committed irrespectively of the replication state of the
transaction. In this mode, the HADR pair can never be in peer state or disconnected peer state.
This mode has the shortest transaction response time but also the highest probability of transaction
losses if the primary system fails. This mode is useful when you do not want transactions to be
blocked or when you experience elongated response times due to network interruptions or
congestion.
Since the transaction commit operations on the primary database are not affected by the relative
slowness of the network or the standby, the log gap between the primary and the standby database
might continue to increase. In the case of a failover, it might involve lost log records that cause the
associated transactions to be lost in a new primary database. Attempts to restart the original primary
database as standby database will fail. Instead, you need to reinitialize it as a new standby database
by restoring a backup image of the new primary database.
12. What is the recommended configuration for log archiving in a DB2 HADR environment?
You can configure different locations for the archived logs on the primary and standby database. At any
given time, the log is only archived on the primary database. If you configure different log archive locations
for the primary and the standby database server, you will have log files in two different locations after many
failovers or takeovers.
We therefore recommend that you make the location of archived logs accessible to both the primary and
standby database, which has the following advantages:
The primary database is the only database that archives logs, but after a takeover, the new primary
(original standby) database starts archiving logs. Therefore, it is simplest to have all logs archived to
the same location. If the archive device is not shared, after a few role switches, some files will be on
one device and some on the other. Then you might need to perform some manual intervention
during the HADR catch-up phase or other recovery actions. You may need to move or copy log files
between the primary and standby systems.
A shared archive allows the standby database to retrieve log files directly from the archive device
during the local catch-up state, relieving the primary database from reading the files and sending
them over the network to the standby database.
13. Which DB/DBM CFG parameters should I tune in a DB2 HADR environment?
The following DB/DBM CFG parameters are important for the performance and recoverability in a DB2
HADR environment:
AUTORESTART
In the event of an abnormal termination of the database, this DB CFG parameter determines whether
the database manager can automatically call the restart database utility when an application
connects to a database.
You should set this parameter to OFF so that a broken primary server does not come back online
and does not restart in HADR primary role causing a split-brain scenario.
BLOCKNONLOGGED
This DB CFG parameter has been available as of DB2 V9.5 Fix Pack 5 and higher. When this
parameter is set to YES, the database manager blocks the following operations against the
database:
ALTER TABLE ... ACTIVATE NOT LOGGED INITIALLY
Not used by default in a SAP system, except for BW /BI0/0P tables. For more information, see
SAP Note 1527970.
Creation of a new NOT LOGGED LOB column in a database table
The DB2 EXPLAIN tables contain LOB columns that are not logged. The DBA Cockpit creates
and adapts the DB2 EXPLAIN tables during EXPLAIN execution using the DB2 stored procedure
SYSPROC.SYSINSTALLOBJECTS. Since the contents of the DB2 EXPLAIN tables is only used
during the EXPLAIN execution, the use of not-logged columns is safe.
Frequently Asked Questions (FAQ) about IBM DB2 High Availability Disaster Recovery in an SAP System Environment
We recommend that you set BLOCKNONLOGGED to NO. This is a safe setting for running SAP
systems in HADR environments. For more information, see SAP Note 1523227.
INDEXREC
Index re-creation time is both a DBM and DB CFG parameter that specifies when DB2 attempts to
rebuild invalid indexes, and - specifically for HADR- whether this occurs during the HADR log replay
on the standby database. The DB CFG parameter - if not set to SYSTEM - overrides the value of the
DBM CFG parameter.
Possible values are:
SYSTEM:
Applies to DB CFG only; accept the value for INDEXREC in the DBM CFG.
ACCESS:
An invalid index is rebuilt when it is first accessed and then rebuilt on the HADR standby
database during log replay.
ACCESS_NO_REDO:
An invalid index is rebuilt when it is first accessed, but the invalid index is left on the HADR
standby database and only rebuilt after an HADR takeover and the first access of the underlying
table.
RESTART:
This is the default value. An invalid index is rebuilt after a RESTART DATABASE or HADR
takeover on the primary database and during log replay on the standby database. Indexes are
rebuilt asynchronously (no wait) at takeover time, and synchronously (wait) at restart time.
RESTART_NO_REDO:
An invalid index is rebuilt after a RESTART DATABASE on the primary database, but not on the
standby database during log replay. The rebuild only occurs after an HADR takeover.
LOGFILSIZ
This DB CFG parameter, which specifies the size of active logs, is taken from the primary database
and used by the standby database so that the size of active log files on both servers match. The DB
CFG parameter value for LOGFILSZ of the standby database is ignored.
After a takeover, the new primary (old standby) database uses its local configuration value upon the
first database reactivation after the takeover.
LOGINDEXBUILD
This DB CFG parameter should be set to ON so that maintenance operations on indexes, such as
CREATE or REORG, on the primary database are logged and also carried out in the same way on
the standby database. If it is set to off, any index that was rebuilt on the primary database will require
rebuild on the standby database after a takeover. As a result, takeover times will increase.
HADR_SYNCMODE
The setting of this parameter must match on both servers in the HADR pair. For a detailed
explanation of the four synchronization modes, see question 11.
HADR_PEER_WINDOW
This DB CFG parameter specifies whether the database goes into disconnected peer state after the
HADR connect state has been changed to disconnect, and how long the primary database suspends
the update of transactions. This parameter must be the same on the primary and standby database.
If HADR_SYNCMODE is set to ASYNC or SUPERASYNC, or HADR_PEER_WINDOW is set to
zero, DB2 ignores this parameter.
Frequently Asked Questions (FAQ) about IBM DB2 High Availability Disaster Recovery in an SAP System Environment
If the peer window is enabled (that is, set to a non-zero value), the primary database sends
messages to the standby database at regular intervals in peer state, indicating a "peer window end"
time stamp. The time stamp can be retrieved using the db2pd command with the hadr option. If the
primary database failed within a peer window, you can perform a failover with no data loss. If the
primary database failed outside the peer window, the primary database might or might not have
committed transactions after the peer window has ended. In that case, a takeover exposes you to
the risk of transaction loss, and you should repair and restart the failed primary database instead of
initiating a takeover.
For an easy integration with cluster managers, a PEER WINDOW ONLY option is added to the
TAKEOVER BY FORCE command. With this option, the command performs a failover only if the
current time is earlier than the peer window end. Upon detecting primary failure, the cluster manager
can issue the TAKEOVER BY FORCE PEER WINDOW ONLY command to initiate a failover. Using
this option maximizes the likelihood that an automated failover does not cause transaction loss. This
option is recommended for automated failover (the DB2 cluster manager uses this option). In such a
configuration, the peer window time should be set to at least the amount of time the cluster manager
needs to detect a failure of the primary database and react to the failure using failover.
If you use db2haicu for configuring the HADR cluster, the value of HADR_PEER_WINDOW must be
larger than 120.
HADR_TIMEOUT
This DB CFG parameter determines the number of seconds the HADR process waits before
considering a communication attempt to its HADR partner as failed. This value must also be the
same on both sides of the HADR pair.
If an HADR database does not receive any communication (heartbeat or acknowledgement signals)
from its partner database for longer than the length of time specified by this parameter, the database
concludes that the connection with the partner database is lost.
If the database is in peer state when the connection is lost, it moves into DisconnectedPeer state if
the peer window is enabled, or into Remote catch-up pending state if the peer window is disabled.
The change of state applies to both primary and standby database.
If this parameter value is set too high, the HADR process cannot detect network or partner database
failures promptly. The primary database can end up waiting for too long and blocking transactions on
the primary database. If the parameter value is too low, the HADR process might get too many false
alarms, breaking the connection too often.
We recommend that you set a value of at least 60 seconds. When setting this parameter, consider
network reliability and machine response times. If the network has an irregular or long transmission
delay, use a longer timeout.
14. Which DB2 registry variables should I tune in a DB2 HADR environment?
The following DB2 registry variables are important for the performance and recoverability in a DB2 HADR
environment:
DB2_HADR_BUF_SIZE
This registry variable is the size of the HADR standby log receive buffer and only recognized on the
standby database. By default, it is calculated as twice the value of LOGBUFSZ.
If the standby database is slow in replaying logs, and the primary database keeps sending more logs
to the standby database, the log receive buffer eventually becomes full, preventing the standby
database from receiving more logs. Saturation of the receive buffer causes transactions on the
primary database to be blocked until the receive buffer has more space to receive log pages.
If your primary load is uneven, consider a larger standby receive buffer to absorb the peaks. We
recommend that you perform basic transactional throughput testing to measure the log generation
rate on your primary database.
HADR standby receive buffer usage can be monitored using the db2pd command with the -hadr
option on standby.
Frequently Asked Questions (FAQ) about IBM DB2 High Availability Disaster Recovery in an SAP System Environment
DB2_HADR_PEER_WAIT_LIMIT
When this variable is set, the HADR primary database breaks out of peer state if logging on the
primary database has been blocked for the specified number of seconds because the log was
replicated to the standby database. When this limit is reached, the primary database breaks the
connection to the standby database. If the peer window is disabled, the primary database enters a
Disconnected state, and logging resumes. If the peer window is enabled, the primary database
enters a DisconnectedPeer state in which logging continues to be blocked.
HADR_TIMEOUT does not break the primary database out of the peer state if the primary keeps
receiving heartbeat messages from the standby database while logging is blocked.
HADR_TIMEOUT is a timeout for the HADR network layer. It does not control timeout for higher
layer operations such as log shipping. If log replay on the standby database is stuck on a large
operation such as load or reorganization, the HADR component still sends heartbeat messages to
the primary database on a normal schedule. In such a scenario, the primary database is blocked as
long as the standby replay is blocked, unless DB2_HADR_PEER_WAIT_LIMIT is set.
You can specify a short HADR_TIMEOUT to detect network problems or a standby crash promptly
while setting a relatively long DB2_HADR_PEER_WAIT_LIMIT to keep replication enabled as long
as the primary and standby databases are connected. Conversely, you can specify a short
DB2_HADR_PEER_WAIT_LIMIT for a better client response time while using a longer
HADR_TIMEOUT to avoid frequent disconnections when the databases are not in peer state.
DB2_LOAD_COPY_NO_OVERRIDE
By default, the LOAD COPY NO command executed on an HADR primary database is converted to
a LOAD NONRECOVERABLE command that the HADR standby ceases to align with the primary
database. The table on the standby database is marked bad, and logs from the primary database are
not applied there. This state can be corrected by the subsequent execution of a LOAD COPY YES
command.
If this registry variable is set to COPY YES, the command is converted to LOAD COPY YES so that
it is automatically used with log replay on the HADR standby database to maintain data integrity. In
order for this to be effective, note that the LOAD command still has to use a valid target that the
HADR standby database can access.
15. How do I manage client connectivity of the SAP system after a failover or takeover?
If the database servers change their HADR roles or if the standby database server has to take over the
primary role by force after an outage of the primary database server, the database clients have to reconnect
to the new primary server.
The database clients can do so using one of the following options:
Frequently Asked Questions (FAQ) about IBM DB2 High Availability Disaster Recovery in an SAP System Environment
The client is configured to know the two database servers. If the database client cannot connect to
the configured primary database server, the database client tries to connect to the configured
standby (alternate) server.
In a split brain scenario with ACR, data corruption can occur as described in the following:
An SAP NetWeaver system consists of several work processes (ABAP) or threads (JAVA). Each of them has
its own database connection. If a split brain scenario exists, ACR is configured, and if some work processes
or threads need to reconnect to the database, it is not ensured that all work processes or threads are
connected to the same database server. Based on the selection algorithm of ACR, some work processes
could be connected to one primary database server and other work processes to the other primary database
server. In this case, the databases are no longer consistent.
The virtual IP concept does not allow this scenario because the virtual IP address is handled by a cluster
manager. The cluster manager ensures that the virtual IP is always bound to one database server only. Even
if a split brain scenario exists, all SAP connections still use the same database server.
We strongly recommend that you use virtual IP because this is the only option that SAP supports. The use of
ACR is only supported by SAP if the database server is set up using db2haicu with SA MP. You can find a
collection of known limitations for the use of ACR in SAP Note 1568539 - DB6: HADR - Virtual IP or
Automatic Client Reroute.
If you would like to configure virtual IP for two separate LANs for your data centers, configure a virtual LAN
so that VIP can float between two data centers.
Frequently Asked Questions (FAQ) about IBM DB2 High Availability Disaster Recovery in an SAP System Environment
Operations
16. How do I start DB2 HADR?
Before starting HADR, you should ensure that your database manager (instance) on both the primary and
standby database is started. Use the db2start command to start the instance. The instance can be started in
any order, primary or standby.
When starting HADR, we recommend that you start the standby database before the primary database. The
reason for starting the standby first is that the primary HADR startup, without the BY FORCE option, requires
the standby database to be active within the HADR_TIMEOUT period or startup fails to prevent a split-brain
scenario.
Using the startup command
DB2 START HADR ON DATABASE database-alias [USER user-name [USING password]] AS {PRIMARY
[BY FORCE] | STANDBY}
When starting the primary, the BY FORCE option specifies that the HADR primary database does not wait
for the standby database to connect to it. After a start BY FORCE, the primary database still accepts valid
connections from the standby database whenever the standby later becomes available.
17. How do I shut down DB2 HADR?
Issue the following shutdown command:
DB2 STOP HADR ON DATABASE database-alias [USER user-name [USING password]]
Although the STOP HADR command can be used to stop HADR on the primary or the standby database, or
both, it should be used with caution. If you want to stop the specified database but still want it to maintain its
role as either an HADR primary or a standby database, do not issue the STOP HADR command. If you issue
the STOP HADR command, the database becomes a standard database and might require re-initialization in
order to resume operations as an HADR database.
Instead, issue the following command:
DB2 DEACTIVATE DATABASE database-alias
If you only want to shut down the HADR operation, we recommend that you shut down the HADR pair as
follows:
1. Deactivate the primary database.
2. Stop the DB2 instance on the primary database.
3. Deactivate the standby database.
4. Stop the DB2 instance on the standby database.
18. How do I perform a planned takeover?
A planned takeover is also referred to as switching roles or role exchange. You issue the TAKEOVER
command when you want to switch roles of the databases.
Switching roles is only done from the standby when the databases are in peer state. If the databases are in
any other state, the TAKEOVER command fails.
To start the takeover procedure, simply issue the TAKEOVER HADR command:
DB2 TAKEOVER HADR ON DATABASE DATABASE-ALIAS
After issuing the TAKEOVER HADR command from the standby database, the following steps are carried
out in the background:
1. The standby tells the primary database that it is taking over.
2. The primary database forces off all client connections and refuses new connections.
Frequently Asked Questions (FAQ) about IBM DB2 High Availability Disaster Recovery in an SAP System Environment
3. The primary database rolls back any open transactions and ships the remaining log, up to the end of
the log, to the standby database.
4. The standby database replays the received log, up to the end of the log.
5. The primary becomes the new standby database.
6. The standby becomes the new primary database
19. How do I perform a forced takeover?
A takeover by force is also referred to as failover, which is issued from the standby database with the
TAKEOVER command BY FORCE option included:
DB2 TAKEOVER HADR ON DATABASE DATABASE-ALIAS BY FORCE [PEER WINDOW ONLY]
Peer:
The primary database is communicating with its standby database and is shipping log buffer flushes.
Remote catchup:
The primary database is communicating with its standby database and is shipping old logs.
DisconnectedPeer:
The connection status is Disconnected. However, the primary database is still in the peer window
phase. Therefore, the update transactions are suspended until the end of the peer window phase.
Frequently Asked Questions (FAQ) about IBM DB2 High Availability Disaster Recovery in an SAP System Environment
The end time timestamp is shown in the PeerWindowEnd field of the db2pd command output with
the hadr option.
Possible states on the standby database are as follows:
Peer:
Caught up to the tail of the log and is receiving currently generated log data from the primary and
replaying it.
Remote catchup:
Not caught up to the tail of the log, has requested old log data from the primary that is not in the
standbys log path and is not available through the standbys user exit.
Local catchup:
Performing log replay from log data already on the disk of that system.
DisconnectedPeer:
The connection status is Disconnected. However, the standby database is still in the peer window
phase. The TAKEOVER HADR ... PEER WINDOW ONLY command can run successfully during this
phase.
Connected:
The database is connected to its partner node.
Disconnected:
The database is not connected to its partner node. If the disconnected status is not expected, check
the status of the other HADR server, either the primary or the standby, to determine if there is a
database problem or some network issue.
Congested:
The database is connected to its partner node, but the connection is congested. With a congested
status, we suggest that you troubleshoot your network to pinpoint the problem for the congestion.
22. Can I connect to the standby database and perform READ operations?
Before IBM DB2 V9.7 Fix Pack 1, the standby database is not a connectable database. You cannot update
the database from the standby, nor can you issue any read-only queries on the standby database. If you
need to read from a standby, the workaround is to use VERITAS Storage Foundation for DB2 HADR.
Starting with IBM DB2 V9.7 Fix Pack 1, you can perform READ operations using the DB2_HADR_ROS
registry variable on the standby database. All types of read queries, including scrollable and non-scrollable
cursors, are supported on the standby database. Read capability is supported in all HADR synchronization
modes and in all HADR states except local catchup.
There are certain limitations for READ operations on the standby database. The only supported isolation
level is Uncommitted Read (UR). When an HADR active standby database is replaying DDL log records or
maintenance operations (REORG, RUNSTATS, etc.), the standby database enters the replay-only window.
Then existing connections to the standby database are terminated and new connections to the standby
database are blocked. Replay-only windows can be monitored on standby by the command db2pd with the
hadr option on standby. For more information about other restrictions, see the IBM Info Center at
Frequently Asked Questions (FAQ) about IBM DB2 High Availability Disaster Recovery in an SAP System Environment
http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/topic/com.ibm.db2.luw.admin.ha.doc/doc/c0054258.htm
l
23. How do I monitor DB2 HADR?
Most of the output from the HADR monitoring methods that are introduced here reflects your current settings
in the database configuration file. The settings that can have the status changed are:
Role
State
Connection status
Heartbeats missed
Role
The HADR role of the current database server that the monitor command is executed on. The database role
can be either Primary or Standby.
State
The status for the HADR state of the current database server that the monitor command is executed on.
Connection status
The connection status returns the current connection status between the primary and the standby database.
Heartbeats missed
The counter of consecutively missed heartbeats. Heartbeats are used by the HADR pair to check each
others status. An HADR database expects at least one heartbeat message from the other database for each
quarter of the time interval defined in the HADR_TIMEOUT configuration parameter, or 30 seconds,
whichever is shorter.
Primary and standby log position
These elements show the status on the log position of both the primary and the standby database. Since log
positions are transmitted from the partner node periodically, these may not be accurate. If log truncation
takes place, the log gap running average is not accurate.
Log gap running average
This element shows the running average of the gap between the primary LSN and the standby LSN. The gap
is measured in number of bytes. Since this is an average number, the value might not be 0 even if the
systems are in SYNC mode and in Peer state.
The following monitoring methods are available:
Snapshot
To get the current status of either the primary or the standby database, issue the GET SNAPSHOT
FOR DATABASE command. It requires SYSMON as a minimum authorization.
DB2 GET SNAPSHOT FOR DATABASE ON DATABASE-ALIAS
The information returned represents the status of the database manager operations at the time the
command is issued. HADR information is listed under the heading HADR Status.
Sample output on Standby:
Frequently Asked Questions (FAQ) about IBM DB2 High Availability Disaster Recovery in an SAP System Environment
db2pd
The current state of HADR can be gathered without explicit connection to a database using the
db2pd command. Note that this command requires SYSADM authorization in DB2 and also that the
database is activated to obtain any useful output related to HADR.
Issue the command in both the primary and the standby database.
db2pd -d DATABASE-ALIAS -hadr
As of IBM DB2 V9.1, the percentage of standby log receiving buffer used is reported on the HADR
standby database.
Sample output on Standby:
Frequently Asked Questions (FAQ) about IBM DB2 High Availability Disaster Recovery in an SAP System Environment
Monitor elements are the individual attributes for various functional areas inside DB2 that are
monitored. This means that they have counters or point-in-time status values updated as relevant
events occur. Related monitor elements are referred to as logical data groups. The available monitor
elements for the HADR logical data group are as shown below.
Note that these are SQL queries and require a prior connection to the database. Before IBM DB2
V9.7 Fix Pack 1, you are unable to use either administrative views or table functions to get the
values of the monitor elements for a standby database.
db2diag.log
Basic status and diagnostics information of HADR can be extracted from db2diag.log. It neither
requires overhead to achieve a connection to the DB2 database, nor interface through any API to get
the DB2 snapshots or monitor element data. The output in the db2diag.log file describes internal
states.
There is a prefix to show whether the state is for a primary (P-) or for a standby (S-) database.
Sample output on Primary:
Frequently Asked Questions (FAQ) about IBM DB2 High Availability Disaster Recovery in an SAP System Environment
24. If the logging is blocked on the primary database server, what should I check for?
1. HADR state
Identify HADR state using the monitor methods described in question 23.
2. Network status
Identify the connection status using the monitor methods described in question 23.
3. Standby log receive buffer usage
Identify the log receive buffer usage via the db2pd command with the hadr option on standby.
4. HADR-related logs
Retrieve HADR-related messages from db2diag.log.
25. How do I install Fix Packs in a DB2 HADR environment to minimize system downtime?
HADR gives you high availability while applying DB2 Fix Packs through rolling upgrades. Your database
downtime is reduced to the short time while you are switching roles between your database servers.
These are the detail steps to perform a rolling upgrade:
1. Install the Fix Pack as a new software copy on the standby and the primary database.
2. Deactivate the standby database and stop the standby instance.
3. Execute the program "db2iupdt" on the standby to switch your existing DB2 instance from the current
software copy to the new DB2 software copy.
4. Start the standby instance and activate the standby database.
5. Switch HADR roles by issuing the takeover command on the standby database.
To minimize the impact on end users, you can perform a graceful takeover using the SAP Graceful
Maintenance Tool. For more information, see SAP Note 1530812.
If you use the SA MP cluster solution, you can use the script sapdb2cluster.sh that is attached to
SAP Note 960843. Call the script with option switch, which automates a graceful database cluster
switch.
After the takeover, the HADR connection status is disconnected because of the different DB2 levels.
HADR supports the standby database to run a newer DB2 level than the primary database, but not
the other way around. Therefore, when the takeover is complete, the new standby database is
deactivated because it has an older DB2 level.
6. Repeat steps 2 to 4 on the new standby (originally primary) database.
Optional step: Switch the HADR roles back to the original state.
7. Execute the script db6_update_db.sh for post processing.
The script is attached to SAP Note 1365982. The script performs a parameterization and a package
rebind. Follow the post processing procedure provided in the SAP Notes list at the end of this
section.
Frequently Asked Questions (FAQ) about IBM DB2 High Availability Disaster Recovery in an SAP System Environment
Execute the script on both primary and standby databases. It requires restart of both instances to
accept the new parameters. To minimize the impact on end users, you can perform a graceful
restart using the SAP Graceful Maintenance Tool. For more information, see SAP Note 1530812.
8. Execute script db6_update_client.sh for post-processing.
The script is attached to SAP Note 1365982. The script updates SAP versions that have a CLI driver
installed. Follow the post-processing procedure provided in the SAP Notes list at the end of this
section.
9. Restart the application servers one by one to update the DB2 clients.
For installation instructions on various combinations of OS platforms and DB2 versions, see the following
SAP Notes:
1138549 DB6: Installation of Fix Packs for DB2 V9.5(UNIX & Linux)
26. How do I change OS, application, and DB2 configuration parameters in a DB2 HADR
environment?
To ensure a consistent system behavior after a failover, we strongly recommend that you keep the same
configuration of operation system and database on both the primary and standby database.
When updating HADR-specific database configuration parameters, DB2 does not allow you to switch HADR
roles. A short database outage is required during the update of parameters on the primary server, the
recycling of the DB2 instance, and the deactivation/activation of the database.
No special procedure is required for those databases or database manager configuration parameters which
are dynamic, that is, which require no recycling of the DB2 database or instance to take immediate effect.
To update database and database manager configuration parameters, as well as for any operating system
and application upgrades, follow the following procedure:
1. Make sure that HADR is in peer state.
2. Deactivate the database on standby and stop the DB2 instance when applying operating system or
non-DB2 changes, which require a server recycle, in addition to database manager configuration
parameter changes that require a DB2 instance recycle.
3. Make the necessary changes to the hardware, software, or DB2 configuration parameters.
4. Start the DB2 instance if it is stopped, and activate the database, or explicitly connect to the
database.
5. Check HADR to ensure peer state and switch roles of the primary and standby database by issuing a
takeover HADR command on the standby database.
To minimize the impact on end users, you can perform a graceful takeover using the SAP Graceful
Maintenance Tool. For more information, see SAP Note 1530812.
If the SA MP cluster solution is implemented, use the script sapdb2cluster.sh that is attached to SAP
Note 960843. Call the script with option switch, which automates a graceful database cluster
switch.
6. Repeat steps 1) to 4) on the new standby (originally primary) database.
Frequently Asked Questions (FAQ) about IBM DB2 High Availability Disaster Recovery i n an SAP System Environment
Optional step: Switch the HADR roles back to the original state.
27. How do I upgrade the database in a DB2 HADR environment?
A rolling upgrade is not supported between DB2 version upgrades. Plan for a database outage while the
HADR primary database is updated.
In an HADR environment, you can only upgrade the primary database and not the standby database. Before
the upgrade, you must execute the STOP HADR command on the primary and the standby database. After
the upgrade, you have to re-create your standby database and initialize HADR again on the primary and the
standby database.
To upgrade a DB2 database that is part of an SA MP cluster, you must first take it out of control of the cluster
management software. To do so, set the respective resource groups offline and switch SA MP into manual
control mode using the command samctrl MT.
For more information, see the upgrade guides on SAP Service Marketplace at
http://service.sap.com/instguides -> Database Upgrades -> DB2 UDB.
28. How do I back up the database in a DB2 HADR environment?
DB2 backups can only be taken on the primary database. If your goal is high availability, we recommend that
you use an online backup to keep your database available.
29. What do I have to consider when performing a table and index reorganization in a DB2 HADR
environment?
Both online and offline reorganizations are replicated operations. However, the method in which they are
replicated differs and can influence the choice of operation. Choose a reorganization method based on
whether you want to keep the affected tables and indexes available during the operation.
Online reorganization
During a default online (inplace) table reorganization and an online index reorganization with option
ALLOW WRITE ACCESS, full read and write access is uninterrupted except during the truncate
phase during which all write access is suspended but read access is maintained. One disadvantage
is that online reorganization causes slower performance and increases log space consumption.
To mitigate the impact from online table reorganization, perform an online reorganization with no
index specification (and ensure no clustering index is defined on the table) if you are reorganizing for
space reclamation only and if clustering is not important in your workload.
To mitigate the impact from online index reorganization, perform CLEANUP ONLY instead. The
index manager typically removes pseudo-deleted keys automatically at a later point in time.
However, if you want to be sure the next accesses through the index are not penalized by the
presence of the pseudo-deleted keys, use the CLEANUP ONLY option to remove the pseudodeleted keys. If you do not need the space and your workload does not hit these keys, it is better to
ignore them and let the index manager handle their removal later.
If there is a failover during an online reorganization, the reorganization cannot be resumed on the
new primary database as the status file is stored outside of the database and is not replicated. You
need to restart the reorganization on the new primary database.
Offline reorganization
During an offline reorganization with reclustering using an index, operations are logged per hundreds
or thousands of affected rows. If your database is in peer state, this can cause intermittent blocking
of primary transactions as the standby must replay many updates at once for a single log record.
Standby replay can be blocked long enough to fill up the standby log receive buffer, which causes a
blockage of the primary database.
A non-clustering reorganization generates a single log record after the reorganization is completed
on the primary database. This method has the greatest impact on the HADR pair as the standby
database performs the entire reorganization from scratch after receiving the log record. Standby
Frequently Asked Questions (FAQ) about IBM DB2 High Availability Disaster Recovery in an SAP System Environment
replay can be blocked long enough to fill up the standby log receive buffer, which causes a blockage
of the primary database.
If you have to perform large offline reorganizations, consider one of the following recommendations:
1. Temporarily disable HADR in order to avoid blocking the primary database. However, this results
in a time window in which there is no high availability or disaster recovery.
2. Set the registry variable HADR_PEER_WAIT_LIMIT value to automatically break the primary
database out of peer state. The primary database will be blocked for the period specified by this
value. The advantage is that this does not require any manual intervention.
30. What are known issues with DB2 HADR?
Affected version: V9.7
Note 1553460 DB6: HADR primary server hangs due to memory leak on standby
APAR IY94994 HADR stdby logfile set never shrinks (First fixed in Fix Pack 3SAP)
Note 830920 DB6: Using DB2 HADR logfile may be deleted before archived
For IBM APARs related to HADR, refer to the fix list via the following link:
V9.7 APARs
http://www.ibm.com/support/docview.wss?uid=swg21412438
V9.5 APARs
http://www.ibm.com/support/docview.wss?uid=swg21293566
V9.1 APARs
http://www.ibm.com/support/docview.wss?uid=swg21255607
V8 APARs
http://www.ibm.com/support/docview.wss?uid=swg21256235
Frequently Asked Questions (FAQ) about IBM DB2 High Availability Disaster Recovery in an SAP System Environment
Related Content
Best Practice: DB2 High Availability Disaster Recovery
High Availability and Disaster Recovery Options for DB2 on Linux, UNIX, and Windows
SAP Notes on Service Market
IBM DB2 Information Center
Frequently Asked Questions (FAQ) about IBM DB2 High Availability Disaster Recovery in an SAP System Environment
Copyright
Copyright 2011 SAP AG. All rights reserved.
No part of this publication may be reproduced or transmitted in any form or for any purpose without the express permission of SAP AG.
The information contained herein may be changed without prior notice.
Some software products marketed by SAP AG and its distributors contain proprietary software components of other software vendors.
Microsoft, Windows, Excel, Outlook, and PowerPoint are registered trademarks of Microsoft Corporation.
IBM, DB2, DB2 Universal Database, System i, System i5, System p, System p5, System x, System z, System z10, System z9, z10, z9,
iSeries, pSeries, xSeries, zSeries, eServer, z/VM, z/OS, i5/OS, S/390, OS/390, OS/400, AS/400, S/390 Parallel Enterprise Serv er,
PowerVM, Power Architecture, POWER6+, POWER6, POWER5+, POWER5, POWER, OpenPower, PowerPC, BatchPipes,
BladeCenter, System Storage, GPFS, HACMP, RETAIN, DB2 Connect, RACF, Redbooks, OS/2, Parallel Sysplex, MVS/ESA, AIX,
Intelligent Miner, WebSphere, Netfinity, Tivoli and Informix are trademarks or registered trademarks of IBM Corporation.
Linux is the registered trademark of Linus Torvalds in the U.S. and other countries.
Adobe, the Adobe logo, Acrobat, PostScript, and Reader are either trademarks or registered trademarks of Adobe Systems
Incorporated in the United States and/or other countries.
Oracle is a registered trademark of Oracle Corporation.
UNIX, X/Open, OSF/1, and Motif are registered trademarks of the Open Group.
Citrix, ICA, Program Neighborhood, MetaFrame, WinFrame, VideoFrame, and MultiWin are trademarks or registered trademarks of
Citrix Systems, Inc.
HTML, XML, XHTML and W3C are trademarks or registered trademarks of W3C, World Wide Web Consortium, Massachusetts
Institute of Technology.
Java is a registered trademark of Oracle Corporation.
JavaScript is a registered trademark of Oracle Corporation, used under license for technology invented and implemented by Netscape.
SAP, R/3, SAP NetWeaver, Duet, PartnerEdge, ByDesign, SAP Business ByDesign, and other SAP products and services mentioned
herein as well as their respective logos are trademarks or registered trademarks of SAP AG in Germany and other countries.
Business Objects and the Business Objects logo, BusinessObjects, Crystal Reports, Crystal Decisions, Web Intelligence, Xcelsi us, and
other Business Objects products and services mentioned herein as well as their respective logos are trademarks or registered
trademarks of Business Objects S.A. in the United States and in other countries. Business Objects is an SAP company.
All other product and service names mentioned are the trademarks of their respective companies. Data contained in this document
serves informational purposes only. National product specifications may vary.
These materials are subject to change without notice. These materials are provided by SAP AG and its affiliated companies ("SAP
Group") for informational purposes only, without representation or warranty of any kind, and SAP Group shall not be liable fo r errors or
omissions with respect to the materials. The only warranties for SAP Group products and services are those that are set forth in the
express warranty statements accompanying such products and services, if any. Nothing herein should be construed as constituti ng an
additional warranty.