2661878___HANA_System_Replication_log_replay_setting_recommendations_for_large_systems_v13
2661878___HANA_System_Replication_log_replay_setting_recommendations_for_large_systems_v13
2661878 - HANA System Replication log replay setting recommendations for large systems
Component: HAN-DB-HA (SAP HANA > SAP HANA Database > SAP HANA High Availability (System Replication, DR, etc.)), Version: 13, Released On: 06.05.2024
Symptom
Log replay on secondary site is unable to catch up with the redo log generation on primary site and therefore the replay backlog increases.
When looking at the current callstacks on secondary site, frequently only single recovery queues are active and log replay is not utilizing multiple recovery queues.
Other Terms
System Replication, Logreplay, Replay Backlog
Solution
If there is a high amount of redo log generated, then the replay step size can be enlarged by setting the following configuration parameters on a System Replication secondary site,
which can result in a better overall log replay throughput:
indexserver.ini/[system_replication]/logshipping_replay_push_persistent_segment_count = 64 (default: 20 each 64 MB -> 1280 MB cache size)
indexserver.ini/[system_replication]/logshipping_replay_logbuffer_cache_size = 21474836480 (default: 4294967296 Bytes = 4 GB cache size)
indexserver.ini/[pitrestart]/replay_step_size = 1073741824 (default: 2097152 log positions á 64 Bytes -> 128 MB, no memory consumption here)
The parameter replay_step_size is only relevant for continuous log replay in HANA 1 SP11 and SP12. Starting with HANA 2 the parameter is only evaluated for log replay, that is done as part of the takeover.
The replay step size is limited by the amount of redo log, that can be cached in memory. Therefore, the cache memory settings have to be increased on secondary site.
In addition the replay step size itself has to be set to a higer value, but this does not consume additional memory.
A disadvantage of increasing the configuration parameters logshipping_replay_push_persistent_segment_count and logshipping_replay_logbuffer_cache_size is higher memory consumption on secondary site while the system is replicating.
So with the parameter changes above, the memory consumption on secondary site increases per indexserver by:
persistent segments: 64 x 64 MB - 1280 MB = 2816 MB
logbuffer cache size: 20 GB - 4 GB = 16 GB
So in summary around 19 GB more memory per indexserver is needed by this configuration.
For very large scale systems with extremely high load, the parameters could be increased even further, eg to the following settings:
indexserver.ini/[system_replication]/logshipping_replay_push_persistent_segment_count = 256
indexserver.ini/[system_replication]/logshipping_replay_logbuffer_cache_size = 51539607552
indexserver.ini/[pitrestart]/replay_step_size = 1073741824
This means overall around 61 GB more memory consumption per indexserver compared to the default configuration.
In order not to waste memory resources, it is recommended to use a step-wise approach and increase the parameter values step by step to higher values in multiple iterations until the right configuration is found.
2732928 How to deal with Alert "Service on <hostname>:<service port number> has increased log replay backlog"