Replication Server Notes
Replication Server Notes
Version 1.5
Page 2
03/05/2013
Table of Contents
Document Revision 1.5 .............................................................................................................................. 4
Introduction & Disclaimer .......................................................................................................................... 4
Repserver Components ............................................................................................................................... 4
More Detailed Look at the Components .................................................................................................. 4
Examine replication environment ............................................................................................................ 4
Repserver BASICS ..................................................................................................................................... 5
General Install ........................................................................................................................................ 5
Table Defs Install.................................................................................................................................... 5
Warm Standby Install ............................................................................................................................. 6
Warm Standby Switch over .................................................................................................................... 7
Database (MSA) repdef .......................................................................................................................... 7
Manually set up connections ................................................................................................................... 8
setup primary db's for rep........................................................................................................................ 8
Function Repdefs (stored procedure replication) ..................................................................................... 9
Replication Tuning Notes ......................................................................................................................... 10
Golden rules ......................................................................................................................................... 10
Find Bottlenecks ................................................................................................................................... 10
Configure the rep agent to trace LTL--write output to a trace file (not to ASE log) Error! Bookmark not
defined.
Turn on Rep Agent tracing and DSI/function string tracing ................. Error! Bookmark not defined.
Turn off Rep Agent tracing and DSI/function string tracing ................ Error! Bookmark not defined.
Tuning .................................................................................................................................................. 12
Tuning RSSD.................................................................................................................................... 12
Tuning Replicate DB ........................................................................................................................ 13
Tuning DSI ....................................................................................................................................... 13
Monitor Counters.................................................................................................................................. 13
Not requiring Setup........................................................................................................................... 13
Requiring Setup ................................................................................................................................ 13
Disaster Recovery Notes ........................................................................................................................... 14
Recover from reloading Primary Database ............................................................................................ 14
Skipping transactions ............................................................................................................................ 14
Stop Replication ................................................................................................................................... 14
Replaying Transaction Logs ................................................................................................................. 14
Rebuild a Stable Device - with tran log ................................................................................................. 15
Rebuild a Stable Device - without tran log ............................................................................................ 15
Restore the RSSD from backup ............................................................................................................. 16
General Troubleshooting .......................................................................................................................... 16
Stable Queue Full ................................................................................................................................. 16
Ignoring duplicate keys when we have a lot, use error class! .............................................................. 17
Reverse Engineering an Error Class ...................................................................................................... 17
HowTo determine the error class configured for a connection ............................................................... 18
Displays all Replication Server configuration parameters. ..................................................................... 19
Determine Latency ............................................................................................................................... 19
Dropping Subscriptions Fast ................................................................................................................. 19
Detecting loss ....................................................................................................................................... 19
Repserver Trace Flags .......................................................................................................................... 20
Configure the rep agent to trace LTL--write output to a trace file (not to ASE log) ............................... 20
Turn on Rep Agent tracing and DSI/function string tracing ............................................................... 20
Turn off Rep Agent tracing and DSI/function string tracing .............................................................. 20
Appendix A Shell scripts ....................................................................................................................... 22
rs_checkreplag.ksh ............................................................................................................................... 22
www.ddsafe.co.uk
Version 1.5
Page 3
03/05/2013
sp__queueinfo ...................................................................................................................................... 23
Appendix B troubleshooting .................................................................................................................. 23
Uninstall repserver program .................................................................................................................. 23
Logical Connection will not Drop ......................................................................................................... 23
www.ddsafe.co.uk
Version 1.5
Page 4
03/05/2013
Repserver Components
====================
SQM (Stable Queue Manager) to manage inserts/deletes and prevent duplicates. One per
Queue
LTM, Log Transfer Manager. Reads the transaction log.
Inbound queue. Holds transactions from LTM. 'admin who, sqm' shows these, e.g. 456:1. the
':1' means inbound
Outbound queue. Holds trans. to be replicated 'admin who, sqm' shows these, e.g. 457:0.
the ':0' means outbound. Has 2 types of queue. Data Server Interface (DSI) and
Replication Server Interface (RSI), used across routes.
Distributor (DIST). Matches repdefs with subscriptions, so messages applied correctly to
replicate. One DIST thread per inbound queue.
SQT (Stable Queue Transaction Manager) ensures queues are accessed in transactional manner. SQT has 4
queues:* Open queue that holds transactions until commit or rollback is read from LTM
* Closed queue holds completed transactions.
* Read queue holds data that has been read from the Closed queue and
a receipt of the transaction received. Tran is then removed from
queue.
* Truncation queue holds begin tran record. Queue is used to determine which
transactions can be deleted.
DSI
RSI
www.ddsafe.co.uk
Version 1.5
Page 5
03/05/2013
rs_helpdb
-- in RSSD
rs_helproute
-- in RSSD
rs_helpsub
-- in RSSD, details table subscriptions
rs_helpdbsub
-- in RSSD
rs_helppubsub
-- in RSSD, if using publications
rs_helpdbpub
-- in RSSD, details publication subscriptions, articles and subscibers
rs_helpuser
-- in RSSD
To look at current connection settings, use admin config.
admin config [,[{"connection" | logical_connection}, data_server, database] |
["route", repserver]] [, configuration_name]
Example:admin config, "connection", <servername>, <dbname>, dsi_quoted_identifier
#-----------------------------------#
Repserver BASICS
#-----------------------------------#
If you use rs_init to configure replication and it fails, you can sometimes get more
information out of the rs_init log files. These are located at
$SYBASE/$SYBASE_REP/init/logs
General Install
Use rs_init to install repserver & set up the RSSD. Create stable queue files first
(using touch). Once this is complete, you need to add connections to the primary and
replicate dataservers and databases. See the sections below on how to do this. To use
the GUI (rs_init), create a rep maint user in the DB using sp_adduser. Remove this later
and add as alias to dbo using sp_addalias.
Alter repdef
============
** This also fixes the subscription automatically
alter replication definition prim_tab1_repdef
add c char(10) null
Testing
=======
declare @cnt int
declare @b_val char(10)
declare @c_val char(10)
select @cnt=2
while @cnt<10
BEGIN
select @b_val='test' + convert(char(5), @cnt)
select @c_val='test' + convert(char(5), @cnt+@cnt)
INSERT INTO pdb1..prim_tab1(a,b,c) values (@cnt, @b_val, @c_val)
select @cnt=@cnt+1
END
www.ddsafe.co.uk
Version 1.5
Page 6
03/05/2013
www.ddsafe.co.uk
'disable'
'enable', 'repserver', 'repserver_ra', 'repserver_ra_ps'
'priority', '5'
'send buffer size', '16k'
'scan batch size', '1000'
'send warm standby xacts', true
Version 1.5
Page 7
03/05/2013
sp_setreplicate rs_marker,"true"
go
sp_setreplicate rs_update_lastcommit,"true"
go
Dump'n'Load databases
--------------------Immediatly dump and load the database from Active to Standby database.
Make sure the "warmsby_maint" has SELECT, DELETE, etc permissions are set on Standby
database
or
use warmsby_copy
go
sp_dropuser 'warmsby_maint'
go
sp_addalias 'warmsby_maint', 'dbo'
go
In RS
----resume connection to SRV2.warmsby_copy
go
In RS
----resume connection to SRV2.warmsby_copy
go
Note: if the old primary database has been shutdown or is no longer contactable, the
logical status for it will remain as Suspended/Waiting for Enable Marker until it is
fixed. Once the server comes backon line, resume the connection and Operation in
Progress will go back to None
www.ddsafe.co.uk
Version 1.5
Page 8
1>
2>
3>
4>
5>
6>
7>
In RDB
======
To avoid any permission issues in replicate DB
Use test_rep_db
go
sp_addalias 'test_rep_db_maint','dbo'
go
At this point the live database should be dumped and loaded into replicate database.
When the dumps have completed, resume the connection to the standby sites.
In PRS
======
resume connection to SRV1_ASE.test_rep_db
go
www.ddsafe.co.uk
03/05/2013
Version 1.5
Page 9
03/05/2013
www.ddsafe.co.uk
Version 1.5
Page 10
03/05/2013
Golden rules
==================
1. Never have repdefs, which are not subscribed to. All transactions on replicated
tables are sent to the Inbound Queue (IBQ), sorted into commit order and translated to
Log Transfer Language(LTL). Only then are they checked for subscriptions. This results
in wasted space in the IBQ and processing by the SQT manager.
2. Make sure SQT has enought memory allocated. Also, check memory_limit
rs_configure 'sqt_max_cache_size' to 'xxxxx'
Find Bottlenecks
=======================
select * from master..syslogshold
Measure diff between repagent position and end of log (1TP & 2TP)
------------------------------------------------------------------rep agent - value of 'Current Marker' column, example (53550,1)
sp_help_rep_agent <db_name>
-- read until end of log
dbcc traceon(3604)
dbcc pglinkage(<dbid>, <current_marker>, 0,2,0,1)
example: dbcc pglinkage(5, 53550, 0,2,0,1)
example outout: "3909 pages scanned"
-- So repagent if 3909 pages behind log truncation marker.
-- We should have very little lag!
(see rs_checklag.ksh in
Repserver Trace Flags
The following Rep Server traceflags will track the commands being written to the stable
queue, and being passed to the Replicate dataserver.
Flag: SQM, SQM_TRACE_COMMANDS
This flag is used when you want to know what commands have been written to the stable
queue.
Flag: DSI, DSI_BUF_DUMP
Use this flag when you want to know what is in the language command buffer passed to
dbcmd()
Replication Server accepts on-line trace command from isql as follows:
trace { "on" | "off" }, module, trace_flag
e.g., trace on,sqm,sqm_trace_commands
both module and trace flag can be either upper or lower case.
Replication Server accepts trace flags from the config file. The syntax is
trace=module,trace_flag
e.g., trace on, dsi,dsi_buf_dump
Keep in mind that these will trace ALL commands, so will produce large amounts of output.
www.ddsafe.co.uk
10
Version 1.5
Page 11
03/05/2013
www.ddsafe.co.uk
11
Version 1.5
Page 12
03/05/2013
Tuning
======
Tuning Primary DB
----------------sp_help_rep_agent <db_name>, 'config'
sp_config_rep_agent <db_name>, scan_batch_size, '10000' --max num records sent to RS
sp_config_rep_agent <db_name>, 'batch_ltl, 'true' --LTL cmds batched up then sent to RS
sp_config_rep_agent <db_name>, send_buffer_size, '16k' -- network packet size
sp_config_rep_agent <db_name>, priority, '2' --default is 5. lower=higher priority
WARNING: making changes to the rep agent can cause a warm stby connection to fail, if the
replicate DB name is different. Requires a resume connection..skip transaction. And the
config changes to be repeated at the replicates rep agent.
Tuning Rep Server
----------------Note: 1 Repserver = 1 CPU
admin who, sqt
existing values are stored in RSSD. Use:
select optionname, charvalue from rs_config
configure replication server set sqt_max_cache_size to '20971520' in RS, or
rs_configure 'sqt_max_cache_size' to 'xxxxx' -- in RSSD,
Ensure value of (sqt_max_xcache_size * num. of queues) is less than memory_limit.
Suggest setting sqt_max_cache_size to 20mb (20971520 bytes)
Max memory_limit = 2047 (just under 2Gb)
Use RAW device for Stable Device.
rs_configure 'num_threads, 75 -- if using Open Server (replicating to non Sybase DB)
Tuning RSSD
----------sp_config_rep_agent <db_name>, priority, '2' --RSSD can have it's own repagent
Put on same machine as RS.
use 'localhost <port>' in interfaces file for ASE and RS
example:
REP1_RS
master tcp ether localhost 10010
master tcp ether <server> 10010
query tcp ether <server> 10010
--keeps rs system tables in memory.
configure replication server set sts_full_cache_rs_classes to 'on'
configure replication server set sts_full_cache_rs_columns to 'on'
configure replication server set sts_full_cache_rs_config to 'on'
configure replication server set sts_full_cache_rs_databases to 'on'
configure replication server set sts_full_cache_rs_datatype to 'on'
configure replication server set sts_full_cache_rs_diskaffinity to 'on'
configure replication server set sts_full_cache_rs_functions to 'on'
www.ddsafe.co.uk
12
Version 1.5
Page 13
03/05/2013
Tuning Replicate DB
------------------change maint user priority in ASE
drop referential integrity checks (foriegn keys)
use func. strings instead of triggers.
Tuning DSI
---------Incease replicate-ASE no. of locks
dsi_max_xacts_in_group
alter connection to RDS.rdb set db_packet_size to 'xxx'
switch on replicate minimal columns --use all columns if replicating to non-Sybase DB
Use parrallel DSI threads (do not do this lightly):parallel_dsi (sets standard values on multiple settings below)
dsi_num_threads
dsi_serialization_method
{none|wair_for_commit|isolation_level_3|single_transaction_per_origin}
dsi_sqt_max_cache_size
dsi_large_xact_size
dsi_num_large_xact_threads
dsi_partitioning_rule
** Recommend using dsi_serialization_method 'none' followed by 'isolation_level_3'
** Recommend using 'time' partioning
For more tuning advice, see
http://www.petersap.nl/SybaseWiki/index.php?title=Performance_Tuning&printable=yes
Monitor Counters
========================
Requiring Setup
------------select * from rs_statdetails, rs_statrun
setup:
set stat_sampling to 'on'
admin stats_intrusive_counter, 'on'
stats_flush_rssd to on
stat_reset_afterflush to on
stat_daemon_sleep_time to '600'
admin stat_config_module, 'all_modules', 'on'
admin stat_config_connections
admin statatistics, flush_statistics
See White paper: "Sybase Replication Preformance and Tuning" by Jeff Tallman
http://my.sybase.com/detail?id=1015811
www.ddsafe.co.uk
13
Version 1.5
Page 14
03/05/2013
Skipping transactions
----------------------If we encounter a duplicate insert error
#on RS:
resume connection to <ase>.<rdb> skip transaction
#on RSSD:
--find transaction id
rs_helpexception
--get SQL
rs_helpexception <tran_id>, v
Stop Replication
---------------#on pdb:
select * from master..syslogshold where dbid=db_id(<pdb>)
go
sp_stop_rep_agent <pdb>
go
dbcc settrunc(ltm, ignore)
go
www.ddsafe.co.uk
14
Version 1.5
Page 15
03/05/2013
allow connections
go
-- Method shows the use of temporary database to hold database.
create database called 'temp_rep' then configure for replication.
use temp_rep
go
exec sp_config_rep_agent temp_rep, 'enable', '<RS>', 'sa', '<passwd>'
go
use master
go
load database temp_rep from '<dump_file>'
go
-- the "connect database" refers to <pdb>
exec sp_start_rep_agent temp_rep, recovery, '<ase>', '<pdb>', '<RS>'
go
--Once complete, RepAgent will shutdown
--Now repeat these steps for each tran. log. Load and start RepAgent.
--** Check replication Server errorlog for any messages about "loss detection". If none
found...
--restart RS in normal mode.
#on pdb
--put back 2TP
dbcc settrunc(ltm, valid)
go
sp_start_rep_agent <pdb>
go
--drop temp_rep!
www.ddsafe.co.uk
15
Version 1.5
Page 16
03/05/2013
General Troubleshooting
Stable Queue Full
Double check queue is full
In RSSD
=======
rs_helppartition
restart rep agent and connections
=================================
In PDB
-----sp_help_rep_agent pdb
sp_stop_rep_agent pdb
sp_start_rep_agent pdb (status should be not active)
In RS
----Suspend connection to server1.pdb
Resume connection to pdb
Increase stable queue
======================
In RS
---admin disk_space (shows existing partitions)
touch /usr/replication/queue10.dat
add partition sq_part10 on /usr/replication/queue10.dat with size 1000 (in Mb)
You can use drop partition sq_part10 online at a later time
www.ddsafe.co.uk
16
Version 1.5
Page 17
03/05/2013
REP_SERVER>
REP_SERVER>
REP_SERVER>
REP_SERVER>
Now that weve created the error class and set it to ignore duplicates, we need to do two
last things:
alter the DSI connections to use the new error class
suspend and then resume the DSI connections for the DSIs to use the new error class
1:
2:
3:
1:
2:
1:
2:
REP_SERVER>
REP_SERVER>
REP_SERVER>
REP_SERVER>
REP_SERVER>
REP_SERVER>
REP_SERVER>
Generally, applications should not be performing data entry of the same data across the
replicated databases as Replication Server is made for it.
www.ddsafe.co.uk
17
Version 1.5
Page 18
03/05/2013
===========
select ds_errorid, action=v.name
from rs_erroractions e, rs_classes c, rs_tvalues v
where e.errorclassid=c.classid
and e.action=v.value
and v.type='ERR'
and c.classname='ASEallowdupsErrorClass'
order by 1
go
Now do a diff against these files and any different codes will be displayed. To find
out what the codes are, in RSSD
rs_helperror 2601, v
http://infocenter.sybase.com/help/index.jsp?topic=/com.sybase.infocenter.dc00783.1550/html/nfg_rs/CDD
HIGGE.htm
create replication server error class composer_repserver_error_class
go
-- Following row added to rs_classses
-- composer_repserver_error_class 0x010000650100006b R
16777317
0x0000000000000000
--
rs_init_erroractions composer_repserver_error_class,rs_repserver_error_class
go
--you will see the following rows inserted into rs_erroractions
-5185 0x010000650100006b
3
16777317
-5186 0x010000650100006b
2
16777317
-5187 0x010000650100006b
3
16777317
-5193 0x010000650100006b
2
16777317
assign action ignore for composer_repserver_error_class to 5185
go
--This row updated in rs_erroractions
-5185 0x010000650100006b
1
16777317
alter connection to AGSIT_DB_CW.AG_SIT_ComposerWeb
set replication server error class to composer_repserver_error_class
go
suspend connection to AGSIT_DB_CW.AG_SIT_ComposerWeb
go
resume connection to AGSIT_DB_CW.AG_SIT_ComposerWeb
go
-- rs_helpdb now shows connection with new Rep Server Error Class:-dsname
dbname
dbid
errorclass
repserver_errorclass
status
www.ddsafe.co.uk
18
controlling_prs
funcclass
Version 1.5
Page 19
03/05/2013
Determine Latency
RDB
===
Select PrimaryDBID=origin, datediff(ss, origin_time, dest_commit_time) Latency (sec),
LastXactOriginTime = origin_time
FROM rs_lastcommit where origin > 0
go
In RSSD
=======
delete from rs_subscriptions where subname=<subname>
go
delete from rs_dbreps where dbrepname='<db_repdef_name>
go
Now you can drop connections!
Detecting loss
Sometimes replication stops without an error. This could happen after a restore of the
primary database. If message loss occurs we will not always see this using admin who
and repserver might not print a detecting loss message to the errorlog. Check the
rs_oqid and rs_exceptslast in the RSSD and to see if some of the queues show a status of
2 which indicates that the queue is suspended due to lost messages.
If repserver has not correctly recognised that loss has occurred, then in order for
repserver to ignore these errors, we must get it to find them. Restart repserver and
check the errorlog for message:
DSI: detecting loss for database
In RS
====
Ignore loss from prim_server.prim_db
go
www.ddsafe.co.uk
19
Version 1.5
Page 20
03/05/2013
www.ddsafe.co.uk
20
Version 1.5
Page 21
>go
www.ddsafe.co.uk
21
03/05/2013
Version 1.5
Page 22
03/05/2013
www.ddsafe.co.uk
22
Version 1.5
Page 23
03/05/2013
sp__queueinfo
create proc sp__queueinfo
as
set nocount on
declare @total varchar(10),
@free varchar(10),
@freeperc varchar(10),
@repserver varchar(30),
@datetime varchar(20)
select @repserver = charvalue from <rssd_dbbname>..rs_config where optionname = 'oserver'
select @datetime = convert(varchar(10),getdate(),101)+" "+convert(varchar(8),getdate(),8),
@total = convert(varchar(10),sum(num_segs)),
@free = convert(varchar(10),sum(num_segs)-sum(allocated_segs)),
@freeperc = convert(varchar(12),convert(numeric(10,2),
(convert(real,(sum(num_segs)-sum(allocated_segs))) /
convert(real,sum(num_segs)))*100 ))
from <rssd_dbbname>..rs_diskpartitions
print "Stable Queue Information for %1! at %2!",@repserver, @datetime
print "Total Partition Size = %1!MB, Space Remaining = %2!MB
(%3!%%)",@total,@free,@freeperc
select rs.q_number,rs.q_type, ( select dsname+'.'+dbname
from <rssd_dbbname>..rs_databases
where dbid = rs.q_number
and rs.q_number != 0) queue_name, count(*) "size(MB)"
from <rssd_dbbname>..rs_segments rs
group by q_number,q_type
having q_number != 0
order by count(*) desc
Appendix B troubleshooting
Uninstall repserver program
if you want to trash your repserver and start over agin, you may find that
it will not uninstall. If that is the case, follow these instructions
The installer reads and maintains version information in a file called
"vpd.properties", which is probably still located in the "C:\Windows"
directory; removing the install directory of repserver won't remove this
file.
Please do the following:
1. rename the vpd.properties file at C:\windows or the drive where your
Windows is installed
2. go into Control Panel, create a new system environment variable
"INSTALL_ALL_PATCH", and give it any value (e.g. "1")
3. install the repserver
4. remove the "INSTALL_ALL_PATCH" variable
www.ddsafe.co.uk
23
Version 1.5
Page 24
03/05/2013
Server 'AGSIT_DB_REP_RS':
Can not drop logical connection to COMPOSER_DS.SIT_Composer because either subscriptions
of repdefs exist for it"
Check
select * from rs_databases
select * from rs_object
if the rs_databases.. dist_status or src_status are greater than 1, then this indicates an issue.
The connection could have any of the following
Status of the connection. Can be:
0x1 valid
0x2 suspended
rs_drp0x0 is an internal repdef which belongs to 102. you can manually delete it, then, issue
drop logical connection.
www.ddsafe.co.uk
24