Goldengate Internationalization Best Practices For V11.2.1: Oracle-To-Oracle
Goldengate Internationalization Best Practices For V11.2.1: Oracle-To-Oracle
1
Wendong Zhu, Makoto Tozawa, Kenneth Tang , Mahadevan Lakshminarayanan
Summary
Prior to version 11.2.1, Oracle GoldenGate (OGG) had limited support for internationalization. For
instance, there was no character set management; OGG only supported character set conversion
through Oracle’s OCI when the target was an Oracle database; there was limited character set support
in the database structural metadata names.
With version 11.2.1 and later, OGG implements character set management and fully supports multibyte
characters, special characters and case sensitivity. In addition, OGG supports character set conversion
on non-Oracle databases, which is a new feature in this release. See Appendix B for more details.
This document recommends best practices for GoldenGate international deployment, and it describes
common limitations that customers will potentially run into.
Character set support and handling is a primary topic in GoldenGate internationalization support. There
are four types of configurations where character sets must be handled, based on the database types of
the source and target.
1. Oracle-to-Oracle.
2. Oracle-to-non-Oracle.
3. Non-Oracle-to-Oracle.
4. Non-Oracle-to-Non-Oracle.
Oracle-to-Oracle
For an Oracle-to-Oracle configuration, there are two scenarios: the source character set is the same as
the target character set, and the source character set is different from the target character set.
Source database character set is the same as target database character set
When Oracle source and target character sets are the same, OGG supports a pass-through configuration
in which the overhead of character set conversion is bypassed. To use a pass-through configuration, all
of the data that OGG processes in this configuration must have valid characters (code points)
corresponding to the database character set. For example, OGG cannot replicate French characters
even though the source and target databases are of the same Japanese character set. NLS_LANG must
be set in the Replicat parameter file with a SETENV parameter. The NLS_LANG character set value must
be set to match that of the source database.
Source database character set is different from target database character set
When Oracle source and target character sets are different, OGG supports character set conversion
using Oracle’s OCI. OGG’s own character set conversion is disabled by default. This configuration
requires specific settings:
• The target database character set must be a superset of the source character set, or one that is
compatible with that of the source.
• NLS_LANG must be set with the SETENV parameter in the Replicat parameter file. The
NLS_LANG character set value must match that of the source database. For example, for a
replication from a Japanese JA16EUC source database to a AL32UTF8 target database, users can
set NLS_LANG as follows in the Replicat parameter file:
SETENV (NLS_LANG=.JA16EUC)
Although GoldenGate supports character set conversion in a bidirectional scenario, it requires the
characters (or code points) to be valid in both database character sets.
Only ASCII and Japanese characters can be replicated successfully in both directions. AL32UTF8 is a
superset of JA16SJIS. If any character is valid in database B, but invalid in database A, the character will
be lost when it is replicated from database B to A.
Oracle-to-non-Oracle
For an Oracle to non-Oracle configuration, OGG’s own character set conversion is enabled by default. It
converts the source character set into the character set of the target client/session.
To avoid the overhead of character set conversion from the target client/session character set to the
target database character set, users should set the target client/session character set value to match
that of the target database. For example, if the source character set is JA16EUC Oracle and the target
character set is UTF-8 Sybase, users need to set utf8 in locales.dat to define the session character set on
the target. OGG’s character set conversion will convert data from JA16EUC to UTF-8. See Appendix B for
more details on OGG’s character set conversion.
Non-Oracle-to-Oracle
For a non-Oracle to Oracle configuration, there are different configurations, depending on whether the
source database character set is EBCDIC-based or ASCII-based.
Note that ASSUMETARGETDEFS cannot be used for non-Oracle to Oracle replication. OGG does not
automatically adjust the schema length. Users must ensure that the target schema is of sufficient length
to hold the converted data contents. Refer to "Handling Character Length Semantics" in this document
for more details on SOURCEDEFS.
If the source database character set is EBCDIC, OGG’s own character set conversion is enabled on the
target Oracle database. (This would also be true if the target were non-Oracle.)
1. There are 12 Oracle character sets (AZ8ISO8859P9E, CL8KOI8R, JA16SJIS, JA16EUC, ZHT16BIG5,
JA16EUCTILDE, JA16SJISTILDE, WE8DEC, ZHS16CGB231280, ZHT16BIG5, ZHT16HKSCS, ZHT32EUC)
that do not have the exact same character sets in OGG, and thus are not supported by OGG. If
the target Oracle character set is one of those character sets, use a SETENV parameter in the
Replicat parameter file to set NLS_LANG to UTF-8 ; for example: SETENV (NLS_LANG
= .AL32UTF8). OGG will convert the EBCDIC character set into UTF-8, and then Oracle will
perform the conversion of UTF-8 to the target database character set. Note that the character
set conversion occurs twice: first by Oracle GoldenGate (from source database EBCDIC character
set to UTF-8) and next by OCI (from UTF-8 to the unsupported character set of Oracle
GoldenGate).
2. If the target Oracle character set is NOT one of the preceding 12 character sets, use a SETENV
parameter in the Replicat parameter file to set NLS_LANG to match that of the target Oracle
character set. Setting NLS_LANG to the target character set in the Replicat parameter file directs
the OCI to send the data in pass-through mode, because target database session character set is
the same as the database character set. This configuration avoids the overhead of one more
conversion by OCI.
1. Set the target NLS_LANG character set (Oracle’s character set) to be equivalent to the source
database character set. See the Appendix for a list of Oracle’s character set names that are
equivalent to the character set names of other databases.
2. For the following scenarios, you must explicitly enable OGG’s character set conversion by
specifying CHARSETCONVERSION in the Replicat parameter file, and you must use the SETENV
parameter in Replicat to set the NLS_LANG character set to that of the target Oracle database.
1) If the goal is to consolidate two different character sets into a single database character set,
you must specify CHARSETCONVERSION and set NLS_LANG to the target database character
set. An example is when data is being replicated from SQL Server (GBK) to Oracle (UTF-8)
and there are both VARCHAR to VARCHAR and NVARCHAR to VARCHAR mappings.
2) If the data type of the source column can contain 4000 or more characters in multibyte
encoding, and if the target column is Oracle varchar2(4000) in single-byte encoding, you
must specify CHARSETCONVERSION in Replicat and set NLS_LANG to the target database
character set. An example is when the source column is SQL Server varchar(8000) or Oracle
CLOB or LONG, and the target column is Oracle varchar2(4000). When replicating from CLOB
in a UTF-8 source database to a target WE8MSWIN1252 Oracle database, the source 4000
characters (12000 bytes) of EURO SIGN (U+20AC) in the CLOB column will be truncated
when mapped to the target VARCHAR2 column if using OCI to perform character set
conversion (the default). Even if 4000 characters of EURO SIGN become 4000 bytes in
WE8MSWIN1252, which fit in the target VARCHAR2(4000) column, only 4000/3 bytes are
actually replicated when OCI performs the conversion. This is a limitation imposed by
Oracle’s OCI. To avoid data loss of VARCHAR2 in the above example, you should enable
OGG’s own character set conversion by specifying CHARSETCONVERSION and setting
NLS_LANG to the character set of the target database.
In special circumstances, there is a restriction concerning the preceding two use cases. There are
12 Oracle character sets (AZ8ISO8859P9E, CL8KOI8R, JA16SJIS, JA16EUC, ZHT16BIG5,
JA16EUCTILDE, JA16SJISTILDE, WE8DEC, ZHS16CGB231280, ZHT16BIG5, ZHT16HKSCS, ZHT32EUC)
that do not have the exact same character sets in OGG. Therefore OGG does not support
character set conversion for those character sets. If the target Oracle database character set is
one of those character sets, the recommendation is as follows:
For the preceding scenario 1, use SETENV to set the NLS_LANG character set to be AL32UTF8.
For the preceding scenario 2), there is no perfect solution. You may need to choose a different
data type, such as CLOB, in target to avoid data loss of the VARCHAR2 data type.
Non-Oracle-to-Non-Oracle
For a non-Oracle to non-Oracle configuration, OGG’s own character set conversion is enabled by default.
It converts data from the source database character set to the target client/session character set.
To avoid the overhead of character set conversion from the target client/session character set to the
target database character set, set the target client/session character set value to match that of the
target database. For example, if the source character set is EUC-JIS in Sybase and the target character
set is UTF-8 in Sybase, set utf8 in locales.dat to define the session character set on the target.
If source and target databases are of the same character set, make sure the two character sets are using
exactly the same binary values to represent a character, considering that different database vendors
each provide their own definitions.
There is no multi-byte support for the Oracle GoldenGate built-in conversion functions. All character-
related operations are run in “byte mode” only. For example, @STRLEN returns the correct number of
characters for a single-byte character set. However, for a multi-byte character set, @STRLEN only
returns the length in bytes, but not the actual number of characters.
The source and target database character sets must be the same if using the Oracle GoldenGate column
mapping functions.
GoldenGate can replicate data between databases that have different character length semantics (BYTE
or CHAR). In the case of BYTE-to-CHAR replication, ASSUMETARGETDEFS is not supported, even though
the schema definitions are similar and the target CHAR-based columns will always be of sufficient length
to hold the data contents. Instead of ASSUMETARGETDEFS, you must run the DEFGEN utility to
generate a data definitions file and specify the SOURCEDEFS parameter in the Replicat parameter file.
Please see the steps below.
1. On the source system, create a parameter file named defgen.prm in the dirprm directory of the
Oracle GoldenGate home directory. For DEFSFILE, specify any name for the source definitions
file (source.def in the following example). The content of the file should look similar to the
following (substituting your own OGG home directory structure, user credentials, and table
names):
DEFSFILE C:\product\GoldenGate\dirdef\source.def
USERID goldengate, PASSWORD goldengate
TABLE goldengate.test1;
TABLE goldengate.test2;
2. Run the DEFGEN utility from the Oracle GoldenGate home directory, as shown in this example:
This example generates a definitions file named “source.def” in the dirdef directory on the
source system.
3. Copy the source definitions file to target system in the dirdef directory.
4. Add the SOURCEDEFS parameter to the Replicat parameter file, as shown in the following
example:
REPLICAT testrep
DISCARDFILE /net/stadm35/product/GoldenGate/dirrpt/discard.txt,
APPEND
SOURCEDEFS /net/stadm35/product/GoldenGate/dirdef/source.def
USERID goldengate, PASSWORD goldengate
REPERROR (default, abend)
MAP goldengate.test1, TARGET goldengate.test1;
MAP goldengate.test2, TARGET goldengate.test2;
Case Sensitivity
In release 11.2.1, OGG supports both case-sensitive and case-insensitive names if the database supports
them. OGG also supports object-level case sensitivity.
OGG uses the locale to compare case-insensitive object names in databases other than Oracle and
Teradata. For example, if the locale of a DB2 database is Turkish, the upper case of "i" is the capital
letter I with dot above “İ”, not the normal capital letter "I".
1. Case-sensitive to case-insensitive: Source tables mytable and MYTABLE are both mapped to the
same target table mytable (or MYTABLE).
3. Case-sensitive to Oracle or DB2: Case-sensitive table name mytable is mapped to target table
name mytable. If mytable does not exist in the target database, the case-sensitive mytable is
mapped to MYTABLE , which is known as “fallback mapping.” Fallback mapping is applied to
default column mapping too.
Miscellaneous
CONVERTUCS2CLOB Deprecated
CONVERTUCS2CLOB option for Extract is deprecated as of OGG version 11.1.1 for the CLOB data type. It
is supported for compatibility reasons, however. OGG automatically converts UCS2 CLOB data to the
database character set, if the character set is multibyte. Users no longer need to specify
CONVERTUCS2CLOB explicitly.
USEANSISQLQUOTES
By default, any double-quoted string in a COLMAP or SQLEXEC clause is treated as a string literal for
backward compatibility. To make OGG recognize a double-quoted string as a column name, specify the
USEANSISQLQUOTES parameter in the GLOBALS file, and also enclose every string literal within double
quotes in the parameter files. This applies to single-byte data only; multibyte string literals are not
supported in parameter files in version 11.2.1.
CHARSETCONVERSION
The use of CHARSETCONVERSION in the Replicat parameter file explicitly enables OGG's character set
conversion feature, overriding the default use of Oracle’s OCI. This parameter addresses certain
scenarios in an Oracle-to-Oracle and non-Oracle-to-Oracle configuration. Please refer to the section
“Configurations for Database Character Set Handling” for more details.
TARGETDEFS
Replicat performs character set conversion. OGG does not support the performing of character set
conversion by Extract. Therefore, the target character set must be the same as the source character set
when using the TARGETDEFS parameter.
CHARSET
The parameter “CHARSET” is new in release 11.2.1. It allows users to enter multibyte characters
into the parameter file without using Unicode notation such as “\uXXXX”. For example, users
can enter multibyte table names and directories in parameter files. Place the CHARSET
parameter on the first line of all OGG parameter files, as shown in the example below. See also
"Character Set in Database, Client, OS and Terminal":
CHARSET ibm-935
As an alternative to specifying CHARSET in every parameter file, you can specify it in GLOBALS to affect
all parameter files. You can combine the two uses: If CHARSET is specified in GLOBALS and also in any
given parameter file, OGG reads the parameter file in that file’s CHARSET character set, regardless of the
GLOBALS setting. The parameter files must be transferred in binary mode when containing non-ASCII
characters.
When configuring character sets in Oracle GoldenGate, users need to consider the following different
levels of character set: database server, client/session, operating system (OS), and terminal.
Different databases use different mechanisms to control the session character set. For example:
• The DB2 and SQL Server session character set is controlled by the locale setting (LANG/LC_ALL
or OS locale).
• The MySQL and Teradata session character set is controlled by the Oracle GoldenGate
SESSIONCHARSET parameter.
Operating system character set
By default, the operating system character set determines the encoding of Oracle GoldenGate text files,
such as the data-definitions file produced by DEFGEN and the parameter files. However, the encoding
of text files can be specified with the CHARSET parameter.
Consider the correct settings for the character set when multibyte data or object names are involved in
parameter files or during interaction with the database. For example, on a MCCSID DB2 for z/OS system,
if a table name contains multibyte characters, the DEFGEN parameter file should be similar to the
following to get a definitions file with the correct code page:
CHARSET ibm-935
SOURCEDB ...
DEFSFILE ./difdef/source.def, PURGE, CHARSET ibm-935
TABLE MBtable;
CHARSET ibm-935
EXTRACT extzos
SOURCEDB ...
...
TABLE MBTable;
CHARSET ibm-935
REPLICAT repzos
TARGETDB ...
SOURCEDEFS ./dirdef/source.def
MAP MBTable, ...;
The terminal character set must be set to the character set of the operating system in order to
display console I/O properly; and the terminal character set must be set to the NLS_LANG
character set when executing PL/SQL by running a SQL script. The following table summarizes
the OGG character sets used in different places.
More Information
For additional information about Oracle GoldenGate internationalization support, see the Oracle
GoldenGate Administrator’s Guide and the Oracle GoldenGate Reference Guide. You can find a list of
supported data types in the Oracle GoldenGate for Oracle Installation and Setup Guide.
Appendix A
Table: Oracle’s character set names equivalent to 3rd-party database character names
BLT8ISO8859P13 921
CEL8ISO8859P14
NE8ISO8859P10
NEE8ISO8859P4 914
SE8ISO8859P3 913
TH8TISASCII tis620
UTF-8 9400
ZHT16BIG5 big5
ZHT16HKSCS 1375
ZHT32EUC 964
Appendix B OGG’s Character Set Conversion
Oracle GoldenGate uses the database-provided character encoding conversion features and provides
international character support. However, prior to Oracle GoldenGate 11.2.1, the character set
conversion feature was not available for most databases that OGG supports.
Starting from V11.2.1, OGG provides character set conversion on CHAR/VARCHAR to CHAR/VARCHAR,
CLOB to CLOB, as well as conversion on CHAR/VARCHAR from/to NCHAR/NVARCHAR and CLOB from/to
NCLOB with multi byte character set support. In V11.2.1, the character set conversion is performed only
by Replicat.
OGG’s character set conversion converts data from the source database character set to the target
client/session character set.
If the target is Oracle and the character set of the target client is different from that of the server, Oracle
will perform character set conversion using OCI.
NOCHARSETCONVERSION prevents Replicat from performing the conversion of character sets and
normally should not be used, because it raises the risk of data-integrity errors. One circumstance where
NOCHARSETCONVERSION might be appropriate is when you are certain that all of the data is ASCII, and
that no ASCII-incompatible characters exist in character columns.
For non-Oracle targets, the default is CHARSETCONVERSION. For Oracle targets, the default is
NOCHARSETCONVERSION unless the captured source data is EBCDIC from DB2 z/OS: In the case of
EBCDIC, the default is CHARSETCONVERSION.
NOTE For an Oracle target to perform the conversion instead of Replicat, the Replicat parameter file
must contain a SETENV parameter that sets the NLS_LANG environment variable to the character set of
the source database.