0% found this document useful (0 votes)
332 views14 pages

Goldengate Internationalization Best Practices For V11.2.1: Oracle-To-Oracle

This document provides best practices for internationalization in Oracle GoldenGate version 11.2.1 and later. It discusses character set support and handling for different database configurations, including Oracle to Oracle, Oracle to non-Oracle, non-Oracle to Oracle, and non-Oracle to non-Oracle. It also describes character set conversion settings needed for each configuration when the source and target character sets are the same or different.

Uploaded by

Tebs A
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
332 views14 pages

Goldengate Internationalization Best Practices For V11.2.1: Oracle-To-Oracle

This document provides best practices for internationalization in Oracle GoldenGate version 11.2.1 and later. It discusses character set support and handling for different database configurations, including Oracle to Oracle, Oracle to non-Oracle, non-Oracle to Oracle, and non-Oracle to non-Oracle. It also describes character set conversion settings needed for each configuration when the source and target character sets are the same or different.

Uploaded by

Tebs A
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

GoldenGate Internationalization Best Practices for V11.2.

1
Wendong Zhu, Makoto Tozawa, Kenneth Tang , Mahadevan Lakshminarayanan

Summary

Prior to version 11.2.1, Oracle GoldenGate (OGG) had limited support for internationalization. For
instance, there was no character set management; OGG only supported character set conversion
through Oracle’s OCI when the target was an Oracle database; there was limited character set support
in the database structural metadata names.

With version 11.2.1 and later, OGG implements character set management and fully supports multibyte
characters, special characters and case sensitivity. In addition, OGG supports character set conversion
on non-Oracle databases, which is a new feature in this release. See Appendix B for more details.

This document recommends best practices for GoldenGate international deployment, and it describes
common limitations that customers will potentially run into.

Configurations for Database Character Set Handling

Character set support and handling is a primary topic in GoldenGate internationalization support. There
are four types of configurations where character sets must be handled, based on the database types of
the source and target.

1. Oracle-to-Oracle.

2. Oracle-to-non-Oracle.

3. Non-Oracle-to-Oracle.

4. Non-Oracle-to-Non-Oracle.

Oracle-to-Oracle
For an Oracle-to-Oracle configuration, there are two scenarios: the source character set is the same as
the target character set, and the source character set is different from the target character set.

Source database character set is the same as target database character set
When Oracle source and target character sets are the same, OGG supports a pass-through configuration
in which the overhead of character set conversion is bypassed. To use a pass-through configuration, all
of the data that OGG processes in this configuration must have valid characters (code points)
corresponding to the database character set. For example, OGG cannot replicate French characters
even though the source and target databases are of the same Japanese character set. NLS_LANG must
be set in the Replicat parameter file with a SETENV parameter. The NLS_LANG character set value must
be set to match that of the source database.

Source database character set is different from target database character set

When Oracle source and target character sets are different, OGG supports character set conversion
using Oracle’s OCI. OGG’s own character set conversion is disabled by default. This configuration
requires specific settings:

• The target database character set must be a superset of the source character set, or one that is
compatible with that of the source.

• NLS_LANG must be set with the SETENV parameter in the Replicat parameter file. The
NLS_LANG character set value must match that of the source database. For example, for a
replication from a Japanese JA16EUC source database to a AL32UTF8 target database, users can
set NLS_LANG as follows in the Replicat parameter file:

SETENV (NLS_LANG=.JA16EUC)

Oracle-to-Oracle Bidirectional with Character Set Conversion

Although GoldenGate supports character set conversion in a bidirectional scenario, it requires the
characters (or code points) to be valid in both database character sets.

Consider the following configuration:

Database A Character Set Database B Character Set

Japanese JA16SJIS Unicode AL32UTF8

Only ASCII and Japanese characters can be replicated successfully in both directions. AL32UTF8 is a
superset of JA16SJIS. If any character is valid in database B, but invalid in database A, the character will
be lost when it is replicated from database B to A.

Oracle-to-non-Oracle
For an Oracle to non-Oracle configuration, OGG’s own character set conversion is enabled by default. It
converts the source character set into the character set of the target client/session.
To avoid the overhead of character set conversion from the target client/session character set to the
target database character set, users should set the target client/session character set value to match
that of the target database. For example, if the source character set is JA16EUC Oracle and the target
character set is UTF-8 Sybase, users need to set utf8 in locales.dat to define the session character set on
the target. OGG’s character set conversion will convert data from JA16EUC to UTF-8. See Appendix B for
more details on OGG’s character set conversion.

Non-Oracle-to-Oracle
For a non-Oracle to Oracle configuration, there are different configurations, depending on whether the
source database character set is EBCDIC-based or ASCII-based.

Note that ASSUMETARGETDEFS cannot be used for non-Oracle to Oracle replication. OGG does not
automatically adjust the schema length. Users must ensure that the target schema is of sufficient length
to hold the converted data contents. Refer to "Handling Character Length Semantics" in this document
for more details on SOURCEDEFS.

Source database character set is EBCDIC

If the source database character set is EBCDIC, OGG’s own character set conversion is enabled on the
target Oracle database. (This would also be true if the target were non-Oracle.)

1. There are 12 Oracle character sets (AZ8ISO8859P9E, CL8KOI8R, JA16SJIS, JA16EUC, ZHT16BIG5,
JA16EUCTILDE, JA16SJISTILDE, WE8DEC, ZHS16CGB231280, ZHT16BIG5, ZHT16HKSCS, ZHT32EUC)
that do not have the exact same character sets in OGG, and thus are not supported by OGG. If
the target Oracle character set is one of those character sets, use a SETENV parameter in the
Replicat parameter file to set NLS_LANG to UTF-8 ; for example: SETENV (NLS_LANG
= .AL32UTF8). OGG will convert the EBCDIC character set into UTF-8, and then Oracle will
perform the conversion of UTF-8 to the target database character set. Note that the character
set conversion occurs twice: first by Oracle GoldenGate (from source database EBCDIC character
set to UTF-8) and next by OCI (from UTF-8 to the unsupported character set of Oracle
GoldenGate).

2. If the target Oracle character set is NOT one of the preceding 12 character sets, use a SETENV
parameter in the Replicat parameter file to set NLS_LANG to match that of the target Oracle
character set. Setting NLS_LANG to the target character set in the Replicat parameter file directs
the OCI to send the data in pass-through mode, because target database session character set is
the same as the database character set. This configuration avoids the overhead of one more
conversion by OCI.

Source database character set is ASCII-based


If the source database character set is ASCII-based and the target is Oracle, OGG’s own character set
conversion is disabled by default. OGG uses Oracle’s OCI to do character set conversion.

1. Set the target NLS_LANG character set (Oracle’s character set) to be equivalent to the source
database character set. See the Appendix for a list of Oracle’s character set names that are
equivalent to the character set names of other databases.

2. For the following scenarios, you must explicitly enable OGG’s character set conversion by
specifying CHARSETCONVERSION in the Replicat parameter file, and you must use the SETENV
parameter in Replicat to set the NLS_LANG character set to that of the target Oracle database.

1) If the goal is to consolidate two different character sets into a single database character set,
you must specify CHARSETCONVERSION and set NLS_LANG to the target database character
set. An example is when data is being replicated from SQL Server (GBK) to Oracle (UTF-8)
and there are both VARCHAR to VARCHAR and NVARCHAR to VARCHAR mappings.

2) If the data type of the source column can contain 4000 or more characters in multibyte
encoding, and if the target column is Oracle varchar2(4000) in single-byte encoding, you
must specify CHARSETCONVERSION in Replicat and set NLS_LANG to the target database
character set. An example is when the source column is SQL Server varchar(8000) or Oracle
CLOB or LONG, and the target column is Oracle varchar2(4000). When replicating from CLOB
in a UTF-8 source database to a target WE8MSWIN1252 Oracle database, the source 4000
characters (12000 bytes) of EURO SIGN (U+20AC) in the CLOB column will be truncated
when mapped to the target VARCHAR2 column if using OCI to perform character set
conversion (the default). Even if 4000 characters of EURO SIGN become 4000 bytes in
WE8MSWIN1252, which fit in the target VARCHAR2(4000) column, only 4000/3 bytes are
actually replicated when OCI performs the conversion. This is a limitation imposed by
Oracle’s OCI. To avoid data loss of VARCHAR2 in the above example, you should enable
OGG’s own character set conversion by specifying CHARSETCONVERSION and setting
NLS_LANG to the character set of the target database.

In special circumstances, there is a restriction concerning the preceding two use cases. There are
12 Oracle character sets (AZ8ISO8859P9E, CL8KOI8R, JA16SJIS, JA16EUC, ZHT16BIG5,
JA16EUCTILDE, JA16SJISTILDE, WE8DEC, ZHS16CGB231280, ZHT16BIG5, ZHT16HKSCS, ZHT32EUC)
that do not have the exact same character sets in OGG. Therefore OGG does not support
character set conversion for those character sets. If the target Oracle database character set is
one of those character sets, the recommendation is as follows:

For the preceding scenario 1, use SETENV to set the NLS_LANG character set to be AL32UTF8.

For the preceding scenario 2), there is no perfect solution. You may need to choose a different
data type, such as CLOB, in target to avoid data loss of the VARCHAR2 data type.
Non-Oracle-to-Non-Oracle
For a non-Oracle to non-Oracle configuration, OGG’s own character set conversion is enabled by default.
It converts data from the source database character set to the target client/session character set.

To avoid the overhead of character set conversion from the target client/session character set to the
target database character set, set the target client/session character set value to match that of the
target database. For example, if the source character set is EUC-JIS in Sybase and the target character
set is UTF-8 in Sybase, set utf8 in locales.dat to define the session character set on the target.

If source and target databases are of the same character set, make sure the two character sets are using
exactly the same binary values to represent a character, considering that different database vendors
each provide their own definitions.

No Multi-byte Transformation Support

There is no multi-byte support for the Oracle GoldenGate built-in conversion functions. All character-
related operations are run in “byte mode” only. For example, @STRLEN returns the correct number of
characters for a single-byte character set. However, for a multi-byte character set, @STRLEN only
returns the length in bytes, but not the actual number of characters.

The source and target database character sets must be the same if using the Oracle GoldenGate column
mapping functions.

Handling Character Length Semantics

GoldenGate can replicate data between databases that have different character length semantics (BYTE
or CHAR). In the case of BYTE-to-CHAR replication, ASSUMETARGETDEFS is not supported, even though
the schema definitions are similar and the target CHAR-based columns will always be of sufficient length
to hold the data contents. Instead of ASSUMETARGETDEFS, you must run the DEFGEN utility to
generate a data definitions file and specify the SOURCEDEFS parameter in the Replicat parameter file.
Please see the steps below.

1. On the source system, create a parameter file named defgen.prm in the dirprm directory of the
Oracle GoldenGate home directory. For DEFSFILE, specify any name for the source definitions
file (source.def in the following example). The content of the file should look similar to the
following (substituting your own OGG home directory structure, user credentials, and table
names):

DEFSFILE C:\product\GoldenGate\dirdef\source.def
USERID goldengate, PASSWORD goldengate
TABLE goldengate.test1;
TABLE goldengate.test2;

2. Run the DEFGEN utility from the Oracle GoldenGate home directory, as shown in this example:

defgen paramfile C:\product\GoldenGate\dirprm\defgen.prm

This example generates a definitions file named “source.def” in the dirdef directory on the
source system.

3. Copy the source definitions file to target system in the dirdef directory.

4. Add the SOURCEDEFS parameter to the Replicat parameter file, as shown in the following
example:

REPLICAT testrep
DISCARDFILE /net/stadm35/product/GoldenGate/dirrpt/discard.txt,
APPEND
SOURCEDEFS /net/stadm35/product/GoldenGate/dirdef/source.def
USERID goldengate, PASSWORD goldengate
REPERROR (default, abend)
MAP goldengate.test1, TARGET goldengate.test1;
MAP goldengate.test2, TARGET goldengate.test2;
Case Sensitivity

In release 11.2.1, OGG supports both case-sensitive and case-insensitive names if the database supports
them. OGG also supports object-level case sensitivity.

OGG uses the locale to compare case-insensitive object names in databases other than Oracle and
Teradata. For example, if the locale of a DB2 database is Turkish, the upper case of "i" is the capital
letter I with dot above “İ”, not the normal capital letter "I".

Examples of mapping between case-sensitive and case-insensitive databases

1. Case-sensitive to case-insensitive: Source tables mytable and MYTABLE are both mapped to the
same target table mytable (or MYTABLE).

2. Case-insensitive to case-sensitive: Case-insensitive table name may be normalized (upper or


lower cased, depending on the database) and mapped to the target table name as-is.

3. Case-sensitive to Oracle or DB2: Case-sensitive table name mytable is mapped to target table
name mytable. If mytable does not exist in the target database, the case-sensitive mytable is
mapped to MYTABLE , which is known as “fallback mapping.” Fallback mapping is applied to
default column mapping too.

Miscellaneous

CONVERTUCS2CLOB Deprecated

CONVERTUCS2CLOB option for Extract is deprecated as of OGG version 11.1.1 for the CLOB data type. It
is supported for compatibility reasons, however. OGG automatically converts UCS2 CLOB data to the
database character set, if the character set is multibyte. Users no longer need to specify
CONVERTUCS2CLOB explicitly.

USEANSISQLQUOTES

By default, any double-quoted string in a COLMAP or SQLEXEC clause is treated as a string literal for
backward compatibility. To make OGG recognize a double-quoted string as a column name, specify the
USEANSISQLQUOTES parameter in the GLOBALS file, and also enclose every string literal within double
quotes in the parameter files. This applies to single-byte data only; multibyte string literals are not
supported in parameter files in version 11.2.1.

CHARSETCONVERSION
The use of CHARSETCONVERSION in the Replicat parameter file explicitly enables OGG's character set
conversion feature, overriding the default use of Oracle’s OCI. This parameter addresses certain
scenarios in an Oracle-to-Oracle and non-Oracle-to-Oracle configuration. Please refer to the section
“Configurations for Database Character Set Handling” for more details.

TARGETDEFS

Replicat performs character set conversion. OGG does not support the performing of character set
conversion by Extract. Therefore, the target character set must be the same as the source character set
when using the TARGETDEFS parameter.

CHARSET

The parameter “CHARSET” is new in release 11.2.1. It allows users to enter multibyte characters
into the parameter file without using Unicode notation such as “\uXXXX”. For example, users
can enter multibyte table names and directories in parameter files. Place the CHARSET
parameter on the first line of all OGG parameter files, as shown in the example below. See also
"Character Set in Database, Client, OS and Terminal":

CHARSET ibm-935

As an alternative to specifying CHARSET in every parameter file, you can specify it in GLOBALS to affect
all parameter files. You can combine the two uses: If CHARSET is specified in GLOBALS and also in any
given parameter file, OGG reads the parameter file in that file’s CHARSET character set, regardless of the
GLOBALS setting. The parameter files must be transferred in binary mode when containing non-ASCII
characters.

Character Set in Database, Client, OS and Terminal

When configuring character sets in Oracle GoldenGate, users need to consider the following different
levels of character set: database server, client/session, operating system (OS), and terminal.

Session character set

Different databases use different mechanisms to control the session character set. For example:

• The Oracle session character set is controlled by the NLS_LANG setting.

• The DB2 and SQL Server session character set is controlled by the locale setting (LANG/LC_ALL
or OS locale).

• The Sybase session character set is controlled by setting in locales.dat.

• The MySQL and Teradata session character set is controlled by the Oracle GoldenGate
SESSIONCHARSET parameter.
Operating system character set

By default, the operating system character set determines the encoding of Oracle GoldenGate text files,
such as the data-definitions file produced by DEFGEN and the parameter files. However, the encoding
of text files can be specified with the CHARSET parameter.

Consider the correct settings for the character set when multibyte data or object names are involved in
parameter files or during interaction with the database. For example, on a MCCSID DB2 for z/OS system,
if a table name contains multibyte characters, the DEFGEN parameter file should be similar to the
following to get a definitions file with the correct code page:

CHARSET ibm-935
SOURCEDB ...
DEFSFILE ./difdef/source.def, PURGE, CHARSET ibm-935
TABLE MBtable;

The Extract parameter file should be similar to the following:

CHARSET ibm-935
EXTRACT extzos
SOURCEDB ...
...
TABLE MBTable;

The Replicat parameter file should be similar to the following:

CHARSET ibm-935
REPLICAT repzos
TARGETDB ...
SOURCEDEFS ./dirdef/source.def
MAP MBTable, ...;

Terminal character set

The terminal character set must be set to the character set of the operating system in order to
display console I/O properly; and the terminal character set must be set to the NLS_LANG
character set when executing PL/SQL by running a SQL script. The following table summarizes
the OGG character sets used in different places.

Oracle GoldenGate Character set Comments


Trail Metadata Table name is encoded in UTF-8. User token
name and value are encoded in UTF-8
TCP/IP Protocols Encoded in UTF-8 (e.g. table name, error
message, etc.)
Definitions File Encoding is in the OS character set.
Parameter File By default reads as OS character set. If
character set parameter (CHARSET) is specified
at first line of parameter file, reads the
parameter file in specified character set.
GGSCI, OBEY Files Users are required to use wildcard or UNICODE
notation (\uXXXX) to specify OS incompatible
characters, including multibyte characters.

More Information

For additional information about Oracle GoldenGate internationalization support, see the Oracle
GoldenGate Administrator’s Guide and the Oracle GoldenGate Reference Guide. You can find a list of
supported data types in the Oracle GoldenGate for Oracle Installation and Setup Guide.
Appendix A
Table: Oracle’s character set names equivalent to 3rd-party database character names

Oracle SQL Server DB2 / IBM CCSID MySQL Sybase


(Windows CP)

AR8ISO8859P6 1089 iso88596

ARMSWIN1256 1256 9448 cp1256 cp1256

AL32UTF8 1208 utf8 utf8

BLT8ISO8859P13 921

BLT8MSWIN1257 1257 1257 cp1257 cp1257

CL8ISO8859P5 915 latin5 iso88595

CL8MSWIN1251 1251 1251 cp1251 cp1251

CL8KOI8R 878 koi8r koi8

CL8KOI8U 1168 koi8u

CEL8ISO8859P14

EL8ISO8859P7 9005 greek ixo88597

EL8MSWIN1253 1253 1253 cp1253

EE8ISO8859P2 912 latin2 iso88592

EE8MSWIN1250 1250 1250, 5346 cp1250 cp1250

IW8ISO8859P8 5012 hebrew ixo88598

IW8MSWIN1255 1255 1255 cp1255

JA16SJIS 943 sjis sjis

JA16SJISTILDE 932 932 cp932 cp932

JA16EUC 33722 ujis

JA16EUCTILDE 954 eucjpms eucjis


KO16MSWIN949 949 1363 cp949

KO16KSC5601 euckr eucksc

NE8ISO8859P10

NEE8ISO8859P4 914

SE8ISO8859P3 913

TH8TISASCII tis620

TR8MSWIN1254 1254 1254 cp1254

US7ASCII 367 ascii

UTF-8 9400

VN8MSWIN1258 1258 1258 cp1258

WE8ISO8859P1 819 latin1 ascii_8, iso_1

WE8ISO8859P15 923 iso15

WE8MSWIN1252 1252 1252 cp1252

WE8DEC 1100 dec8

WE8ROMAN8 1051 hp8 roman8

WE8ISO8859P9 920 iso88599

ZHS16GBK 936 1386 gbk cp936

ZHS16CGB231280 5478 gb2312

ZHS32GB18030 1392 gb18030

ZHT16BIG5 big5

ZHT16HKSCS 1375

ZHT16MSWIN950 950 1373 cp950

ZHT32EUC 964
Appendix B OGG’s Character Set Conversion
Oracle GoldenGate uses the database-provided character encoding conversion features and provides
international character support. However, prior to Oracle GoldenGate 11.2.1, the character set
conversion feature was not available for most databases that OGG supports.

Starting from V11.2.1, OGG provides character set conversion on CHAR/VARCHAR to CHAR/VARCHAR,
CLOB to CLOB, as well as conversion on CHAR/VARCHAR from/to NCHAR/NVARCHAR and CLOB from/to
NCLOB with multi byte character set support. In V11.2.1, the character set conversion is performed only
by Replicat.

OGG’s character set conversion converts data from the source database character set to the target
client/session character set.

OGG’s Character Set Conversion

Source database Target client Target database

If the target is Oracle and the character set of the target client is different from that of the server, Oracle
will perform character set conversion using OCI.

Users can specify the CHARSETCONVERSION and NOCHARSETCONVERSION parameters to control


whether Replicat performs character-set conversion when replicating character-type data between
databases that have different character sets. Replicat performs the conversion by default in most, but
not all, cases. For more information, see the Oracle GoldenGate Windows and UNIX Administrator’s
Guide.

NOCHARSETCONVERSION prevents Replicat from performing the conversion of character sets and
normally should not be used, because it raises the risk of data-integrity errors. One circumstance where
NOCHARSETCONVERSION might be appropriate is when you are certain that all of the data is ASCII, and
that no ASCII-incompatible characters exist in character columns.

For non-Oracle targets, the default is CHARSETCONVERSION. For Oracle targets, the default is
NOCHARSETCONVERSION unless the captured source data is EBCDIC from DB2 z/OS: In the case of
EBCDIC, the default is CHARSETCONVERSION.
NOTE For an Oracle target to perform the conversion instead of Replicat, the Replicat parameter file
must contain a SETENV parameter that sets the NLS_LANG environment variable to the character set of
the source database.

You might also like