DB2BP HPU Data Movement 1212
DB2BP HPU Data Movement 1212
IBM
Best practices
Building a data migration strategy with
IBM InfoSphere Optim High
Performance Unload
Konrad Emanowicz
DB2 Data Warehouse QA Specialist
IBM Ireland Lab
Garrett Fitzsimons
Best Practices Specialist for Warehouses
IBM Ireland Lab
Richard Lubell
DB2 Information Development
IBM Ireland Lab
Issued: December 2012
Executive Summary..................................................................................................................... 3
Introduction .................................................................................................................................. 4
Incorporating HPU into your data migration strategy .......................................................... 5
Implementing HPU...................................................................................................................... 7
Understanding HPU.................................................................................................................... 9
Using the HPU control file.......................................................................... 10
Controlling resources with HPU ............................................................... 14
Monitoring HPU migration ........................................................................ 16
Configuring HPU in an IBM PureData System for Operational Analytics ....................... 19
Target system settings ................................................................................. 20
Data migration scenarios .......................................................................................................... 21
Migrating data between databases with different topologies ............... 21
Migrating a data subset............................................................................... 24
Migrating data between databases with different versions of DB2 and
different distribution maps......................................................................... 27
Conclusion .................................................................................................................................. 30
Appendix A. Configuration of test systems used ................................................................. 31
Further reading........................................................................................................................... 33
Contributors.................................................................................................. 33
Notices ......................................................................................................................................... 34
Trademarks ................................................................................................... 35
Executive Summary
This paper is targeted at persons that are involved in planning, configuring, designing,
implementing, or administering a data warehouse that is based on DB2 Database for
Linux, UNIX, and Windows software. The examples in this paper apply generally but
are focused on the IBM PureData System for Operational Analytics.
IBM InfoSphere Optim High Performance Unload for DB2 for Linux, UNIX, and
Windows V4.02 is a high-speed tool for unloading, extracting, and migrating data in DB2
for Linux, UNIX, and Windows databases. High Performance Unload (HPU) is designed
to extract and migrate data from DB2 table space containers. A data migration strategy
that uses HPU minimizes storage needs and automates many manual tasks.
HPU uses named-pipes and parallel LOAD operations to stream data from the source to
the target database, minimizing the need to stage the data on disk. You can direct HPU to
determine where different partition maps, software levels, and key constraints are used
and automatically handle these events during data migration
HPU can also unload subsets of data from target database without the need to access the
DB2 software layer. With this functionality you can migrate data online from larger
production systems to smaller pre-production or development environments.
HPU has other uses in data extraction from DB2 backup images which are covered in the
companion paper Best Practices Using IBM Optim High Performance Unload as part of
a Recovery Strategy in an IBM Smart Analytics System
3 of 35
Introduction
This paper describes best practices for incorporating the use of HPU into your migration
strategy and implementing HPU for an IBM PureData System for Operational Analytics
data warehouse. This paper covers how to migrate different sets of data between two
databases.
To use this paper, you should have a basic knowledge of HPU software as well as DB2
software as implemented in a partitioned data warehouse environment. The further
reading section in this paper contains links to product documentation and papers that are
referenced in the paper.
The first and second sections outline the benefits of using HPU and the possible
considerations of integrating it into your system.
The third section reviews the HPU Migration Process, HPU control files, HPU
parameters and recommends best practices for creating control files.
The fourth section provides details on how to install and configure HPU for migration in
an IBM PureData System for Operational Analytics.
The fifth section offers specific best practice recommendations for different scenarios.
The appendix section outlines the configuration of the test system that was used in
developing this paper.
This paper builds on and complements a series of best practices papers that discuss
aspects of a DB2 data warehouse. Refer to these papers in Further Reading for more
information.
4 of 35
Migrate data between databases with different topologies, software levels, and
partition maps
For example, use HPU control file parameters to control the migration of a subset
of data from a large production environment to a smaller development with
fewer database partitions, a different partition map and a different DB2 software
level.
Eliminate the need to stage the source data on storage before the data is loaded
to the target database
For example, named pipes are used to stream data from the source to target
environment, then the LOAD command is automatically invoked to load data
into the target database.
Avoid contention with ETL processes by scheduling automated data migration outside
of ETL, backup and other data operations.
Existing migration strategies involve laborious procedures such as backup and restore
operations, data redistribution, export and load commands, and moving source data over
5 of 35
the network to the target systems. Using HPU helps simplify the migration process by
avoiding manual steps, skipping the staging phase and speeding migration through
direct transfer to target tables by using named pipes.
6 of 35
Implementing HPU
When you use HPU to migrate data from source to target database there are installation
and configuration considerations that need to be addressed. An implementation of HPU
should minimize the number of configuration points and avoid unintentional use of
resources. The following sections cover how to install, configure and set up your
environment for HPU.
Installing HPU
HPU must be installed on each host of the source database where you intend to unload
data and on each host of the target database where the data is loaded. The same version
of HPU must be used across all source and target data nodes. Use the db2hpu -version command to determine the version installed on each data node.
Make the HPU installation package available to each individual host by placing it on the
shared /db2home file system. This action avoids the task of copying the installation
package to each host.
Use the HPU control file parameters to quiesce the table from which you want to
migrate data.
This action flushes all data for the table space in the buffer pool to the table space
container on disk and locks the table during the unloading operation.
Unload data from offline backup images or online backup images created outside of
ETL operations to help ensure data consistency.
7 of 35
8 of 35
Understanding HPU
The HPU migration consists of unloading, repartitioning, and loading data between a
source and target database. By understanding how HPU operates you can best configure
your environment and implement your data migration strategy. Figure 1 shows the data
migration process between a source and target database with a different topology:
The source database contains data nodes DataNode 1, DataNode 2, and DataNode 3
and has a total of 28 database partitions plus a coordinator node.
The target database contains data nodes DataNode 1 and DataNode 2 and has a total
of eight database partitions plus a coordinator node.
HPU unloads data on each database partition in parallel on the source database.
2.
HPU repartitions data according to the DB2 distribution map created by HPU
based on the target table details that are provided in the control file.
9 of 35
3.
HPU sends the output streams from the source database across the network to
the HPU daemon on the target database (DataNode 1 and DataNode 2 in figure
1).
The target daemon creates a single named pipe for each database partition on the
target system, in figure 1 it is represented as the /work1 directory, and initiates
the DB2 LOAD command.
4.
The HPU daemon consolidates the multiple streams from each of the source
database partitions into a single stream, which HPU sends to each named pipes
associated with a target partition.
5.
The HPU daemon calls the LOAD command to load data into the target
database.
The process for migrating from a backup file is slightly different than presented in figure
1. Data must first be staged from database backup files on the source system before it is
migrated through named pipes to the target system. Dedicate the same storage path on
each source data node for the staging process. Allocate storage capacity for staging
equivalent to the table space size of the table that is being migrated.
GLOBAL Block
The GLOBAL control block contains configuration data that is common to all migrate
blocks in the HPU control file. There is only one GLOBAL block per control file and the
GLOBAL block must be the first block in the control file. Any parameters in this block
override its equivalent configuration file defaults.
The GLOBAL block designates the control settings that are used by default for each
MIGRATE block unless an option is overridden at a lower level. For example, the
QUIESCE and LOCK options control table space locking and buffer pool state but you
can override these options in each MIGRATE block.
The following example shows the GLOBAL block syntax:
-- Global Block
GLOBAL CONNECT TO BCUDB
DB2 NO QUIESCE NO LOCK NO;
10 of 35
MIGRATE Blocks
The MIGRATE block specifies the table space, tables, and SELECT statement for the data
that is being migrated. You can specify multiple MIGRATE blocks in a single control file
to determine a sequence of migration tasks.
The key syntax elements that are used in a MIGRATE block are presented in table 1
below.
Parameter
Description
Usage
PART
Specifies database
source tables.
FAST_SELECT
TARGET
ENVIRONMENT
LOCK OPTION
Specifies
database
Use this option to indicate whether a readonly lock is to be held during unloading on
the table space of the source table.
QUIESCE
OPTION
TARGET KEYS
WORKING IN
XML/LOB IN
FORMAT
locking
partitions
on
the
for
source
Table 1. Key HPU data migration parameters when using the MIGRATE block
Use the SELECT clause with the PART clause to migrate a subset of data when the
target environment has less storage available.
11 of 35
Target keys
The TARGET KEYS clause is the primary syntax element that is employed during data
migration. Information that is provided by the clause is used by HPU to generate the DB2
distribution map that is used to distribute and load data into the target table.
The TARGET KEYS clause consists of two sections:
Database partitions
This section specifies the sequence of the database partition numbers. For example,
PARTS(1:10) specifies that data is migrated for database partitions 1 and 10.
Partitioning key details
This section specifies the columns details for DB2 partitioning key of the target table.
There are three ways to specify the partitioning key information:
The CURRENT keyword keeps the current definition of the partitioning key. It
sets the partitioning key for the target table to the same value or values as
identified in the FAST_SELECT block in the source table. For example:
The DEFAULT keyword indicates that the first valid column in the
FAST_SELECT block is used in the partitioning key. For example:
The explicit column list provides either column names or column numbers. For
example:
Use the HPU options for target database partitions and distribution key to help
ensure that data is re-partitioned correctly when you are migrating data to a database
with a different partition map.
12 of 35
data node. To determine how many ranges can be processed in parallel, use the
following calculation:
INTEGER(number of processor cores/ number of partitions).
Resource usage, especially memory, can be high for parallel migration of many ranges
and can consume more resources than intended on the source database system.
To control the resources available to HPU do not migrate whole partitioned table in one
step. Use the nbcpu HPU configuration parameter to limit the number of processor cores
available to HPU.
Migrate range partitioned tables as a sequence of ranges within a single control file;
choose the most recent range first.
Use FORMAT DELIMITED INTO clause when the source and target table name are
different.
Use QUIESCE YES and LOCK YES options to help ensure a safe and consistent
unload where no modifications to the table are allowed until the unload process is
complete. The QUIESCE YES option flushes all pages of the source table from the
DB2 buffer pool. The LOCK YES places a share lock on the table and to prevent the
table from being modified.
Use the CURRENT keywords in the TARGET KEYS clause when the partitioning
key for the target and source tables are the same.
Use CURRENT PARTS(ALL) only when target tables exist on all database partitions
of the target instance.
To exclude specific partitions from the full list of partitions of the target instance use
EXCEPT PARTS() clause in TARGET ENVIRONMENT clause together with
CURRENT PARTS(ALL).
Use a location other than the default /tmp path in the WORKING IN and XML/LOB
IN clauses to ensure that appropriate disk space is available when migrating tables
with XML or LOB columns. Tables with XML or LOB columns have to be staged to
files temporarily and cannot be processed through pipes.
Use separate TARGET KEYS clause for each target table in its FAST_SELECT block
to migrate tables in different database partition groups.
The following example shows a control file migrate block that is used to migrate data
from source_tabschema.source_tabname table on database partitions 1:N to
target_tabschema.target_tabname table on partitions 1:M. The target database name is
13 of 35
on
Storage Capacity
When you migrate data from backup images, storage capacity is needed on the source
data nodes for staging data from the backup files. The storage capacity that is allocated
per data node for a migrated table must be the total size of the table space on the data
node.
For tables with LOB or XML columns, named pipes cannot be used and the data must be
staged to disk first before the load phase. Ensure that there is also sufficient storage
capacity on the target data nodes by determining the size of the table, including the LOB
and XML files.
When you are unloading a table from table space containers to files, the total storage
capacity that is required equals the table size.
Processor resources
HPU uses all processor cores and each partition is always processed by at least one core.
When there are fewer cores than partitions, the number of partitions that are
processed in parallel equals the number of cores.
When there are more cores than partitions, all partitions are processed in parallel by
more than one core.
Use the nbcpu configuration parameter on the source system to set the maximum
number of processor cores used by HPU when migrating data.
14 of 35
In a scenario that involves a database outage, it is recommended to allow HPU to use all
available processor cores to unload data as quickly as possible. In an environment where
queries are running during migration, configure HPU to restrict the resources that are
consumed, reducing the amount of processor capacity that is used.
For example, for a data node on an IBM PureData System for Operational Analytics with
sixteen cores and eight database partitions, and with 50% of processor cores dedicated to
HPU by setting nbcpu=8, all eight database partitions are processed in parallel with each
processed
by
one
core
as
INTEGER(Number
of
cores/Number
of
Partitions)=INTEGER(4/4)=INTEGER(1)=1. With the default settings all sixteen cores
would be used with each partition processed by two cores.
For data node with 4 partitions and 18 processor cores and with nbcpu set to 10, each
partition is processed by two cores as INTEGER(Number of cores/Number of
Partitions)=INTEGER(10/4)=INTEGER(2.5)=2.
Memory resources
Use the bufsize parameter to define the HPU buffer. It is recommended to use the
default HPU buffer pool size in most cases. The minimum accepted value is 262144 (256
kilobytes), the maximum and default accepted value is 4 MB. Use the minimum value
when you are migrating from many source nodes.
Influence HPU usage of processor resources on the source system by specifying the
number of cores to be used on each data node.
(blocks,
(kbytes,
(blocks,
(kbytes,
-c)
-d)
-f)
-m)
(-n)
(512 bytes, -p)
(kbytes, -s)
(seconds, -t)
(-u)
(kbytes, -v)
unlimited
soft
unlimited
unlimited
2000
64
65536
unlimited
4096
unlimited
15 of 35
To change the data seg size settings to unlimited, use the following command:
ulimit -d unlimited
Identify the HPU process using the "db2hpu" keyword in the COMMAND
column for the TOP command and in the Name column for the TOPAS command.
2.
Verify that the processor resources used by HPU are expected for the HPU
nbcpu parameter settings.
For example, when the nbcpu parameter is set to half the number of cores available on
the server, processor usage for a "db2hpu" process should not exceed 50%. The following
sample TOPAS output shows contention between db2hpum6 process and java process:
Tue Dec 18 16:16:50 2012
Kernel
User
Wait
Idle
63.1
36.8
0.0
0.0
Network KBPS
lo0
213.9
tr0
34.7
Interval: 2
|##################
|##########
|
|
I-Pack O-Pack
2154.2 2153.7
16.9
34.4
|
|
|
|
Cswitch
Syscall
Reads
Writes
Forks
Execs
Runqueue
Waitqueue
KB-In KB-Out
107.0 106.9
0.9
33.8 PAGING
Faults
Disk
Busy%
KBPS
TPS KB-Read KB-Writ Steals
hdisk0
0.0
0.0
0.0
0.0
0.0 PgspIn
PgspOut
Name
PID
CPU% PgSp Owner
PageIn
db2hpum6
16684 83.6 13.1 root
PageOut
java
12192 12.7 12.2 root
Sios
lrud
1032 2.7 0.0 root
5984
15776
8
2469
0
0
11.5
0.0
3862
1580
0
0
0
0
0
Readch
Writech
Rawin
Ttyout
Igets
Namei
Dirblk
4864
34280
0
0
0
4
0
MEMORY
Real,MB
1023
% Comp
27.0
% Noncomp 73.9
% Client
0.5
PAGING SPACE
Size,MB
512
% Used
1.2
16 of 35
aixterm
topas
ksh
gil
19502
6908
18148
1806
0.5
0.5
0.0
0.0
0.7
0.8
0.7
0.0
root
root
root
root
The db2hpum6 and java processes are together consuming 100% of available processor
resources, causing contention between them. It is recommended to stop the java process
or limit the processor resources available by HPU by adjusting the HPU nbcpu
configuration parameter to lower the number of processor cores available to HPU.
Where you need the capability to manipulate HPU during processing, you should use
OS/WLM capabilities and LOAD parameters within DB2 on the target database.
A shortage of physical memory can cause paging on the source system that can in some
cases trigger HPU migration failure. Use the vmstat command to view the pi and po
(page in and page out) columns on AIX, and the si and so (swap in and swap out)
columns on Linux. Non-zero values indicate that paging is occurring. The following
example shows sample output for the vmstat command:
kthr
----r b
0 0
0 0
2 0
memory
page
faults
cpu
----------- ------------------------ ------------ ----------avm fre re pi po fr sr cy in sy cs us sy id wa
45483 221 0 0 0 0
1 0 224 326 362 24 7 69 0
45483 220 0 0 0 0
0 0 159 83 53 1 1 98 0
45483 220 0 0 0 0
0 0 145 115 46 0 9 90 1
time
-------hr mi se
15:10:22
15:10:23
15:10:24
TO check whether the HPU process "db2hpum6" is triggering the paging process, refer to
the "Paging" Column in the TOP command output or "PgSp" Column in the TOPAS
command output.
Use the db2 list utilities show detail command on the target system to check
the progress of the migration. The command shows the progress for the current DB2
LOAD command that was started by HPU.The following example shows sample output
for the db2 utilities command:
ID
= 186
Type
= LOAD
Database Name
= TEST
Member Number
= 0
Description
= [LOADID: 1891.2012-12-1811.48.27.128866.0
(65530;32768)]
[*LOCAL.bcuaix.121218114833]
OFFLINE LOAD DEL AUTOMATIC INDEXING INSERT NON-RECOVERABLE
TEST.TB_SALES_FACT
Start Time
= 12/18/2012 11:48:27.158263
State
= Executing
Invocation Type
= User
Progress Monitoring:
17 of 35
Phase Number
Description
Total Work
Completed Work
Start Time
=
=
=
=
=
1
SETUP
0 bytes
0 bytes
12/18/2012 11:48:27.158270
=
=
=
=
=
2
LOAD
100000 rows
220 rows
12/18/2012 11:48:29.168612
Phase Number
Description
Total Work
Completed Work
Start Time
=
=
=
=
3
BUILD
12 indexes
0 indexes
Not
Started
18 of 35
When backup images are used and data needs to be unloaded and staged on the
source database system
LOB and XML data that needs to be staged on the target database system before
it is loaded
HPU uses a single absolute directory path for each data host. On the target database
system, data can be loaded in parallel from multiple file system paths across each
database partition.
On an IBM PureData System for Operational Analytics, use the /bkpfs (backup and
cold storage) file system for HPU staging areas and for creating a working directory for
named pipes and LOB and XML flat files.
On IBM PureData System for Operational Analytics, create a directory for HPU in the
/bkpfs file system on each user host and each data host. For example, for pipes, LOB
and XML files or backup staging, create /work1 directory link on each host. For
example, the following commands show how to create the required link:
mkdir /bkpfs/bcuaix/NODE0001/HPU
ln /bkpfs/bcuaix/NODE0001/HPU /work1
For backup files staging, the stagedir HPU configuration parameter in the
db2hpu.cfg file should be set to your directory.
Use a directory structure on storage that does not conflict with storage used for table
space containers when you are staging or streaming data.
19 of 35
Make a copy of the default configuration file, customize it as required and save it
on a file system that is accessible across all nodes; /db2home is recommended.
2.
Add the dir_cfg parameter to the default HPU configuration file on each data
node to reference the shared HPU configuration file.
3.
20 of 35
21 of 35
partition group, the default pattern that is used by HPU. If the target tables exist
in different database partition groups, you must use either a separate TARGET
KEYS clause in the FAST_SELECT block for each table or two separate migrate
blocks.
Perform table space migration only when all target tables exist in the same DB2
database partition group and all target tables names match the source table names.
Use a separate migrate block for each table when different LOCK and QUIESCE
options are required for the tables.
For example, to migrate data for three tables BI_SCHEMA.TB_SALES_FACT fact table
and BI_SCHEMA.TB_STORE_DIM, BI_SCHEMA.TB_CUSTOMER_DI dimension tables
from a source system with three data nodes each with eight database partitions to
another database with two data nodes each with four database partitions the following
control file was used:
GLOBAL CONNECT TO BCUDB;
-- Migrate Block for Fact table
MIGRATE TABLESPACE
PART(1:28)
DB2 NO LOCK YES QUIESCE YES
TARGET ENVIRONMENT(INSTANCE "bcuaix" on "bluejay06" IN
TEST )
WORKING IN("/work1")
SELECT * FROM BI_SCHEMA.TB_SALES_FACT;
TARGET KEYS((3,4) parts(1:8))
FORMAT DELIMITED INTO TEST.TB_SALES_FACT
;
-- Migrate whole TBS_DIM table space
MIGRATE TABLESPACE TBS_DIM
PART(ALL)
DB2 NO LOCK YES QUIESCE YES
22 of 35
23 of 35
24 of 35
Restrict use of the WHERE clause to filter the data on different columns in the
Select clause for each table
Use DB2 temporal tables to access data from a point in time in the past
For example, to migrate data extracts from range partitioned fact table the following
control file could be used:
GLOBAL CONNECT TO BCUDB;
-- Migrate block for the fact table
MIGRATE TABLESPACE
PART(1:28)
-- Buffer pool is flushed and write access to the table prevented
during the Migration
DB2 NO LOCK YES QUIESCE YES
TARGET ENVIRONMENT(INSTANCE "bcuaix" on "bluejay06" IN
TEST )
WORKING IN("/work1")
-- Select statement and target keys
SELECT * FROM BI_SCHEMA.TB_SALES_FACT where STORE_ID between 101
and 250
;
DATAPARTITION ID (2)
TARGET KEYS(current parts(1:8))
FORMAT DELIMITED INTO TEST.TB_SALES_FACT
;
This
scenario
demonstrates
migrating
a
subset
of
fact
table
BI_SCHEMA.TB_SALES_FACT. The requested data set has rows where value for
STORE_ID column is 101 - 250 and which exist only in data partition 2.
The following example demonstrates how to migrate the temporal dimension table
BI_SCHEMA.TB_PRODUCT_DIM data from the past time between '2012-06-18' and
'2012-06-19':
GLOBAL CONNECT TO BCUDB;
-- Migrate Block for Dimension table
MIGRATE TABLESPACE
PART(0)
DB2 NO LOCK YES QUIESCE YES
TARGET ENVIRONMENT(INSTANCE "bcuaix" on "bluejay06" IN
TEST )
WORKING IN("/work1")
-- Select statement and target keys
25 of 35
LIKE
26 of 35
27 of 35
Source and target databases have distribution maps that are the same size but
the target map is not in standard round robin order and the map does not have
a repeating list of target partitions
The target distribution map is created by HPU by repeating the sequence of
target partitions specified in the PARTS() clause in the TARKET KEYS block
until the DB2 distribution map array is filled. Some changes of the map can
leave no repeating pattern (list) of target partitions in the target distribution
map. The map can be changed deliberately, for example, to eliminate data
skew issues on the database or a redistribute command is run. After the
redistribute command the map might have no repeating pattern for target
partitions. Customization of a map can also leave the map without a pattern.
In this case, either run single stream migration or provide the transformed
version of the full map text in the control file, in the PARTS() clause, and use
the standard migration. Perform the following steps to use the transformed
map:
1.
2.
-m
map_file.out
-t
Transform the map by replacing spaces and new line characters with a
comma with unix shell:
cat map_file.out | sed 's/ /,/g' | tr '\n' ',' | sed
'$s/.$//' > map_file_transformed.out
3.
In the control file provide the text of the transformed map from the
map_file_transformed.out file in the PARTS() clause within the TARGET
KEYS block
4.
The key considerations and recommendations for single stream migration are:
Performance degradation
The standard migration with row hashing performed by HPU and DB2 parallel
load offers better throughput than single stream migration especially on larger
partitioned database environments. In a system with several data nodes, use
default partition-to-partition migration to avoid the network bottleneck caused
by the single pipe processing on the coordinator node.
28 of 35
Use the db2gpmap command on the database to extract the map into a file.
When no repeating pattern of partitions exists and the migration performance
is crucial, it provides the transformed version of the full map text in the control
file to use the standard migration rather than single stream migration
non-standard
database
partitioning
The following example shows a control file that is enforcing the single stream migration
for BI_SCHEMA.TB_SALES_FACT table:
GLOBAL CONNECT TO BCUDB;
-- Migrate Block
MIGRATE TABLESPACE
PART(1:28)
DB2 NO LOCK YES QUIESCE YES
TARGET ENVIRONMENT(INSTANCE "bcuaix" on "bluejay06" IN
TEST REPART NO)
WORKING IN("/work1")
SELECT * FROM BI_SCHEMA.TB_SALES_FACT where PRODUCT_ID=10;
FORMAT DELIMITED INTO TEST.TB_SALES_FACT
;
The target database partition group in which the target table exists was redistributed
recently and is no longer in round robin order. There is no repeating pattern of target
logical nodes in the target distribution map. Since the source table is not large specify the
REPART NO in the TARGET ENVIRONMENT clause option to migrate the data in a
single stream mode. The TARGET KEYS clause is not used as repartitioning is not
performed by HPU.
29 of 35
Conclusion
Use IBM InfoSphere Optim High Performance Unload as a tool to streamline specified
database migration scenarios:
Unload data from offline backup images or online backup images created outside of
ETL operations to help ensure data consistency.
Use the SELECT clause in conjunction with the PART clause to migrate a subset of
data when the target environment has less storage available.
Use the HPU options for target database partitions and distribution key to help
ensure that data is re-partitioned correctly when you are migrating data to a
database with a different partition map.
Migrate range partitioned tables as a sequence of ranges within a single control file;
choose the most recent range first.
Influence HPU usage of processor resources on the source system by specifying the
number of cores to be used on each data node.
Use a directory structure on storage that does not conflict with storage used for table
space containers when you are staging or streaming data.
Ensure a user with the name used by source instance exists on the target system and
has the appropriate database privileges to load and insert into the target tables.
Perform table space migration only when all target tables exist in the same DB2
database partition group and all target tables names match the source table names.
Use a separate migrate block for each table when different LOCK and QUIESCE
options are required for the tables.
Use HPU migration to migrate different subsets of data from backup files when past
data is required and the access to the production database is not allowed or
restricted, for example, during ETL and online periods.
Preserve the values from the source database for target tables with identity columns
defined.
30 of 35
The source system was an IBM Smart Analytics System E7700 that consisted of
four servers; a foundation module and three data modules. The foundation
module had a single database partition that supported the catalog function, the
coordinator function and non-partitioned metadata, staging and data warehouse
tables and four database partitions. Each data module had eight database
partitions that contained partitioned data warehouse tables.
2.
The target system was an IBM Smart Analytics System E7100 that consisted of
three servers; an administration module and two data modules. The
administration module had a single database partition that supported the catalog
function, the coordinator function and non-partitioned metadata, staging and
data warehouse tables. Each data module had four database partitions that
contained partitioned data warehouse tables.
Optim High Performance Unload for DB2 for Linux, UNIX, and Windows
version 32 bits 04.02.100
The HPU configuration file db2hpu.cfg on the test system used was as follows:
# HPU default configuration
bufsize=4194304
db2dbdft=BCUDB
31 of 35
db2instance=BCUAIX
doubledelim=binary
netservice=db2hpudm412
nbcpu=8
lock=yes
quiesce=yes
stagedir=/work1
32 of 35
Further reading
Advanced Recovery Solutions for IBM DB2 for Linux, UNIX and Windows
http://www-01.ibm.com/software/data/db2/linux-unix-windows/tools/datarecovery/
Contributors
Vincent Arrat
HPU Development Team
Jaime Botella Ordinas
IT Specialist & Accelerated Value Leader
James Cho
STSM & Chief Architect for IBM PureData
System for Operational Analytics
Austin Cliford
DB2 Data Warehouse QA Specialist
Bill Minor
Information Management DB2 Tooling &
Development
33 of 35
Notices
This information was developed for products and services offered in the U.S.A.
IBM may not offer the products, services, or features discussed in this document in other
countries. Consult your local IBM representative for information on the products and services
currently available in your area. Any reference to an IBM product, program, or service is not
intended to state or imply that only that IBM product, program, or service may be used. Any
functionally equivalent product, program, or service that does not infringe any IBM
intellectual property right may be used instead. However, it is the user's responsibility to
evaluate and verify the operation of any non-IBM product, program, or service.
IBM may have patents or pending patent applications covering subject matter described in
this document. The furnishing of this document does not grant you any license to these
patents. You can send license inquiries, in writing, to:
IBM Director of Licensing
IBM Corporation
North Castle Drive
Armonk, NY 10504-1785
U.S.A.
The following paragraph does not apply to the United Kingdom or any other country where
such provisions are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES
CORPORATION PROVIDES THIS PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER
EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NONINFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do
not allow disclaimer of express or implied warranties in certain transactions, therefore, this
statement may not apply to you.
Without limiting the above disclaimers, IBM provides no representations or warranties
regarding the accuracy, reliability or serviceability of any information or recommendations
provided in this publication, or with respect to any results that may be obtained by the use of
the information or observance of any recommendations provided herein. The information
contained in this document has not been submitted to any formal IBM test and is distributed
AS IS. The use of this information or the implementation of any recommendations or
techniques herein is a customer responsibility and depends on the customers ability to
evaluate and integrate them into the customers operational environment. While each item
may have been reviewed by IBM for accuracy in a specific situation, there is no guarantee
that the same or similar results will be obtained elsewhere. Anyone attempting to adapt
these techniques to their own environment do so at their own risk.
This document and the information contained herein may be used solely in connection with
the IBM products discussed in this document.
This information could include technical inaccuracies or typographical errors. Changes are
periodically made to the information herein; these changes will be incorporated in new
editions of the publication. IBM may make improvements and/or changes in the product(s)
and/or the program(s) described in this publication at any time without notice.
Any references in this information to non-IBM Web sites are provided for convenience only
and do not in any manner serve as an endorsement of those Web sites. The materials at
those Web sites are not part of the materials for this IBM product and use of those Web sites is
at your own risk.
IBM may use or distribute any of the information you supply in any way it believes
appropriate without incurring any obligation to you.
Any performance data contained herein was determined in a controlled environment.
Therefore, the results obtained in other operating environments may vary significantly. Some
measurements may have been made on development-level systems and there is no
guarantee that these measurements will be the same on generally available systems.
Furthermore, some measurements may have been estimated through extrapolation. Actual
34 of 35
results may vary. Users of this document should verify the applicable data for their specific
environment.
Information concerning non-IBM products was obtained from the suppliers of those products,
their published announcements or other publicly available sources. IBM has not tested those
products and cannot confirm the accuracy of performance, compatibility or any other
claims related to non-IBM products. Questions on the capabilities of non-IBM products should
be addressed to the suppliers of those products.
All statements regarding IBM's future direction or intent are subject to change or withdrawal
without notice, and represent goals and objectives only.
This information contains examples of data and reports used in daily business operations. To
illustrate them as completely as possible, the examples include the names of individuals,
companies, brands, and products. All of these names are fictitious and any similarity to the
names and addresses used by an actual business enterprise is entirely coincidental.
COPYRIGHT LICENSE: Copyright IBM Corporation 2012. All Rights Reserved.
This information contains sample application programs in source language, which illustrate
programming techniques on various operating platforms. You may copy, modify, and
distribute these sample programs in any form without payment to IBM, for the purposes of
developing, using, marketing or distributing application programs conforming to the
application programming interface for the operating platform for which the sample
programs are written. These examples have not been thoroughly tested under all conditions.
IBM, therefore, cannot guarantee or imply reliability, serviceability, or function of these
programs.
Trademarks
IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International
Business Machines Corporation in the United States, other countries, or both. If these and
other IBM trademarked terms are marked on their first occurrence in this information with a
trademark symbol ( or ), these symbols indicate U.S. registered or common law
trademarks owned by IBM at the time this information was published. Such trademarks may
also be registered or common law trademarks in other countries. A current list of IBM
trademarks is available on the Web at Copyright and trademark information at
www.ibm.com/legal/copytrade.shtml
Windows is a trademark of Microsoft Corporation in the United States, other countries, or
both.
UNIX is a registered trademark of The Open Group in the United States and other countries.
Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both.
Other company, product, or service names may be trademarks or service marks of others.
35 of 35