Load Replace 25 Million Rows in 2 Seconds: Mike Walsh Senior Dba Architect/Suntrust Bank
Load Replace 25 Million Rows in 2 Seconds: Mike Walsh Senior Dba Architect/Suntrust Bank
Session: A7
Tuesday, May 24, 2005 / 3:30 – 4:30 PM
1
SunTrust Banks, Inc., headquartered in Atlanta, Georgia, is one of
the nation's largest commercial banking organizations.
2
The Availability Problem
• Sequential files are loaded to DB2 tables daily
• Sources outside the organization
• Financial status files from batch processing
• We operate in a 24x7 world where batch windows have
disappeared.
• Tablespace is unavailable during LOAD/REPLACE
operations but the online “demand” never stops
• We need a way to close or reduce the unavailability gap.
3
A Historical Look at Online Balance Inquiry (OBIQ)
4
OBIQ is popular
• Every online CICS and internet application reads and updates
OBIQ data
• Tellers
• ATM
• Internet
• Etc
• OBIQ provides interface routines
• There is no scheduled down time.
5
The IMS Legacy
• IMS databases were loaded to “shadow” databases
• Online databases were stopped
• IMS datasets were renamed (shadow names to production names)
• Online databases were started
• Renaming was easy
• IMS has no “internal” identifiers like DBID, OBID, PSID
• “Shadow” databases need not be defined to the online system
6
The DB2 Conversion
• Accomplished in August 2001
• No basic redesign to data or processing
• Each of the 20 tables has two indexes
• DB2 internal identifiers prevented simple dataset renames so
a third party “Fast” load utility was used to minimize
unavailability.
7
Defining the Availability Problem With Standard
DB2 LOAD Utility
DSNT408I SQLCODE = -911, ERROR: THE CURRENT UNIT OF WORK HAS BEEN
ROLLED BACK DUE TO DEADLOCK OR TIMEOUT. REASON 00C900BA, TYPE
OF RESOURCE 00002006, AND RESOURCE NAME DQA1MW3 .SCPHADDR.000-
00001
DSNT418I SQLSTATE = 40001 SQLSTATE RETURN CODE
DSNT415I SQLERRP = DSNXRRC SQL PROCEDURE DETECTING ERROR
DSNT416I SQLERRD = 102 13172746 13172922 13228501 -404545534
12714050 SQL DIAGNOSTIC INFORMATION
DSNT416I SQLERRD = X'00000066' X'00C9000A' X'00C900BA'
X'00C9D9D5' X'E7E32002' X'00C20042' SQL DIAGNOSTIC
INFORMATION
We also DISPLAY the DB2 objects to show what is going on in DB2 at the same
time.
8
Defining the Availability Problem
With “Fast” Load Utility
• “Fast” load STOPS tablespace during LOAD process so that the LOAD
can be accomplished outside of DB2.
• Attempts to access results in -904 SQLCODE
• Hundreds of such errors typically occurred during “Fast” load utility
processing.
DSNT501I -DBP2 DSNIDBET RESOURCE UNAVAILABLE 187
CORRELATION-ID=ENTRVR340227
CONNECTION-ID=CIPAP200
LUW-ID=*
REASON 00C90081
TYPE 00000201
NAME DQA1MW3.SCPHADDR
Even though the elapsed time using a “fast” load utility is shorter than the
conventional DB2 LOAD Utility, the unavailability duration is still unacceptable.
9
Two Availability
Improvement Strategies
10
10
Compare
Old Master
New Master
Process
Essentially, the application would build a set of updates, deletes and inserts from a
comparison of the new input file with the information currently in the DB2 table.
This would create an environment which would be 100% available, even during
update operations.
While the application team considered this approach both desirable and feasible,
they felt that it would not promote to a top priority position on their list of
development projects.
11
The Implementation (Preliminary Steps)
• Define “shadow” or “clone” versions of
• OBIQ database (not absolutely necessary, but there are advantages to be
discussed later)
• OBIQ Tablespaces
• Similar, but different, tablespace names unless you are using a different
database
• OBIQ Tables
• similar, but different, table names
• OBIQ Indexes
• similar, but different, index names
• Permit RACF ALTER access to underlying DB2 dataset structures to
application’s production ID.
12
The process involves creating “shadow” definitions of all the tablespaces, tables and
indexes for each table that will be loaded.
If you elect to create a separate DB2 database, then you will be able to create the
“shadow” table with the same OBID value as the real table. This is an advantage
and will be explained in more detail later.
Our process will be dealing directly with the underlying VSAM linear datasets that
make up DB2 tablespaces and indexspaces. That means that your authorization ID
must be allowed by your security software to create, rename and delete these
datasets.
12
A Picture of the Process
LOAD COPY
SHADOW
DSN1COPY
“B”
RENAME(2)
New Production
RENAME(1)
“A” “C”
Old Production
13
13
The LOAD Process (Initial Housekeeping Steps)
• “Shadow” objects are normally in a STOPPED state so we:
• -START
• All “shadow” tablespaces, ACCESS(RW)
• All “shadow” indexspaces, ACCESS(RW)
• For restart purposes we unconditionally TERMINATE the “shadow” load
utility ID.
• For restart purposes we unconditionally DELETE “shadow” image copy
datasets and “B” DB2 VSAM objects.
14
These steps allow us the flexibility to restart the jobstream from the top is there is
any sort of failure.
14
The LOAD Process (LOAD/MODIFY)
• Execute DB2 LOAD utility to LOAD/REPLACE data in
“shadow” table.
• This populates the “shadow” with the data ultimately destined for
production.
• Execute the DB2 MODIFY utility to eliminate all references to
IMAGE COPY datasets for the “shadow” tablespace
• MODIFY RECOVERY TABLESPACE
DQA1MW3X.SHADOWTS DELETE AGE(*)
15
Since we reuse the image copy datasets each day, we must run the MODIFY utility
to be sure that DB2 is not “remembering” those earlier copies.
15
The Transformation (COPY)
• Allocate COPY output datasets
• Allocate DISP=(NEW,CATLG)
• We can reuse DSN because of MODIFY and earlier DELETE
• Execute DB2 COPY Utility
• COPY each tablespace and indexspace to individual DASD datasets
• Choose PARALLEL option for speed
• Choose large REGION size
• START “shadow” tablespace so that DSN1COPY will process
• -STA DB(DQA1MW3X) SPACENAM(SHADOWTS) ACCESS(RW)
16
1. We create the COPY target datasets new each time we run the jobstream. They
are only used temporarily and we have deleted them in an earlier job step.
2. DB2 Parallel Image Copy really cuts down on the elapsed time for this
operation.
16
The Transformation (IDCAMS DEFINE)
17
The “B” datasets are modeled after their production counterparts. Only the name is
different. The portion of the name that is different is shown in red.
17
The Transformation (DSN1COPY)
//LOADDATA EXEC PGM=DSN1COPY,
// PARM='CHECK,FULLCOPY,OBIDXLAT,RESET'
//SYSPRINT DD SYSOUT=*
//UTPRINT DD SYSOUT=*
//SYSUT1 DD DSN=TESTIU2.SX.DQA1MW3X.SCPHADDT.DB2IC1,DISP=SHR
//SYSUT2 DD DSN=TESTIU2.DSNDBC.DQA1MW3X.SCPHADDT.I0001.B001,
// DISP=OLD
//SYSXLAT DD *
521 420
2 2
1 1
3 3
18
DSN1COPY is used to translate the internal OBID, DBID and PSID values from the
“shadow” set of objects to those that match the production objects. The
DSN1COPY step is executed once for each tablespace and indexspace involved in
the process.
Since both the production and the “shadow” objects are defined permanently to
DB2, these values will not change from day to day.
18
The Transformation
(IDCAMS Delete)
• DELETE “C” datasets to avoid dataset name collisions during upcoming
RENAME
• This is for restart purposes and will fail normally with RC=12
19
This is just a precautionary step that deletes the “C” datasets for restart purposes.
We will rename the “A” (production) datasets to the “C” name structure later.
19
The Switch
(STOP Production Objects)
• -STOP production tablespace and indexspaces
• This begins our reduced period of unavailability
-STO DB(DQA1MW3) SPACENAM(SCPHADDR)
-DIS DB(DQA1MW3) SPACENAM(SCPHADDR)
-STO DB(DQA1MW3) SPACENAM(IXCRPHYS)
-DIS DB(DQA1MW3) SPACENAM(IXCRPHYS)
-STO DB(DQA1MW3) SPACENAM(IXCR1A3Y)
-DIS DB(DQA1MW3) SPACENAM(IXCR1A3Y)
20
If we had the capability to invoke a SWITCH process like online REORG uses, it
would close the last window of unavailability, but since we can’t we must STOP
each database object to take it away from DB2. The command stream shown above
also issues the DISPLAY command, which we can examine in case of a problem.
20
The Switch (RENAME #1)
• Rename “A” production objects to “C” datasets
ALTER TESTIU2.DSNDBC.DQA1MW3.SCPHADDR.I0001.A001 -
NEWNAME(TESTIU2.DSNDBC.DQA1MW3.SCPHADDR.I0001.C001)
ALTER TESTIU2.DSNDBD.DQA1MW3.SCPHADDR.I0001.A001 -
NEWNAME(TESTIU2.DSNDBD.DQA1MW3.SCPHADDR.I0001.C001)
21
Here we show the “A” (production) datasets being RENAMEd to the “C” names.
21
The Switch (RENAME #2)
• Rename “B” shadow objects to “A” production dataset names
ALTER TESTIU2.DSNDBC.DQA1MW3X.SHADOWTS.I0001.B001 -
NEWNAME(TESTIU2.DSNDBC.DQA1MW3.SCPHADDR.I0001.A001)
ALTER TESTIU2.DSNDBD.DQA1MW3X.SHADOWTS.I0001.B001 -
NEWNAME(TESTIU2.DSNDBD.DQA1MW3.SCPHADDR.I0001.A001)
22
Here we show the “B” datasets being RENAMEd to the “A” production names.
22
The Switch
(START Production Objects and STOP Shadow Objects)
23
Next we START the production database objects and our period of unavailability is
ended. In our environment, the time to STOP, RENAME and then START usually
is less than 2 seconds.
After the SWITCH is complete, we STOP the “shadow” objects to be sure there is
no attempt to access them until the load process is run the next time.
23
The Switch (Cleanup for tomorrow)
24
As a cleanup operation, we DELETE the “C” datasets because we don’t need them
any more. We also use a JCL step to delete the DASD image copy datasets.
24
DASD Consumption
• The “shadow” load strategy can consume a lot of DASD, even if
only for a short time
1. The original production objects
2. The “shadow” objects
3. The DASD image copy datasets
4. The “B” datasets loaded by DSN1COPY
• Timely deletion of datasets can reduce this somewhat
1. Delete “shadow” objects just after successful COPY
2. Delete image copy datasets after successful DSN1COPY
• Minimum DASD high water mark is 3 times that of production
25
The “shadow” load strategy is not free because it does consume a potentially large
amount of DASD space. We list the various “copies” of the data that are produced
during the process.
The minimum high water mark is approximately 3 times the size of the production
tablespace and indexspaces.
25
A “Shadow” Load Variation
August/September 2004 “Z Journal” article entitled “Using
FlashCopy Version 2 for DB2 UDB for OS/390 & Z/OS
Object-Level Migration” by Daniel L. Luksetich
26
The cited article did not specifically address the issue of “shadow” loading of DB2
tables, but it did contain some helpful and relevant information about the internal
identifiers stored in DB2 VSAM datasets.
Using the information about DB2’s ignoring the OBID values in index pages, we
can streamline the “shadow” load process significantly. We discuss this
streamlining on the following visuals.
26
A Picture of the “Variation”
LOAD
SHADOW
REPAIR Utility
New Production
RENAME(2)
RENAME(1)
“A” “C”
Old Production
27
1. The ‘shadow” table is loaded, as before. In this situation, we have created the
“shadow” table with the OBID parameter that matches the OBID of the
production counterpart. This means that the data pages for the tablespace
already have the correct OBID.
2. The other internal values that matter can be set by the REPAIR utility.
Remember that the OBID in the index pages is ignored by DB2.
3. The RENAME process is the same as the original strategy.
27
The Details (Preliminary Steps)
• Create new database to hold “shadow” objects
• Create “shadow” tablespace, as before
• Create “shadow” table with OBID that matches the production
counterpart. This requires that “shadow” objects be in a different database
from production.
• Create “shadow” indexes as before.
• Permit RACF ALTER access to underlying DB2 dataset structures to
application’s production ID.
28
The only change here is that we insure that the “shadow” objects are created in a
separate DB2 database. This will allow us to create the table using the OBID
parameter that matches the production table OBID.
28
The Revised LOAD Process (LOAD)
29
The COPY utility is eliminated entirely so the MODIFY and COPY are eliminated.
29
The Revised Transformation (REPAIR)
Run the DB2 REPAIR Utility to change object identifiers in page headers.
30
Here is an example of the use of the REPAIR utility to alter the internal identifiers
for both the tablespace and the indexspaces.
30
The Revised Transformation
(IDCAMS Delete)
• DELETE “C” datasets to avoid dataset name collisions during upcoming
RENAME
• This is for restart purposes and will fail normally with RC=12
31
No change here.
31
The Revised Switch
(STOP Production & Shadow Objects)
-STOP production tablespace and indexspaces for rename
This begins our reduced period of unavailability
32
No change here.
32
The Revised Switch (RENAME #1)
• Rename “A” production objects to “C” datasets
ALTER TESTIU2.DSNDBC.DQA1MW3.SCPHADDR.I0001.A001 -
NEWNAME(TESTIU2.DSNDBC.DQA1MW3.SCPHADDR.I0001.C001)
ALTER TESTIU2.DSNDBD.DQA1MW3.SCPHADDR.I0001.A001 -
NEWNAME(TESTIU2.DSNDBD.DQA1MW3.SCPHADDR.I0001.C001)
33
No change here.
33
The Revised Switch (RENAME #2)
• Rename “A” shadow objects to “A” production dataset names
ALTER TESTIU2.DSNDBC.DQA1MW3X.SHADOWTS.I0001.A001 -
NEWNAME(TESTIU2.DSNDBC.DQA1MW3.SCPHADDR.I0001.A001)
ALTER TESTIU2.DSNDBD.DQA1MW3X.SHADOWTS.I0001.A001 -
NEWNAME(TESTIU2.DSNDBD.DQA1MW3.SCPHADDR.I0001.A001)
34
No change here.
34
The Revised Switch
(START Production Objects)
35
No change here.
35
The Revised Switch
(Cleanup for tomorrow)
• Delete “C” datasets
DELETE (TESTIU2.DSNDBC.DQA1MW3.SCPHADDR.I0001.C001) PURGE
DELETE (TESTIU2.DSNDBC.DQA1MW3.IXCRPHYS.I0001.C001) PURGE
-
-
-
36
No change here.
36
Comparing one strategy with another
• The revised “shadow” load strategy can consume less DASD than
the original
1. The original production objects
2. The “shadow” objects
• Minimum DASD high water mark is 2 times production
• The revised strategy is faster
• No COPY utility
• No DSN1COPY utility
37
This visual summarizes the major changes in the streamlined version of the
“shadow” load process.
37
Complications to Expect (Names and numbers of things)
• Database Names
• Most flexible if “shadow” is in a database different from production objects
• Allows tablespace name to be retained in “shadow”
• Allows use of OBID parameter on “shadow” table create DDL
• Separate database name is required for the “revised” shadow load approach.
• Indexspace Names
• Long index names (> 8 characters) will certainly produce different
indexspace names.
• Consider building an automation tool to keep indexspace names and index
OBID’s correctly matched up. See notes for a reference.
38
For a suggested way to build your own tool see presentation “DBA Toolkit - Try this
yourself at home” by Mike Walsh in the 2002 IDUG North American Conference
proceedings.
38
More Considerations
• Dataset extensions
• DB2 extends a tablespace to additional VSAM datasets automatically.
(…A002,…A003)
• The “shadow” load process does not automatically recognize additional
VSAM objects so you must be aware of the number of VSAM datasets
required and code accordingly
• REORG Fast Switch
• Fast Switch will change VSAM object name structure
• Tables loaded daily are rarely REORG’ed
39
The LOAD process automatically acquires additional VSAM datasets when the
capacity of a single dataset is exceeded. The “shadow” load process does not
automatically recognize this condition.
If you have this situation, you may have to allocate additional datasets even though
they may not be used. Additionally, the JCL in your job may need to be enhanced
with condition code checking to tolerate such errors as attempting to DELETE a
non-existent dataset.
39
SunTrust Experience
• OBIQ Balance Tables
• Range between 9 – 25 million rows
• LOAD Utility requires between 6-15 minutes elapsed
• Switch times between 1-2 seconds elapsed
• Outage more dependent on number of objects (indexes and partitions) than on
row counts
• SQLCODE -904 occurs typically less than 10 times during the entire
“shadow” load process for all 20 tables.
40
40
Tool Vendor Opportunities
High Availability LOAD
• Create “shadow” datasets using target tablespace and
indexspace attributes but not defined to DB2.
• Perform LOAD to “shadow” objects outside DB2’s control.
• Perform -STOP, IDCAMS RENAME, -START followed by
IDCAMS DELETE of old production objects
or (even better)
• Perform DRAIN, RENAME or FASTSWITCH followed by
IDCAMS DELETE of old production objects
41
This visual outlines the author’s vision of how such a vendor offering might be
implemented.
41
LOAD REPLACE 25 Million Rows in 2 Seconds
Session: A7
Mike Walsh
SunTrust Bank
[email protected]
42
42