DB2 For ZOS Course 1
DB2 For ZOS Course 1
cover
Student Notebook
ERC 1.0
Trademarks
IBM® is a registered trademark of International Business Machines Corporation.
The following are trademarks of International Business Machines Corporation in the United
States, or other countries, or both:
AD/Cycle AS/400 BookManager
C/370 CICS Cloudscape
DB2 DB2 Connect DB2 Extenders
DB2 Universal Database DFSMSdss DFSMShsm
Distributed Relational
DFSORT DRDA
Database Architecture
FlashCopy Footprint GDDM
IMS iSeries Language Environment
Lotus Multiprise MVS
OS/2 OS/390 Parallel Sysplex
pSeries QMF RACF
RMF SAA SP
SQL/DS VTAM WebSphere
xSeries z/Architecture z/OS
zSeries 1-2-3 400
Java and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the
United States, other countries, or both.
Microsoft, Windows and Windows NT are trademarks of Microsoft Corporation in the
United States, other countries, or both.
Intel is a trademark of Intel Corporation in the United States, other countries, or both.
UNIX is a registered trademark of The Open Group in the United States and other
countries.
Linux is a registered trademark of Linus Torvalds in the United States and other countries.
Other company, product and service names may be trademarks or service marks of others.
The information contained in this document has not been submitted to any formal IBM test and is distributed on an “as is” basis without
any warranty either express or implied. The use of this information or the implementation of any of these techniques is a customer
responsibility and depends on the customer’s ability to evaluate and integrate them into the customer’s operational environment. While
each item may have been reviewed by IBM for accuracy in a specific situation, there is no guarantee that the same or similar results will
result elsewhere. Customers attempting to adapt these techniques to their own environments do so at their own risk.
TOC Contents
Trademarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxi
Course Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
Agenda . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
viii DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V2.0
Student Notebook
xii DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V2.0
Student Notebook
xiv DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V2.0
Student Notebook
xvi DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V2.0
Student Notebook
xviii DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V2.0
Student Notebook
Bibliography. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . X-1
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .X-5
TMK Trademarks
The reader should recognize that the following terms, which appear in the content of this
training document, are official trademarks of IBM or other companies:
IBM® is a registered trademark of International Business Machines Corporation.
The following are trademarks of International Business Machines Corporation in the United
States, or other countries, or both:
AD/Cycle® AS/400® BookManager®
C/370™ CICS® Cloudscape™
DB2® DB2 Connect™ DB2 Extenders™
DB2 Universal Database™ DFSMSdss™ DFSMShsm™
Distributed Relational
DFSORT™ DRDA®
Database Architecture™
FlashCopy® Footprint® GDDM®
IMS™ iSeries™ Language Environment®
Lotus® Multiprise® MVS™
OS/2® OS/390® Parallel Sysplex®
pSeries® QMF™ RACF®
RMF™ SAA® SP™
SQL/DS™ VTAM® WebSphere®
xSeries® z/Architecture™ z/OS®
zSeries® 1-2-3® 400®
Java and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the
United States, other countries, or both.
Microsoft, Windows and Windows NT are trademarks of Microsoft Corporation in the
United States, other countries, or both.
Intel is a trademark of Intel Corporation in the United States, other countries, or both.
UNIX is a registered trademark of The Open Group in the United States and other
countries.
Linux is a registered trademark of Linus Torvalds in the United States and other countries.
Other company, product and service names may be trademarks or service marks of others.
xxii DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V2.0
Student Notebook
Duration: 3 days
Purpose
You will learn about the new features and enhancements of DB2 UDB
for z/OS Version 8 in this detailed technical description of DB2 UDB for
z/OS Version 8, including the functional enhancements and the
technical requirements of this very significant new version of DB2 UDB
for z/OS.
Audience
System and database administrators, application developers, and
other individuals who need a technical introduction to selected new
features of Version 8.
Prerequisites
You should have practical experience with DB2 UDB for z/OS and
OS/390 and have a basic knowledge of the functions and usage of
DB2 UDB for z/OS and OS/390 Version 7.
Objectives
After completing this course, you should be able to:
• Describe selected new features and enhancements of DB2 UDB
for z/OS Version 8
• Evaluate the usefulness of the new features and enhancements of
DB2 UDB for z/OS Version 8
Contents
Scalability
Availability
SQL Enhancements
e-Business Features
Unicode in DB2 for z/OS
Network Computing Enhancements
Application Enablement Enhancements
Utility Enhancements
Performance Enhancements
Data Sharing Enhancements
Installing and Migrating DB2 UDB for z/OS Version 8
Curriculum Relationship
CG37 DB2 UDB for z/OS and OS/390 - V7 Transition
pref Agenda
Day 1
Welcome
Unit 1 - Scalability
Unit 2 - Availability
Unit 3 - SQL Enhancements
Day 2
Unit 3 - SQL Enhancements (Cont)
Unit 4 - e-Business
Unit 5 - Unicode in DB2 for z/OS
Unit 6 - Network Computing
Unit 7 - Application Enablement
Unit 8 - Utility Enhancements
Day 3
Unit 9 - Performance Enhancements
Unit 10 - Data Sharing Enhancements
Unit 11 - Installation and Migration
viii DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V3.1
Student Notebook
List of Topics
The problem
The 64-bit architecture support
64-bit processor
The z/OS architecture
Real
Virtual
Pre-DB2 V8 real storage exploitation
DB2 V8's virtual storage expansion
Moving above the 2GB Bar
More partitions
More log data sets
More tables in a JOIN
Longer SQL statements
Longer index keys and predicates
Notes:
DB2 for z/OS V8 offers extensive synergy with the zSeries platform and the z/OS operating
system, and breaks through many of the limitations previously imposed by the operating
system. These enhancements have a positive effect on scalability and availability by
delivering large address spaces with the exploitation of the 64-bit virtual addressing
provided by the z/Architecture. With this support, DB2 can guarantee to keep up with the
explosive demands of e-business, transaction processing, and business intelligence.
With DB2 V8 you can manage more data with larger buffers in memory, utilize larger
control storage areas such as the EDM and RID pools, and gain more capacity for
concurrent locks. You can also access your data through more partitions and join more
tables in a single SQL statement.
In this unit, we discuss the following topics:
• The 64-bit architecture support
• More partitions
• More log data sets
• More tables in join
1-2 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
1-4 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Memory
Intensive
Applications
Paging / Page DB2 based
(TSO, Notes,
Movement Delay Applications
Baan etc.)
(To/from ESTORE)
G6
CPU Power (Bigger the image, the more real storage constraint hurts)
Adding CPU capacity resulted in little or no additional real work being done
Paging overhead increased due to 2 GB (31-bit ) central storage limit
Path length increased due to high MIPS and N-way system effects
Parallel Sysplex multisystem environments
Expanded storage had provided an excellent interim solution
G5/G6 implemented the enhanced MOVE PAGE instruction
Notes:
This visual demonstrates why the new 64-bit architecture was introduced. It shows that, for
several storage starved environments, adding CPU capacity does not allow any further
growth, and results in little or no additional real work being done. The limiting factor was
that the paging overhead increased due to the 2 GB (31-bit) central storage limit.
Expanded storage has provided an excellent interim solution, and the G5/G6 processors
brought some relief with the implementation of the enhanced MOVE PAGE instruction, but
systems with large and variable workloads need the storage constraint removed.
1-6 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
DBM1 Virtual Storage Constraint Relief
DBM1 storage constraint - single biggest inhibitor to scaling
workload on 31-bit machines will become even larger issue
z/Architecture and large 64-bit main memories take a hold in the field
Processor power keeps increasing
What has caused the dramatic increase in DBM1 storage?
Larger workloads
New DB2 functionality
Larger REAL storage available on new processors
z/Architecture and z/OS support for REAL storage > 2GB
Problem made worse, reduced paging, faster CPUs promote larger workloads
Notes:
Over the years, virtual storage usage has grown dramatically in DB2's DBM1 address
space. This storage growth has been fueled by larger workloads, new functions, and larger
real storage available on mainframe processors. The latter, in particular, has allowed
customers to run workloads that in the past would have been definitely limited by paging
overhead.
With the arrival of z/Architecture and z/OS support for real storage larger than 2 GB, we
have seen that the problem may become worse, since the reduced paging, faster CPUs,
and higher multi-processor levels can promote larger and larger workloads. The DBM1
2 GB virtual storage constraint, already the single biggest inhibitor to scaling DB2
workloads on 31-bit machines, becomes an even larger growth inhibitor as z/Architecture
and large 64-bit main memories continue to take hold in the field.
Notes:
The figure above shows a typical customer case. The biggest storage consumers inside
the DBM1 address space are usually the buffer pools, followed by the EDM pool.
Remember also that when you use the local dynamic statement cache
(KEEPDYNAMIC(YES) in combination with CACHEDYN=YES), that the local dynamic
statement cache can take up a large amount of virtual storage. In addition, do not forget
that every open compressed data set uses a 64 KB compression dictionary that has to be
loaded into storage.
And last but not least, it is important to remember that some storage is shared between all
users of the system; the largest areas being CSA and ECSA. When you use the IRLM
option PC=NO (in V7), your IRLM locks are stored in ECSA. Also, when you have
WebSphere running on the same LPAR, WebSphere usually consumes a lot of ECSA
storage as well, and all storage allocated to CSA/ECSA cannot be used for addressing the
(extended) private areas.
1-8 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty There is an introductory article by Mary Petras and John Campbell on this subject. It can be
found at:
http://www.idug.org/idug/member/journal/mar00/storage.cfm
z/OS 1.3
DB2 Version 8
Notes:
The solution consists of three parts:
• 64-bit capable hardware (zSeries)
• 64-bit capable operating system:
- 64-bit real storage support (OS/390 V2R10 and above)
- 64-bit virtual storage support (z/OS 1.2 and above)
• 64-bit capable subsystems (DB2 for z/OS Version 8)
Before we explore the 64-bit support in DB2 V8, we first look at the hardware and operating
system effort of going to 64-bit.
1-10 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Notes:
IBM launched the zSeries in the year 2000. This class of servers was designed for high
performance data and transaction serving and was optimized to handle the volatile
demands of the e-business climate.
zSeries is a family of processors that use the new z/Architecture (formerly known as
ESAME Architecture). The z/Architecture is commonly known as “64-bit” architecture,
although it provides much more than 64-bit capabilities.
Currently three types of zSeries machines are available, available in different models:
• z990, with up to 256 GB of memory
• z900, with up to 64 GB of memory
• z800/z890, with up to 32 GB of memory
1-12 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
64-bit Memory Architecture (The Solution)
Central
Expanded
Storage Storage Picture not
to scale
Notes:
This visual gives us a pictorial representation of the evolution of the memory management
from the 24-bit to the 31-bit to the 64-bit support.
The ESA/390 architecture limits the amount of central storage that can be configured to a
single OS/390 image to 2 GB. OS/390 Version 2 Release 10 removes the 2 GB real
storage restriction by utilizing the new 64-bit architecture. OS/390 V2R10 supports up to
128 GB of central storage, when running in z/Architecture mode. The 128 GB limit is a
software restriction that was implemented to prevent the Page Frame Table from exceeding
1 MB. (This restriction is still in place in z/OS V1.5, and may be lifted in a future z/OS
release.)
In z/OS V1.2, IBM delivered the initial 64-bit virtual storage management support. With the
new z/OS 64-bit operating environment, an application address space can have 2 to the
power of 64 (or 2**64) virtual addresses, with backing by real storage as needed. With this
new architecture, z/OS delivers the functions to meet the needs of growing e-business
application environments that will dominate future commercial data processing while
maintaining today’s critical applications.
2 GB IRLM
64-bit 2 GB
WebSphere
addressing 31-bit
31-bit Application
Application
Notes:
When IPLing the system in 64-bit mode (ARCHLVL = 2), provided you have the right
zSeries hardware, and are at least at OS/390 V2R10, you can start exploiting the larger
amounts of real memory that are available on the zSeries machines. (Note that starting
with z/OS V1.2, the system will enforce the architecture level based on the IPL processor,
and you should therefore no longer specify the ARCHLVL in the LOADxx member.)
You can exploit larger real memory with DB2 Version 6 and Version 7, for example, by
exploiting data space buffer pools. So even before migrating to Version 8, you can already
take advantage of the 64-bit real support and large amounts of real memory. For more
information on how DB2 Version 6 and 7 can exploit 64-bit real addressing on the new
zSeries hardware, see Figure 1-12 "DB2 (Pre-V8) z/Architecture Exploitation" on page
1-20.
1-14 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
REAL Storage Support on zSeries - Provided
Constraint relief
For workloads limited by 2 GB real storage
Enabled value from 16-way multiprocessor on a z900, or 32-way on a z990
Allowed consolidation of LPARs
Notes:
OS/390 V2R10 and z/OS have provided the 64-bit real storage addressability needed to
scale in real memory addressing. OS390 V2R10 has the ability to run in either 31-bit mode
or 64-bit mode on a zSeries, while z/OS only runs in a 64-bit mode real storage
environment. (For completeness, you can run z/OS in 31-bit mode on a zSeries machine
when using the z/OS Bimodal Migration Accommodation offering. This offering is only
provided on z/OS V1.2, 1.3, and 1.4.)
z/OS 1.2 and later releases provide 64-bit virtual storage exploitation of the addressing
range above 2 GB.
Basically, R10 has provided initial z/Architecture real addressing support (up to 128 GB of
central storage) and the support for 24-bit, 31-bit, and 64-bit applications.
z/OS 64-bit real storage support has provided significant and transparent reduction of
paging overhead, now only to disk, and real storage constraint relief for workload limited by
the 2 GB of real storage by configuring all z990, z900, z800 and z890 memory as REAL.
The elimination of Expanded Storage support has been handled by z/OS with minimal
customer impact while reducing memory management overhead. However, remember that
in the absence of expanded storage, paging will now be to disk only. Therefore it is
important to have enough real storage to back up your workload in order to avoid paging.
Large real storage memory support has also enabled the exploitation of 16-way
multi-processors in z900, and up to 32-way on the z990, and allowed the consolidation of
LPARs.For more information on 64-bit real exploitation, see the z/OS migration Web site at:
http://www.ibm.com/servers/eserver/zseries/zos/installation/
1-16 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
REAL Storage Support on zSeries - Migration
Ease of migration
z/OS and z/OS.e expect to run 64-bit mode on z990/z900/z800
New zSeries hardware supports both ESA and z/Architecture modes
z/OS tolerates 31-bit architecture of G5/G6 & Multiprise Server
Mix 31-bit or 64-bit real systems in LPARs or Sysplex
Minimal sysprog setup required
System services reimplemented to use central storage instead of ESTOR
Much of the value is delivered by the operating system itself
Pageable storage can be backed anywhere
Everyone benefits!
Bimodal migration accommodation offering (z/OS 1.2, 1.3, 1.4)
Notes:
IBM has put a lot of effort into making it easy to migrate to z/OS and zSeries. The zSeries
hardware supports both ESA (ARCHLVL 1) and z/Architecture (ARCHLVL 2) modes.
z/OS can be run on the 31-bit architecture hardware of G5/G6 and Multiprise Servers.
(It is worth noting that z/OS V1.5 will be the last release to do so. Starting with z/OS V1.6,
a zSeries machine is required.) However, when running z/OS and z/OS.e on the
z990/z900/z800/z890 hardware, these operating systems expect to run 64-bit mode. To
help out customers that are reluctant to run in 64-bit mode, the day they install a zSeries
machine, a z/OS Bimodal Migration Accommodation offering is available. It is available at
no charge as a download off the Web. It gives customers the security of knowing they can
fallback to 31-bit mode if there are any 64-bit problems during their migration. The 31-bit
mode is fully supported during the Accommodation period. The Bimodal Migration
Accommodation offering is only available for z/OS 1.2, 1.3, and 1.4.
Important: It should be noted that DB2 Version 8 can only run in 64-bit mode. Bimodal
mode is not supported for DB2 V8. When you are using the Bimodal support as an
“insurance policy” in case something goes wrong after migrating to z/OS and zSeries,
you need to create a stable 64-bit environment, where you no longer need to be able to
“fall back” to 31-bit mode, before migrating to DB2 Version 8.
You can mix 31-bit or 64-bit real systems in LPARs or sysplex.
With the 64-bit support, system services have been re-implemented to use central storage
instead of expanded storage (ESTOR).
1-18 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
Large Real Memory Support
For all versions of DB2 (no application code changes required)
Facilitates system consolidation activities
Allows multiple DB2 systems (single images or data sharing) on a
single OS image without significant paging activity
Larger CF Structures ( > 2 GB) for DB2 were also introduced with
zSeries at CFLEVEL 12
© Copyright IBM Corporation 2004
Notes:
All DB2 versions can benefit from larger real memory support. With more real storage
available on a single LPAR, you may consider consolidating LPARs, or just start taking
advantage of using the larger amounts of real storage that is available, by resizing your
buffer pools, for example. As always, make sure you that your workload is backed by
enough real storage; we do not want to introduce too much paging in the system by
overcommitting real storage.
However, you may want to take the opportunity to re-evaluate how DB2 uses its buffer pool
storage: virtual pools versus hiperpools versus data space buffer pools.
Notes:
Even though this course is focusing on DB2 V8, you do not have to wait until you get to V8
before you can start taking advantage of the 64-bit capabilities of the hardware and
operating system. Pre-V8 systems can already get a serious performance boost from being
able to exploit the increased number of CPUs and faster CPUs of the z/Architecture
machines, as well as the support for real memory beyond 2 GB. As mentioned earlier, even
pre-V8 versions of DB2 can make good use of this extra memory and increase in
processing power, as we will demonstrate in the next topics.
In order to exploit real storage above the 2 GB bar in releases prior to V8, you need to have
the PTFs for the following APARs installed on Version 6. You almost certainly have those
installed, as this support was introduced in 1999.
• PQ25914 allows data spaces and virtual buffer pools to be backed by real storage
above 2 GB.
• PQ36933 avoids ABEND0D3 when testing whether a page is in memory with real
storage above the 2 GB bar.
1-20 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty DB2 V7 has this support in the base code. Also make sure to be current on OS/390 and
z/OS maintenance, to avoid real storage manager problems that were discovered at the
early stages of 64-bit real addressing support.
For more information about which DB2 versions can run on zSeries hardware, see:
http://www-1.ibm.com/support/docview.wss?rs=64&context=SSEPEK&q1=64+bit+
support&uid=swg21009394&loc=en_US&cs=utf-8&lang=en+en
Notes:
Increased processing power is always good news for DB2 users, as it can drive business
workloads to greater heights.
However, there are also problems associated with increased processor capacity, especially
when it is not balanced with equal improvements in I/O bandwidth.
When the workload increases, it usually requires more data to be processed, which
subsequently means more I/O. Over the years I/O latency has improved quite a lot, but it
has not been able to keep up with the increases in CPU speed. For that reason, I/Os
become more and more precious, and the amount of I/O required can have a big impact on
the performance of applications and systems in general. The solution to this problem is to
avoid I/O as much as possible, and one of the ways to do so is to cache more data in
memory. However, more data caching means we need larger amounts of memory to be
able to cache that data.
1-22 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
Larger Buffer Pools
Large and ever cheaper main memories available
Gain performance advantages
Restrictions
Due to DBM1 VSTOR constraint, DB2 forces the maximum BPOOL sizes <
machine memory (on a 2 GB central storage LPAR):
Size of VPOOL limited to 1.6 GB
(Typically only 1 GB due to other constraints)
Frequently in the range 400-800 MB
Problem
Cannot increase virtual pools as needed to avoid I/O for many workloads
Specifically those which repeatedly access same data elements and index
keys
Solution
Hiperpools (DB2 Version 3)
Notes:
With the very large and ever-cheaper main memory capacity that is available on the current
and upcoming z/Architecture machines (currently 10s of GB, into the 100s of GB now), it is
becoming feasible for customers to configure very large buffer pools to gain significant
performance advantages. However, due to DBM1 virtual storage constraints, DB2 currently
enforces maximum buffer pool sizes that are less than the memory capacities of these
machines.
The total size of virtual pools is limited to 1.6 GB. However, in actual practice, customers
typically cannot configure more than 1.0 GB for their buffer pools, due to DBM1 virtual
storage constraints. EDM pool, buffer pool control blocks, VSAM control blocks, and
compression dictionaries are other sizeable contributors to the demands on DBM1.
The fact that DBM1 virtual storage was becoming an inhibitor to DB2 scalability became
clear in the early nineties. To reduce the size of the virtual pool inside the DBM1 address
space, DB2 Version 3 introduced hiperpools. Hiperpools “live” in expanded storage, so they
do not compete for real storage, in so-called expanded-storage-only hiperspaces, outside
the DBM1 address space, bringing excellent virtual storage constraint relief for many years.
Hiperpools
Buffers in hiperspaces (hiperpools) offer some relief
Expanded Storage exploitation - Cheaper memory option
Benefits
High performance data access
Larger amount of data in memory
I/O elimination - Large buffer pool without OS paging problems
Still requires a substantial virtual pool size for effective usage
Limits on size (8 GB max HP size)
Contains ONLY clean pages - NOT updated pages not written to disk
Page addressable, hence buffers must be in virtual pool before use
No direct I/O in and out of a hiperpool
Solution
Data space buffer pools
Notes:
Although a very good solution for many years, and exploited by many customers,
hiperpools come with a set of limitations of their own:
• DB2 V7 limits the total size of hiperpools to 8 GB. This limit could be raised, however
hiperpools have several other drawbacks which make them undesirable as a long term
solution:
- They are only page addressable, not byte addressable, and therefore buffers must
be moved into the virtual pool before they can be used.
- They can contain only clean pages.
- You cannot do I/O directly into or out of a hiperpool.
- The hiperpool page control blocks reside in the DBM1 address space and thus
contribute to virtual storage constraints.
- Hiperpools require a fairly substantial virtual pool size for effective use. Typically, the
hiperpool to virtual pool size is on the order of 2:1 to 5:1. Therefore, virtual pool size
ultimately limits hiperpool size.
1-24 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty - A separate set of latches is used to manage hiperpool and virtual pool buffers, so as
the frequency of page movement between virtual pools and hiperpools increases,
the Least Recently Used (LRU) management of these pools increases, and latch
contention issues can quickly arise.
Hiperpools were designed over a decade ago to exploit ESA and to make efficient use of
large amounts of expanded storage. To overcome some of the hiperpool limitations, DB2
V6 introduced virtual pools in data spaces.
Notes:
Data spaces provide a good short term solution by exploiting the 64-bit real memory
support introduced with OS/390 V2R10. Since DB2 V6 you can place buffer pools (as well
as the dynamic statement cache) in data spaces, freeing up storage for other work in the
DBM1 address space. Note that there is a performance penalty associated with buffer
pools in data spaces, when such data space buffers are not 100% backed by real storage.
The advantages of data spaces over hiperpools are:
• Read and write cache with direct I/O to data space
• Byte addressability
• Large buffer pool sizes (32 GB for 4 KB page size and 256 GB for 32 KB page size)
• Single buffer pool can span multiple data spaces
• Multiple buffer pools in same data space
• Excellent performance experienced with z990 and z900 with large processor storage
• Performance dependent upon being in 64-bit real mode
With the z/Architecture processors running in 64-bit addressing mode and having no
expanded storage (all storage is central), hiperpools have no reason to exist.
1-26 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty Hiperpools and supporting ESO hiperspace APIs are emulated when running in 64-bit
ESAME mode, as there is no expanded storage on a zSeries machine. So when using
hiperpools, we move real storage (hiperpool) to real storage (buffer pool). zSeries is not as
efficient relative to G6 with regards to the MVPG instruction, which is issued for page
movement; when moving pages between expanded storage and real storage, zSeries took
20 to 30% more CPU time compared to G6.
Therefore, data space buffer pools are definitely recommended over hiperpools when
running on zSeries with 64-bit ESAME mode, when backed by real storage.
DBM1
2 GB
Lookaside
Notes:
Even though data space buffer pools outperform hiperpools on a zSeries running in 64-bit
mode with sufficient amounts of real storage to back up the data space buffers, data space
buffer pools are not free of shortcomings either.
The total size of data space virtual pools is limited to 32 GB (4 KB page size). This limit is
imposed by the control structures to manage the data space pages, which reside in the
DBM1 address space, and allow up to a maximum of 8 million pages. Also, the lookaside
pool resides in DBM1 and requires storage.
Although data spaces provide a good short term solution for exploiting 64-bit real memory,
they are undesirable as a long term solution, not only because of the size limitations, but
also because of the overhead involved with copying buffers between the data spaces and
the lookaside pool as they are accessed and updated. Data spaces have scalability issues,
and the VSTOR limit of 2 GB for DBM1 address space remains the biggest constraint to
achieving linear scalability for DB2 systems.
1-28 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
Other Considerations
Problems with both hiperpools and data space buffer pools
Still require work space in the DBM1 address space
Require page movement
Elapsed time
CPU time
Some amount of monitoring and tuning still required
Business requirements to manipulate more data in DB2
Larger objects
Image, text, video
XML data
Notes:
Some other considerations and problems with both hiperpools and data space buffer pools
is that they require page movement, which takes CPU cycles and elapsed time to
complete. They both also require a considerable amount of virtual storage inside the DBM1
address space.
In addition, new business requirements demand that more data and new types of data
need to be handled by DB2 systems. These requirements include:
• Handling larger objects
• Handling Image, text, video data
• Handling XML data
Notes:
Whereas OS/390 V2R10 introduced 64-bit real addressing support, which benefits all DB2
versions, z/OS V1.2 introduced the infrastructure for the 64-bit virtual addressing support
that DB2 Version 8 will exploit.
In this visual, we show the mapping of an address space using 64-bit addressability. The
left-hand column numbers are upper boundary hexadecimal values associated with each
area. The picture is obviously not drawn to scale. The area above 2 GB, also called “above
the bar”, would be dramatically larger if drawn to scale, and should be larger by a factor of
10 to the power of 12. This huge addressing range should be able to accommodate the
storage requirements for many years to come.
With z/OS 64-bit virtual storage support, database subsystems like DB2 and other
middleware can make use of this large 64-bit virtual storage to increase capacity by
supporting a larger number of concurrent users and concurrent transactions. DB2 was one
of the first subsystems designed for the MVS and OS/390 31-bit environment and one of
the first subsystems to support MVS and OS/390 extended addressability.
1-30 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty DB2 has established itself as the enterprise database manager of choice for OS/390 with
its abilities to handle varied large system workloads efficiently, including transaction and
large query environments. DB2 is now again one of the first subsystems to take advantage
of 64-bit data addressability. With 64-bit virtual storage exploitation, DB2 can relieve virtual
storage constraints and provide capacity enhancement to a large number of DB2
applications.
DB2 64-bit virtual storage exploitation is a two-step plan described in the white paper, IBM
eserver zSeries 900 z/OS 64-bit Virtual Storage Roadmap, available in PDF from the Web
site:
http://www.ibm.com/servers/eserver/zseries/library/whitepapers/
gm130076.html
In summary, the first step is to take advantage of the basic 64-bit virtual storage system
infrastructure and system services to enhance database manager buffer support. With DB2
V8 all existing 31-bit DB2 applications (including those written in Assembler, PL/I, COBOL,
FORTRAN, C/C++ and Java), as well as future DB2 applications, can benefit transparently
from DB2’s 64-bit virtual storage support.
The second step is to exploit the z/OS C/C++ and Java infrastructure to extend DB2’s
support to 64-bit C/C++ and Java applications. DB2 APIs will be enabled to support 64-bit
data, to facilitate 64-bit C/C++, and Java applications, to access existing data or store new
data into the database.
DB2 V8 makes use of extra services provided by z/OS V1R3, and therefore you have to be
on z/OS V1.3 before you can migrate to DB2 Version 8.
Notes:
As mentioned before, DB2 Version 8 will be one of the first IBM subsystems, if not the first,
to exploit 64-bit virtual storage support. This is mainly achieved by moving large storage
consumers from above the 16 MB line, to above the 2 GB bar.
As discussed in more detail later on in this publication, DB2 V8 uses a multi-step migration
process; from V7 to V8 compatibility mode, to V8 enabling-new-function mode, to V8
new-function mode. Most of the new functions in DB2 V8 are only available when you get
to new-function mode. The 64-bit exploitation, on the other hand, is available after you
migrate your Version 7 system to Version 8 compatibility mode; in other words, it is
available from day one in DB2 V8.
The 64-bit enablement of DB2 is completely transparent to all existing applications.
1-32 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
Why Implement 64-bit VSTOR Support?
Significant importance to DB2:
Enhanced data caching in memory will deliver higher performance
Relieving virtual storage constraint will provide scalability
Increases maximum buffer pool sizes
Eliminates need for hiperpools and data spaces
Simplifies DB2 systems management and operational tasks
Notes:
The implementations of 64-bit virtual support is very important to DB2 for z/OS. It allows
DB2 to significantly enhance its in-memory data caching capabilities, and allows DB2 to
deliver better performance.
Relieving the currently existing virtual storage constraints will provide much better
scalability for DB2 systems in a mainframe environment. Today’s virtual storage constraints
in the DBM1 address space is the most important scalability inhibitor for DB2 subsystems.
With the advent of 64-bit virtual support in the operating system and DB2’s exploitation
thereof, it allows DB2 to increase maximum buffer pool sizes, and eliminates the need for
hiperpools and data spaces, which will simplify DB2 systems management and operational
tasks as well.
vers &
DB2 Mo
Inc.
Shakers
Notes:
The following storage areas inside the DBM1 address space move above the 2 GB bar:
• Buffer pools
• Buffer pool control blocks
• RID pool
• Compression dictionaries
• EDM Pool - DBDs, OBDs, and dynamic statement cache
• Castout buffers
• Sort pool
1-34 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
DB2 Benefits of 64-bit VSTOR Support
VSTOR relief utilizes 64-bit virtual addressing to move data areas
above 2GB bar in DBM1 address space
Especially large storage areas like buffer pools will move
DB2's "data access" modules enhanced to access 64-bit
addressable buffers "in place"
NO data movement between the data space and DBM1 as per data
space buffer pools today
Notes:
DB2 V8’s virtual storage constraint relief utilizes the z/OS 64-bit virtual addressing
capabilities to move data areas above the 2 GB bar in DBM1 address space. Especially
large storage areas like buffer pools, sort and RID pool, compression dictionaries, DBDs,
and the dynamic statement cache will move above the 2 GB bar.
DB2's “data access” modules have been enhanced to access 64-bit addressable buffers
“in place”. In V8 there is no data movement of pages between the data space and the
lookaside buffer in DBM1, as is the case with data space buffer pools today in V7.
General Expectations
The performance objective with the 64-bit virtual support is to increase system throughput
with virtual storage constraint relief. This allows DB2 to support more concurrent threads.
The new factors affecting performance are:
• The 64-bit address translation
• Increased code size due to 4-byte versus 2-byte instructions
1-36 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
64-bit Virtual Buffer Pool Support
DB2 Version 8 is 64-bit exclusive (even in compatibility mode)
Buffer pools always allocated above 2 GB bar
Eliminates need for hiperpools and data space buffer pools
Terminology
As of V8, terms buffer pool and virtual pool become "synonymous"
Data space buffer pools, hiperpools no longer exist
No longer needed / supported in V8
Sizing and placement
Buffer pool max size is 1 TB
Total buffer pool max size is 1 TB
Make sure buffer pools are backed by real storage
Buffer pool control blocks also go above the bar
Castout buffers (data sharing) above the bar
V7 buffer pool information saved for fallback
Notes:
The main focus of DB2 V8 and virtual storage constraint relief is to utilize 64-bit virtual
addressing to move large storage consumers like the DB2 buffer pools and their
associated buffer control blocks above the 2 GB bar in the ssnmDBM1 address space.
DB2's data access modules have been enhanced to access the 64-bit addressable buffers
in place. That is, the data is accessed without any data movement as is done today for data
space buffer pools between the data space and the lookaside buffer.
DB2 V8 requires z/OS V1R3 or above as prerequisite. If an attempt is made to start DB2
V8 on an OS/390 or a z/OS R1 or R2 system, then DB2 issues an error message during
startup and terminates.
Note: These prerequisites can have implications for disaster recovery and sysplex
cross-system restart scenarios.
The use of 64-bit virtual addressing greatly increases the maximum buffer pool sizes. DB2
V8 is 64-bit exclusive, and always allocates the buffer pools above the 2 GB bar. This
effectively eliminates the need for hiperpools and data space buffer pools, thereby
simplifying DB2 systems management and operations tasks. Therefore, hiperpools and
data space buffer pools are no longer supported in DB2 V8. As of DB2 V8, the terms buffer
pool and virtual pool become synonymous.
Buffer pools can now scale to extremely large sizes, constrained only by the physical
memory limits of the machine (64-bit allows for 16 exabytes of addressability). System
consolidation by having multiple DB2 images, with or without data sharing, is now possible
without significant paging activity.
DB2 V8 increases the (theoretical) maximum buffer pool sizes to the limit of the
architecture, 1 TB. This limit is imposed by the real storage available:
• The maximum size for a single buffer pool is 1 TB.
• The maximum size for summation of all active buffer pools is 1 TB.
The buffer pool control blocks, also known as page manipulation blocks (PMBs), and the
data sharing castout buffers are moved above the 2 GB bar.
Important: Before you get carried away and start allocating huge buffer pools to
eliminate all DB2 I/Os in the system, you have to make sure that you do not overcommit
your real storage on your z/OS image. Make sure that by increasing the buffer pool size,
you do not introduce excessive paging. This is especially important in a 64-bit real
environment where there is no expanded storage, and all paging will be to DASD. Also
remember that DB2 is usually not the only application running on the z/OS image that
requires real storage.
DB2 V8 maintains the old V7 virtual pool and hiperpool definitions as they were at the time
of migration to be used in case of fallback, and it adds new definitions of buffer pools for
those catalog page sets which are not defined with a 4K buffer pool.
If you are running with adequate real storage and have many I/Os in some buffer pools,
you should look into using the new buffer pool PGFIX(YES) option to improve performance.
See Figure 1-28 "Additional Buffer Pool Information" on page 1-44 for more details on the
PGFIX(YES) option.
1-38 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
Configuration Changes
Max number of read, write, castout engines are now all 600
DB2 ensures at least 4 TB above bar using MEMLIMIT keyword
Additional enhancements:
Parameter Value
BP0 Minimum 2000 and default (from 56)
BP8K0 Minimum 1000 and default
BP16K0 Minimum 500 and default
BP32K Minimum 250 and default
CTHREAD Default is now 200 (from 70)
MAXDBAT Default is now 200 (from 64)
CONDBAT Default is now 10 000 (from 64)
IDFORE Default is now 50 (from 40)
IDBACK Default is now 50 (from 20)
Notes:
DB2 now uses 600 engines for asynchronous reads (already 600 since V6), writes (300 in
V7) and castout (300 in V7) processing. This will increase the amount of storage required
by system threads. (A system thread is usually around 128 KB per system thread). The
good news is that for castout buffers, this storage moves above the bar.
At startup time DB2 specifies a MEMLIM of 4 TB. The 4 TB limit is more to draw the line
somewhere, than a real practical limit on today’s machines. DB2 uses MEMLIM=4TB to
provide some protection for massive storage leaks that could bring down the entire LPAR.
Using 0 TB has the same effect as REGION=0M for storage below the bar, indicating that
there is no limit, and that is not advisable. But even with a limit of 4 TB, you must have
enough paging devices available to store all those page frames; otherwise you can still
bring down the entire LPAR, before reaching the 4 TB limit.
Buffer pools can now scale to extremely large sizes, constrained only by the physical
memory limits of the machine. The recommendation still stands that buffer pools should not
be over-allocated relative to the amount of real storage that is available. DB2 V8 issues the
following warning messages:
• DSNB536I: This indicates that the total buffer pool virtual storage requirement exceeds
the size of real storage of the z/OS image.
• DSNB610I: This indicates that a request to increase the size of the buffer pool will
exceed twice the amount of real storage.
• DSNB508I: This is issued when the total size used by all buffer pools exceeds 1 TB.
For more information about the effect of these messages, see Figure 1-27 "Sizing of Buffer
Pools" on page 1-42. Some DSNZPARM defaults have also changed, including
CTHREAD, MAXDBAT, CONDBAT, IDFORE, and IDBACK. See also Figure 11-15
"Changed Defaults" on page 11-44.
1-40 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
Migration Sizing of Buffer Pools
Data spaces and virtual pools (NO Hiperpool) - VPSIZE is used
Virtual pools with a corresponding hiperpool -
VPSIZE + HPSIZE is used
VPSEQT, VPPSEQT and VPXSEQT keep their sizes
Even if buffer pool size is determined by
VPSIZE + HPSIZE
DB2 V8 maintains original virtual pool and hiperpool sizes for
fallback
No change for new installations - Values taken from install
process
Notes:
When first migrating to V8, DB2 uses the following parameters to determine the size of the
buffer pool:
• For data space pools and virtual pools with no corresponding hiperpool, the VPSIZE is
used.
• For virtual pools with a corresponding hiperpool, VPSIZE + HPSIZE is used.
• VPSEQT, VPPSEQT and VPXSEQT keep their previous values, even if the buffer pool
size is determined by VPSIZE + HPSIZE.
DB2 V8 maintains the old V7 virtual pool and hiperpool definitions as they were at the time
of migration to be used in case of fallback, and it adds new definitions of buffer pools for the
catalog.
For newly installed V8 subsystems, as in prior releases, DB2 initially uses the buffer pool
sizes that were specified during the installation process. Thereafter, the buffer pool
attributes can be changed via the ALTER BUFFERPOOL command, and they are stored in
the BSDS.
ALTER
BPOOL
BP1 BP1
300MB 500MB
to 600 MB
Twice REAL storage capacity
ALLOC
BPOOL BP2 BP2
300MB 8MB
Notes:
Now that there is plenty of virtual addressing available, there is a concern that greatly
over-allocated virtual buffer pools can cause an auxiliary storage shortage, and lead to a
system wait state. To draw your attention to this, DB2 will put out warning messages:
• DSNB536I: This message is issued when the total size of the virtual pools exceeds the
total amount of real storage on the z/OS image. This is only a warning message, but it
should be a sign that you are probably overcommitting your real storage, because the
real storage is also needed for other storage users like the EDM and sort pool in DBM1,
as well as all other users in the system. DB2 puts out a message DSNB538I when you
are below the real storage size again. (These messages are not new in V8.)
• DSNB610I: This message is issued when the total buffer pool size reaches twice the
amount of real storage on the z/OS image, but the total amount of buffer pool space is
still less than 1 TB. When this limit is reached, DB2 starts taking action.
From this point on, allocations of new buffer pools or expansions of existing buffer pools
are limited to 8 MB per ALTER command, as follows:
1-42 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Notes:
The following ALTER BUFFERPOOL command parameters are no longer supported:
VPTYPE, HPSIZE, HPSEQT, CASTOUT. If they are specified, a warning message
DSNB539I is issued. The other parameters remaining unchanged are VPSEQT,
VPPSEQT, VPXPSEQT, DWQT, VDWQT, and PGSTEAL, although the defaults for DWQT,
VDWQT for local buffer pools, and CLASST and GBPOOLT for group buffer pools are
changing in V8. See Figure 11-15 "Changed Defaults" on page 11-44 for details.
1-44 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty PGFIX should be used for subsystems’ buffer pools which read or write frequently. The
recommendation is to alter your DB2 Version 8 buffer pools which have frequent page
reads or writes to use PGFIX(YES) if you have sufficient real storage available for these
buffer pools.
Important: Again notice the importance of having sufficient real storage to back up your
buffer pool 100%. Having 99.9% is not good enough, as the LRU buffer steal algorithm
will introduce paging if there is insufficient real storage to back up your buffer pool
completely. Therefore, make sure your buffer pool is fully backed by real storage before
you start using PGFIX(YES).
In some cases, this processing time (CPU time) reduction can be as much as 10% for I/O
intensive workloads with fixed buffer pool pages. To use this option, you can use the
PGFIX(YES) option on the ALTER BUFFERPOOL command. The ALTER takes effect at
the next BP allocation.
For user data, you can issue the following commands:
-ALT BPOOL(bpname) VPSIZE(0)
-ALT BPOOL(bpname) VPSIZE(yyyy) PGFIX(YES)
Here, bpname is the name of the buffer pool and yyyy is the current size of your buffer pool.
For the catalog and directory you can use:
-ALT BPOOL(bpname) PGFIX(YES)
-STOP DATABASE or STOP DB2
-START DATABASE or START DB2
Notice that the page fixing is at the buffer pool level. This option is available in V8
compatibility mode and beyond.
The LSTATS report removes the references to hiperpool related counters.
Notes:
As mentioned before, you need to be running on a zSeries machine in 64-bit mode, as well
as being on z/OS v1.3 or above. In order to prevent any problems, DB2 verifies these
prerequisites at DB2 (re)start.
In case you are not running on a z/Architecture, startup puts out a message DSNY011 -
z/Architecture required and abends with reason code 00E8005A.
In case you are not on z/OS V1.3 or above, restart puts out a message DSNY012 - z/OS
1.3 (or later) required and abends with reason code 00E80058.
As mentioned before, the following ALTER BUFFERPOOL command parameters are no
longer supported:
• VPTYPE
• HPSIZE
• HPSEQT
• CASTOUT
1-46 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty In case you forget to remove them, or specify them by accident, you receive a DSNB539I
warning message indicating that the specified value is ignored, and processing continues
for the other parameters.
All messages related to buffer pools have been revised:
• References to “virtual buffer pool” have been changed to “buffer pool”
• References to “data space buffer pool” or “hiperpool” have been deleted
RIDPOOL - Changes
Split into two parts
Small part below 2 GB bar storing RIDMAPs
Larger part above 2 GB bar storing RIDLISTs
(Bulk of RIDPOOL storage)
RIDLIST
No change to the installation panel RIDLIST
RIDLIST
Slight modification to guidelines for
estimating the size
RIDLIST
Same size RIDMAPs would have held half as many
RIDLISTs RIDLIST
RIDMAP size is doubled to accommodate the
same number of 8 byte RIDLISTs 64-bit pointers
Each RIDLIST now holds twice as many RIDs
RIDBLOCK size is now 32K
2 GB Bar
New RIDPOOL calculation:
Each RIDMAP contains over 4000 RIDLISTs
Each RIDLIST contains 6400 RID entries
Hence each RIDMAP/RIDLIST combination can
contain over 26 million RIDs compared to roughly RIDMAP RIDMAP
13 million in V7 and prior
31-bit pointers in
various control blocks
Notes:
The RID pool is split into two parts. A small part of the RID pool remains below the 2 GB
bar and the majority is moved above. The RID Pool below the 2 GB bar stores the RID
maps which are small in number, and the RID pool above the 2 GB bar contains the RID
lists which comprise the bulk of the RID pool storage.
Because of the changes, there are some slight modifications in estimating the size for the
RID pool. The same size RIDMAPs would have held half as many RIDLISTs. The RIDMAP
size is doubled to accommodate the same number of 8 byte RIDLISTs, and each RIDLIST
now holds twice as many RIDs. Each RIDBLOCK is now 32 KB in size.
Here is the new RIDPOOL calculation:
• Each RIDMAP contains over 4000 RIDLISTs.
• Each RIDLIST contains 6400 RID entries.
• Each RIDMAP/RIDLIST combination can then contain over 26 million RIDs, versus
roughly 13 million in previous DB2 versions.
1-48 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
Compression Dictionaries - Changes
Can occupy 64K of storage per data set - with
many open compressed data sets, they take up
to 500 MB in some installations
What was changed:
The compression dictionary will be loaded above the
bar after it is built
All references to the dictionary now use 64-bit Compression dictionary for a
pointers table space / partition in DB2
Use standard 64-bit hardware compression Version 8
assembler instructions
Main users:
Data Manager 2 GB BAR
Utilities
Buffer Manager (finds dictionary at OPEN time )
Standalone Utilities Compression dictionary for a
Still load dictionary below the bar table space / partition in DB2
Version 7
Storage need for compression dictionaries may
increase in V8
DSMAX increased to allow 100,000
Requires z/OS 1.5
4096 partitions
Notes:
The compression dictionary for a compressed table space is loaded into virtual storage for
each compressed table space or partition as it is opened. Even though it is not accessed
frequently, it occupies a good chunk of storage while the data set is open. A compression
dictionary can occupy up to 64 KB bytes of storage per data set (sixteen 4 KB pages). For
some customers, those who have a large number of compressed table spaces, the
compression dictionaries can use up as much as 500 megabytes. Therefore, moving the
dictionary above the 2 GB bar provides significant storage relief for many customers.
DB2 V8 can further increase the compression dictionary storage requirement for some
systems, as V8 also implements support for 4096 partitions for a single table; if they are all
compressed and open, you have 4096 compression dictionaries in memory. This was
another driver for moving compression dictionaries above the 2 GB bar.
The compression dictionary is loaded above the bar after it is built. All references to the
dictionary now use 64-bit pointers. Compression uses standard 64-bit hardware
compression instructions. Standalone utilities still load the dictionary below the bar.
External
No change to installation panels Thread
Control structures - Pointers are doubled so expect related
that sort tree size doubles (64-bit pointers) storage
Sort data buffers will NOT change significantly
Notes:
Sorting requires a large amount of virtual storage, as there can be multiple copies of the
data being sorted at a given time. Two kinds of storage pools are used for DB2 sort (also
known as RDS sort) to store various control structures and data records. One pool is a
thread-related local storage pool, and the other is a global sort pool.
To take advantage of the 64-bit addressability for a larger storage pool, some high level
sort control structures remain in thread-related storage below the 2 GB bar, but these
structures contain 64-bit pointers to areas in the global sort pool above the 2 GB bar. The
sort pool above the 2 GB bar contains sort tree nodes and data buffers.
1-50 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
EDM Pool - DBDs/OBDs/DCS - Changes
(Global) dynamic statement cache
In V7, if "cache dynamic" is on, statements are
cached in data space or EDM pool
In V8 "cache dynamic" statements are ALWAYS DBDs
cached in the dynamic statement cache storage
pool above 2 GB bar
New EDM DBD cache created above 2 GB OBD OBD OBD
Will give DBDs space to grow and relieve OBD OBD OBD
contention with other objects 64-bit OBD
OBD OBD OBD pointers
External - Installation Panel - DSNITIPC
New
DBD cache size (EDMDBDC)
Change
2 GB BAR
Dynamic statement cache
EDMDSPAC to EDMSTMTC storage above
2 GB BAR instead of in a data space or EDM
pool
Dynamic statement cache always allocated
Remove
EDMPOOL DATA SPACE MAX (EDMDSMAX) DBD
Potential reduction of EDMPOOL parameter value Hash
Since (potentially) stmt cache and DBD removed chains
Notes:
In V8, the EDM pool is always split into three parts:
• Storage for the global dynamic statement cache
• Storage for DBDs
• Storage for plans and packages (SKCT, CT, SKPT, and PTs)
It is worth noting that in V8, storage for the dynamic statement cache is always allocated (at
least 5 MB), and is no longer related to whether CACHEDYN is YES or NO. This is
because the CACHEDYN DSNZPARM is online changeable in V8, and DB2 needs an
initial size to be allocated at startup time to allow the size to be changed online.
1-52 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
Other Virtual Storage Enhancements
LOBs
When LOB materialization is required, V8 uses storage above the bar instead
of data space
LOBVALA (per user) and LOBVALS (total system) remain valid
IPCS DB2 and IRLM dump formatting support 64-bit addressing
Number of open data sets has increased to 100 000 with V8 and
z/OS V1.5
Notes:
It is worth noting that other storage areas moved above the bar as well, such as accounting
blocks, lock trace, buffer trace, and so on, some of which are at the thread level, and can
therefore still represent a considerable amount of storage in systems with a lot of
concurrent thread activity.
In addition, LOBs are no longer materialized in a data space in V8, but above the 2 GB bar
inside the DBM1 address space, as described next.
LOB Data
When LOBs need to be materialized in V7, DB2 does so using data spaces. In V8, storage
above the 2 GB bar inside the DBM1 address space is used instead. Storage allocation is
limited by the system parameters previously used for materializing LOBs in data spaces:
• LOBVALA (the size per user):
The default is 2048 KB, the limit value is 2097152 KB.
1-54 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
IRLM V2.2 64-bit IRLM for DB2 V8
IRLM V2.2 is 64-bit application
Ships both 64-bit and 31-bit version (64-bit version requires z/OS 1.3)
Notes:
IRLM V2.2 is a 64-bit application. Actually, IRLM V2.2 ships with both a 31-bit version of the
modules, as well as a 64-bit capable version. If the operating system is able to handle
64-bit, the 64-bit version are loaded. The 64-bit version requires the z/Architecture as well
as z/OS 1.3. DB2 V8 requires IRLM V2.2 in 64-bit mode.
With IRLM V2.2, locks always reside above the 2 GB bar. This allows IRLM to manage
many more locks than was the case in previous releases. IRLM V2.2 can manage up to
100 million locks, around 16 times more than V2.1. Therefore, you can increase NUMLKTS
(maximum number of locks per table space before lock escalation occurs) or NUMLKUS
(number of lock per user before a resource unavailable occurs) up to 100 million. Here the
same rules apply as for the storage areas that move above the 2 GB bar in DBM1. Make
sure there is enough real storage to back this up.
Locks can no longer reside in ECSA. Therefore, the PC=NO parameter in the IRLM
procedure is no longer honored. PC=YES is always used. This also means that the
MAXCSA parameter no longer applies, as it was related to ECSA storage usage. You can
still specify both PC=NO and MAXCSA= for compatibility reasons, but they are ignored by
IRLM.
The fact that IRLM locks no longer reside in ECSA (when you were using PC=NO in V7)
also means that a considerable amount of ECSA storage, which used to contain the IRLM
locks, is now freed up in V8, and can be used by other subsystems (such as WebSphere)
and other applications.
IRLM V2.2 has two additional parameters that can be specified in the IRLM startup
procedure. These parameters are:
• MLMT - Max storage for locks:
This specifies, in megabytes, gigabytes, terabytes, or petabytes, the maximum amount
of private storage available above the 2 GB bar that the IRLM for this DB2 uses for its
lock control block structures. The IRLM address space procedure uses this parameter
(which you specify on the IRLM startup proc EXEC statement) to set the z/OS
MEMLIMIT value for the address space. Ensure that you set this value high enough so
that IRLM does not reach the limit. Note that the value you choose should take into
account the amount of space for possible retained locks as well. IRLM only gets storage
as it needs it, so you can choose a large value without any immediate effects. IRLM
monitors the amount of private storage used for locks. If the specified limit is reached,
new lock requests will be rejected unless they are “must complete”.
After the PTF for APAR PQ87611 is applied (still open at the time of writing of this
publication), you can dynamically change the amount of storage used for locks above
the bar by using the z/OS command:
MODIFY irlmproc,SET,PVT=nnnn
Here, nnnn specifies the upper limit of private storage that is used for locks. You can
specify the number in nnnnM (MB) or nnnnG (GB). If neither 'M' nor 'G' is specified, the
default is 'M' (as in IRLM 2.1). The PVT= parameter will control private storage above
the bar, and is tied to MEMLIMIT. A SET,PVT= command will cause the MEMLIMIT to
be updated; never lower than the default 2G, and never lower than the amount of
storage in use, plus the 10% of reserved space.
Before this PTF, you can use the MLMT startup to control the maximum amount of
storage used above the bar (MEMLIM) by IRLM.
(Even before PQ87611, you can use the SET,PVT= command. However, it does not
control the storage above the bar, but only the extended private area below the bar, and
above the line, as with IRLM V2.1.)
• PGPROT - Page protect:
Acceptable values are NO and YES (default). The page protect IRLM startup procedure
parameter specifies whether IRLM loads its modules that reside in common storage into
page-protected storage.
YES indicates that modules located in common storage are to be loaded into
page-protected storage to prevent programs from overlaying the instructions. YES is
1-56 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty recommended because it requires no additional overhead after the modules are loaded,
and the protection can prevent code-overlay failures.
NO indicates that common storage modules are to be loaded into CSA or ECSA without
first page protecting that memory.
Immediate Benefits
Simplified buffer pool monitoring and tuning
Only ONE type of buffer pool
Buffer pool size limits increased
Eases the worry over monitoring once the system stabilizes
DBM1 VSTOR constraint relief
Increase CTHREAD - ECSA allocation may need increasing if CTHREAD
raised
Expect a single DB2 subsystem to run larger workloads
Could defer going to data sharing
Consolidating members in a group
Expected increases in workload
Maximum number of prefetch, deferred write, and castout engines is now
600 in order to decrease "engine not available" conditions
IFCID's 217 and 225 reflect DBM1 VSTOR usage above and below
2 GB bar
Notes:
With DB2 V8’s exploitation of 64-bit virtual storage, you can benefit from the following
capabilities (even in V8 compatibility mode):
• Buffer pool monitoring and tuning becomes simpler:
- Hiperpools and data space buffer pools are eliminated, thus reducing complexity.
There is now only one type of buffer pool. Dynamic statement cache and LOB data
spaces have also been eliminated.
- Buffer pool size limits are increased, therefore buffer pool storage does not need to
be as tightly monitored and controlled, especially in cases where there is a large
amount of real memory available on the machine.
- The ssnmDBM1 virtual storage constraints are no longer a key consideration in
determining the optimum sizes for buffer pools.
• This may allow installations to increase the number of current active threads
(CTHREAD). ECSA allocation may need to be increased if CTHREAD is raised.
1-58 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty • A single DB2 subsystem is able to run larger workloads. This may cause some
installations to defer going to a data sharing environment for capacity reasons (since
data sharing is still required for the highest scalability and availability), or to consolidate
the data sharing group to fewer members.
• To handle the expected increases in workload, the maximum number of deferred write
and castout engines are increased from 300 to 600 in order to decrease engine not
available conditions.
Note: DB2 for z/OS V8 always uses 64-bit addressing, independent of the mode it is
running in (compatibility mode, enabling-new-function mode, or new-function mode).
You can use IFCIDs 0217 and 0225 to monitor ssnmDBM1 virtual storage usage above
and below 2 GB.
z/OS provides a new MEMLIMIT JCL keyword which controls how much virtual storage
above the 2 GB bar is available in each address space.
Virtual storage information is collected in SMF by RMF in record type 78-2. RMF can
produce:
• Common storage summary and detail reports
• Private area summary and detail reports
The reports are requested as follows:
• Specify S, either explicitly or by default, and RMF produces summary reports
• Specify D, and RMF produces both summary reports and detail reports
These are the available options:
• REPORTS(VSTOR(D)):
This produces a summary and detail report for common storage.
• REPORTS(VSTOR(D,xxxxDBM1)):
This produces a summary and detail report for common storage and a summary and
detail report for the private area of the xxxxDBM1 address space.
• REPORTS(VSTOR(MYJOB)):
This produces a summary report for common storage and a summary report for the
private area of the MYJOB address space.
More information on setting up and monitoring 64-bit is contained in the technical bulletin,
z/OS Performance: Managing Processor Storage in an all “Real” Environment, available
from:
http://www.ibm.com/support/techdocs
AMODE Considerations
DB2 and IRLM have been enhanced to run with AMODE(64) where
appropriate
DBM1 address space now uses AMODE(64) to access the data above the bar
IRLM 2.2 uses AMODE(64) to access the lock structures
Stored procedures support
AMODE(64) stored procedures not supported
AMODE(64) callers (your applications) not supported
This also means there is NO need to change your existing applications to
benefit from DB2's effort to go 64-bit
Exits called in 31-bit mode
Notes:
DB2 has been enhanced to allow the DBM1 address space to use AMODE(64) to access
the data above the bar. The same is true for IRLM 2.2. IRLM now uses AMODE(64) to
access the lock structures that are now allocated above the 2 GB bar.
Note, however, that your applications as well as stored procedures are not allowed to run in
AMODE(64). This is actually good news as it means that there is NO need to change your
existing applications to benefit from DB2's effort to go 64-bit. In addition, exits are still
called in 31-bit mode, so there is no need to change those to be able to co-exist with DB2
running in 64-bit mode.
1-60 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
DB2 64-bit Summary
Buffer pool
64-bit AS
RID pool
31-bit AS
Sort pool
BP ctl blks
Compr dict
Castout buf
Notes:
In summary, the figure above shows that only the DBM1 address space and IRLM address
space exploit the 64-bit virtual storage capabilities, as these are the two address spaces
that suffer the most from virtual storage constraints in DB2 Version 7.
Other DB2 address spaces, such as the master address space (MSTR) and the distributed
data facility (DDF) or DIST address space, remain in 31-bit.
User programs as well as stored procedures have to be in 31-bit mode. As mentioned
before, this is not a problem. On the contrary, it means that no application changes are
required to exploit DB2 V8 functionality.
1-62 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Notes:
DB2 Version 8 allows you to create a partitioned table space with up to 4096 partitions, a
drastic increase from today’s 255.
In this topic, we explore the reasons for doing so, the impact on data set names, DB2
command syntax, and the output of those DB2 commands.
1-64 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
Requirement for 4096 Partitions
Installations require more than the current maximum of 254
partitions
For more granular segments of work, for example, to have partitions
for each day for 11 years (that is, needs 4026 partitions)
More partitions enable smaller partition data set sizes
==> Easier to manage
Maximum number of partitions raised from 254 to 4096
Table spaces and indexes
Table space must have LARGE or DSSIZE specified to go beyond
254 parts
Maximum table size remains 16 TB for 4 KB pages
Can exceed 16 terabytes for a single table with larger page sizes
Notes:
Customers have a need for more partitions in their partitioned table spaces for certain
types of applications. One example of these applications is the collection and retrieval of
daily data, which, of course, means 365 partitions (or 366 for a leap year). If the customer
wants to keep 11 years worth of daily data in separate daily partitions, that would be 4026
partitions. Another example is keeping weekly data in partitions; if a customer wants to
keep 10 or 20 years worth of weekly data, that would be 520 or 1040 partitions.
Customers also want to have their large table spaces spread out over many partitions to
reduce the size of their partition data sets. For a 16 terabyte table, the maximum data set
size allowed for a 4 KB page size is 64 gigabytes and that may be too unwieldy to manage.
More partitions allow us to reduce the data set size, by making the data more granular and
manageable (for example, having daily partitions instead of weekly partitions.)
With DB2 V7, the maximum number of partitions in a partitioned table space and index
space is 254. With DB2 V8 the new maximum number of partitions is 4096 for partitioned
table spaces. The actual maximum number of partitions that you can specify is dependent
on the page size and the LARGE (not recommended) or DSSIZE (preferred) parameter
(see Figure 1-42 "Table Space Size and Number of Partitions" on page 1-69 for details).
4096 partitions will help the applications mentioned above and eliminate the need for
work-around solutions. You can define new partitioned table spaces with the new value of
partitions, or you can use online schema changes (see Figure 2-66 "Partition Management"
on page 2-99) to apply the new limits to existing partitioned table spaces.
The CREATE TABLESPACE statement now allows you to specify up to 4096 partitions in
the NUMPARTS clause for a partitioned table space.
1-66 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
Maximum Number of Partitions
Maximum number of PARTITIONs depends on the DSSIZE and the page
size in the CREATE TABLESPACE statement
As in DB2 V7, use DSSIZE when creating a table space for partitions of
4 GB and larger
LARGE clause is for compatibility of previous release to identify each partition
of a partitioned table space that has a maximum size of 4 GB
Use DSSIZE clause as a preferred method instead
DB2 creates default DSSIZE if LARGE or
DSSIZE keyword is omitted and
NUMPARTS > 254 Page Default
(governed by the page size value) Size DSSIZE
4 KB 4 GB
8 KB 8 GB
16 KB 16 GB
32 KB 32 GB
Notes:
The maximum number allowed in the NUMPARTS keyword is dependent on the page size
and DSSIZE specified for the table space; see Figure 1-42 "Table Space Size and Number
of Partitions" on page 1-69 for details. If the DSSIZE (or LARGE) keywords are not
specified, and NUMPARTS is greater than 254, a default data set size is given depending
on the page size value; for 4 KB page size, DSSIZE default is 4 GB; for 8 KB page size,
DSSIZE default is 8 GB; for 16 KB page size, DSSIZE default is 16 GB; and for 32 KB page
size, DSSIZE default is 32 GB.
Another V8 enhancement is that you can add partitions to an existing partitioned table
space. This way you do not have to create your partitioned table space with 4096 partitions
from day one. You can define the object with the number of partitions that you need today,
and add new ones when required. For more information, see Figure 2-67 "Adding a Table
Partition" on page 2-101.
When adding partitions to an existing table space, the maximum number of partitions
allowed depends on how the table space was originally created. If DSSIZE was specified
when the table space was created, it is non-zero in the catalog. The maximum number of
partitions allowed is shown in Table 1-1.
Table 1-1 Maximum number of partitions allowed when DSSIZE >0
DSSIZE Page size
4 KB 8 KB 16 KB 32 KB
1-4 GB 4096 4096 4096 4096
8 GB 2048 4096 4096 4096
16 GB 1024 2048 4096 4096
32 GB 512 1024 2048 4096
64 GB 256 512 1024 2048
If DSSIZE = 0, the maximum numbers of partitions allowed is shown in Table 1-2. If LARGE
was specified when the table space was created, the maximum number of partitions is
shown in the fourth row of Table 1-2. For more than 254 partitions when LARGE or DSSIZE
is not specified, the maximum number of partitions is determined by the page size of the
table space.
Table 1-2 Maximum number of partitions allowed when DSSIZE =0
Type of table space Existing number Maximum number
of partitions of partitions
Non-large 1 - 16 16
Non-large 17 - 32 32
Non-large 33 - 64 64
Large N/A 4094
1-68 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
Table Space Size and Number of Partitions
Max # of Total
Type of Rid partitions Page Size DSSIZE TS size
5-byte(non-EA) large 4096 4 KB (4 GB) 16 TB
5-byte EA 4096 4 KB 1 GB 4 TB
5-byte EA 4096 4 KB 4 GB 16 TB
5-byte EA 2048 4 KB 8 GB 16 TB
5-byte EA 1024 4 KB 16 GB 16TB
5-byte EA 512 4 KB 32 GB 16 TB
5-byte EA 256 4 KB 64 GB 16 TB
5-byte EA 4096 8 KB 1 GB 4 TB
5-byte EA 4096 8 KB 8 GB 32 TB
5-byte EA 2048 8 KB 16 GB 32 TB
5-byte EA 4096 16 KB 16 GB 64 TB
5-byte EA 2048 16 KB 32 GB 64 TB
5-byte EA 4096 32 KB 32 GB 128 TB
5-byte EA 2048 32 KB 64 GB 128 TB
Notes:
The figure above shows the maximum size of a partitioned table space in DB2 Version 8.
Note that the table space size is dependent on the number of partitions, the page size, and
the DSSIZE.
The maximum size of a DB2 table space in Version 8 is 128 TB. However, to be able to
reach this maximum, you must use a 32K page size (either with 4096 partitions and 32 GB
VSAM data sets (DSSIZE), or 2048 partitions and 64 GB VSAM data sets).
The reason for not being able to have a 128 TB table space with a 4K page size is that the
number of pages you can address is limited by the 5-byte RID (4 byte page number, and 1
byte ID map entry).
If you want to use 4096 partitions, you need 12 bits to represents that number of partitions,
which leaves you with 20 bits to address pages (4 bytes * 8 bits/byte - 12). The 20 bits
allow up to 1048576 pages (1 MB).
With a 4KB page size and 4096 partitions, the result is:
4096 (4K page size) * 4094 (partitions) * 1048576 (pages) = 16 TB
This is the maximum you can address with a 4K page size in a single partitioned table
space.
1-70 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Notes:
Currently, DB2 names the DB2 data sets with the convention of 'Axxx' as the last qualifier,
where xxx is the partition number. This naming convention allows for the definition of no
more than 999 partitions. With DB2 V8, a new data set naming convention allows data sets
with partition numbers greater than 999.
The naming scheme for a data set with more than 999 partitions is shown in the example
below.
catname.DSNDBx.dbname.psname.p0001.lnnn
where
p is I or J
lnnn is A001-A999 for partitions 1 through 999
lnnn is B000-B999 for partitions 1000 through 1999
lnnn is C000-C999 for partitions 2000 through 2999
lnnn is D000-D999 for partitions 3000 through 3999
lnnn is E000-E096 for partitions 4000 through 4096
1-72 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
Note: If you use jobs that specify the full DB2 data set names, such as DSN1COPY,
make sure that they take these new data set naming conventions into account when
using more than 999 partitions.
Notes:
The output under column PART in message DSNT397I is the partition number. It is blank
for a simple table space or simple index space. For non-partitioned indexes on a partitioned
table space, it is the logical partition number preceded by the character L, for example
L4096.
For data-partitioned secondary indexes, the prefix for this value is the character D, for
example, D0001.
As you can see from the example above, you can use a combination of partition list and
ranges as argument of the PART keyword for DISPLAY, START, or STOP DATABASE DB2
command.
The STOP DATABASE command on the bottom of the visual above refers to parts 3, 5, 6,
9 to 12, 15, and 19 to 4096.
1-74 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
Display Database Command Support
Uses partition range output if partitions have same status and attributes.
For example, 6 parts of TS486X are stopped. The display output shows
0001 through 0006 with STOP state.
Also reduces output size when 4094 partitions are involved.
DISPLAY DATABASE(DB486B) SPACE(*)
Notes:
The DISPLAY DATABASE command has been enhanced to support this large number of
partitions. In order not to flood you with output (4096 data partitions, 4096 parts for each
partitioned index, and 1 or more entries for each non-partitioned index), the DISPLAY
command uses “partition range output”. This means that it only shows the start and end
partition, index name, when all the partitions in that range have the same status and
attributes. For example in the visual above, 6 parts of TS486X are stopped. The display
output shows 0001 through 0006 with STOP state, only two lines instead of six.
Notes:
The OVERVIEW keyword of DISPLAY DATABASE displays each object in the database on
its own line, providing an easy way to see all objects in the database.
OVERVIEW limits the display to only the space names and space types that exist in the
specified databases. The number of parts is displayed for any partitioned table space. This
keyword is very helpful in case of a large amount of partitions for one or more page sets in
your database.
The OVERVIEW keyword cannot be specified with any other keywords except
SPACENAM, LIMIT, and AFTER.
In the example shown on the visual above, you can easily see that table space TS486A
consists of four partitions. Page set IX486B is a partitioned index, whereas IX486A is a
secondary index with four logical partitions.
Table space TS486C is a simple or segmented table space.
1-76 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
Some Considerations
You can alter past 254 partitions with ALTER TABLE and ADD
PARTITION clause, up to the maximum allowed based on table
space parameters
More partitions =>
More open data sets
More compression dictionaries
More virtual and real storage
More space in catalog and directory
Larger DBDs
Notes:
The size of catalog objects SYSDBASE, SYSCOPY, SYSSTATS, SYSTABLEPART_HIST,
SYSINDEXPART_HIST, SYSTABSTATS_HIST, and SYSINDEXSTATS_HIST is greatly
increased, as are directory objects DBD01, SYSUTILX, and SYSLGRNX if a large number
of partitions are created or added to table spaces. These tables should be sized correctly
so that there is no need to resize them too often in the future.
A good solution for customers who want daily or weekly granular segments is to start with a
minimum number of partitions when creating the table space. The customer can then add
more partitions using the ALTER ADD PARTITION statement provided with Online Schema
Evolution (see Figure 2-66 "Partition Management" on page 2-99).
For example, if a customer wants daily data segments, the table space can be created with
365 partitions and then partitions can be added later for subsequent years.
Customers that have existing partitioned table spaces with 254 partitions can use online
schema evolution to add more partitions to their table spaces.
Be aware that with LOBs, there is one LOB table space, one auxiliary table, and one
auxiliary index per partition per LOB column. A single database can hold a maximum of
65,535 objects. Therefore, if a table with 4096 partitions has a LOB column, there is a need
to create 12,288 objects (4096 LOB table spaces, 4096 auxiliary tables, 4096 auxiliary
indexes), so only 5 LOB columns could be defined on a 4096 partition table space.
1-78 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
More Active Log Data Sets
Maximum size of a single DB2 active log data set is 4 GB
V8 increases the number of active log data sets from 31 to 93 per
LOGCOPY
Requires BSDS conversion
Must be in NFM
Convert BSDS with new DSNJCNVB job
Notes:
Even with the maximum size of each active log data set size of 4 GB minus 1 CI (this
increase was made available via the PTF for APAR PQ48126 to DB2 V6 and V7, and is
now in the base code of V8), large DB2 systems need more log data available in the active
log data sets, because it reduces the chances of DB2 requiring archive log data sets for an
extended rollback or for media recovery. Since active log read is generally faster than
archive log read, queueing for archive log tape volumes would be virtually eliminated.
DB2 V8 increases the maximum number of active log data sets from 31 to 93 per log copy.
Increasing the maximum number of active log data sets (and archive log volumes,
discussed in the next topic) requires a conversion of the BSDS to allow more data set
entries. To do the conversion, the user runs a new BSDS conversion utility job DSNJCNVB.
In order to minimize the fallback and data sharing co-existence impact of this change, your
current DB2 system must be in V8 new-function node (NFM) before you can convert your
BSDS to support the new maximum values. BSDS conversion is optional in DB2 V8, but
recommended.
DB2 install job DSNTIJIN automatically provides a larger BSDS definition (space
allocation) during a new installation; however, you must still convert the BSDS by running
the new conversion utility job DSNJCNVB once DB2 operates in NFM. When migrating to
V8, you should follow the documented pre-conversion procedure to manually redefine a
larger BSDS before converting it with DSNJCNVB.
1-80 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
Increased Maximum Archive Log Data Sets
Notes:
With the explosion of e-business and the extremely high transaction volumes that large
customers are processing today, customers are finding that the current maximum of 1,000
archive log volumes (per log copy) recorded in the BSDS is no longer sufficient to remain
recoverable without having to take frequent image copies. Even with the maximum size of
each log data set, active or archive, now increased to 4 GB minus 1 CI, large DB2 systems
are creating so many archive log data sets that the 1,000 archive log data set maximum
only allows them to register a few days of log data in the BSDS.
DB2 V8 increases the maximum number of archive log data sets recorded in the BSDS
from 1,000 data sets per log copy to 10,000 data sets.
Prior to running the conversion utility, you need to do the following steps to allocate a larger
BSDS:
1. Rename existing BSDS data sets to save the original in case the conversion fails.
2. Allocate larger BSDS using the original BSDS name. You can use the VSAM DEFINE
statements in installation job DSNTIJIN for this task.
3. Copy the original data set to a new, larger data set; VSAM REPRO is recommended.
4. Repeat for the second copy dual BSDSs.
See also Figure 1-48 "More Active Log Data Sets" on page 1-79 and Figure 1-49
"Increased Maximum Archive Log Data Sets" on page 1-81.
1-82 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Notes:
Up until V7 you can only have 15 tables in the FROM clause of your SQL statements (that
are non-star join queries). This restriction has been in place since DB2 V1. In V8 this limit is
increased to 225.
This enhancement increases the usability and power of SQL. Lifting this restriction is very
important to ERP and CRM applications. These applications typically use a highly
normalized design with hundreds of tables. Therefore, even fairly simple queries require a
lot of tables that need to be joined, easily more than 15, the current restriction. With V8, you
can now specify up to 225 tables in the FROM clause of your SQL statements.
1-84 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
Complex Joins - Up to 225 Tables (2 of 2)
Before DB2 Version 8, the number of tables that can be joined in the
FROM clause is limited to 15:
The limit is set to prevent the possibility that a large query causes the DB2
Optimizer to consume huge amounts of resources (storage, CPU, ET)
resulting in critical storage shortage and possible system crash
APS uses dynamic programming. The number of join combinations, when
determining the best access path, grows exponentially
Can use 'hidden' ZPARM (SPRMMXT) to increase the limit (at your own risk)
Limit does not apply to star join queries
With DB2 Version 8:
Optimization enhancements enable support for joins of up to 225 tables
Code added to recognize common query patterns (that is, Star Schema),
which enables more efficient optimization of very large joins
Queries that do not fit the Star Schema pattern, but join a large number of
tables, could still run into problems
DB2 has put in place thresholds to force the optimization process to
complete quickly
Notes:
The number of tables that can be specified in the FROM clause of an SQL statement is
limited to 15, for DB2 versions prior to V8. Many customers need to run queries that join
more than 15 tables in their ERP or CRM applications, typically designed with large
numbers of tables to be joined. When you exceed this limit, you receive an SQLCODE
-129.
Queries that qualify for star join processing already allow up to 225 tables in the FROM
clause. To get around the 15 table limit, customers can evaluate the use of a “hidden”
DSNZPARM (SPRMMXT - MXTBJOIN) so that their queries can run. This parameter is
hidden because, in general, there is a need for extra storage and processor time when
dealing with these complex queries.
The default limit on the number of tables joined has stayed at 15 for a long time because a
large query could cause DB2 to consume a lot of resources (storage and CPU) when
evaluating the cost of each possible join sequence. This in turn can cause critical storage
shortages and have a negative impact on the DB2 subsystem. See APARs PQ31326,
PQ28813, and PQ57516 for more details.
Note that this limitation applies to the number of tables in the FROM clause. In V7, you can
already use up to 225 tables throughout the entire set of SQL statements (for example,
including subselects).
In DB2 V8, the default limit is changed from 15 to 225 tables to be joined. This means that
users can more easily join more than 15 tables. It also means that DB2 can join this many
tables without restriction.
A number of enhancements have been implemented in DB2 V8 to reduce the amount of
resources needed for the optimization process. This allows you to join more tables using
less resources. A new functionality can recognize common query patterns (like star
schema) and optimize large joins very efficiently.
These improvements, while reducing the risk of running into resource shortages, do not by
themselves eliminate the risk. Queries that do not fit the star schema pattern, but join a
large number of tables, could still run into problems, even in DB2 V8.
1-86 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
Considerations for Optimization Thresholds
How does it work?
When the optimization process gets within a percentage of any threshold
(that is, 80%), curbs are activated to speed up optimization
If the threshold value is exceeded, curbs force optimization to complete
very quickly
Thresholds expressed in terms of storage (number of MB), CPU
(number of seconds), and ET (elapsed time)
To avoid regressing existing queries, thresholds only applied if more than
15 tables joined
'Hidden' zparm provided to allow thresholds to apply to queries with 15 or
less tables (TABLES_JOINED_THRESHOLD)
Notes:
To address this problem, DB2 V8 has enhanced the monitoring of how much storage and
CPU is being consumed by the optimization process. If it exceeds certain thresholds, then
curbs are put in place to force the optimization process to complete quickly. When
excessive resources have been consumed by the optimization process, the goal changes
— from selecting the “optimal” plan, to selecting a “reasonable” plan, in a minimal amount
of time.
The resource threshold used is expressed in terms of storage (number of megabytes),
CPU (number of seconds), and elapsed time (also in number of seconds). The thresholds
are large enough so that most existing queries are not impacted, but small enough so that
they prevent severe resource shortages.
To guard against regressing existing queries, the threshold is only applied when the
number of tables joined is greater than 15 (the limit prior to DB2 V8). This way, only
customers that were using the “hidden” ZPARM to run queries with >15 tables may see any
change to their existing workload.
Affected Interfaces
New 'hidden' keyword zparms -- all changeable online
MAX_OPT_STOR
Maximum amount of RDS OP POOL storage to be consumed by Optimizer
Default is 20 MB
Values range from 0-100 MB (if 0, then default is used)
MAX_OPT_CPU
Maximum amount of CPU time to be consumed by Optimizer
Default is 100 sec
Values range from 0 to 500 sec (if 0, then default is used)
MAX_OPT_ELAP
Maximum amount of elapsed time to be consumed by Optimizer
Default is 100 sec
Values range from 0 to 1000 sec (if 0, then default is used)
TABLES_JOINED_THRESHOLD
Number of tables joined to cause DB2 to limit amount of resources consumed by
Optimizer
Default is 16
Values range from 0 to 225 (if 0, threshold never applied) (if 1, threshold always
applied
Notes:
These parameters change the behavior of the optimizer when it is evaluating SQL
statements with more than 15 tables (TABLES_JOINED_THRESHOLD DSNZPARM) in the
SQL statement.
They can be overridden by coding these values in the DSN6SPRM macro specification and
re-assembling and re-linking DSNZPARM. Please do not change these default values
unless directed to do so by IBM.
1-88 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Benefits .....
DB2 Family compatibility
Required to support long names, 4096 partitions and UNICODE
Important for SQL Procedure Language applications and SQL statements
created by generators
SQL Procedure must be completely stated in a single SQL statement --
limit prior to Version 8 was 32K
Notes:
Ever since DB2 V1, an SQL statement has been limited to 32 KB. This is normally not a
problem, but with the support of longer names for most DB2 objects, 4096 partitions, and
especially SQL procedures, the maximum of 32 KB can become a problem. In DB2 Version
8, this restriction is lifted, as is the case with so many other V7 limitations. In DB2 Version
8, your SQL statements can be up to 2 MB.
When using embedded static SQL, this means that you can now code a 2 MB statement
between your EXEC SQL and END-EXEC, in your COBOL program, for example.
On the other hand, when using dynamic SQL, these long statements are passed to DB2 as
a CLOB or DBCLOB, because a “normal” character string can only be up to 32 KB.
1-90 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
Dynamic SQL Statements
SQL statements passed to DB2 on PREPARE/EXECUTE
IMMEDIATE statements now passed in CLOBs and DBCLOBs
Changes to PREPARE/EXECUTE IMMEDIATE
Host variable can now be specified as CLOB and DBCLOB
Maximum length of SQL statement contained in CLOB or DBCLOB
2,097,152 bytes for CLOB
1,048,574 double-byte characters for DBCLOB
Maximum length of contained statement If host variable VARCHAR and
VARGRAPHIC
32,767 bytes for VARCHAR
16,383 double-byte characters for VARGRAPHIC
NOTE: Since SQL statements are broken up into pieces and stored in
succeeding records with a sequence number to keep them in order,
there is no change in the catalog to store a 2 MB statement
Notes:
As mentioned in the previous topic, SQL statements passed to DB2 via
PREPARE/EXECUTE IMMEDIATE statements have to be passed in CLOBs and
DBCLOBs, when you want to be able to use statements that are longer than 32 KB.
The maximum length of an SQL statement when the host variable is a VARCHAR is 32,767
bytes, and 16,383 double-byte characters for a VARGRAPHIC.
The interface to PREPARE/EXECUTE IMMEDIATE has been enhanced in V8. With this
enhancement, PREPARE/EXECUTE IMMEDIATE will accept host variables that are
specified as CLOB and DBCLOB.
The maximum length of an SQL statement contained in a CLOB is 2,097,152 bytes, and
1,048,574 double-byte characters for a DBCLOB.
When the SQL statements are stored in the DB2 catalog for static SQL, they are broken
into pieces, and stored in succeeding records (with a sequence number, to allow you to
retrieve them in the correct order). Therefore, no catalog changes are required to store
SQL statements up to 2 MB in the DB2 catalog.
main() {
Notes:
The example in the figure above shows how to execute a “long” UPDATE statement. In the
example, the statement does not really need a CLOB, as it is only 44 bytes, but it is just
given to show you how to use a CLOB. The CLOB in our example can be up to 100K (the
maximum being 2M). Therefore the statement string :string1 that we pass to the EXECUTE
IMMEDIATE can be up to 100 KB long.
Note that the EXECUTE IMMEDIATE is the same as in previous versions of DB2. The only
difference is that it also accepts a CLOB or BLOB in V8.
1-92 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
PREPARE Example
IDENTIFICATION DIVISION.
DATA DIVISION.
...............
WORKING-STORAGE SECTION.
01 USTRING SQL TYPE IS DBCLOB(400K).
01 ATTRSTG.
49 ATTR-LEN PIC S9(4) USAGE BINARY.
49 ATTR-DATA PIC X(1000).
.................
PROCEDURE DIVISION.
.................
MOVE " long sql statement in UNICODE(UTF-16) " TO USTRING-DATA.
MOVE <length of sql statement> TO USTRING-LENGTH.
MOVE "INSENSITIVE SCROLL WITH CS " TO ATTR-DATA.
MOVE 27 TO ATTR-LEN.
...................
EXEC SQL EXECUTE PREPARE ATTRIBUTES :ATTRSTG FROM :USTRING.
Notes:
When preparing SQL statements that are bigger than 32 KB, you must use CLOBs or
DBCLOBs. The visual above shows how to PREPARE an SQL statement using a DBCLOB
in a COBOL program.
Note that the precompiler generates a structure containing two elements, four byte length
field, and data field of specified length, for each CLOB or DBCLOB variable declared. The
names of these fields vary depending on the host language used:
• For COBOL, they are variable-LENGTH and variable-DATA.
• For C, they are variable.length and variable.data.
• For PL/I, Assembler and FORTRAN, they are variable_LENGTH and variable_DATA.
See the “Declaring LOB host variables and LOB locators” section in DB2 Application
Programming and SQL Guide, SC18-7415, for more details. An example of a COBOL
DBCLOB declaration is shown below.
01 USTRING.
02 USTRING-LENGTH PIC S9(9) COMP.
02 USTRING-DATA.
49 FILLER PIC G(32767) USAGE DISPLAY-1.
49 FILLER PIC G(32767) USAGE DISPLAY-1.
49 FILLER PIC G(32767) USAGE DISPLAY-1.
49 FILLER PIC G(32767) USAGE DISPLAY-1.
49 FILLER PIC G(32767) USAGE DISPLAY-1.
49 FILLER PIC G(32767) USAGE DISPLAY-1.
49 FILLER PIC G(32767) USAGE DISPLAY-1.
49 FILLER PIC G(32767) USAGE DISPLAY-1.
49 FILLER PIC G(32767) USAGE DISPLAY-1.
49 FILLER PIC G(32767) USAGE DISPLAY-1.
49 FILLER PIC G(32767) USAGE DISPLAY-1.
49 FILLER PIC G(32767) USAGE DISPLAY-1.
49 FILLER PIC G(16396) USAGE DISPLAY-1.
Because the COBOL language allows graphic declarations of no more than 32767
double-byte characters, for DBCLOB host variables that are greater than 32767
double-byte characters in length, DB2 creates multiple host language declarations of
32767 or fewer double-byte characters.
Also note that you cannot use a CLOB or DBCLOB for the attributes string. The attribute
string definitions remain the same as in V7.
1-94 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
Other Considerations
Distributed data
Remote support for large SQL statements requires requester and server
support for DRDA V3
DB2 and DB2 Connect V8 adding support for this protocol and can
flow large SQL statements
DB2 for iSeries V5R2 does not provide support for DRDA V3 but
does support long SQL statements
Trace records that contain entire or partial SQL statement
IFCID 0063, 0140, 0141, 0142, 0145, 0168, 0316
Contain complete or partial SQL statement text
New IFCID 350 provides the full SQL statement text
Use IFCID 317 for statements in the statement cache
Notes:
Here we describe some considerations applying to long SQL statements.
Distributed Data
Remote support for large SQL statements requires DRDA requester and server support for
The Open Group DRDA V3 Technical Standard. DB2 for z/OS and DB2 Connect V8 added
support for this protocol and can flow large SQL statements. DB2 for iSeries V5R2 does
not provide support for DRDA V3, but does support long SQL statements. DB2 for Linux,
UNIX, and Windows are adding support for 2 MB SQL statements in their upcoming
release (called Stinger). For more information on Stinger, see:
http://www.ibm.com/software/data/db2/stinger/
In case you want to see the full SQL statement, you have two options:
• If the statement is in the dynamic statement cache, you can use IFCID 317 to retrieve
the full SQL statement through the READS interface of the IFI.
• If the statement is not in the dynamic statement cache, you can use the new IFCID 350.
Unlike IFCID 63, it contains the full SQL statement text.
1-96 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
Long Predicates
Column length for predicate operands is extended to 32704 bytes
from 255 bytes (254 for graphic strings)
DB2 Family compatibility
Change:
Increase the maximum length for predicates to 32704 bytes
Matches the maximum defined size of a VARCHAR/VARGRAPHIC column
The DB2 maximum sort key size has been increased to 16000 bytes
(from 4000 in V7)
Notes:
Prior to V8, the maximum length for predicate operands is 255 bytes and for graphic strings
254 bytes. This is incompatible with the rest of the DB2 family. In DB2 V8 the maximum
length for predicates is increased to 32704 bytes, matching the maximum defined size of a
VARCHAR column.
This support requires no SQL changes. Predicates are supported for both indexable and
non-indexable columns. The maximum length for the pattern expression for LIKE
predicates remains 4000 bytes.
The maximum sort key size has also been increased in DB2 V8 to 16000 bytes. The limit
was 4000 bytes in previous DB2 versions.
Notes:
Prior to V8, the maximum key length is 255 bytes. This sometimes presents us with some
design challenges. With the implementation of Unicode support, multiple bytes may be
required to represent a single character. Existing data that is converted to Unicode or new
data in Unicode can result in index keys longer than 255 bytes.
Since long index keys are supported by DB2 for UNIX, Linux, and Windows, applications
using keys longer than 255 bytes cannot easily be ported to the z/OS platform.
In DB2 V8, the maximum key length is extended from 255 bytes to 2000 bytes.
This support requires no SQL change. The increased key limit of 2000 bytes is only
available in new-function mode. The partitioning key limit of 255 bytes does not change
with this enhancement.
1-98 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
List of Topics
Partitioning and clustering enhancements
Notes:
DB2 for z/OS Version 8 brings a lot of changes that affect availability, keeping up with the
explosive demands of e-business, transaction processing, and business intelligence. DB2
V8 delivers increased application availability with schema evolution support, which permits
schema changes without stopping data access while the changes are implemented. You
can gain greater availability and management through data partitioned secondary indexes,
and minimize partition management for historical data with support for rolling partitions and
adding partitions.
Version 8 also introduces a new technique to back up and recover an entire DB2
subsystem or data sharing group.
DB2 Version 7 allows you to change a number on DSNZPARMs online. In V8, more
DSNZPARMs are online changeable.
2-2 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Availability - Overview (1 of 3)
Partitioning and clustering
Why partitioned table spaces
DB2 V7 and prior
DB2 V7 challenges
DB2 V8 partitioning
Table-controlled partitioning
Classification of indexes
Data-partitioned secondary indexes (DPSIs)
Creating a data-partitioned secondary index
The need for DPSIs
DPSI considerations
Clustering
Notes:
In the first part of this unit, we look at the many enhancements to partitioning and
clustering.
First, we review why and where you want to use partitioned table spaces, and discuss the
challenges in partitioning in DB2 V7 and earlier versions.
Then we introduce the first partitioning enhancement in V8, table-controlled partitioning.
With the changes in partitioning, it is also important that we do a better job of using correct
terminology when talking about indexes on partitioned table spaces. The new index
terminology is introduced in that topic.
We also discuss a brand-new type of index in DB2, the data-partitioned secondary index,
how to define it, the problems it solves, and some design considerations.
Lastly, we describe the clustering enhancements in V8.
2-4 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
Availability - Overview (2 of 3)
Online schema changes
Online schema changes overview
Table data type changes
Index changes
Versioning
Dynamic partitions (partition management)
Other schema changes
New DBET states for online schema changes
Notes:
The second part of this unit is dedicated to the enhancements in DB2 V8 that allow you to
make schema changes without having to drop and recreate objects, so-called online
schema evolution.
First we discuss the type of online table and index changes that are supported in Version 8.
We also discuss the underlying infrastructure that allows this to work, called versioning,
and the impact of online changes on the status of your objects (DBET states).
Then we describe the online changes you can make to partitioned table spaces; adding
partitions on the fly, and rotating partitions.
Availability - Overview (3 of 3)
System level point-in-time recovery
Backing up the system
Restoring the system
Online ZPARMs
Other availability enhancements
Control interval larger than 4 KB
Monitoring system checkpoints and log offload activity
Log monitor long running UR backout
Detect long readers (IFCID 313)
Locking enhancements
Improved LPL recovery
SMART DB2 extent sizes for DB2 managed object
Notes:
The last parts of this unit are dedicated to these topics:
• System level point-in-time recovery is a new feature that allows you to take a
non-disruptive backup of an entire DB2 subsystem or data sharing group, as well as
restoring that entire subsystem or data sharing group to a previous point-in-time.
• DB2 V7 introduced online changeable DSNZPARMs. V8 continues to work its way
down the list to make more and more DSNZPARMs online changeable.
• In the final topics, we describe a number of miscellaneous enhancements that can have
a positive effect on availability, such as:
- VSAM control intervals larger than 4 KB
- Monitoring system checkpoints and logging offload activity
- Log monitoring long running UR backout
- Detecting long readers (IFCID 313)
- Locking enhancements
- Improved LPL recovery
- SMART DB2 extent sizes for DB2 managed object
2-6 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
Why Partitioned Table Spaces
Size
Number of rows
Size of data sets
Size -- history
Performance and availability -- reduced elapsed time for:
Utilities
COPY / RECOVER
REORG
LOAD
Batch
Query / BI / Data Warehouse
Manageability - handle a partition at a time
Notes:
Partitioned table spaces are usually recommended for storing tables of large size. Two of
the reasons for the recommendation deal with availability issues:
• Positive recovery characteristics:
- If the data set backing a physical partition becomes damaged, the data outage is
limited to that partition's data, and only that fraction of the data needs to be
recovered.
- Furthermore, if partitioning is performed along application-meaningful lines, logical
damage (by a wayward application, for example) can be isolated to certain partitions
— again limiting the data outage and recovery scope.
• The potential to divide and conquer:
- The elapsed time to perform certain utilities, or the storage requirement to perform
online REORG against a large table space, may be prohibitively high. Because
utility jobs can be run at the partition level, operations on a table space can be
broken along partition boundaries into jobs of more manageable size.
- The jobs may be run in parallel to accomplish the task in reduced elapsed time, or
serially to limit resource consumed by the task at any one point-in-time.
128 TB
Version 8
Table space
16 TB
Version 6
Table space
4000 TB
per LOB column
1 TB
Version 5
Table space
Notes:
This foil is to scale! As we can see, the size of a table space has dramatically increased
over the past several versions of DB2. From DB2 V4 to DB2 V8, the size of a table space
has increased 2048 times! This rapid growth has made it even more desirable to be able to
partition a table space to make the object more manageable and to make utility work
against such a large object more granular. Note that to get to a 128 TB table space, you
must use a page size of 32 KB.
2-8 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
V7 Partitioned Tables
Creating
Index-controlled partitioning
Logical and physical partitions
NPI challenges
Recovery of NPI is always entire NPI
NPI contention
REORG PART BUILD2 phase
LOAD PART
System affinity in a data sharing environment
Notes:
In this and the following topics, we review how to use partitioned tables in DB2 V7 and
before. We have a quick refresh of:
• How to create a partitioned table, using so-called index-controlled partitioning
• The differences between logical and physical partitions
• The challenges of V7’s non-partitioning indexes, such as non-partitioned index (NPI)
contention
Notes:
In DB2 V7 and prior, in order to create a partitioned table, you created a table space
specifying the NUMPARTS keyword and created the table to be placed in this table space.
The definition of this table space is incomplete at this point and the table is marked as
unavailable until the required partitioning index is defined.
2-10 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
Creating the Partitioning Index
CREATE INDEX index-name ON CUSTOMER
(ACCOUNT_NUMASC) . . .
CLUSTER <= required!!
(
PART 1 VALUES ( 199 ) <= presence of one or
PART 2 VALUES ( 299 ) more PART n VALUES
... clauses indicates a
PART 4 VALUES ( 499 ) partitioning index
)
...
Notes:
In V7, it is required that you define a partitioning index for a partitioned table space in order
to complete the definition of the table space. The partitioning index specifies a partitioning
key that dictates what columns the table is partitioned by, and one PART clause per
partition to specify which rows (key ranges) go into which partition. The partitioning index
also has to be the clustering index, and the CLUSTER keyword is required or the
statement fails. That is, the partitioning index controls how the table is partitioned and how
the table is clustered. This table and index combination is now called index-controlled
partitioning.
Partitioning
Partitioned
Clustering
Partitioning index
Clustering order
IX
301
302
303
304
305
306
402
403
404
405
406
201
202
203
204
205
206
401
101
102
103
104
105
106
TB
401
402
403
404
405
406
301
302
303
304
305
306
201
202
203
204
205
206
101
102
103
104
105
106
Partitioned table
Notes:
In V7 and prior versions, you only have index-controlled partitioning at your disposal to
create a partitioned table. When using index-controlled partitioning, concepts such as
“partitioned”, “partitioning” and “clustering” are intertwined because the index that defines
the columns and key ranges for the different partitions is the partitioning index, is
partitioned (made up of different physical partitions), and is also the clustering index.
2-12 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
V7 - Logical and Physical Partitions
Partitioning and clustering index (on ACCOUNT_NUM) -- both logically and physically
partitioned
IX
101
103
106
102
104
105
201
202
203
204
205
206
301
302
303
304
305
306
401
402
403
404
405
406
TB
303
305
306
301
302
304
201
203
204
206
202
205
101
103
104
106
102
105
402
403
405
401
404
406
Partitioned Table
MO
MN
MD
MA
MS
NC
NH
GA
AR
DE
NY
CT
KY
NJ
AL
LA
FL
MI
IN
IA
IL
Notes:
Prior to DB2 V8, any index that does not specify the PART n VALUES keywords, and is
defined on a partitioned table, is a non-partitioned index (NPI). Prior to DB2 V8, NPIs
cannot be physically partitioned, that is, they cannot have multiple physical partitions. They
can only be allocated across multiple pieces to reduce I/O contention. However, there does
exist a concept of logical partitions. That is, all the keys of an NPI that point to rows of a
physical data partition are considered to be a logical partition of the index.
In this visual, we can see the partitioning index that we defined in earlier visuals. The
partitioning index partitions the table on ACCOUNT_NUM and there are four partitions (four
data partitions and four corresponding index partitions). The other index (at the bottom of
the visual) is an NPI on STATE_CD and is composed of one physical data set. Notice that
logical partitions exist for the NPI, but they are only used by utilities for claim and drain
processing. Note that in V7, this second index is referred to as a non-partitioned index
(NPI) or a secondary index.
Notes:
The positive aspects of partitioning begin to deteriorate if there are non-partitioned indexes
present. More examples follow:
• In a data sharing environment, some customers find benefit from isolating the work on
certain members to certain partitions. Such affinity-routing eliminates intersystem
read-write interest on physical partitions, thereby reducing data sharing overhead.
Affinity routing does not alleviate contention on non-partitioned indexes, since keys that
belong to different data partitions are spread throughout the non-partitioned index.
• Recovery from a media failure on a non-partitioned index can only be done at the entire
index level. No piece-level rebuild or recovery can be done for a non-partitioning index.
• The sheer size of NPIs over very large partitioned tables makes their management as a
single large object difficult. RUNSTATS, REORGs, REBUILDs, etc. take longer clock
time than if they could be run on smaller object parts in parallel.
• Partition-level operations become less clean if there are non-partitioned indexes
present. For example, to erase the data of a partition, you normally LOAD that partition
2-14 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty with an empty input file. This operation quickly “resets” the data partition as well as the
partitioned index, but also entails removing key entries from the NPIs that reference the
partition being erased. For each row being removed from that partition, DB2 has to look
up the key entry for that row in the non-partitioned index and delete it.
V7 - Contention on NPI
REORG
LOAD
LOAD
PART
PART
PART
Partitioned Index
402
403
401
201
405
406
106
205
206
404
203
301
303
304
305
202
302
306
101
102
103
104
105
204
Partitioned
102, MO
405, MA
204, MD
302, MD
406, NH
303, MN
401, NY
304, MS
206, NC
101, DE
306, NH
105, CT
106, KY
305, CT
201, NJ
403, FL
205, AL
301, LA
202, FL
103, MI
404, IN
402, IA
203, IA
104, IL
Table
MO
MD
MN
MA
MS
NC
NH
DE
NY
KY
CT
NJ
LA
FL
AL
MI
IN
IA
IL
Notes:
As mentioned before, the positive aspects of partitioning begins to deteriorate if there are
non-partitioned indexes present. Here we provide some examples related to running
utilities:
• When non-partitioned indexes exist on the table space, a BUILD2 phase is performed
during online REORG of a partition. This phase uses the shadow index for the index's
logical partition to correct RID values in the NPI. During this phase, the utility takes
exclusive control of the logical partition. This blocks queries that are not
partition-restrictive from operating.
• LOAD PART jobs, run concurrently, contend on non-partitioned indexes because keys
of all parts are interleaved. In addition, during a LOAD PART job, key processing
against non-partitioned indexes follows insert logic (a row at a time) which is slower
than append logic (a page at a time).
2-16 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
V8 Partitioned Tables
Table-controlled partitioning
Creating a table-controlled partitioned table
Converting to table-controlled partitioning
Catalog support
Classification of indexes
Partitioning/secondary
Partitioned/non-partitioned
Clustering
Notes:
In the following topics, we describe the many enhancements to partitioned tables in DB2
Version 8.
First, we introduce the “new” way to create partitioned tables, called “table-controlled
partitioning”. We also look at how to “convert” from index-controlled to table-controlled
partitioning, and where the information about table-controlled partitioning is stored in the
DB2 catalog.
With the changes in partitioning, it is important to have correct terminology to describe the
different types of indexes on a partitioned table: partitioning versus secondary indexes, and
partitioned versus non-partitioned indexes.
DB2 Version 8 also introduces a new type of index, a so-called data-partitioned secondary
index. We look at the problems it solves and some design considerations on when and how
to use them.
Lastly, we look at the clustering enhancements, where in V8 any index can be the
clustering index, and the fact that you can change the clustering index on the fly.
V8 Partitioned Tables -
Table-Controlled Partitioning
Partitioning
Clustering
Partitioned
TB
101
102
103
104
105
106
201
202
203
204
205
206
401
402
403
404
405
406
301
302
303
304
305
306
Partitioned table
Notes:
By using table-controlled partitioning, clustering, being partitioned, and being the
partitioning index are now separate concepts. When using table-controlled partitioning, a
table does not require a partitioning index, as the partitioning is done based on the
PARTITION BY clause in the CREATE TABLE statement. Since the partitioning index is no
longer required, another index may also be used as a clustering index. We will discuss
clustering in detail in Figure 2-34 "Clustering Indexes" on page 2-50.
2-18 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
V8 - Creating Partitioned Tables
CREATE TABLESPACE tsname NUMPARTS n
(PARTITION 1 USING . . .
. . .
PARTITION n USING . . . )
IN dbname;
Notes:
Before V8, only index-controlled partitioning was supported. With index-controlled
partitioning, the partitioning key and partition boundaries are specified on the CREATE
INDEX statement, when creating a partitioning index on the table. This results in a table
that is unusable or “incomplete” until the partitioning index is created.
DB2 V8 introduces table-controlled partitioning. That is, when creating a partitioned table,
the partitioning key and partition boundaries can be specified on the CREATE TABLE
statement, as shown in the visual.
When creating a partitioned table (the table space had a NUMPARTS keyword specified),
table-controlled partitioning is initiated by specifying the new PARTITION BY clause on the
CREATE TABLE statement. The PARTITION BY clause identifies the columns and values
used to define the partition boundaries. When the new clause is used, the definition of the
table is complete and data can be inserted into the table.
Instead of the PART and VALUES keywords used in DB2 V7 index-controlled partitioning,
we now use the PARTITION and ENDING AT keywords. The old syntax keyword
combination (PART and VALUES) is still supported, but what actually controls whether a
2-20 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
RANGE , ,
ADD PARTITION BY ( partition-expression )( partition-element )
partition-expression:
NULLS LAST ASC
column-name
DESC
partition-element:
AT
,
INCLUSIVE
PARTITION integer ENDING ( constant )
Our second example shows how to use the ADD PARTITION BY RANGE clause on an
existing table.
CREATE DATABASE BSDBVER2;
2-22 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
Converting to Table-Controlled Partitioning
No need to DROP/CREATE all existing partitioned tables
DB2 will automatically convert to table-controlled partitioning for
you when any of the following SQL statements are executed:
DROP the partitioning index
ALTER INDEX NOT CLUSTER on the partitioning index
ALTER TABLE ... ADD PARTITION
ALTER TABLE ... ROTATE PARTITION
ALTER TABLE ... ALTER PARTITION n
CREATE INDEX ... PARTITIONED
CREATE INDEX ... ENDING AT ... omitting cluster keyword
Least disruptive approach
ALTER INDEX xpi NOT CLUSTER of the (current) partitioning index
ALTER INDEX xpi CLUSTER of the same index
Notes:
For tables that use index-controlled partitioning created in DB2 V8 or in previous releases
of DB2, the use of any of the statements listed in this visual will automatically convert the
table to use table-controlled partitioning. It is not necessary to drop and recreate the table.
Users are encouraged to convert partitioned tables to use table-controlled partitioning. A
non-disruptive method for doing this involves the following two steps:
1. ALTER INDEX ixname NOT CLUSTER on the partitioning index:
- The index remains available and functioning as before.
- The table is converted to table-controlled partitioning and appropriate catalog
changes are made.
- Column CLUSTERING in SYSIBM.SYSINDEXES will be changed from ‘Y’ to ‘N’.
DB2 will continue to use this index as the clustering index until another index is
explicitly defined with the CLUSTER keyword or altered to be the clustering index or
you drop the partitioning index.
2-24 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
Catalog Support
SYSTABLES
PARTKEYCOLNUM
Number of columns in the partitioning key
SYSTABLEPART
LIMITKEY_INTERNAL VARCHAR 512
Internal format of partition boundary corresponding with LIMITKEY
LIMITKEY - used for both index and table controlled partitioned tables
SYSCOLUMNS
PARTKEY_COLSEQ
Column's numeric position in table's partitioning key
PARTKEY_ORDERING
Order of column in partitioning key ('A', 'D')
SYSINDEXES
INDEXTYPE
'P' for partitioning index on table controlled partitioned table
'D' for DPSI
'2' type 2 (all....)
Notes:
For table-controlled partitioning, the limit keys are only stored in columns LIMITKEY and
LIMITKEY_INTERNAL of SYSIBM.SYSTABLEPART. No indexes defined on this table will
have any values in column LIMITKEY of SYSIBM.SYSINDEXPART. Also, column
PARTKEYCOLUMN in SYSIBM.SYSTABLES will have a non-zero value (this is the
number of columns in the partitioning key). In addition, column IXNAME of
SYSIBM.SYSTABLEPART will be blank.
For index-controlled partitioning, the limit keys are stored in both
SYSIBM.SYSINDEXPART and SYSIBM.SYSTABLEPART in column LIMITKEY. Also,
column IXNAME of SYSIBM.SYSTABLEPART contains the name of the index-controlled
partitioning index.
When using table-controlled partitioning, a value of ‘P’ for the INDEXTYPE column of
SYSINDEXES indicates that an index is both partitioned and partitioning. In
table-controlled partitioning, in order to be a partitioning index, its left-most columns must
be the same columns (or be a superset of the columns), in the same order and collating
sequence as the columns specified when defining the partitioning of the table.
Indexes on index-controlled partitioned tables, and non-partitioned secondary indexes
(non-partitioned and non-partitioning) on table-controlled partitioned tables are identified
with a value of '2' for the INDEXTYPE column of SYSINDEXES.
2-26 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
DB2 V8 Classification of Indexes
An index may / may not be correlated with the partitioning
columns of the table
Partitioning index (PI)
Secondary index
An index may / may not be physically partitioned
Partitioned
Non-partitioned
Clustering index:
Any index may be the clustering index!!
The clustering index can be unique / non-unique
Notes:
Until V7, terms such as secondary index (on a partitioned table), non-partitioned index
(NPI), non-partitioning index, or non-clustering index, were often used interchangeably to
describe any index other than the partitioning index. With all the enhancements related to
partitioning and indexes in DB2 Version 8, it is important that we make sure to use the
correct terminology. It is important to distinguish between:
• Partitioning and non-partitioning indexes
• Partitioned and non-partitioned indexes
• Clustering and non-clustering indexes
Index classification
Indexes on partitioned tables can be classified as follows:
• Based on whether or not the columns in the index correlate with the “partitioning”
columns of the table. The partitioning columns are those specified in the PARTITION
BY clause in the CREATE TABLE statement.
Partitioning index The columns in the index are the same as (are in the same
order and have the same collating sequence), or start with the
same columns as those specified in the PARTITION BY clause
of the CREATE TABLE statement for table-controlled
partitioned tables, or on the CREATE INDEX statement for
index controlled partitioned tables. A partitioning index can
have a super set of the partitioning columns, that is, it can
contain all the partitioning columns plus additional columns.
Secondary index Any index where the columns do not coincide with the
partitioning columns of the table. We describe these in more
detail in Figure 2-21 "Secondary Indexes" on page 2-31.
• Based on whether or not an index is physically “partitioned”:
Partitioned index The index is made up of multiple physical partitions (one per
data partition), not just index pieces.
Non-partitioned index The index is a single physical data set, or multiple pieces.
• Based on whether or not the index determines the clustering of the data. Please
note that when using table-controlled partitioning, any index may be the clustering
index. (With index-controlled partitioning, the partitioning index must be the clustering
index).
We discuss clustering in more detail in Figure 2-34 "Clustering Indexes" on page 2-50.
Clustering index The index determines the order in which the rows are stored in
the partitioned table. There can only be one clustering index.
The clustering index is either the index that is defined with the
CLUSTER attribute (explicit clustering index), or the oldest
index on the table, if no explicit clustering index is defined
(implicit clustering index).
Non-clustering index The index does not determine the data order in the partitioned
table.
Note that non-partitioned tables can also have any index defined as the clustering index
(as in V7).
2-28 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
Partitioning Indexes
201
102
103
104
105
106
202
203
204
205
206
301
302
303
304
305
401
402
403
404
405
406
306
PARTITIONED ...
205
206
302
303
403
201
101
104
102
103
105
106
202
204
301
304
305
306
401
402
404
405
406
... )
PARTITION BY (
ACCOUNT_NUM ASC)
IX
102, MO
302, MD
303, MN
304, MS
405, MA
206, NC
306, NH
406, NH
101, DE
401, NY
106, KY
105, CT
305, CT
205, AL
301, LA
201, NJ
202, FL
403, FL
103, MI
404, IN
203, IA
402, IA
104, IL
ON CUSTOMER (
ACCOUNT_NUM ASC,
STATE_CD ASC )
Notes:
With DB2 V8, the definition of “partitioning index” has changed. A partitioning index is no
longer necessarily an index that controls how a table is partitioned. Now the term
partitioning index has two possible meanings:
1. For index-controlled partitioning (old style partitioning):
The term continues to mean that it is the index that controls how the table is partitioned.
2. For table-controlled partitioning (new DB2 V8 style partitioning):
The term now means that the index has the same left-most key column(s), in the same
order, and using the same collating sequence as the columns that control partitioning
on the table. As the name implies, table-controlled partitioning means that the table
definition itself actually controls how partitioning is done.
In this visual, we have a table-controlled partitioned table. You can tell this by the fact that
we specified the PARTITION BY clause in the CREATE TABLE statement.
Index PARTIX1 is a partitioning index because its key has the same left-most column(s), in
the same order, and using the same collating sequence as the columns (in this case only
one column, ACCOUNT_NUM ASC) which partitions the table. (Index PARTIX1 also
happens to be partitioned, because we specified the PARTITIONED keyword in the
CREATE INDEX statement.) Even though index PARTIX2 is not partitioned (we did not
specify the PARTITIONED keyword in the CREATE INDEX statement), it is also a
partitioning index because its key also has the same left-most column(s), in the same
order, and using the same collating sequence as the columns (in this case, only one
column, ACCOUNT_NUM ASC), which partitions the table.
2-30 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
Secondary Indexes
MAR
MAR
MAY
NOV
AUG
NOV
DEC
NOV
OCT
DEC
OCT
APR
APR
LAST_ACTIVITY_DT ASC )
SEP
SEP
FEB
JUN
JUN
FEB
FEB
JAN
JAN
JUL
JUL
PARTITIONED ...
405, JUN, MA
406, OCT, NH
301, NOV, LA
304, APR, MS
303, MAY,MN
403, MAR, FL
302, JUL, MD
306, JUN, NH
105, APR, CT
206, FEB, NC
401, SEP, NY
106, JAN, KY
101, FEB, DE
202, AUG, FL
305, SEP, CT
205, JAN, AL
103, DEC, MI
402, NOV, IA
201, JUL, NJ
104, NOV, IL
203, OCT, IA
404, FEB, IN
ACCOUNT_NUM INTEGER,
TB CUST_LAST_NM
...
CHAR(30),
)
PARTITION BY (
ACCOUNT_NUM ASC)
MS
NC
NH
DE
KY
NY
CT
AL
LA
NJ
FL
MI
IN
IA
IL
Notes:
Indexes may be created on a table for several reasons: to enforce a uniqueness constraint,
to achieve data clustering, but most typically, to provide access paths to data for queries or
referential constraint enforcement. While the cost of maintaining any index must always be
evaluated against its benefit, several unique factors come into play when deciding whether
to add a secondary index to a partitioned table. This is because there are areas where
secondary indexes can cause performance and contention problems.
A secondary index is any index that is not a partitioning index. In order for an index to be
not partitioning, its key must not have the same left-most column(s), or in the same order,
or using the same collating sequence as the columns (in this case only one column,
ACCOUNT_NUM ASC) which partitions the table.
In this visual, we can see that index SI1 is a secondary index because its key
(LAST_ACTIVITY_DT ASC) does not have the same left-most column(s) as those which
partition the table (in this case ACCOUNT_NUM ASC). Note that this index also happens
to be partitioned (because we specified the PARTITIONED keyword in the CREATE
INDEX statement). A partitioned secondary index, or more precisely a Data Partitioned
Secondary Index (DPSI), is new in V8. In prior DB2 versions, all secondary indexes were
non-partitioned; only the partitioning index could be partitioned.
Index SI2 is a secondary index because its key (STATE_CD ASC) does not have the same
left-most column(s) as those which partition the table (in this case ACCOUNT_NUM ASC).
Index SI2 is also not-partitioned (because we did not specify the PARTITIONED keyword in
the CREATE INDEX statement).
2-32 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
Partitioned Index and Non-partitioned Index
301
101
102
103
104
105
106
202
203
205
302
303
305
401
402
403
404
405
306
206
304
406
ACCOUNT_NUM ASC )
PARTITIONED ...
405, JUN, MA
406, OCT, NH
304, APR, MS
303, MAY,MN
301, NOV, LA
403, MAR, FL
306, JUN, NH
302, JUL, MD
105, APR, CT
206, FEB, NC
401, SEP, NY
106, JAN, KY
101, FEB, DE
202, AUG, FL
305, SEP, CT
205, JAN, AL
103, DEC, MI
402, NOV, IA
201, JUL, NJ
ACCOUNT_NUM INTEGER,
104, NOV, IL
203, OCT, IA
404, FEB, IN
TB CUST_LAST_NM
...
CHAR(30),
)
PARTITION BY (
ACCOUNT_NUM ASC)
MN
MA
MS
NC
NH
DE
KY
NY
CT
AL
LA
FL
NJ
MI
IN
IA
IL
Notes:
This visual shows the difference between a partitioned and a non-partitioned index. A
partitioned index is made up of multiple physical partitions, one per data partition. The
index keys in each index partition correspond to the rows in the same data partition
number. That is, index partition 1 only contains keys for those rows found in data partition
1, index partition 2 only contains keys for those rows found in data partition 2, and so on. A
partitioned index has the keyword PARTITIONED specified in the CREATE INDEX
statement that defines it. (The partitioning index of an index-controlled partitioned table is
also a partitioned index by definition, as it actually defines the partitioning range itself.)
A non-partitioned index may be encompassed by one physical data set or multiple data
sets if piece size is specified. Non-partitioned indexes have a concept of “logical” partitions
if the table they are defined on is partitioned. A non-partitioned index is an index in which
the keyword PARTITIONED is not specified in the CREATE INDEX statement that defined
it.
Partitioning Indexes -
Partitioned and Non-Partitioned
CREATE . . . INDEX PARTIX1
Partitioned Partitioning index PARTIX1 ON CUSTOMER (
ACCOUNT_NUM ASC )
PARTITIONED
202
203
204
205
206
302
303
304
305
306
402
403
404
405
406
201
301
401
101
102
103
104
105
106
Partitioned
402
403
404
405
406
101
102
103
104
105
106
301
302
304
201
202
203
204
205
303
305
306
206
401
table
102, MO
204, MD
302, MD
303, MN
304, MS
405, MA
206, NC
306, NH
406, NH
101, DE
105, CT
106, KY
305, CT
401, NY
205, AL
301, LA
202, FL
403, FL
103, MI
201, NJ
404, IN
203, IA
402, IA
104, IL
Notes:
The visual above shows two different types of indexes, which combine two of the concepts
we talked about in the previous visuals. We start mixing partitioning and partitioned.
Starting in Version 8, a partitioning index can be partitioned (PARTIX1) or non-partitioned
(PARTIX2).
In the visual, index PARTIX1 is a partitioned partitioning index. It is partitioned, because it
consists of four real partitions just as the partitioned table it is defined on. The fact that an
index becomes partitioned does not occur by chance. You must specify the keyword
PARTITIONED on the CREATE INDEX statement to do this. This index is also partitioning,
because its left-most columns (in this case only one column), is the same as the columns in
and in the same collating sequence as the partitioning key (PARTITIONED BY clause) of
the underlying table.
Index PARTIX2 is a non-partitioned partitioning index. It is non-partitioned, because the
keyword PARTITIONED was omitted during CREATE INDEX. It is also a partitioning index,
because its left-most columns (in this case only one column), is the same as the columns in
2-34 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty and in the same collating sequence as the partitioning key (PARTITIONED BY clause) of
the underlying table.
Partitioned and
Non-partitioned Secondary Indexes
MAR
MAR
AUG
MAY
NOV
NOV
NOV
OCT
DEC
OCT
DEC
APR
APR
SEP
SEP
FEB
FEB
JUN
JUN
FEB
JAN
JAN
JUL
JUL
Partitioned 204, DEC, MD
304, APR, MS
102, MAR,MO
405, JUN, MA
406, OCT, NH
301, NOV, LA
303, MAY,MN
403, MAR, FL
206, FEB, NC
105, APR, CT
202, AUG, FL
302, JUL, MD
306, JUN, NH
401, SEP, NY
106, JAN, KY
101, FEB, DE
205, JAN, AL
305, SEP, CT
103, DEC, MI
402, NOV, IA
201, JUL, NJ
203, OCT, IA
104, NOV, IL
404, FEB, IN
table
MO
MN
MD
MA
MS
NC
NH
DE
NY
KY
CT
AL
LA
NJ
FL
MI
IN
IA
IL
Notes:
At the bottom of this visual we also show a secondary index that is made up of a single
data set; it is a Non-Partitioned Secondary Index (NPSI) “NPSI2”. This is the only type of
secondary index that was available before DB2 V8. The NPSI consists of a single data set
(or multiple pieces) and is not partitioned according to the table’s partitioning scheme.
NPSIs may be defined as unique.
At the top of the visual we see index “DPSI1”, a secondary index that is made up of multiple
physical partitions (that match the partitioning scheme of the table); it is a Data-Partitioned
Secondary Index (DPSI). It is an index which is partitioned based on data rows. However,
the index contains different columns than the partitioning columns or in a different order or
collating sequence than those which partition the table. DPSIs must allow duplicates and
thus must not be unique. DPSIs are new in V8.
The visual above also illustrates that a single table may support a mix of non-partitioned
and data-partitioned secondary indexes. We will discuss DPSIs in more detail next.
2-36 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
Creating a Data Partitioned Secondary Index
NOT CLUSTER
CLUSTER
PARTITIONED
( PART integer )
other options VALUES ... 1
Notes:
As introduced in the previous topic, DB2 V8 has the ability to physically partition secondary
indexes. The partitioning scheme introduced is the same as that of the table space. That is,
there are as many index partitions in the secondary index as table space partitions, and
index keys in partition 'n' of the index reference only data in partition 'n' of the table space.
Such an index is called a Data-Partitioned Secondary Index (DPSI).
DPSIs introduce a number of new design opportunities, some of which we will discuss next,
but first we need to look at how to create a data-partitioned secondary index.
The index “DPSI1” on the previous visual is a data-partitioned secondary index. (In our
case the DPSI is also the clustering index, but that does not have to be the case. You can
see that inside each part of the DPSI, the keys are stored by month in ascending order.)
Also note that each part of the DPSI potentially has values for all months, as each part of
the DPSI stores keys that relate to the rows in the corresponding data partition.
When creating a partitioned index, you cannot specify the size of each individual part of the
index. You cannot specify a PART clause on the create index statement (for a
table-controlled partitioned table). If the sizes of all partitions have to be different, you can
use the ALTER INDEX ix-name ALTER PARTITION partno PRIQTY value statement.
2-38 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
DPSIs and Utility Operations
REORG
LOAD
PART
PART
Data-Partitioned Secondary Index (DPSI)
DPSI1 on last_activity_date asc
MAR
AUG
MAR
MAY
NOV
NOV
NOV
DEC
OCT
OCT
DEC
APR
APR
SEP
SEP
FEB
FEB
JUN
JUN
FEB
JAN
JAN
JUL
JUL
Table Partitioned on
204, DEC, MD
304, APR, MS
102, MAR,MO
405, JUN, MA
406, OCT, NH
301, NOV, LA
303, MAY,MN
403, MAR, FL
202, AUG, FL
105, APR, CT
206, FEB, NC
306, JUN, NH
302, JUL, MD
401, SEP, NY
106, JAN, KY
101, FEB, DE
305, SEP, CT
205, JAN, AL
103, DEC, MI
402, NOV, IA
201, JUL, NJ
203, OCT, IA
104, NOV, IL
404, FEB, IN
account_num,
(clustered on
last_activity_dt asc)
204, MD
102, MO
302, MD
303, MN
304, MS
405, MA
206, NC
406, NH
306, NH
401, NY
101, DE
106, KY
105, CT
305, CT
205, AL
201, NJ
301, LA
202, FL
403, FL
103, MI
404, IN
203, IA
402, IA
104, IL
Notes:
By looking at the layout of the keys in the DPSI’s parts, it is obvious that this organization
promotes high data availability by facilitating efficient utility processing on data partitioned
secondary indexes. It also streamlines partition-level operations such as adding and
rotating partitions, also introduced in DB2 V8. (See Figure 2-66 "Partition Management" on
page 2-99 for more details.
• Elimination of the BUILD2 phase during online REORG of a partition:
There is no BUILD2 phase processing for DPSIs. Because keys for a given data
partition reside in a single DPSI partition, a simple substitution of the index partition
newly built by REORG for the old partition is all that is needed. If all indexes on a table
are partitioned (partitioned PI or DPSIs), the BUILD2 phase of REORG is eliminated.
• Elimination of LOAD PART job contention and enabling append (load) mode insertion
for much more efficient processing:
There is no contention between LOAD PART jobs during DPSI processing. This is
because there are no shared pages between partitions on which to contend. Thus if all
indexes on a table are partitioned, index page contention is eliminated.
Also note that during parallel LOAD PART job execution, each LOAD job inserts DPSI
keys into a separate index structure, in key order. This allows the LOAD utility logic to
follow an efficient append strategy (instead of doing “row at a time” logic).
• Facilitation of partition-level operations:
Because keys for a given data partition reside in a single DPSI partition, partition-level
operations can take place at a physical versus logical level. Thus, partition-level
operations are facilitated to the extent that a table's indexes are partitioned.
2-40 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
DPSI Query Performance
Query performance characteristics of DPSIs
Allows query parallelism
Queries with predicates only on secondary index columns will need to scan
all partitions
DB2 tries to do partition pruning -- The application needs to code explicit
partitioning key predicates to allow for partition pruning when a DPSI exists
Chosen for queries with predicates on partitioning columns plus secondary
index columns
Secondary indexes
NPSIs
Pro: Favor sequential query performance
Con: Partition-level query or utility operations
DPSIs
Pro: Favor partition-level query or utility operation
Con: Sequential query performance, although well-suited to partition
parallelism
Notes:
DPSIs allow for query parallelism and are likely to be picked by the optimizer for queries
with predicates on partitioning columns plus predicates on the secondary index columns.
The physical nature of a DPSI can weaken the profile of some types of queries. Queries
with predicates that solely reference columns of the secondary index are likely to
experience performance degradation, due to the need to probe each partition of the index
for values that satisfy the predicate. Queries with predicates against the secondary index
that also restrict the query to a single partition (by also referencing columns of the
partitioning index), on the other hand, benefit from the organization. This is called partition
pruning.
For example, if you are aware of a correlation between DPSI and PI key values, code the
PI restriction explicitly when supplying a DPSI predicate to facilitate partition pruning. Let
us assume that the partitioning column of a table is DATE, and a data partitioned
secondary index exists on ORDERNO. If the company has a policy that the first four digits
of ORDERNO are always a four-digit year, you should write queries that include both
ORDERNO and DATE in the WHERE clause. For example, instead of coding WHERE
2-42 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
Design Considerations - Initial Thoughts
Predicates - are there predicates on partitioning columns?
Clustering - what are the "large" processes against the data?
ORDER BY - what is the desired sequence?
DPSIs are non-unique - query might have to scan all parts, if no
predicates on partitioning column(s)
Frequency of index maintenance (INSERT / UPDATE / DELETE /
utilities) against DPSI / NPSI
Is number of parts on table a consideration for DPSI / NPSI?
(for example, 4096 parts)
Online REORG - frequency any different?
Considerations for add / rotate partition?
Notes:
Data partitioned secondary indexes are not a solution that fits all cases. They must be used
wisely. DPSIs only give you an additional design option.
The decision to use a non-partitioned secondary index or a data-partitioned secondary
index must take into account both data maintenance practices and the access patterns of
the data. We recommend replacing an existing non-partitioned secondary index with a
data-partitioned index only if there are perceivable benefits such as easier data or index
maintenance, improved data or index availability, or improved performance.
Note also that the capacity to partition secondary indexes provided by this enhancement,
coupled with the increase in the number of partitions supported in DB2 V8, increases the
design feasibility of using efficient LOAD PART operations to add new data to a table (as
opposed to using SQL INSERT operations).
Furthermore, if that design option is taken, the frequency with which REORG INDEX needs
to be run on the secondary indexes may decline (since the LOAD operator can reserve free
space for future insert, and because REORG of each part no longer needs to be followed
by a REORG INDEX of the NPIs to clean them up from all the BUILD2 activity).
CHECK INDEX
Can be run on partition of DPSI, or logical partition of NPSI
RUNSTATS
May be run against single partitions, including DPSIs. Partition-level
statistics are used to update aggregate statistics for the entire table.
Partition parallelism
DPSIs allow for totally concurrent operations with PART keyword,
as do PIs
LOAD, REORG, REBUILD INDEX, CHECK INDEX
Notes:
Data Partitioned Secondary Indexes also improve the recovery characteristics of your
system. DPSIs can be copied and recovered at the partition level. Individual partitions can
be rebuilt in parallel to achieve a fast rebuild of the entire index.
A more detailed account of utility and query processing of partitioned secondary indexes is
given in Figure 8-28 "Utility Changes to Support DPSI (1 of 6)" on page 8-62.
2-44 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
System Planning for DPSIs
DSMAX (and z/OS limit) number of data sets backing a secondary index could
increase
Catalog / Directory growth
DPSIs:
SYSINDEXPART
SYSCOLDISTSTATS
SYSINDEXSTATS
DSNDB01.DBD01
For indexes which are COPY YES:
SYSLGRNX
SYSCOPY
For indexes whose HISTORY statistics are collected:
SYSINDEXPART_HIST
SYSINDEXSTATS_HIST
SYSCOLDISTSTATS_HIST
Notes:
A secondary index can now be partitioned and you may now have up to 4096 partitions per
partitioned table. Each partition requires its own data set. You should take into
consideration the DSMAX parameter, as the number of data sets can increase more easily
and in larger increments (you can have up to 4096 new data sets per DPSI index).
The ability to have more partitions or indexes that can be partitioned, you will also need
additional space in catalog and directory tables. You should consider increasing the size of
the catalog tables listed in this visual when you start to use DPSIs extensively, especially
when combined with partitioned tables with a large number of partitions.
More storage in the EDM pool will be required since DBDs will be larger due to support of
more partitions. The size of the EDM pool should be reviewed if you plan to use many more
partitions.
Programs and queries that access index partitioning information in the catalog may need to
change. They will need to take into account the different kinds of indexes now available
and the increase in the number of partitions.
* RESTRICTED
DSNT360I - ***********************************
DSNT397I -
INDEX3 IX D0001 RW
Notes:
In this and the next set of visuals, we look at the output of the DISPLAY database
command for the different types of indexes that exist in DB2 Version 8.
2-46 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty The following shows another example of a DISPLAY DATABASE command output.
DSNT360I -DB8A ***********************************
DSNT361I -DB8A * DISPLAY DATABASE SUMMARY
* GLOBAL
DSNT360I -DB8A ***********************************
DSNT362I -DB8A DATABASE = BSDBVER4 STATUS = RW
DBD LENGTH = 4028
DSNT397I -DB8A
NAME TYPE PART STATUS PHYERRLO PHYERRHI CATALOG PIECE
-------- ---- ----- ----------------- -------- -------- -------- -----
BSTSVR12 TS 0001 RW
BSTSVR12 TS 0002 STOP
BSTSVR12 TS 0003 RW
-THRU 0004
DIX1 IX D0001 RW
-THRU 0004
NIX1 IX L* RW
PIX1 IX 0001 RW
PIX1 IX 0002 RO
PIX1 IX 0003 RW
-THRU 0004
******* DISPLAY OF DATABASE BSDBVER4 ENDED **********************
DSN9022I -DB8A DSNTDDIS 'DISPLAY DATABASE' NORMAL COMPLETION
***
In the example above, DIX1 is the DPSI, NIX1 is a non-partitioned index, and PIX1 is a
partitioned partitioning index.
DSNT360I - ***************************************
DSNT397I -
INDEX3 IX D0002 RW
INDEX4 IX L0002 RW
Notes:
Non-partitioned indexes are displayed with an L; either L*, when all logical partitions have
the same status, or Lxxxx for individual logical parts. In the example above, INDEX2 is a
non-partitioned index with no logical parts in any special status, therefore L*. (For more
information on logical partitions, see Figure 2-11 "V7 - Logical and Physical Partitions" on
page 2-13.
INDEX4 is also a non-partitioned index and here logical part #1 (L0001) is in recovery
pending status.
Note that the usage of “L” applies to both non-partitioned secondary indexes (NPSIs) as
well as non-partitioned partitioning indexes (NPPIs).
2-48 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
Displaying DPSIs
DSNT360I - ****************************************
DSNT360I - ****************************************
INDEX1 IX 0001 RW
INDEX1 IX 0002 RW
INDEX2 IX L* RW
Dnnnn indicates information about
INDEX3 IX D0001 RW partitions of a DPSI
INDEX3 IX D0002 RW
Notes:
When you display a data-partitioned secondary index (DPSI), you will notice that each part
of the DPSI is preceded by a D, for example D0001.
In the figure above, INDEX3 is a DPSI consisting of two parts, D0001 and D0002.
Clustering Indexes
Any index can be the clustering index
The CLUSTER keyword is now optional when creating a
partitioning index
Ordering of rows for INSERT and REORG:
Partitioning columns determine the proper partition for the row
The clustering index determines the location within the partition
(assuming availability of space at that location)
The clustering index can be changed using ALTER INDEX
Steps to change the clustering index
ALTER INDEX index1 NOT CLUSTER
ALTER INDEX index2 CLUSTER
(Followed by REORG to change the sequence of existing rows)
Notes:
Historically, the partitioning index for partitioned tables also had to be the clustering index.
These two attributes (partitioning and clustering) are unbundled in Version 8.
In DB2 V8, any index on a table-controlled partitioned table can be the clustering index,
including a secondary index.
The CLUSTER keyword is now optional when creating a partitioning index. Please note
that if you are attempting to create an index-controlled partitioned table space, if you do not
specify the CLUSTER keyword in the CREATE INDEX statement for the partitioning index,
then that table is converted to table-controlled partitioning and the table definition is
complete.
The partitioning columns (those specified in the PARTITION BY clause) determine the
proper partition for the placement of rows. The clustering index (the index defined with the
CLUSTER keyword or the first index defined on the table) controls the clustering or location
within the partition. If there are no indexes defined on the table space, then clustering is
done using the partitioning columns specified in the PARTITION BY clause.
2-50 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty You now also have the capability to change the clustering index with an ALTER INDEX
statement. Since only one index can be explicitly defined as a clustering index, you must
follow these steps to change the clustering:
1. ALTER INDEX ixname NOT CLUSTER on the current clustering index.
Clustering continues to be done according to this index until a new clustering index is
explicitly defined
2. ALTER INDEX ixname CLUSTER on the index you wish to be the new clustering index.
New rows will be clustered according to the new clustering index, old rows will remain in
their current location.
3. REORG the table space to re-arrange the rows in the new clustering index order.
All existing rows will be re-arranged in the new clustering sequence. Any new rows will
be inserted with the new clustering sequence.
It is worth noting that for table-controlled partitioning, the SYSIBM.SYSTABLEPART
columns IXNAME and IXCREATOR contain blanks, and column LIMTKEY_INTERNAL
has the highest value of the limit key of the partition in internal format.
For index-controlled partitioning, columns IXNAME and IXCREATOR contain the index
name and creator, and column LIMITKEY_INTERNAL has blanks.
If no explicit clustering index is specified for a table, the V8 REORG utility recognizes the
first index created on each table as the implicit clustering index when ordering data rows.
If explicit clustering for a table is removed (changed to NOT CLUSTER), that index is still
used as the implicit clustering index until a new explicit clustering index is chosen.
OR
Notes:
The clustering index can be any index, you are no longer restricted to the partitioning index
having to be the clustering index. This visual shows two possibilities. Index DPSICLST is a
data partitioned secondary index and it can be the clustering index. Index NPSICLST is a
non-partitioned secondary index and it can be the clustering index. However, only one
index may be explicitly defined as the clustering index at any one time. The clustering index
may also be unique (when it is not a DPSI, as DSPIs cannot be defined as unique).
2-52 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
Clustering NPSI
In this case, we choose to use a non-partitioned secondary index as
the clustering index. Note that the rows within each partition are now
stored according to the index on the STATE_CODE column.
204, DEC, MD
102, MAR,MO
304, APR, MS
206, FEB, NC
202, AUG, FL
405, JUN, MA
301, NOV, LA
403, MAR, FL
303, MAY,MN
306, JUN, NH
302, JUL, MD
406, OCT, NH
105, APR, CT
205, JAN, AL
101, FEB, DE
401, SEP, NY
106, JAN, KY
305, SEP, CT
201, JUL, NJ
103, DEC, MI
402, NOV, IA
203, OCT, IA
104, NOV, IL
404, FEB, IN
TB
Partitioned table
IX
MO
MD
MN
MA
MS
NC
NH
DE
NY
CT
KY
LA
AL
NJ
FL
MI
IN
IA
IL
Notes:
In this case, we chose to use a non-partitioned secondary index as the clustering index.
Note that the rows within each partition are now stored according to the index on the
STATE_CODE column.
MAR
MAR
MAY
NOV
AUG
NOV
NOV
OCT
OCT
DEC
DEC
APR
APR
SEP
SEP
FEB
JUN
FEB
JUN
FEB
JAN
JAN
JUL
JUL
Partitioned table
partitioned by
403, MAR
102, MAR
303, MAY
202, AUG
301, NOV
104, NOV
402, NOV
304, APR
406, OCT
105, APR
103, DEC
203, OCT
204, DEC
305, SEP
401, SEP
306, JUN
404, FEB
405, JUN
101, FEB
205, JAN
106, JAN
206, FEB
302, JUL
201, JUL
ACCOUNT_NUM,
clustered on
LAST_ACTIVITY_DT
within each partition
302
304
306
303
305
101
103
104
105
301
201
204
205
206
401
402
403
405
406
102
203
106
202
404
Partitioning index part_IX_1 (partitioned) -- on ACCOUNT_NUM
Notes:
In the case shown in the figure above, we chose to use a data-partitioned secondary index
as the clustering index. Note that the rows within each partition are now stored according to
the index on the LAST_ACTIVITY_DT column. The clustering index is based on a
secondary index, and therefore by definition different from the partitioning columns. The
partitioning columns are only used to determine in which partition to insert the rows; inside
each partition the order of the rows is based on the clustering index, a DPSI in this case.
2-54 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
Online Schema Changes
Overview of online schema evolution
Altering Tables
Altering Indexes
Versioning
Partition management enhancements
Adding, rotating, changing, and rebalancing partitions
Notes:
In DB2 Version 8 we make great strides to improve data availability. V8 allows you to make
a number of schema changes without having to drop and recreate the objects.
We start out with an overview of the things that you normally have to do to implement a
change to your schema. We also list the things that were done in previous releases and
versions to allow you to make more online schema changes.
DB2 Version 8 allows you to make a number of changes (ALTERs) to tables and indexes
without having to drop and recreate the objects (and its dependent objects).
To implement online schema changes, DB2 V8 has implemented a new versioning
infrastructure.
V8 also allows you to add partitions to a partitioned table space, as well as rotate (roll-off),
and rebalance partitions, without having to drop and recreate the object.
Lastly, we discuss a number of the schema changes, like changing the clustering index,
and switching between padded and not padded indexes.
Data maintenance
Online RUNSTATS, COPY, REORG, and LOAD (V3-V7)
Notes:
Over the last versions of DB2, significant enhancements have already been implemented
to reduce the unavailability window.
Application Maintenance
DB2 has implemented packages and package versioning since Version 2 Release 3. By
using this versioning technique, you can prepare the DB2 packages for the new version of
the application in advance, and once the new application code gets activated, DB2
automatically picks up the new version of the package, without any service interruption.
Code Maintenance
Installing a new version of DB2 or applying maintenance (PTFs) normally requires that you
stop and start your DB2 subsystem. During that time, your DB2 applications are not
available for your customers. With the introduction of DB2 data sharing in Version 4, you
can stop and start individual members (DB2 subsystems) to activate maintenance or a new
2-56 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty DB2 release, and applications can use other, active, members to run while certain
members of the data sharing group are down for maintenance.
Data Maintenance
Another area in which a lot of work has been accomplished is to keep the data available as
much as possible. Data requires maintenance every so often. Sometimes data gets
disorganized and needs a REORG, sometimes additional data needs to be loaded, and
during that maintenance time, the data should be available to the applications as much as
possible. DB2 utilities have come a long way over the last releases, for example, by
introducing online REORG, inline copy and statistics, and online LOAD RESUME.
Schema Maintenance
Starting in Version 8, DB2 takes on a new challenge, that is, to reduce the unavailability
window when making changes to the data definition of DB2 objects.
However, it should be noted that V8 does not support schema versioning. DB2 will always
convert the rows to the latest table format.
Data availability
Notes:
In the past, DB2 releases have implemented most DDL ALTER enhancements without
actively addressing the problem of data unavailability while modifying object attributes.
Some of these obstacles have also been removed via APARs where the required change
was fairly simple. See Table 2-1 for a list of some of the changes:
2-58 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Notes:
As 24x7 availability becomes more critical for applications, the need grows for allowing
changes to database objects reflected in the catalog and the DBD while minimizing the
impact upon availability. We call this Online Schema Evolution (or Online Schema Changes
or Online Alter).
In an ideal world, this enhancement would provide support for changes to all object
attributes without losing availability. DB2 V8 lays the groundwork for allowing many
changes, while implementing a reasonable subset of these changes.
The following schema changes are allowed in DB2 V8:
• Extend CHAR(n) column lengths.
• Change type within character data types (CHAR, VARCHAR).
• Change type within numeric data types (SMALLINT, INTEGER, FLOAT, REAL, FLOAT,
DOUBLE, DECIMAL), as long as the existing row values can all fit within the range
allowed by the new data type.
• Change type for graphic data types (GRAPHIC, VARGRAPHIC).
2-60 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty • Allow column data type changes for columns that are referenced within a view.
• Allow these column changes for columns that are part of an index.
• Add a column to an index.
• Drop the partitioning index (or create a table without one).
• Change the clustering index.
• Create or alter an index to have not padded varying length character columns within a
key.
• Allow an alter of identity columns.
• Add a partition to the end of a table which extends the limit value.
• Rotate partitions.
• Support automatic rebalancing of partitions during REORG.
• Loosen the restrictiveness of indexes in Recover or Rebuild Pending.
Key Benefits
Availability!
Avoids wasted space as column lengths can be defined for today's
maximums and extended in the future if necessary
Reduces the number of data sets to be managed as partitions can
be added when required - particularly important for time series data
Enables rolling partition designs
May reduce the number of indexes required as you may be able to
drop partitioning index and define a secondary index as the
clustering index
Notes:
The key benefits to online schema evolution are:
• Availability:
Changes to the schema can be done without having to make the objects unavailable for
extended periods of time in order to drop and redefine them.
• You can avoid wasting disk space:
Columns can be defined with today’s needs in mind, they can be extended in the future
as requirements change.
• Reduce the number of data sets to be managed:
Partitions can be added as needed.
• Enables rolling partition design:
You can roll partitions and reuse them for new data ranges.
2-62 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
2-64 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
CHAR(20) to CHAR(10)
SMALLINT to DEC(3,0)
CHAR(6) to INTEGER
Notes:
Designing databases and objects for applications is more forgiving than in the past. The
problem with underestimating the size of objects is lessened with the ability to change
column data types without losing availability to the data. Designing applications can give
more consideration to saving space and reducing the number of data sets up front without
the fear of being locked in at a future point by initial schema decisions.
2-66 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
Supported Alter Data Types
smallint integer
smallint float(1-21) or real
If a decimal is converted to
smallint foat(22-53) or double
floating point, column cannot
smallint >= decimal(5,0) have unique index or a unique
integer float(22-53) or double constraint
integer >=decimal(10,0)
float(1-21) or real float(22-53) or double
<=decimal(7,s) float(1-21) or real
<=decimal(15,s) float(22-53) or double
For decimal data types "a" plus
decimal(p,s) decimal(p+a,s+b) "b" must be greater than zero or
char(n) char(n+x) there is no change
char(n) varchar(n+x)
varchar(n) char(n+x)
varchar(n) varchar(n+x)
graphic(n) graphic(n+x)
For character data types "x" can
graphic(n) vargraphic(n+x)
be greater than or equal to zero
vargraphic(n) vargraphic(n+x)
vargraphic(n) graphic(n+x)
Notes:
A column data type may be altered if the data can be converted from the old type to the
new without losing significance. This basically means that the new column definition has to
allow for “larger” values than the current column definition. In V8 you can change numeric
and character data types. You can change smallint to int, to decimal, to float, real, or
double, as long as the maximum value that you can store in the new data type is greater
than the maximum value of the old data type. The same is true for character data. In
addition, you can change between CHAR and VARCHAR, and vice versa, as well as
between GRAPHIC and VARGRAPHIC, and vice versa.
Note that when a DECIMAL data type is changed to FLOAT, the column cannot be part of a
unique constraint. This is because the FLOAT data type is stored as a floating point
number, and floating point numbers are approximate numbers. Therefore, two existing
decimal values that are different can end up being converted to the same floating point
number, which would mean a unique constraint violation.
COLUMN
Example:
ALTER TABLE CUST ALTER LASTNAME SET DATA TYPE CHAR(40)
Notes:
To support changing the column data type of a column in an existing table, the SET DATA
TYPE clause of the ALTER TABLE ALTER COLUMN has been enhanced to support these
additional changes. In this visual, we show the syntax and an example of the use of the
SET DATA TYPE clause.
2-68 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
What Happens to the Table?
Information about new table layout is stored in:
Catalog and directory
New system pages inside the page set
If this is the first ALTER TABLE ALTER COLUMN, the original
definition (V0) is captured in the SYSOBDS catalog table
Maximum 255 alters per table space before a REORG is required
Table space is placed in AREO* (advisory REORG-pending)
Accessible but some performance degradation until REORG
Plans, packages and cached dynamic statements referring to the
changed column are invalidated
Runstats values for columns are invalidated
Notes:
Here we discuss some considerations regarding how the tables are handled.
RUNSTATS
Note that after altering the column’s data type or column length, the RUNSTATS values in
some catalog tables for that column become unusable. Therefore it is a good idea to these
statistics again after changing the data type of a column. More precisely:
• The statistics found in SYSCOLUMNS are converted to the new data type when the
column is altered. So HIGH2KEY, LOW2KEY and COLCARDF is still usable.
• Cardinality statistics (TYPE 'C' statistics) in SYSCOLDIST remain usable.
• Single and multi-column frequency statistics (TYPE ‘F’ statistics) are no longer usable.
To indicate that some statistics may have become unusable, the STATSTIME column in the
following catalog tables is updated to ‘0001-01-02-00.00.00.000000’:
• SYSIBM.SYSCOLSTATS
• SYSIBM.SYSCOLUMNS
• SYSIBM.SYSCOLUMNS_HIST
2-70 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
What Happens to the Data?
Existing data remains unchanged
On SELECT, data will be materialized to the latest format
On INSERT / UPDATE, the entire row will be changed to latest
format
Reorg changes all rows to the latest format
Notes:
After a table column data type is changed (via an ALTER statement), the new definition
immediately applies for all data in the associated table. No existing data is converted to the
new version format.
When rows are retrieved, they are materialized in the new format indicated by the catalog
and system pages contained within the object. Likewise, when a data row is modified or
inserted, the entire row is saved using the new definition.
When the object is reorganized, all rows are converted into the format of the latest
definition (see Figure 2-57 "Versioning" on page 2-86 for details).
Notes:
When the data type or length of a column is altered on a table and that column is defined in
an index, the index is altered accordingly.
If a table has multiple indexes, a change for a table column results in a new table version
and a new index version for each index that contains the column. Indexes created on
different tables in the same table space or unchanged columns in the same table are not
affected. If a change is made to a non-indexed column, it results in a new table (and table
space) version but not a new index version.
All new keys inserted are in the new index format.
If an entire index is rebuilt from the data, all the keys are converted to the latest format.
Utilities which may rebuild an entire index include:
• REBUILD INDEX
• REORG TABLESPACE
• LOAD REPLACE
2-72 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty If data type changes, reorganization of the entire index (REORG INDEX) materializes all
keys to the format of the latest version.
Whether or not the index is immediately available after a column in the index incurred a
data type change, depends on the data type of the column being changed.
Limiting the Scope of the Unavailability for Dynamic SQL for RBDP
Index
To limit the scope of the unavailability of the data for dynamic SQL when the index is in
RBDP state, in V8:
• Deletes are allowed for table rows, even if there are indexes in RBDP
• Updates and inserts are allowed for table rows, even if their corresponding non-unique
indexes are in RBDP state
• Inserting or updating data rows which result in inserting keys into an index that is in
RBDP state is disallowed for unique or unique where not null indexes
• For dynamic SQL queries, DB2 does not choose an index in RBDP for an access path
Notes:
When a column is altered in a base table, the views that reference the column are
immediately regenerated. If one of the views cannot be regenerated, then the ALTER
TABLE statement fails on the first error encountered.
A change to any column within a view invalidates all plans, packages, and dynamic cached
statements that are dependent on that view.
When a column data type is altered, the precision and scale of the decimal arithmetic result
needs to be recalculated. The value of the CURRENT PRECISION special register that is
in effect for the ALTER TABLE is used to regenerate all the views affected by the altered
column. Since a single CURRENT PRECISION setting is used for all the views, it is
possible the ALTER TABLE can fail with an SQLCODE -419 or complete with a precision
calculated for view columns that does not work well for an application. In this case the user
has to DROP and CREATE the view in order to correct the problem.
If an ALTER TABLE fails because of a problem regenerating a view, the failing SQLCODE
and tokens identifying which ALTER failed is returned and the entire ALTER TABLE
statement fails.
2-74 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty If a check constraint is dependent on the column being altered, it is also “regenerated”. The
regeneration may also fail in the case where different options are in use during the
regeneration than the options in use at the time the check constraint was created. The
options are the decimal point indicator and quote delimiter. The failing SQLCODE and
tokens identifying which ALTER failed are returned.
V8 also adds a new DDL statement ALTER VIEW viewname REGENERATE. This
statement has two purposes:
• Regeneration of views which have been marked invalid during catalog migration. A view
is considered as invalid when column STATUS in catalog table SYSIBM.SYSTABLES
contains a 'R'. During catalog migration, DB2 continues processing even when a view
regeneration fails. DB2 does not want to stop the migration process because a single
view on the catalog is no longer usable. You can use the ALTER VIEW name
REGENERATE statement to do those views manually after the migration is complete.
• To have DB2 update the stored form of the view to the new V8 structure — rather than
doing this on the fly each time it is used.
When there is a trigger defined on a column that is changed, that is also automatically
handled by the system.
Notes:
When making schema changes, applications are usually affected. Changes in the schema
must be closely coordinated between database objects and applications to avoid
“breaking” existing applications. For example, if a column is extended from CHAR(n) to
CHAR(n+m), the processing application truncates the last m bytes if the application is not
changed to handle the longer column. With this feature, be sure to assess which programs
need to change.
The creation of new versions for objects can degrade performance of existing access
paths. Schema changes should be planned to balance the trade-off between performance
and availability expectations within a customer environment. Typically, the best time to
make schema changes to minimize the poor performance impact is before a scheduled
reorganization of an object.
Rebuild any affected indexes and rebind plans and packages for applications using static
SQL. This will prevent the system from picking an inefficient access path during automatic
rebind because the best suited index is currently not available. When the index is rebuilt
and the RBDP status is reset, statements in the dynamic statement cache related to the
2-76 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty table for which the index was rebuilt are automatically invalidated, and the next time the
statement is executed, a new access path is determined, and can now pick the index that
was rebuilt.
Schedule RUNSTATS to repopulate the catalog with accurate column and index statistics.
Ever since V1 of DB2, you have been able to add a (nullable) column to an existing table,
without having to drop and recreate the object; online schema evolution before its time.
When you add a column to an existing table in Version 8, this change will not create a new
version if this is the first alter on the table, in other words, if the table is still at version 0.
If you have executed a version-generating ALTER on that table in the past (in other words,
the table is not at V0), then adding a column to an existing table will create a version.
Restrictions
Data types must be compatible and lengths must be the same
or longer
Disallowed for ROWID, LOB, DATE, TIME, TIMESTAMP columns
or a distinct type
Data types and lengths cannot be altered when
Column is part of a referential constraint or has a FIELDPROC
An EDITPROC or VALIDPROC exists on the table
Part of a materialized query table
The column is defined as an identity column or seclabel
Default values and check constraints are handled
V8 does NOT support schema versioning (or recovery other than
a point-in-time recovery of the catalog)
Notes:
DB2 V8 takes the first steps to avoid outages due to schema changes. However, certain
restrictions are still in place, such as these:
• Data types must be compatible and lengths must be the same or longer.
• Online schema changes are not allowed on columns that are defined as ROWID, LOB,
DATE, TIME, TIMESTAMP, as well as columns that use a distinct type.
• Data type and lengths cannot be altered when:
- The column is part of a referential constraint.
- The column has a FIELDPROC defined.
- An EDITPROC or VALIDPROC exists on the table.
- The table is used in a materialized query table (if the table you are ALTERing is a
materialized query table, or a materialized query table is defined on the table you
are altering), irrespective of whether or not the column you are altering is part of the
SELECT clause that makes up the MQT.
2-78 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Operational Impact
Table space and indexes will contain data in multiple version
formats until REORG or LOAD REPLACE or REBUILD (for IX)
Recovering data
Whether to current or PIT, there is no problem, as image copy
and/or SYSOBDS contain version information
Log processing will insert and update data to new format
Just restoring data from a previous version
Note: It is recovery of data rather than schema
Moving data using off-line utilities (for example, DSN1COPY)
may require extra steps
REPAIR VERSIONS
Recommendation: Keep track of DDL history of your tables when
moving data using off-line utilities
Notes:
SYSODBS only contains the original version (V0), the one prior to the first new version.
Table space and indexes contain system pages with all changed definitions. REORGs and
MODIFYs must be used to convert all the data to the latest definition.
Recovery of data rather than schema means that the data is recovered to a point-in-time
(PIT); you do not recover the schema to a point-in-time.
Moving data between objects or subsystems using DSN1COPY is more complicated. You
need to execute the REPAIR utility using the VERSIONS keyword to resync the catalog
information with the data. As before, as a “golden rule”, you need to track the DDL history
of your tables, because you may have to perform different actions depending on different
sequences of DDL statements, when using offline utilities to move the data to a different
table. For more information about the use of REPAIR VERSIONS, see Figure 8-24
"REPAIR - Use of Versions" on page 8-55.
2-80 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
ASC
ADD COLUMN ( column-name )
DESC
Notes:
In DB2 V8, the syntax of the ALTER INDEX statement is changed to allow for the addition
of columns to the end of an index, for example:
ALTER INDEX CUST_IDX ADD COLUMN NEW_COL ASC;
2-82 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
Alter Index Add Column
Ability to add a column to the end of an index
When the column preexists in the table, the index is placed in
RBDP
If it is a new column, add it to the table and index in the same
UOW, then the index will be placed in AREO* instead
Examples: Immediate
ALTER TABLE CUST ADD COLUMN NEW_COL; availability!
ALTER INDEX CUST_IDX ADD COLUMN NEW_COL ASC; (Index placed
COMMIT; in AREO*)
Delayed
ALTER TABLE CUST ADD COLUMN NEW_COL;
availability!
COMMIT;
(Index placed
ALTER INDEX CUST_IDX ADD COLUMN NEW_COL ASC; in RBDP)
COMMIT;
Notes:
Columns can now be appended to the end of an existing index key with the ALTER INDEX
statement.
• If the index is not defined (created with the DEFINE NO keyword), no restricted state is
set and a new index version is not created.
• If the index is defined and the column is added to the table in the same unit of work that
the column is also added to the index, the index is immediately available for access,
and the index is placed in Advisory Reorg Pending (AREO*) state.
• However, if the column was not added to the table in the same unit of work, the index is
placed in a Rebuild Pending state (RBDP) and a new index version is generated.
This support allows maximum availability for the situations where new columns are added
to a table, and these new columns are also desired as part of an existing index. By making
changes in one unit of work, there is no loss of availability. The alternative is to drop the
index, and then create a new index with the column. When creating a new index, there is
always a period of unavailability while the index is being created.
Restrictions
Cannot exceed 64 columns in an index
Length maximum
2000 - n for padded, where n is the number of nullable columns
2000 - n - 2m, where m is the number of varying length columns
Disallowed for
System defined indexes (indexes on Catalog and Directory)
Partitioning indexes of index-based partitioned table spaces
Indexes enforcing a primary or unique constraint or referential constraint
A unique index required for a ROWID column defined as GENERATED BY
DEFAULT
An auxiliary index
Notes:
This visual shows the restrictions in for adding columns to the end of an index. You cannot
exceed a total of 64 columns per index. The total length of the index key columns cannot
exceed 2000 bytes minus the number of nullable columns and minus 2 times the number of
varying length columns in the index.
You may not alter add columns to the end of an index for the following indexes:
• An index that is a system-defined catalog index
• An index that enforces a primary key, unique key, or referential constraint
• A partitioning index when index-controlled partitioning is being used
• A unique index required for a ROWID column defined as GENERATED BY DEFAULT
• An auxiliary index
2-84 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
RBDP Considerations
Index must be rebuilt - reorg will not suffice
Dynamic statement cache
Changes which result in indexes being placed in RBDP will flush dynamic
statement cache
Once the index is available again, revert to optimal access paths by
Rebinding affected plans/packages for static SQL
Automatic invalidation of related statements in the dynamic statement cache
when RBDP is reset.
At next execution, statement is prepared again, and can go back to original
access path
Manual invalidation of DSC can be done using
RUNSTATS UPDATE NO REPORT NO
Notes:
There are a few things to consider when an index is placed in a Rebuild Pending (RBDP)
state:
• Indexes in RBDP must be rebuilt using the REBUILD INDEX utility; a REORG will not
remove the RBDP state.
• Changes which result in indexes being placed in RBDP will flush any statements in the
dynamic statement cache that use those indexes. Static plans/packages are not
invalidated. For non-unique indexes, static plans/packages that do not use the index to
retrieve data will continue to function as before. For unique indexes placed in RBDP,
those plans/packages that are dependent on them will get a -904 (unavailable resource)
on an update of an existing row or insert of a new row. Deletes are no problem.
• Once the index has been rebuilt, you should rebind the affected plans/packages and
you should invalidate your dynamic statement cache. In V8, you can invalidate the
dynamic statement cache by running RUNSTATS with UPDATE NO REPORT NO
keywords.
Versioning
Up to 256 active versions for table space, up to 16 for indexes
Active versions include those within page set and all available image
copies
Unaltered objects remain at version 0
If maximum reached, alter fails with SQLCODE -4702
If you have a large number of ALTERs:
Make several ALTERs to an object within a single unit of work
(counts as one)
You cannot have an ALTER and DML in the same unit of work
SQLCODE -910
REORG and MODIFY before you reach the maximum of 256 versions
per table space
MODIFY to delete image copies
"Rebuild" index before you reach the maximum of 16 versions per
index space
Notes:
To support online schema evolution, DB2 has implemented a new architecture to track
object definitions at different times during its life by using versions.
Altering existing objects may result in a new format for tables, table spaces, or indexes that
indicates how the data should be stored and used. Since all the data for an object and its
image copies cannot be changed immediately to match the format of the latest version,
support for migrating the data over time is implemented by using versions of tables and
indexes. This allows data access, index access, recovery to current, and recovery to a
point-in-time while maximizing data availability.
Versioning existed before DB2 V8 for indexes (after an indexed VARCHAR column in a
table had been enlarged). It was tracked using the IOFACTOR column of SYSINDEXES. In
DB2 V8, the first ALTER that creates a new index version switches to DB2 V8 versioning by
setting the OLDEST_VERSION and CURRENT_VERSION columns to the existing
versions in the index. And to support the table data type changes mentioned before,
versioning in Version 8 is also implemented for tables and table spaces.
2-86 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Version Limits
A table space can have up to 256 different active versions (versions 0-255) while an index
can have up to 16 different active versions (versions 0-15). Active versions include those
within the pageset and all available image copies (in SYSCOPY).
The range of active versions is all versions that exist for rows in the page set itself as well as
the versions that exist in image copies registered in SYSCOPY. If the maximum number of
active versions is reached, the SQL statement fails with:
DSNT408I SQLCODE = -4702, ERROR: THE MAXIMUM NUMBER OF ALTERS ALLOWED HAS BEEN
EXCEEDED FOR TABLE
Unaltered objects remain at version 0 (zero).
VERSION
OLDEST_VERSION CURRENT_VERSION VERSION
"0" DATA
SYSTABLESPACE X X
SYSTABLEPART X
SYSTABLES X
X
SYSINDEXES X X
(data version)
SYSINDEXPART X
SYSCOPY X
SYSOBDS X
Notes:
As can be seen in the visual above, versioning information for an object is kept in the
catalog tables SYSIBM.SYSTABLESPACE, SYSIBM.SYSTABLEPART,
SYSIBM.SYSINDEXES, SYSIBM.SYSINDEXPART, SYSIBM.SYSTABLES, and
SYSIBM.SYSCOPY.
In addition, the new catalog table SYSIBM.SYSOBDS contains one row for each table
space OBD or index that can be recovered to an image copy and that has more than one
active version. Only the first active version (the definition of the object when it was created
— Version 0 (zero)) of the OBD is placed in SYSODBS, the records are cleaned up when
version numbers are consolidated, and active versions are reduced to only one active
version.
A table space starts out with all data in tables at version zero. When an ALTER creates a
new version, it gets the next available number after the active table space
CURRENT_VERSION. Once version 255 is reached, numbering starts again with version
1 if it can be reclaimed. A version of 0 indicates that a version creating ALTER statement
has never been issued for the corresponding table or table space.
2-88 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Notes:
To reduce the number of versions, you must first materialize all the rows to the latest
version format. Depending on the object type (table space and COPY YES index) you can
use REORG, LOAD REPLACE, or REBUILD INDEX to achieve this.
Then you can run the MODIFY utility to update the version information in the catalog. For
table spaces, and indexes defined as COPY YES, the MODIFY utility must be run to
update OLDEST_VERSION for either SYSTABLEPART and SYSTABLESPACE, or
SYSINDEXPART and SYSINDEXES. If there are COPY, REORG, or REPAIR VERSIONS
SYSCOPY entries (ICTYPE of “blank” and STYPE of “V”) for the table space, MODIFY
updates OLDEST_VERSION to be the lowest value of OLDEST_VERSION found from
matching SYSCOPY rows. If no SYSCOPY rows remain for the object, MODIFY sets
OLDEST_VERSION to the lowest version data row or key that exists in the active pageset.
For indexes defined as COPY NO (those indexes do not store information in
SYSIBM.SYSCOPY), a REORG, REBUILD, or LOAD utility that resets the entire index
before adding keys, updates OLDEST_VERSION in SYSIBM.SYSINDEXES to be the
same as CURRENT_VERSION.
2-90 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty When rebuilding an entire index (partitioned or non-partitioned) defined as COPY NO,
REBUILD updates the new catalog columns SYSINDEXES.OLDEST_VERSION and
SYSINDEXPART.OLDEST_VERSION to the value of
SYSINDEXES.CURRENT_VERSION.
If the index is partitioned, REBUILD INDEX also updates
SYSINDEXES.OLDEST_VERSION if the lowest OLDEST_VERSION from all
SYSINDEXPART entries has changed.
The SYSCOPY records inserted by COPY utility have the OLDEST_VERSION column
filled in with the lowest version of data within the copied object.
Reclaiming Versions
Run MODIFY RECOVERY for table spaces and COPY YES indexes
Deletes any unwanted SYSCOPY entries
Determines oldest version in page set and related SYSCOPY
entries and updates the catalog
Updates SYSTABLEPART / SYSINDEXPART
For nonpartitioned, also updates SYSTABLESPACE / SYSINDEXES
For partitioned, updates SYSTABLESPACE / SYSINDEXES if oldest version
across all the partitions has changed
Notes:
When reorganizing a partitioned table space that has indexes defined as COPY NO,
REORG updates the new catalog column SYSINDEXPART.OLDEST_VERSION to the
value of SYSINDEXES.CURRENT_VERSION for those indexes (this is not done for
non-partitioned indexes unless reorganizing the whole table space). If the index is
partitioned, REORG also updates SYSINDEXES.OLDEST_VERSION if the oldest
OLDEST_VERSION from all SYSINDEXPART entries has changed. For non-partitioned
table spaces, REORG also updates SYSINDEXES.OLDEST_VERSION to the same value.
The REORG TABLESPACE utility resets table spaces and all indexes which are in AREO*
state. SYSCOPY records with an ICTYPE value of 'X' (for REORG LOG YES) or 'W' (for
REORG LOG NO) and an STYPE value of 'A' are inserted for each data partition where
REORP is reset. SYSCOPY records include the current version number (in column
OLDEST_VERSION), at the time of reorganization. The REORG TABLESPACE utility
inserts SYSCOPY records for each index that is built as part of the reorganization.
2-92 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
Segmented Table Space Example
Notes:
Let us take a look at a segmented table space and how versioning is tracked. Assume that
we have a table space TS1 and three tables (T1, T2 and T3) and that we have populated
these tables and taken an image copy of the table space. We then do two alters to change
two of the columns of table T1, and we commit the schema change in one commit scope.
At this point we have generated version 1 of TS1. Applications continue to update, insert,
and delete rows from T1. We then decide that we also need to alter a column in table T3,
and we commit that schema change. At this point we have generated version 2 of TS1.
Applications continue to update, insert, and delete rows from T3.
Version Status
SYSCOPY OLDEST_
TSNAME TIMESTAMP ICTYPE VERSION
TS1 1 F 0
(FULL)
Notes:
Here we can see the details of the operations performed in the previous visual. SYSCOPY
will contain an entry for the image copy that we created prior to doing the alters. Column
OLDEST_VERSION of the SYSCOPY row will contain a ‘0’ since the oldest version of the
data is the original version. Once the alters are done, in SYSTABLES, table T1 will have a
value of 1, T3 will have a value of 2, and T2 will continue to have a value of 0 for column
VERSION since it was not altered is still at its original definition.
Additionally, SYSTABLESPACE will also reflect the changes we have made. Column
OLDEST_VERSION will contain 0 since we still have data at the original version and an
image copy with data in the original version. Since this is not a partitioned table space,
there is only one row in SYSTABLEPART and it will still have a value of 0 for column
OLDEST_VERSION.
So, at this point we have three active versions, that is, the data is in any of three forms:
Version 0 format, which is the original definition before any of the alters are done; version 1
format, which is the definition after the first set of alters (done on the same commit scope);
and version 2 format, which is the definition after the second alter (second commit scope).
2-94 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty All three tables contain rows in the original format (version 0), since we loaded data before
any alters were done. In addition, since we inserted a new row into table T1 right after the
first version was generated (first ALTER), table T1 will contain data in version 1.
SYSCOPY OLDEST_
TSNAME TIMESTAMP ICTYPE VERSION
Table versions have
been collapsed as TS1 1 F 0
(FULL)
all data has been
materialized TS1 2 w 2
(REORG)
TS1 2 F 2
(FULL INLINE)
Notes:
Now let us see what happens after we run a REORG with an inline image copy on table
space TS1. The REORG reformats all the data to the latest definition (or version). Table T1
in SYSTABLES is updated to reflect the fact that it is now at the latest version (version 2).
Note that table T2 remains at version 0 since no alters were performed against it. In
SYSTABLESPACE, the OLDEST_VERSION for TS1 remains at 0 since we still have an
entry in SYSCOPY that will allow us to recover our data to that version; the
CURRENT_VERSION will be 2. OLDEST_VERSION in SYSTABLEPART for TS1 is
unaffected since this is not a partitioned table space.
2-96 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
Version Status Following a MODIFY
SYSCOPY OLDEST_
TSNAME TIMESTAMP ICTYPE VERSION
MODIFY RECOVERY
TS1 1 F 0
Delete Age(1) has (FULL)
deleted first copy w
TS1 2 (REORG)
2
TS1 2 F 2
(FULL INLINE)
Notes:
Now, let us see what happens when we run a MODIFY DELETE AGE(1) and delete the
image copy row created prior to the REORG from SYSCOPY. Since we ran a REORG and
all the data for the altered objects is at version 2 and there are no more image copy records
in SYSCOPY with an older OLDEST_VERSION number, MODIFY updates the
OLDEST_VERSION column of SYSTABLESPACE and SYSTABLEPART for TS1 (and sets
it to 2 (two) in this case. It is set to 2 because that is the oldest version that is used in the
table space or any entry in SYSCOPY after the MODIFY.
Notes:
System pages is a new term in V8 to indicate pages in DB2’s physical data set that do not
contain real data (except for space map pages). A system page can be:
• A header page
• A dictionary page of a compression dictionary
• A version page (also known as system page for OBDRECs), containing information
about different row layouts of the data that occurs in the table.
There can be zero, one, or multiple system pages for OBDRECs in a DB2 page set or
partition. They can appear anywhere in the data set and are anchored off the header page.
The V8 COPY utility has a new SYSTEMPAGES (YES is the default and recommended)
keyword. This allows some utilities (for example, UNLOAD and offline utilities) to access
and interpret the data because they have the data formats in the system pages. (See
Figure 8-22 "COPY Utility SYSTEMPAGES Option" on page 8-51 for details.)
2-98 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
Partition Management
Add partition
Rotate partition
Alter partition boundaries
Rebalance partitions
Notes:
A common partitioning scheme is to partition by date. Typically, the oldest data resides in
partition 1 and the youngest data resides in part n. As time progresses and more data is
collected, the usual desire is to add partitions to the table space to house the new periods
of data. At some point, enough history has been collected, and the desire becomes one of
two things:
1. To discard the oldest partition's worth of data and re-use that partition to hold the
newest period's data. This reflects a cyclic use of some set number of partition
numbers.
2. To roll-off the oldest partition of data and roll-on a new partition in which to collect the
next period’s data. This delays the need to re-use a partition number until the name
space is exhausted (that is, until the number wraps). A variation on this is to make the
data in “rolled off” partitions available to queries only on demand. The partition is
hibernating, as it were, but can be awakened to participate in special queries.
Another common partitioning scheme is to partition so that you achieve partitions of similar
size. This method is mainly used when you want to run utilities and application processes
in parallel to reduce total elapsed time and with reduced contention.
DB2 V8 has the ability to immediately add partitions, rotate partitions, change the
partitioning key values for table-controlled (see Figure 2-15 "V8 Partitioned Tables -
Table-Controlled Partitioning" on page 2-18 for more information) partitioned tables via the
ALTER TABLE statement, and rebalance partitions. Here we give a brief description of
these enhancements. More detailed information can be found in the Utilities unit.
2-100 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
Adding a Table Partition
ts pi npsi dpsi
Notes:
Up until DB2 V7, an application that requires inserting time based data, the data is often
stored by month, with a separate partition assigned for each month. The design usually
leans toward allocating the maximum number of partitions allowed up front. This probably
means allocating 254 partitions (with most of them initially with minimum space allocation)
so that the application had a life span of about 20 years before running out of partitions.
In the example in the figure above, we have not been that wise. We currently have 59
partitions allocated and in use, each partition containing a month’s data starting with
January 1999. Currently, we want to get ready to store the data for December 2003, but we
are out of partitions. We only have 59 partitions defined in our table space. In previous
versions of DB2, we would have had to drop the entire table space, redefine it with
additional partitions, and reload the data again.
But with Version 8, this is no longer required. We can just add a new partition at the end to
store December’s data, as we show in the next couple of pages.
,
ADD PARTITION ENDING AT ( constant )
Notes:
The visual above shows the syntax of the new ALTER TABLE ADD PARTITION statement.
Note that the statement is referencing the table (ALTER TABLE), not the table space (the
statement is not an ALTER TABLESPACE). As a consequence, there is no syntax at the
table level to specify the size of the new partition. (How the size of the new partitioned is
determined will be explained shortly).
Also note that you do not specify a partition number on the ALTER TABLE statement, only
a new limit key for the new partition. DB2 knows how many partitions are currently being
used and will allocate a data set with the next higher partition number (when the table
space is storage group defined; otherwise you must allocate a VSAM file prior to running
the ALTER TABLE statement).
2-102 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
Add Table Partition
Notes:
The figure above shows how easy it is to add a partition to a partitioned table space in DB2
Version 8. Just execute the ALTER TABLE ADD PARTITION statement and you are ready
to go. You can start inserting / loading rows into the new partition immediately.
With DB2 Version 8, it is much easier to add partitions at a later date. Therefore, you can
now start out with a limited number of partitions in your applications, as many as you
currently need, and then reevaluate your partition needs within the next 12 to 18 months.
This results in “managing” many fewer objects that today may be pre-allocated without
being immediately used.
Notes:
With Version 8, users are able to dynamically add partitions to a partitioned table. You can
add partitions up to the maximum limit that is determined by the parameters specified when
the partitioned table was initially created. When you add a partition, the next available
physical partition number is used:
• When objects are DB2-managed (STOGROUP defined), the next data set is allocated
for the table space and each partitioned index.
• When objects are user-managed (USING VCAT), these data sets must be pre-defined.
The data sets for the data partition and all partitioned indexes must be predefined using
the VSAM access method services (IDCAMS) DEFINE command for the partition to be
added (that is the value of the PARTITIONS column in SYSTABLESPACE plus one),
before issuing the ALTER TABLE ADD PARTITION statement.
Note that no partition number is supplied on the ALTER TABLE ADD PARTITION
statement. The partition number is selected by DB2 based on the current number of
partitions of the table.
2-104 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty The newly added partition is immediately available. None of the objects need to be stopped
before issuing the ALTER statement.
The table is quiesced and all related plans, packages, and cached statements are
invalidated. This is necessary as the access path may be optimized to read only certain
partitions. Automatic rebinds will occur (if allowed), but you may wish to issue rebinds
manually.
Since you cannot specify attributes like PRIQTY, the values of the previous logical partition
are used. Therefore, you will probably want to run an ALTER TABLESPACE statement
afterwards to provide accurate space parameters, before starting to use the newly added
partition. However, you have to remember that you specify the physical partition number on
the ALTER TABLESPACE command.
Adding partitions to a partitioned table does not affect the ability to do point-in-time
recovery of the object in any way. When you recover the table space to a point-in-time
before the partition was added, the new partition is not deleted; it is only emptied.
If you are altering a partitioned table space where currently the highest partitioning key is
not enforced, and add a partition at the end, the last partitioning key limit is enforced.
ts pi
Note that adding parts and rotating could mix up the physical order
Notes:
When designing an application that requires storing the data for only a certain amount of
time, (such as for legal reasons), consider a rolling partitions design. Now that there is the
ability to easily rotate and reuse partitions over time in Version 8, it is easier to manage a
limited number of partitions that are set up based upon dates.
Rotating partitions allows old data to “roll-off” while reusing the partition for new data with
the ALTER TABLE ROTATE PARTITION FIRST TO LAST statement. A typical case is
where 13 partitions are used to continuously keep the last 12 months of data. When
rotating, you can specify that all the data rows in the oldest (or logically first) partition are to
be deleted, and then specify a new table space high boundary (limit key) so that the
partition essentially becomes the last logical partition in sequence, ready to hold the data
that is added.
2-106 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
Alter Table Rotate Partition Syntax
Allows old data to be rolled away
,
ROTATE PARTITION ENDING AT ( constant ) RESET
FIRST TO LAST
Notes:
The figure above shows the new ALTER TABLE ROTATE PARTITION syntax. Note the
RESET keyword at the end. This is to draw your attention to the fact that the information in
the existing table space partition is going to be deleted (reset).
RESET in this context should not be confused with the RESET keyword that is used in the
“VSAM-world” to indicate the reset of the HURBA to zero.
REORP inherited
Partition 59 2003 Nov from previous
partition: Otherwise
would be available
Partition 60 2003 Dec REORP immediately
Notes:
The figure above shows an example of an ALTER TABLE ROTATE PARTITION statement.
In this case partition 1 is emptied (all rows are deleted) and reused to store the data of
January 2004 (new limit key ‘31/01/2004’).
Normally the emptied partition is immediately available. However, because partition 60 (the
previous last logical partition) is in Reorg Pending (REORP) status, partition 1 will inherit
that status and will be in REORP as well. We will have to run a REORG before both
partitions can be used by our applications.
2-108 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
Rotate Partition Effects
Immediate availability (no REORG necessary)
Notes:
The partition that was rolled off is immediately available after the SQL statements are
successfully executed. No REORG is necessary (unless the previous logical partition is
REORP, then the rolled off partition will also be placed in REORP).
The current lowest logical partition is looked up and its corresponding physical partition will
become the new highest logical partition.
When you execute the ALTER TABLE ROTATE PARTITION statement, the new limit key
must be higher than the limit key of the last logical partition at that time, and the last
partition limit key is enforced.
Important: It is important to know that recovery to a previous point-in-time is blocked
after running the rotate statement. You cannot recover to a point-in-time before the
ALTER TABLE ROTATE PARTITION was executed. This is because all SYSCOPY and
SYSLGRNX entries are deleted when the rotate statement is executed. In addition, DB2
does not support an “undo” of the rotate statement itself. After the rotate, DB2 has no
knowledge of what the original limit key for the rolled-off partition was.
Suggestion:
Run LOAD ... PART n REPLACE
with a dummy SYSREC data set to clear out the partition first
Run cross-loader to combine the LOAD REPLACE with the ALTER TABLE
ROTATE PARTITION statement
Notes:
Because the data of the partition being rolled off is deleted, you may want to consider
running an unload job before rotating the partition.
When rotating partitions of a partitioned table, consideration should be made for the time
needed to complete the DDL statement.
The RESET keyword indicates that you reset the data of an existing partition. DB2 will
delete all the rows of the existing partition that you are about to reuse. To speed up the
delete process, you may want to consider doing a LOAD REPLACE with an empty data set
before running the ALTER TABLE ... ROTATE PARTITION statement. The reset operation
requires that the keys for the rows of that partition must also be deleted from all
non-partitioned indexes. Each NPI must be scanned to delete these keys; therefore, the
process can take an extended amount of time to complete as each NPI is processed
serially.
Additional consideration must be given for the time needed to delete data rows if
processing must be done a row at a time. Additional delete row processing may be
2-110 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty required for referential integrity relationships (SET NULL and DELETE CASCADE), or
when there are delete triggers.
Notes:
After using the new ALTER TABLE ROTATE PARTITION statement, the logical and
physical order of the partitions is no longer the same. The display command lists the status
of table space partitions in logical partition. Logical order is helpful when investigating
ranges of partitions that are in REORP. It enables one to more easily see groupings of
adjacent partitions that may be good candidates for reorganization. When used in
conjunction with the new SCOPE PENDING keyword of REORG, a reasonable subset of
partitions can be identified if one wants to reorganize REORP ranges in separate jobs.
The logical partition number can also be found in the catalog in several places. Both
SYSTABLEPART and SYSCOPY have a LOGICAL_PART column (Table 2-1).
Table 2-1 Logical versus physical partition
Physical partition Logical partition
SYSCOPY DSNUM LOGICAL_PART
SYSTABLEPART PARTITION LOGICAL_PART
2-112 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
Alter Partition Boundary
V6 introduced ability to modify limit keys
Same functionality introduced for table-based partitioning
Affected partitions placed in REORP
Highest value is enforced - any keys that are made invalid are
discarded to data set during REORG
,
ALTER PARTITION n ENDING AT ( constant )
Notes:
In DB2 V6, the ability to modify limit keys for table partitions was introduced. The
enhancement in DB2 V8 introduces the same capability for table-based partitioning with
the ALTER TABLE ALTER PART VALUES statement. The affected data partitions are
placed into a Reorg Pending state (REORP) until they have been reorganized.
Attention: Note that any ALTER, related to a partitioned table that specifies a partition
number, always has to specify the physical partition number. It is the user’s responsibility
to derive the physical partition number from the logical partition number whenever
required. SYSIBM.SYSTABLEPART and SYSIBM.SYSCOPY have a new
LOGICAL_PART column to assist with the conversion between logical and physical
partition number.
Notes:
The figure above shows an example of how to change the limit key of a partition for a
table-controlled partitioned table space. As with index-controlled table spaces, the affected
partition (and potentially the next logical partition) are put in REORP status.
Note that you specify the physical partition number on the ALTER TABLE ALTER
PARTITION statement.
2-114 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
Rebalance Partitions
REORG ... REBALANCE allows you to automatically rebalance
rows across selected partitions
Notes:
Rebalancing partitions is done by means of the REORG TABLESPACE utility. Specifying
the REBALANCE option when specifying a range of partitions to be reorganized allows
DB2 to set new partition boundaries for those partitions, so that all the rows that participate
in the reorganization are evenly distributed across the reorganized partitions. (However, if
the columns used in defining the partition boundaries have many duplicate values within
the data rows, even balancing is not always possible.)
Rebalancing is ideal when no skewing of data between partitions is required, or needs to
be catered for. It has an advantage over changing the partition boundaries using the
ALTER TABLE ALTER PARTITION. ENDING AT statement, in that the partitions involved in
the rebalancing are not put into REORP status.
You can specify REBALANCE with online REORG, REORG TABLESPACE SHRLEVEL
REFERENCE.
Upon completion, DB2 invalidates plans, packages, and the dynamic statement cache that
reference the reorganized object.
More details can be found in Figure 8-12 "REORG REBALANCE" on page 8-33.
2-116 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
NOT PADDED
PADDED
Note: If ADD COLUMN and PADDED or NOT PADDED are specified, ADD COLUMN
must be specified before PADDED or NOT PADDED2
Notes:
In versions prior to Version 8, varying length columns in the index are always padded with
blanks up to their maximum length. When you specify a large number as the maximum
length for a VARCHAR column, and only use a few bytes, your indexes that contain that
column will be very big, because of the index key padding.
In Version 8, varying length columns are no longer always padded to their full length when
they are part of an index key. When specifying NOT PADDED during the creation of an
index, padding does not occur and the keys are stored as true varying length keys. Varying
length indexes are marked in the new SYSINDEXES column, PADDED, with a value of 'N'.
NOT PADDED is the default at create time.
You can also alter an index between PADDED and NOT PADDED. The syntax is shown in
the figure above.
2-118 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
Alter Index Not Padded/Padded
Prior to V8, data is varying length in the table but always padded
to the maximum length in indexes
V8 provides a new index option NOT PADDED.
Then varying-length columns are no longer be padded to the
maximum length in index keys.
True column is stored in the index (length and data)
Enables index-only access for varchar data
Likely to reduce index storage requirement
ALTER INDEX PADDED/NOT PADDED sets index to RBDP
Index must be rebuilt from data
Optimizer can then choose index for index only access
Notes:
An index can be changed from PADDED to NOT PADDED or from NOT PADDED to PADDED
using ALTER INDEX. In both cases the index is placed in Rebuild Pending (RBDP) state.
Even though the data is temporarily unavailable (while the index is in RBDP), there is no
need to drop and recreate the index. In addition, DB2 V8 can avoid using an index that is in
RBDP in many cases (as discussed in Figure 2-83 "RBDP Index Avoidance" on page
2-121).
One of the major benefits of using NOT PADDED indexes, is that DB2 can do true
index-only access with NOT PADDED indexes. This is discussed in Figure 9-31
"Varying-Length Index Keys (1 of 2)" on page 9-49.
When the database design has used VARCHAR columns wisely (that is, not using the
VARCHAR data type for very small columns, or character columns that are always filled to
the maximum size), NOT PADDED indexes can reduce the size of the index (including the
number of levels).
Notes:
In V8, you can change the clustering order in a partitioned or non-partitioned table space
without dropping the index.
In case of a partitioned table, when using index-controlled partitioning, the partitioning
index is also the clustering index. On the other hand, when using table controlled
partitioning, any index can be the clustering index.
In addition, you can change the clustering attribute of an index by using the CLUSTER and
NOT CLUSTER options of ALTER INDEX. As before, only one clustering index is allowed
for any table.
When you change an index to NOT CLUSTER, it continues to be the clustering index until
you define another index with the CLUSTER attribute. When you change an index to
become the clustering index, all new rows will be inserted using the new clustering index.
To rearrange the existing row according to the new clustering index, you must perform a
REORG.
For more details, see also Figure 2-34 "Clustering Indexes" on page 2-50.
2-120 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
RBDP Index Avoidance
Data Manager bypasses indexes in rebuild pending (index is ignored)
For all DELETE processing
For all non-unique indexes for UPDATEs and INSERTs
Optimizer avoids RBDP indexes as follows:
Dynamic PREPARE
Indexes in RBDP avoided
Cached
If cached, PREPARE is bypassed
Invalidation occurs during ALTER
Static BIND (RBDP index NOT avoided)
Indexes in RBDP can still be chosen, and will get resource unavailable at
execution time
Reoptimization
Acts the same as initial BIND or PREPARE
Utilities invalidate the cache for tables where index RBDP state is reset
Notes:
Today, when an index is in Rebuild Pending (RBDP), the optimizer is unaware of that and
can still pick that index during access path selection when you are executing a query in
your QMF application, for example.
In Version 8, DB2 will try to avoid using indexes in RBDP.
Data Manager will bypass indexes in Rebuild Pending (index is ignored):
• For all DELETE processing
• For all non-unique indexes for UPDATEs and INSERTs
A unique index in RBDP cannot be avoided for UPDATE or INSERT because DB2 needs
the index to be available to be able to enforce uniqueness.
The DB2 optimizer avoids RBDP indexes as follows:
• Dynamic PREPARE:
- Indexes in RBDP are avoided.
• Cached statements:
- If the statement is cached, the PREPARE is bypassed.
- Invalidation occurs during ALTER.
• Static BIND does not avoid an index that is in RBDP:
- Indexes in RBDP can still be chosen, and will get Resource Unavailable at
execution time.
• Reoptimization:
- Reoptimization acts the same as an initial BIND or PREPARE.
• DB2 utilities invalidate the dynamic statement cache for tables where index RBDP state
is set or reset.
Note: An index in RECOVER-pending (RECP) is also avoided during access path
selection of a dynamically prepared SQL statement.
2-122 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
Index Creation Enhancements
Deferred indexes avoided by dynamic access path selection
Index avoidance also applies to DEFER YES indexes in RBDP
Notes:
In this topic, we describe two enhancements related to DB2 index creation, namely:
• Deferred indexes do not prevent table access
• Invalidation of dynamically cached statements after CREATE INDEX
The problem with deferred indexes is that, prior to DB2 V8, they block table access if the
optimizer selects an access path that uses the deferred index for a given statement. The
execution of the statement results in SQL code -904 and reason code 00C900AE, which
indicates that an attempt was made to access an index that is in Rebuild Pending state.
As mentioned in the previous topic, with DB2 V8, the optimizer does not consider indexes
in RBDP state during the dynamic prepare of a statement. As deferred indexes are initially
in RBDP state, this enhancement also applies to indexes created with DEFER YES.
To take advantage of deferred indexes as soon as possible after they are (re)built, cached
statements, which refer to the base table of a deferred index, are invalidated when the
index is rebuilt and reset from RBDP state.
2-124 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty make DB2 require less manual interaction and hence contribute to a less error-prone
approach.
DBET States
Used by Online Schema Evolution
Table space
AREO* - Advisory REORG-pending
Performance degradation during row access
Indicates all data rows not at current version
Reset by REORG TABLESPACE or LOAD REPLACE
Index
AREO*
Performance degradation, all keys not materialized to current version
Reset by REORG TABLESPACE, REORG INDEX, REBUILD INDEX,
LOAD REPLACE
RBDP
Index is unusable and must be rebuilt
Index avoidance mitigates outage
Notes:
In support of online schema changes, DB2 V8 introduces a new Database Exception Table
state (DBET), Advisory Reorg (AREO*). Advisory Reorg indicates that the table space,
index, or partition identified should be reorganized for optimal performance.
The AREO* state is reset for a table space by running the REORG TABLESPACE or LOAD
REPLACE utilities.
The AREO* state is reset for an index by running by running the REORG TABLESPACE,
REORG INDEX, REBUILD INDEX, or LOAD REPLACE utilities.
Utilities can be run to reset AREO* for individual partitions without being restricted by
AREO* in adjacent partitions. The AREO* state is not reset by a START FORCE
command, but can be reset using the REPAIR utility.
The DISPLAY DATABASE command shows the new DBET state AREO* for all objects.
2-126 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
Review of New DBET States
Resulting
Object type
Action taken condition
Table Space ALTER TABLE ALTER COLUMN (any data type) AREO*
Index Space ALTER TABLE ALTER COLUMN of VARCHAR to CHAR AREO*
ALTER TABLE ALTER COLUMN of CHAR to VARCHAR AREO*
ALTER TABLE ALTER COLUMN of GRAPHIC to VARGRAPHIC AREO*
ALTER TABLE ALTER COLUMN of VARGRAPHIC to GRAPHIC AREO*
ALTER TABLE ALTER COLUMN of CHAR to CHAR AREO*
ALTER TABLE ALTER COLUMN of GRAPHIC to GRAPHIC AREO*
ALTER TABLE ALTER COLUMN of VARGRAPHIC to VARGRAPHIC AREO*
Index Space ALTER INDEX ADD COLUMN RBDP (1)
Index Space ALTER INDEX from NOT PADDED to PADDED RBDP
ALTER INDEX from PADDED to NOT PADDED RBDP
The AREO* and RBDP states are not reset by a START FORCE command, but
can be reset using the REPAIR utility.
(1) Can be AREO* if column added to the table in the same UOW as the ALTER INDEX ADD COLUMN
Notes:
The figure above gives an overview of the different types of ALTERs you can do and their
effect on the DBET state of your table space and/or index.
Notes:
Version 8 provides enhanced backup and recover capabilities at the DB2 subsystem or
data sharing group level. The purpose is to provide an easier and less disruptive way to
make fast volume-level backups of an entire DB2 subsystem or data-sharing group with
minimal disruption, and recover a subsystem or data-sharing group to any point-in-time,
regardless of whether you have uncommitted units of work.
The primary focus of this solution is to provide a fast and easy way to bring back an entire
DB2 system or data sharing group to a point-in-time that you know is OK. It is can be used
for off-site disaster recovery situations, but its primary goal is to bring back the data on the
original volumes to an arbitrary point-in-time as quickly as possible.
Two new utilities provide the vehicle for system level point-in-time recovery:
• The BACKUP SYSTEM utility provides fast volume-level copies of DB2 databases and
logs.
2-128 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty • The RESTORE SYSTEM utility recovers a DB2 system to an arbitrary point-in-time.
RESTORE SYSTEM automatically handles any creates, drops, and LOG NO events
that might have occurred between the backup and the recovery point-in-time.
Using the new BACKUP SYSTEM utility, you can copy both the data and logs, or only the
data. Previously, to make a system level backup, you had to issue the SET LOG
SUSPEND command, which stops the logging and thus prevents any new database
updates.
A BACKUP SYSTEM job does not stop the logging. However, it does quiesce some system
activities, but the process is less disruptive than SET LOG SUSPEND processing. (The list
of quiesced system activities during SYSTEM BACKUP processing are described in Figure
2-89 "BACKUP SYSTEM Operation" on page 2-134.) The BACKUP SYSTEM utility can
operate on an entire data-sharing group, whereas the SET LOG SUSPEND command has
to be issued for each data-sharing member.
As a further enhancement to taking system level backups, the SET LOG SUSPEND
command now quiesces 32K page writes and data set extensions. (Note that quiescing
32K page writes is only required for page sets where the CI size of the underlying VSAM
data set is not 32K. This was not possible in the past as all VSAM data sets had to use a
4K CI size. However, V8 gives you the possibility to have the VSAM CI size match the DB2
page size. See “Control Interval Larger than 4KB” on page 2-145 for more details.)
The BACKUP SYSTEM and RESTORE SYSTEM utilities rely on new DFSMShsm
services, and SMS constructs, like copy pools and copy pool backups, in z/OS V1R5 that
automatically keep track of which volumes need to be copied.
A copy pool is a set of SMS managed storage groups that can be backed up and restored
with a single command. Each DB2 system has up to two copy pools, one for databases and
one for logs. The name of each copy pool that is to be used with DB2 must use the
following naming conventions:
DSN$locn-name$cp-type
The variables that are used in this naming convention have the following meanings:
DSN Unique DB2 product identifier
$ Delimiter
locn-name DB2 location name.
cp-type Copy pool type. Use DB for the database copy pool and LG for
the log copy pool.
You have to set up both COPYPOOLs properly in order to successfully implement system
level point-in-time recovery:
• The “log” COPYPOOL should be set up to contain the volumes that contain the BSDS
data sets, the active logs, and their associated ICF catalogs.
• The “database” COPYPOOL should be set up to contain all the volumes that contain
the databases and the associated ICF catalogs.
Archive logs should, if possible reside on different DASD and COPYPOOLs. If that is not
possible, they should be with part of the “log” COPYPOOL.
Attention: It should be noted that in this new environment, IBM recommends that ICF
catalogs are created and reside with the logs, and separate ICF catalogs are created that
reside with the databases.
You can specify the number of copy versions to be maintained on disk (maximum 85) by
using the VERSIONS attribute.
A copy pool backup is a new storage group type that is used as a container for copy pool
volume copies. Each volume in the copy pool storage group needs to have a
corresponding volume in the copy target storage group.
2-130 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
BACKUP SYSTEM
RBLP = RBA2 'Copy Pool Backup' SG1 'Copy Pool Backup' SG2 RBLP = RBA2
COPYV1 LOGV1
3 5 4
2
BACKVOL BACKVOL
COPYPOOL(DSN$DSNDB0G$DB') COPYPOOL(DSN$DSNDB0G$LG')
token(RBA2) token(RBA2)
Checkpoint Checkpoint
at RBA1 at RBA2
1 Update Issue Message
Release lock
Release Quiesce
BSDS DSNU1602
Update DBD01
Quiesce some
Exclusive lock
6
activity
Notes:
The figure above provides an overview of how the BACKUP SYSTEM utility operates.
You can use the new BACKUP SYSTEM utility to back up an entire DB2 subsystem or an
entire DB2 data sharing group with a single command. The BACKUP SYSTEM utility has
two options that you can specify:
• FULL:
In this case, the backup contains both the logs and databases. This is referred to as a
full system backup. Full system backups can be used to recover the system to the
point-in-time (PIT) at which the copy was taken. After restoring the backups, you can
use a normal DB2 restart to bring the system back to consistent state. DB2 restart
processing will take care of any outstanding units of work, by either completing their
commit processing, or rolling back the changes they made so far (in case they did not
reach a commit point). In this case you do not need to use the RESTORE SYTEM utility,
although it can be used if you want. When taking a FULL backup, the copies of the log
copypool are always taken after the database copypool copies.
• DATA ONLY:
This type of copy only contains the “database” COPYPOOL. A BACKUP SYSTEM
DATA ONLY does NOT copy the “log” COPYPOOL. Such a copy is referred to as a
data-only system backup. Data-only system backups can be used in conjunction with
the new RESTORE SYSTEM utility to recover the system to an arbitrary PIT.
These system backups (full and data-only) are recorded in the BSDS (up to 50 entries) and
the header page of DBD01. The information in DBD01 is called the Recovery Base Log
Point (RBLP) and is the RBA (non-data sharing) or LRSN(data sharing - the RBLP LRSN is
determined by taking minimum of all member level RBLP values) of the time the most
recent system backup ran. The information in DBD01 is recorded before the actual copy is
initiated. This way it can be used as a starting point for log processing after a backup has
been restored. The information in the BSDS is recorded after receiving word back from
DFSMShsm that the copies are “logically” complete.
BACKUP SYSTEM invokes DFSMShsm services (so-called fast replication services) to
either take a data-only or a full system backup via DASD volume copy functions. The
backups may (and probably do) contain uncommitted data. This is not a problem. the data
is brought back to consistency (no outstanding in-flight, in commit, or in abort units of work
by a DB2 restart operation or by using the RESTORE SYSTEM utility. While the BACKUP
SYSTEM utility is active in the system, certain DB2 system activities are quiesced:
• System checkpoints
• 32K page writes (for page sets with a VSAM CISZE that is different from the DB2 page
size)
• Writing page set close control log records (PSCRs)
• Data set creation, extensions, renaming, and deletion
However, the log write latch is not obtained by the BACKUP SYSTEM utility, as is done by
the SET LOG SUSPEND command. Therefore, using the BACKUP SYSTEM utility should
be less disruptive in most cases. There is no need to suspend logging when using the
BACKUP SYSTEM utility, as DB2 is now in control of taking the volume copies, whereas
you are in charge to initiate copying the data, in case of using -SET LOG SUSPEND.
In a data sharing environment, the BACKUP SYSTEM utility will fail, when it detects any
member that is in a “failed” or “not normally quiesced” state.
The BACKUP SYSTEM utility completes after the “logical” copies have completed. This
should typically be within a few seconds.Taking copies of the volumes in the COPYPOOL is
done in parallel.
Here is the sequence of events that occur when running the BACKUP SYSTEM utility, as
shown in the figure above. The example uses BACKUP SYSTEM FULL, so we take a
backup of both the database and the log copy pool.
1. Preparation: To make sure there is only one BACKUP SYTEM job running in the system
at any one time, DB2 uses a new lock type to serialize the execution of BACKUP and
RESTORE SYTEM utilities. So the first thing to do is to obtain that lock in exclusive
2-132 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty mode. DB2 also has to drain some minimal system activity, to maintain system integrity.
(They are listed earlier in this topic). Therefore, the next thing to do is to quiesce those
system activities.
2. Update the header page of DBD01 to update the Recovery Base Log Point (RBLP). The
RBLP value is based on system checkpoints in non-data sharing and system
checkpoints and group buffer pool checkpoints in a data sharing environment.
3. Invoke the DFSMShsm fast replication function to make a volume-level backup of all
the volumes in the database copy pool.
4. Update the BSDS with the with the system copy information, recording the RBLP that is
associated with the copy.
5. Invoke the DFSMShsm fast replication function to make a volume-level backup of all
the volumes in the log copy pool, because we specified the FULL option when invoking
the BACKUP SYSTEM utility.
6. Free up the system activities there were quiesced at the beginning of the utility
invocation, release the exclusive lock, and put out the message DSNU1602I “BACKUP
SYSTEM UTILITY COMPLETED”.
COPYV1 COPYV2 RBLP 'Copy Pool Backup' SG1 'Copy Pool Backup' SG2
RBA1 RBAh
RBA1
BACKVOL
COPYPOOL(DSN$DSNDB0G$DB')
token(RBA1)
RBLP:
RBLP: RBLP: RBA n
RBA 1 RBA h
sc1 sc2 sc3 sc4 sc5 sc6 sc7 sc8 sc9 sc10 sc11
'DB' copypool 'Log' copypool 'DB' copypool 'Log' copypool 'DB' copypool 'Log' copypool
DSN$DSNDB0G$DB' DSN$DSNDB0G$LG' DSN$DSNDB0G$DB' DSN$DSNDB0G$LG' DSN$DSNDB0G$DB' DSN$DSNDB0G$LG'
Notes:
The figure above shows what happens when several BACKUP SYSTEM utilities are
executed. The first and the second execution are BACKUP SYSTEM FULL jobs. This
means that both the data base copy pool and log copy pool are backed up each time
(COPYV1 and LOGV1) and (COPYV2 and LOGV3). The third execution is a BACKUP
SYSTEM DATA ONLY execution. This means that only the data base copy pool is backed
up (COPYV3).
DFSMShsm tracks all copies that are made of any copy pool (up to 85), and identifies them
by a token (that is passed to it by DB2, via the BACKUP SYSTEM utility). It is worth noting
that today all backup versions of a copy pool have to reside on disk.
Note that all the BACKUP SYSTEM executions (up to a maximum of 50) are tracked in the
BSDS, where as the RBLP in DBD01 is updated (overwritten) each time the BACKUP
SYSTEM is run.
2-134 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
RESTORE SYSTEM Operation
TOKEN COPY
RBA1 COPYV1 DFSMShsm
RBAh COPYV2
BSDS 3
RBAn COPYV3
COPYV1 COPYV2
RBA1 RBAh 'Copy Pool Backup' SG1 'Copy Pool Backup' SG2
1
RECOVER 4
RBAh COPYPOOL(DSN$DSNDB0G$DB')
is returned token(RBAh)
SYSPITR=RBAk
6
sc1 sc2 sc3 sc4 sc5 sc6 sc7 sc8 sc9 sc10 sc11
Notes:
The figure above shows how a RESTORE SYSTEM utility operates. In this example, the
LOGONLY option is not used. Before we describe this example, we give a general
description of how RESTORE SYSTEM works.
Depending on the type of system backup that is available, and the PIT that you want to go
back to, you have different options to restore the system.
ii. Taking backups of all DB2 data, as well as the logs. This can be done by using
COPYPOOLs and issuing the DFSMShsm commands manually
iii. Issuing the SET LOG RESUME command.
2. In case you need to restore the system, stop DB2. In case of a data sharing group, stop
all members.
3. Use HSM commands to restore the database and log copypools:
4. FRRECOV * COPYPOOL(cpname) VERIFY(Y)
Token(X'C4C2D7F1B7EF7EA8AD2DBE0E000007A5E090')
5. If data sharing, delete the CF structures used by DB2.
6. Restart DB2. For a data sharing system, restart all non-dormant members. During
restart all in-flight, in-abort and in-commit URs get resolved.
7. If data sharing, execute GRECP/LPL recovery. This recovers changed data that was
stored in the CF at the time of the backup.
Note that in this scenario, we do not use the new RESTORE SYSTEM utility.
2-136 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty using an exact RBA for a normal conditional restart, you use the ENDLRSN parameter
(whereas you use the ENDRBA keyword when specifying a RBA that is at a 4K CI
boundary). When doing a conditional restart for system level point-in-time recovery, you
always use the SYSPITR keyword.
You also have to make sure that all logs (active and archive), between the time of the
system level backup and the SYSPITR that you want to recover to, are available to the
RESTORE SYSTEM utility.
utility will only apply outstanding log changes to the databases. This option is useful if
the user (or some automation tool) has already restored the database volumes prior to
invoking the RESTORE SYSTEM utility. RESTORE SYSTEM LOGONLY uses the
recovery base log point (RBLP) that is recorded in DBD01 to determine the starting
point of the log apply phase (just like RESTORE SYSTEM does). Note that the RBLP is
only updated by the BACKUP SYSTEM utility and the -SET LOG SUSPEND command.
Therefore, if you take flashcopies without a prior -SET LOG SUSPEND command,
RESTORE SYSTEM LOGONLY may go back much farther in time for the start of the
log apply phase than you expected.
Note: The RESTORE SYSTEM utility does not restore the log COPYPOOL. The ability
to back up logs with a FULL backup is designed to allow for the recovery of an entire
subsystem, via means other than the RESTORE SYSTEM utility. Such a recovery can be
realized by direct invocation of DFSMShsm services, as shown in “Restoring a DB2
System to the PIT of a Prior System Level Backup” on page 2-135.
As indicated above, each COPYPOOL can have multiple VERSIONS. You cannot specify a
specific desired version, other than implicitly via the log truncation point (specified as the
PITR CRCR). When you specify the log truncation point, you have determined the version
of the COPYPOOL that will be used. RESTORE SYSTEM automatically recovers from the
latest version prior to the log truncation point. The restore of the database volume from the
COPYPOOL is done in parallel.
After the data is restored, the RESTORE SYSTEM utility uses the recovery base log point
(RBLP) that is recorded in DBD01 to determine the starting point of the log apply phase.
The log apply phase uses the fast log apply (FLA) function to recover objects in parallel.
The consistency for LOG NO utilities is established when, during the logapply phase of
RESTORE SYSTEM, a log record is encountered that represents the open of a table space
or index space with recovery(no). In this case, table spaces will have to be put in RECP
state, and index spaces will have to be put in either RECP or RBDP state, depending on
their COPY attribute. These objects should be recovered to a different point-in-time, prior to
the log truncation point using image copies, or rebuilt from the data in the case of a COPY
NO index.
Note: With DB2 Version 8, system level point-in-time recovery is an all-or-nothing
approach, as it is operating at the subsystem level or at the data sharing group level. This
architecture might be extended in the future to handle backup and recovery at a more
granular level, such as an application or a single table space.
2-138 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty where the system was OK. Assume we determine that at RBAk all is still OK, and we
decide to return to that point-in-time. Then you have to create a “SYSPITR” conditional
restart record specifying RBAk, and restart DB2. DB2 will start in System Recover Pending
mode, and the only thing you can do is execute the RESTORE SYSTEM utility.
1. The first thing RESTORE SYSTEM does is to look at the point-in-time you want to
recover to, as indicated by the SYSPITR CRCR RBA/LSRN (RBAk), and look for the
backup system entry in the BSDS that immediately precedes that point-in-time
(RBA/LRSN). We find that RBAh is the last BACKUP SYSTEM run. (We have a more
recent BACKUP SYSTEM utility execution available at RBAn, but that is beyond RBAk
that we want to return to, and therefore cannot be used.
2. Invoke DFSMShsm functions to restore the version of the data base copy pool that is
associated with the entry that is retrieved from the BSDS.
3. DFSMShsm will analyze the version token passed to it from DB2 and locate the
appropriate backup in its copy pool backup.
4. DFSMShsm will then invoke the fast replication functionality to restore all the volumes
belonging to the database copy pool in parallel.
5. Now that the database copy pool has been restored, DB2 can look at the header page
in DBD01 that has just been restored, to retrieve the RBLP (RBAh). This is the starting
point of the forward log recovery.
6. The RESTORE SYSTEM then starts the log apply phase and uses the fast log apply
(FLA) function to recover all objects in parallel from the log, up to the point-in-time
(RBA/LRSN) specified by the conditional restart (RBAk).
For more information, see “Using a tracker site for disaster recovery” in DB2 Administration
Guide, SC18-7413.
2-140 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
More Online ZPARMs
Additional online changeable ZPARMs in DB2 V8:
Notes:
In order to minimize the events which make it necessary to recycle your DB2 subsystem to
make ZPARM parameter changes effective, the online change for subsystem parameters
was introduced in DB2 V7. This function allows a user to load a new subsystem parameter
load module into storage without recycling DB2. To do this, you can use the normal
installation parameter update process to produce a new load module, which includes any
desired changes to parameter values. You can then issue the -SET SYSPARM command
to load the new module in order to affect the change.
Not all subsystem parameters are dynamically changeable in DB2 V7. Refer to DB2
Universal Database for OS/390 and z/OS Command Reference Version 7, Appendix A.2,
SC26-9934 for a list of online changeable subsystem parameters. An overview is shown in
Table 2-1.
DB2 V8 adds some more online changeable parameters. Refer to DB2 UDB for z/OS
Version 8 DB2 Command Reference, SC18-7416. Actually all new DSNZPARMs
introduced in Version 8 are online changeable.
EDPROP
For most parameters, online change is transparent, with the change taking effect
immediately. There are a few parameters for which this is not the case. The behavior
exhibited by the system upon change for these parameters is discussed below.
PARTKEYU
After you change the partitioning key columns update parameter from YES to NO, from
YES to SAME, or from SAME to NO, your attempt to inappropriately update the value in a
partitioning key column results in SQLCODE -904, resource unavailable.
Important: In addition, when PARTKEYU is set to YES, in DB2 Version 8, drain locks are
no longer acquired when moving between partitions after updating the partitioning key.
SYSADM/SYSADM2
You can only change the install SYSADM (SYSADM and SYSADM2) subsystem
parameters using the SET SYSPARM command using the current install SYSADM
2-142 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty authority. If you do not have the proper authority, DB2 issues message DSNZ015 for each
ID you tried to change. In this case, the current value of the load module parameter
remains in effect. The rest of the load module is not affected, that is, all other changes will
take effect.
As with changing this parameter offline, revoking SYSADM authorization from the install
SYSADM user ID, does not cascade. However, if you REVOKE SYSADM from the user
that was just replaced as an install SYSADM, then that REVOKE will cascade.
To perform an online update of SYSADM/SYSADM2 system parameters, use either a
primary or secondary authorization ID having install SYSADM authority.
SYSOPR1/SYSOPR2
You can only change the Install SYSOPR (SYSOPR1 and SYSOPR2) subsystem
parameters using the SET SYSPARM command if either your primary or any secondary
authorization ID has install SYSADM authority. If you do not have the proper authority, DB2
issues message DSNZ015 for each ID you tried to change. In this case, the current value
of the load module parameter remains in effect. The rest of the load module is not affected,
that is, all other changes will take effect.
CACHEDYN
When you change this value from YES to NO, existing statements in the cache will not be
used by new threads, but the statements will remain in the pool until they are no longer
referenced. Therefore, changing this value to NO, you cannot immediately reduce the EDM
pool size allocated for dynamic statements.
MAXKEEPD
If you change the value for MAXKEEPD, those changes take effect after the next COMMIT.
XLKUPDLT
Your changes to XLKUPDLT will not affect the previous settings of currently running
statements. All future statements coming in are affected immediately.
Notes:
DB2 V8 also takes actions to increase availability, allowing you to specify a CIZIZE that
matches your DB2 page size, as well as trying to anticipate and warn you about potential
availability problems, and take more automatic actions to reduce the outage to a minimum.
The following enhancements were made to further increase availability:
• Control interval larger than 4 KB
• Monitoring system checkpoint activity and log offload activity
• Monitoring long running unit of recovery (UR) backout
• Detecting long readers
• Lock escalation IFCID
• Partitioning key update enhancements
• Automatic and less disruptive LPL recovery
• SMART DB2 extent sizes for DB2 managed objects
2-144 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
For in-abort URs, there is also a new progress message. As for DSNR047I, the new
message DSNR048I is also issued every two minutes:
• DSNR048I csect-name UR BACKOUT PROCESSING LOG RECORD
AT RBA rba1 TO RBA rba2 FOR
CORRELATION NAME =xxxxxxxxxxxx
CONNECTION ID =yyyyyyyy
LUWID =logical-unit-of-work-ID=token
PLAN NAME =xxxxxxxx
AUTHID =xxxxxxxx
END USER ID =xxxxxxxx
TRANSACTION NAME =xxxxxxxx
WORKSTATION NAME =xxxxxxxx
2-146 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty Lock Holder Can Inherit WLM Priority from Lock Waiter
Assume that a transaction executes with a low WLM priority and makes updates to the
database. Because it is updating, it must acquire X-locks on the pages that it modifies and
then these X-locks are held until the transaction reaches a commit point. This is business
as usual. If such a transaction holds an X-lock on a row that another transaction is
interested in, the other transaction is forced to wait until the first transaction (with a low
WLM priority) commits. If the lock waiter performs very important work and is thus assigned
a high WLM priority, it would be desirable that it is not slowed down by other transactions
that execute in a low priority service class.
To reduce the time the high priority transaction has to wait for that lock to be given up by
the low priority transaction, it would be better if the waiting transaction could temporarily
assign its own priority to the transaction that holds the lock until this transaction frees the
locked resource.
The WLM component of z/OS 1.4 provides a set of APIs that can be used to accomplish
this. This service is called WLM enqueue management. In z/OS 1.4, this support is limited
to transactions running on a single system.
DB2 V8 exploits WLM enqueue management. When a transaction has spent roughly half of
the lock time-out value waiting for a lock, then the WLM priority of the transaction, which
holds the lock, is increased to the priority of the lock waiter if the latter has a higher priority.
If the lock holding transaction completes, it resumes its original service class. In case
multiple transactions hold a common lock, this procedure is applied to all of these
transactions.
2-148 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty deleted from the LPL and DB2 issues an existing message, DSNI021I, to indicate the
completion.
If the automatic LPL recovery fails, DB2 V8 issues message DSNI005I to indicate the
failure of the automatic LPL recovery.
A new message, DSNB357I is issued to inform you about the fact that pages have been
added to LPL, which might not be automatically recovered.
• TSQTY specifies the number of space in KB for the primary space allocation quantity
for DB2-managed table spaces that are created without the USING clause. A value of 0
(zero) indicates that you want to use standard defaults (now 1 cylinder).
• IXQTY fulfils a similar role but for index spaces.
TSQTY and IXQTY have a global scope. TSQTY applies to non-LOB table spaces. For
LOB tables paces a 10x multiplier will be applied to TSQTY to provide the default value for
PRIQTY. IXQTY applies to all indexes.
The default value for TSQTY and IXQTY remains 0 (zero) as in previous versions.
However, zero now (V8) means a primary allocation of 1 cylinder for both table spaces and
indexes. (When using the installation clist in migration mode, DB2 will draw your attention
to this change in allocation when specifying zero, by displaying message “DSNT520I DB2’s
default sizes for index and table spaces are increased in V8”.)
DB2 always honors the PRIQTY value specified by the user and recorded in the associated
PQTY column in SYSTABLEPART or SYSINDEXPART catalog table.
This enhancement should benefit all users of DB2. It is particularly beneficial for
ERP/CRM, ported and new applications. For example, when implementing an ERP/CRM
solution, it is difficult to know which objects will be used by the functions that you will exploit
in the CRM/ERP package. In addition, it is often difficult for any type of application to have
a good understanding of the object sizes at design time.
With this enhancement, in a lot of cases, there is no longer the need for a user to specify
the secondary quantity for a DB2-managed pageset, or even both the primary and
secondary quantities. The methodology used will not lead to heavy over-allocation and
waste excessive space. For example, it will not exceed DSSIZE and PIECESIZE.
This feature delivers autonomic selection of data set extent sizes with a goal of preventing
extent errors before reaching maximum data set size.
The new default allocations in cylinders, as well as the sliding (bigger) secondary
allocations, can also result in better performance of mass inserts, prefetch operations, and
the LOAD, REORG and RECOVER utilities.
2-150 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
List of Topics
Breaking limitations
Dynamic scrollable cursors
Multi-row FETCH and INSERT
GET DIAGNOSTICS statement
Common table expressions and recursive SQL
Identity column enhancements
Sequence objects
Scalar fullselect
DB2
Multiple DISTINCT clauses DB2
DB2
INSERT within SELECT statement
Miscellaneous enhancements
Notes:
This unit describes the enhancements to SQL. It consists of the following topics:
• Breaking limitations
• Dynamic scrollable cursors
• Multi-row FETCH and INSERT
• GET DIAGNOSTICS statement
• Common table expressions and recursive SQL
• Identity column enhancements
• Sequence objects
• Scalar fullselect
• Multiple DISTINCT clauses
• INSERT within SELECT statement
• Expressions in GROUP BY
3-2 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
3-4 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Notes:
Applications for DB2 UDB for z/OS are very often developed on other platforms and then
ported. In doing this, care has to be exercised because of the restrictions on the number of
characters that can be used for the object names. For example, table names and column
names are restricted with DB2 V7 to 18 characters. DB2 V8 support for long names goes a
long way to help easily port applications from other platforms. Long names require the DB2
V8 new-function mode to be active; during the catalog migration process DB2 will ALTER
the existing definitions to the new ones.
Refer to Table 3-1 for a complete list of changed limits.
3-6 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Maximum number 64 for table spaces that are 64 for table spaces that are not defined with
of partitions in a not defined with LARGE or a LARGE or a DSSIZE greater than 2G
partitioned table DSSIZE greater than 2G
space or partitioned 4096 depending on what is specified for
index 254 for table spaces that are DSSIZE or LARGE and the page size.
defined with LARGE or a
DSSIZE greater than 2G
Maximum size of a For table spaces that are not For table spaces that are not defined with
partition (table defined with LARGE or a LARGE or a DSSIZE greater than 2G:
space or index) DSSIZE greater than 2G: 4 gigabytes, for 1 to 16 partitions
4 gigabytes, for 1 to 16 2 gigabytes, for 17 to 32 partitions
partitions 1 gigabyte, for 33 to 64 partitions
2 gigabytes, for 17 to 32
partitions For table spaces that are defined with
1 gigabyte, for 33 to 64 LARGE:
partitions 4 gigabytes, for 1 to 4096 partitions
For table spaces that are For table spaces that are defined with a
defined with LARGE: DSSIZE greater than 2G:
4 gigabytes, for 1 to 254 64 gigabytes, depending on the page size (1
partitions to 256 partitions for 4KB, 1 to 512 partitions
for 16KB, 1 to 1024 partitions for 32KB, and
For table spaces that are 1 to 2048 for 32KB)
defined with a DSSIZE greater
than 2G:
64 gigabytes, for 1 to 254
partitions
Maximum length of 255 bytes less the number of Partitioning index: 255 - n
an index key key columns that allow nulls Non-partitioning index that is padded: 2000
-n
Non-partitioning index that is not padded:
2000 - n -2m Where n is the number of
columns in the key that allow nulls and m is
the number of varying-length columns in the
key
3-8 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Notes:
DB2 V7 introduced static scrollable cursors. When a static scrollable cursor is opened, the
qualifying rows are copied to a declared temporary table automatically created by DB2 in a
TEMP database defined by you. User-declared temporary tables and system-declared
temporary tables share the set of table spaces that are defined in the TEMP database.
When a static scrollable cursor is opened, DB2 stores the RID of each qualifying base row
in the result table. The result table has a fixed number of rows. Scrolling is performed on
the temporary table in both the forward direction and backward direction. DB2 deletes the
result table when the cursor is closed.
3-10 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
Sensitive and Insensitive Cursors - V7 Review
Notes:
DB2 V7 introduced keywords to control the positioning within a cursor, and whether the
data in the result set is maintained with the actual rows in the base table.
In V7, you can declare a scrollable cursor in one of the following two ways:
• INSENSITIVE:
This causes the scrollable cursor to be read-only. You can only specify INSENSITIVE
on the FETCH statement, which is the default if the cursor is declared INSENSITIVE. A
FETCH INSENSITIVE request retrieves the row data from the result table. The cursor is
not sensitive to the updates, deletes, and inserts made to the base table.
• SENSITIVE STATIC:
This causes the scrollable cursor to be updateable. You can specify either of the
following on the FETCH statement:
- INSENSITIVE:
The application can see the changes it has made itself, using positioned updates
and deletes through the cursor. These changes are visible to the application
because DB2 updates both the base table and the result table when a positioned
update or delete is issued by the application. However, updates and deletes to the
base table made outside the cursor are not visible. The cursor is also not sensitive
to inserts.
- SENSITIVE:
This is the default if the cursor is declared SENSITIVE STATIC. A FETCH
SENSITIVE request retrieves the row data from the result table (stored in a declared
temporary table). However, as part of a SENSITIVE FETCH, the row is verified
against the underlying table to make sure it still exists and qualifies. Using FETCH
SENSITIVE, the application can see the changes it has made using positioned and
searched updates and deletes. The application can also see the committed updates
and deletes made by other applications or outside the cursor. However, the cursor is
not sensitive to inserts.
Note that all V7 scrollable cursors use a declared temporary table (DTT) to store the
result table of the cursor. The DTT is populated at OPEN CURSOR time.
3-12 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
New in V8 - Dynamic Scrollable Cursors
Scrollable cursor that provides access to the base table rather than a
DTT -- allows visibility of updates and inserts done by you or other users
Defaults to single row FETCH, so DDF applications should use:
Multi-row FETCH
Positioned UPDATE/DELETE for multi-row FETCH
Notes:
DB2 V8 introduces dynamic scrollable cursors. A dynamic scrollable cursor does not
materialize the result table at any time. Instead, it scrolls directly on the base table and is
therefore sensitive to all committed inserts, updates, and deletes. Dynamic scrollable
cursors are supported for the index scan and table space scan access paths. DPSIs also
support dynamic scrollable cursors.
Dynamic scrollable cursors can be used with row-level operations or rowset-level
operations. Rowset-level operations are discussed in Figure 3-17 "Multi-Row FETCH and
INSERT" on page 3-32.
NO SCROLL
DECLARE cursor-name
ASENSITIVE
SCROLL
INSENSITIVE
SENSITIVE STATIC
DYNAMIC
Notes:
This visual shows the new syntax for declaring a cursor in DB2 V8.
NO SCROLL, which is the default, indicates that the cursor is non-scrollable.
SCROLL specifies that the cursor is scrollable. For a scrollable cursor, whether the cursor
has sensitivity to inserts, updates, or deletes depends on the cursor sensitivity option in
effect for the cursor. The sensitivity options include the following possibilities:
• ASENSITIVE, introduced in DB2 V8 (this is the default for a scrollable cursor)
• INSENSITIVE, as in DB2 V7
• SENSITIVE STATIC, as in DB2 V7
• SENSITIVE DYNAMIC for dynamic scrollable cursors, introduced in DB2 V8
3-14 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
Declare Cursor - New Attributes
SENSITIVE DYNAMIC
Specifies that size of result table is not fixed at OPEN cursor time
Cursor has complete visibility to changes
All committed inserts, updates, deletes by other application processes
All positioned updates and deletes within cursor
All inserts, updates, deletes by same application processes, but outside cursor
FETCH executed against base table since no temporary result table
created
ASENSITIVE
DB2 determines sensitivity of cursor
If read-only...
Cursor is INSENSITIVE if SELECT statement does not allow it to be SENSITIVE
(UNION, UNION ALL, FOR FETCH ONLY, FOR READ ONLY)
It behaves as an insensitive cursor
If not read-only, SENSITIVE DYNAMIC is used for maximum sensitivity
Mainly for Client applications that do not care whether or not the server
supports the sensitivity or scrollability
Notes:
SENSITIVE DYNAMIC specifies that the result table of the cursor is dynamic, in that the
size of the result table may change after the cursor is opened as rows are inserted into or
deleted from the underlying table, and the order of the rows may change. Rows inserted,
updated, or deleted with INSERT, UPDATE, and DELETE statements executed by the
same application process are immediately visible. Rows inserted, updated, or deleted with
INSERT, UPDATE and DELETE statements executed by other application processes are
visible once committed.
Because the FETCH statements are executed against the base table, no temporary result
table is created. The SELECT statement of a cursor that is defined as SENSITIVE
DYNAMIC cannot contain an INSERT statement.
For client applications that do not care whether or not the server supports the sensitivity or
scrollability, you can use the ASENSITIVE option, to let DB2 determine whether the cursor
behaves as SENSITIVE DYNAMIC or INSENSITIVE depending on the complexity
(updateability) of the associated SELECT statement.
3-16 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
FETCH Syntax
NEXT
FETCH
INSENSITIVE PRIOR
SENSITIVE FIRST
LAST
CURRENT
BEFORE
AFTER
ABSOLUTE host-variable
integer-constant
RELATIVE host-variable
integer-constant No new syntax.
Cannot specify
INSENSITIVE -
SQLCODE -244
FROM
cursor-name
single-fetch-clause
single-fetch-clause: ,
INTO host-variable
INTO DESCRIPTOR descriptor-name
Notes:
There is no new syntax for the FETCH statement in DB2 V8 to support dynamic scrollable
cursors.
However, there is a restriction in the FETCH orientation syntax. Any violation of the
restriction results in SQLCODE -244, as explained in the next visual.
Implications on FETCH
INSENSITIVE not allowed with FETCH statement (SQLCODE -244) if
The associated cursor is declared as SENSITIVE DYNAMIC SCROLL
The cursor is declared ASENSITIVE and DB2 chooses the maximum allowable
sensitivity of SENSITIVE DYNAMIC SCROLL
Notes:
SQLCODE -244 indicates that the sensitivity option specified on FETCH conflicts with the
sensitivity option in effect for cursor cursor-name. If a cursor is declared INSENSITIVE, the
FETCH statement can only specify INSENSITIVE or nothing. If a cursor is declared
SENSITIVE, the FETCH statement can specify INSENSITIVE, SENSITIVE, or nothing.
This is as in DB2 V7.
The restriction in DB2 V8 is that the keyword INSENSITIVE is not allowed with the FETCH
statements if the associated cursor is either:
• Declared as SENSITIVE DYNAMIC SCROLL, or
• Declared ASENSITIVE, and DB2 selects the maximum allowable sensitivity of
SENSITIVE DYNAMIC SCROLL for the associated SELECT statement.
Since there is no temporary result table, there are no holes except in the case of the
current row. An SQLCODE +231 is returned if FETCH CURRENT or FETCH RELATIVE +0
is requested, but the row on which the cursor is positioned has been deleted or the row has
been updated so that it no longer meets the selection criteria. This situation can only occur
3-18 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty when using ISOLATION (CS) and CURRENTDATA(NO), which allows the row to be
retrieved without taking a lock.
The order is always maintained. If a column for an ORDER BY clause is updated, then the
next FETCH statement behaves as if the updated row were deleted and re-inserted into the
result table at its correct location. At the time of a positioned update, the cursor is
positioned before the next row of the original location and there is no current row, making
the row appear to have moved. If the row is deleted and the next cursor operation is
FETCH CURRENT, a warning SQLCODE +231 is raised.
Notes:
Dynamic scrollable cursors are useful when it is important for the application to see
updated rows as well as newly inserted rows. The purpose of using SENSITIVE DYNAMIC
is defeated if the isolation is RR or RS, as the update of table by other users is severely
restricted. Therefore isolation CS is recommended for maximum concurrency.
Locks may be held on the current row. Remember that there is no temporary result table
and the base table is accessed directly by the FETCH statement.
If isolation UR is specified as a bind option and the associated SELECT statement contains
a FOR UPDATE OF clause, DB2 promotes UR to CS. If WITH UR is specified in the
SELECT statement along with the FOR UPDATE OF clause, DB2 returns SQLSTATE
42801, SQLCODE -173.
As in previous versions, DSNZPARM RELCURHL specifies whether DB2 should, at
commit time, release a data page or row lock on which a cursor that is defined WITH HOLD
is positioned. This lock is not necessary for maintaining cursor position. If you choose YES,
the default, DB2 releases this data page or row lock after a COMMIT is issued for cursors
that are defined WITH HOLD. Specifying YES can improve concurrency. If you choose NO,
3-20 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty DB2 holds the data page or row lock for WITH HOLD cursors after the COMMIT. This
option is provided so that existing applications that rely on this lock can continue to work
correctly.
Notes:
DB2 cannot use optimistic concurrency control for dynamic scrollable cursors, as it is
working against the real table. For updateable dynamic scrollable cursors that use
ISOLATION(CS), DB2 holds row or page lock on the current row in the base table (DB2
does not use a temporary global table when using dynamic scrollable cursors). The most
recently fetched row or page from the base table is locked to maintain data integrity for a
positioned update or delete.
3-22 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Notes:
Dynamic scrollable cursors are supported with stored procedures. The stored procedure
itself can update through a dynamic scrollable cursor. However, the program calling the
stored procedure is restricted from updating using the allocated cursor. (This is the case for
any type of cursor.)
Scalar functions and arithmetic expressions in SELECT list are re-evaluated at every
FETCH.
Column functions (AVG, MIN, MAX, etc.) are calculated once at open cursor time.
Functions may not be meaningful because the size of result table can change.
The use of non-deterministic functions (built-in or UDF) in the WHERE clause can cause
misleading results, because the result of the function can vary from one FETCH to a
subsequent FETCH of the same row.
Cursors requiring the use of a workfile cannot be declared SENSITIVE DYNAMIC. For
example, you cannot associate the following SQL statement with a dynamic scrollable
cursor, as the entire result table must be materialized to a workfile:
3-24 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
DRDA Considerations
DB2 Linux, UNIX, and Windows V8.1 clients support dynamic
scrollable cursors with FP4
Only via the ODBC interface
Only when calling DB2 for z/OS V8
Dynamic scrollable cursors do not support the DRDA limited block
fetch protocol
To achieve blocking, use multi-row fetch operations. This is currently
supported between:
DB2 for z/OS V8 systems
DB2 ODBC distributed clients using dynamic scrollable cursors and
DB2 for z/OS V8
Notes:
Dynamic scrollable cursors are also allowed in a DRDA environment. They are fully
supported in a distributed environment between DB2 for z/OS V8 systems.
DB2 V8.1 clients on Linux, UNIX, and Windows platforms have also implemented dynamic
scrollable cursors via ODBC (or DB2 CLI) with FixPak 4 when communicating with a DB2
for z/OS V8 system. You can make a statement a dynamic scrollable cursor by calling the
SQLSetStmtAttr() function with the SQL_ATTR_CURSOR_TYPE statement attribute set to
SQL_CURSOR_DYNAMIC.
When using dynamic scrollable cursors in a distributed environment using DRDA, limited
block fetch is not used, and one row is fetched at a time from the remote server. DB2
cannot use block fetch as it has to evaluate the predicates during each FETCH operation
because the base table may have changed, as dictated by the semantics of the “dynamic”
keyword.
You may want to consider declaring your dynamic scrollable cursor WITH ROWSET
POSITIONING to “simulate” a limited block fetch effect. When using rowset cursors, DRDA
always sends a rowset across the wire, also for dynamic scrollable cursors.
3-26 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty At the time of writing of this publication, rowset cursors are supported:
• Between DB2 for z/OS V8 systems.
• Between DB2 ODBC/CLI clients on distributed platforms that use dynamic scrollable
cursors to retrieve data on a DB2 for z/OS V8. Note that dynamic scrollable cursors on
DB2 distributed clients always use multi-row fetch (when obtaining data from a DB2 for
z/OS V8 system).
Notes:
This visual shows some SQL statements using dynamic scrollable cursors that you can
embed in your programs.
3-28 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
Dynamic Scrollable Cursor Example (2 of 2)
Example (Continued)
Notes:
The visual shows some SQL statements using dynamic scrollable cursors that you can
embed in your programs.
Note that when you declare the cursor as dynamic scrollable, and SQLCODE +100 is
returned, you can continue to FETCH rows. However, once end of file is reached, FETCH
NEXT will always give +100; that is, once you get SQLCODE +100, do not expect to see
rows inserted after the last successful fetch (prior) to +100, and similarly for FETCH
PREVIOUS. Once you get a +100, if you want to continue fetching, you need to reposition
via FETCH FIRST/LAST, etc.
Notes:
The visual summarizes the characteristics of different types of cursors. Only the last row is
new in DB2 V8.
3-30 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Benefits .....
Enhances usability and power of SQL
Performance is improved by eliminating multiple trips between
application and database engine; for distributed access,
reduced network traffic
Notes:
DB2 V8 introduces support for multiple row processing for both the FETCH and INSERT
statements. In prior versions of DB2, an application has to execute multiple SQL FETCH
statements, one for each row to be retrieved from a table, and multiple SQL INSERT
statements, one for each row to be inserted into a table.
In V8, a single FETCH statement can be used to retrieve multiple rows of data from the
result table of a query as a rowset. A rowset is a group of rows that are grouped together
and operated on as a set. For example, you may fetch the next rowset. Fetching multiple
rows of data can be done with both scrollable and non-scrollable cursors. New syntax on
the FETCH statement allows the specification of the number of rows to be returned in the
rowset for each fetch. The maximum rowset size is 32767.
The multiple-row FETCH statement can only be embedded in an application program. It is
an executable statement that cannot be dynamically prepared. This only applies to the
FETCH statement. You can of course dynamically prepare a statement with rowset
positioning.
3-32 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty Prior to DB2 V8, an SQL INSERT statement using the VALUES clause can insert one row
of data into a table or view. The INSERT INTO ... SELECT FROM ... allows zero or more
rows to be inserted into a table or view named in the INTO clause, by retrieving qualified
rows from the table or view named with the FROM clause. With the multi-row INSERT
enhancement in DB2 V8, an INSERT statement using the VALUES clause can insert one
or more rows into a table or view with a single SQL statement.
There are two forms of multiple-row INSERT: one static, and one dynamic form. These DB2
V8 enhancements help to lower the statement execution cost, and in a distributed
environment, also the network cost. Multiple trips between the application and the
database are no longer required, and there are fewer send and receive messages over the
network.
Multi-Row FETCH
DECLARE CURSOR statement
Host variable arrays
FETCH statement
Positioned UPDATE statement
Positioned DELETE statement
Notes:
Multi-row FETCH requires a proper understanding of how to set up the environment in
terms of declaring the cursor, setting up the host variable arrays, and performing the
FETCH operations. Using multi-row FETCH not only has an impact on how you handle
FETCH operations in your application, but also has repercussions on how the positioned
UPDATE and positioned DELETE statements are handled. We discuss the details in the
following visuals.
3-34 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
DECLARE CURSOR Syntax
NO SCROLL
DECLARE cursor-name
ASENSITIVE
SCROLL
INSENSITIVE
SENSITIVE STATIC
DYNAMIC
(1)
CURSOR FOR select-statement
WITH HOLD statement-name
WITH RETURN
rowset-positioning
rowset-positioning:
WITHOUT ROWSET POSITIONING
Notes:
The visual shows the changes to the DECLARE CURSOR syntax. The row-set positioning
block specifies whether multiple rows of data can be accessed as a rowset on a single
FETCH statement for this cursor. The default is WITHOUT ROWSET POSITIONING.
• WITHOUT ROWSET POSITIONING:
Specifies that the cursor can only be used to return a single row for each FETCH
statement, and that the FOR n ROWS clause cannot be specified on a FETCH
statement for this cursor. Doing so results in an SQLSTATE 24523 or SQLCODE -249.
• WITH ROWSET POSITIONING:
Specifies that this cursor can be used to return either a single row or multiple rows, as a
rowset, with a single FETCH statement. The FOR n ROWS clause of the FETCH
statement controls how many rows are returned on each FETCH statement. Cursors
declared WITH ROWSET POSITIONING may also be used with row positioned FETCH
statements.
Important: Using WITHOUT ROWSET POSITIONING does not mean that there is no
DRDA blocking when using distributed access. It only indicates that one row at a time is
returned to the application host variables.
3-36 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
DECLARE CURSOR Example
Declare C1 as the cursor of a query to retrieve a rowset
from table EMP
EXEC SQL
DECLARE C1 CURSOR
WITH ROWSET POSITIONING
FOR SELECT * FROM EMP;
Notes:
This visual shows an example of how to declare the cursor for a query to retrieve a rowset
from the table EMP. Note that you do not specify the size of the rowset on the DECLARE
CURSOR statement. That is done at FETCH time.
FETCH Syntax
FETCH
FROM
FETCH fetch-orientation cursor-name
INSENSITIVE single-row-fetch
multiple-row-fetch
SENSITIVE
fetch-orientation:
BEFORE
AFTER
row-positioned
rowset-positioned
rowset-positioned:
NEXT ROWSET
PRIOR ROWSET
FIRST ROWSET
LAST ROWSET
CURRENT ROWSET
ROWSET STARTING AT ABSOLUTE host-variable
RELATIVE integer-constant
multiple-row-fetch:
,
FOR host-variable ROWS
integer-constant INTO host-variable-array
Notes:
Two new syntax blocks, multiple-row-fetch and rowset-positioned, have been introduced
for multi-row FETCH.
The rowset-positioned block is similar to the existing row-positioned block in DB2 V7. The
rowset-positioned clause specifies positioning of the cursor with rowset-positioned fetch
orientations NEXT ROWSET, PRIOR ROWSET, FIRST ROWSET, LAST ROWSET,
CURRENT ROWSET, ROWSET STARTING AT ABSOLUTE, ROWSET STARTING AT
RELATIVE, just as the existing row-positioned clause specifies positioning of the cursor
with row-positioned fetch orientations NEXT, PRIOR, FIRST, LAST, CURRENT,
ABSOLUTE, and RELATIVE.
The multiple-row-fetch block is similar to the existing single-row-fetch block in DB2 V7,
except that there is an additional clause FOR n ROWS, where n can either be an integer
constant or a host variable. When using a host variable, you can vary the number of rows
fetched in a rowset for each FETCH, if needed.
3-38 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
FETCH Examples
EXAMPLE 1:
Fetch the previous rowset and have the cursor positioned on that rowset
EXEC SQL
FETCH PRIOR ROWSET FROM C1 FOR 3 ROWS INTO...
-- OR --
EXEC SQL
FETCH ROWSET
STARTING AT RELATIVE -3 FROM C1 FOR 3 ROWS INTO...
EXAMPLE 2:
Fetch 3 rows starting with row 20 regardless of the current position of the
cursor
EXEC SQL
FETCH ROWSET STARTING AT ABSOLUTE 20
FROM C1 FOR 3 ROWS INTO...
Notes:
This visual shows some simple examples of the FETCH statement using rowsets.
Given that the cursor C1 is defined as:
DECLARE C1 CURSOR WITH ROWSET POSITIONING FOR SELECT * FROM EMP
• Fetch 3 rows starting with row 20 regardless of the current position of the cursor, and
cause the cursor to be positioned on that rowset at the completion of the fetch:
FETCH ROWSET STARTING AT ABSOLUTE 20 FROM C1 FOR 3 ROWS INTO :HVA1,:HVA2, ...
• Fetch the previous rowset, and have the cursor positioned on that rowset:
FETCH PRIOR ROWSET FROM C1 FOR 3 ROWS INTO :HVA1,:HVA2, ... or
FETCH ROWSET STARTING AT RELATIVE -3 FROM C1 FOR 3 ROWS INTO :HVA1,:HVA2, ...
• Fetch the first 3 rows and leave the cursor positioned on that rowset at the completion
of the fetch:
FETCH FIRST ROWSET FROM C1 FOR 3 ROWS INTO :HVA1,:HVA2, ...
In the foregoing example:
• The FOR n ROWS clause specifies that with a single FETCH statement in the
application program, DB2 fetches n rows. The value of ‘n’ determines the ROWSET
size.
• The rowset is the group of rows that are returned by the single FETCH statement from
the result table of the query.
• :HVA1 and :HVA2 following the INTO clause are the names of the host variable arrays.
For more information on host variable arrays, see Figure 3-23 "Host Variable Arrays" on
page 3-41.
Single row and multiple row fetches can be mixed for a rowset cursor. If FOR n ROWS is
NOT specified and the cursor is declared for rowset positioning, then the size of rowset is
the same as the previous rowset fetch (as long as it was the previous fetch for this cursor),
or the previous fetch was a FETCH BEFORE or FETCH AFTER and the fetch before that
was a rowset fetch. Otherwise the rowset is 1.
Tip: To avoid any unexpected behavior, we recommend that you always code the FOR n
ROWS clause.
Note that you can also specify the FETCH FIRST n ROWS only clause in the SELECT
statement of the cursor. However, this clause has a very different meaning:
• In the SELECT statement, the FETCH FIRST n ROWS ONLY clause controls the
maximum number of rows that can be accessed with the cursor. When a FETCH
statement attempts to retrieve a row beyond the number specified in the FETCH FIRST
n ROWS ONLY clause of the SELECT statement, an end of data condition (SQLCODE
+100) occurs. In other words, it controls the total number of the rows of the result set.
• In a FETCH statement, the FOR n ROWS clause controls the number of rows that are
returned for a single FETCH statement.
3-40 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
Host Variable Arrays
A host variable array is an array in which each element of the
array contains a value for the same column; column-wise
binding
Can only be referenced in multi-row FETCH and INSERT
Host variable arrays are supported in:
COBOL
PL/I
C and C++
Assembler support is limited to cases where USING
DESCRIPTOR is allowed.
Notes:
When using ROWSET operations, each FETCH retrieves column values for multiple rows.
It is therefore necessary to set up host variable arrays to receive multiple values for each
column. A host variable array is an array in which each element of the array contains a
value for the same column. The number of host variable arrays that you must specify in a
multi-row fetch or multi-row insert statement is the same as for single-row select and insert
statements; that is, one host variable for each column in the table that you want to retrieve
or insert. Host variable arrays can only be referenced in multi-row FETCH and INSERT
operations.
To handle nulls, you also have to have an indicator array and you specify the name of this
array following the host variable array. In the following example, COL1 is the host variable
array and COL1IND is its indicator array. Assuming that COL1 has 10 elements (for
fetching a single column of data for multiple rows of data), then COL1ID must also have 10
entries:
EXEC SQL
FETCH C1 FOR 5 ROWS
INTO :COL1:COL1IND
END_EXEC.
Host variable arrays are supported in COBOL, PL/I, C, and C++. Assembler support is
limited to cases where USING DESCRIPTOR is allowed. (When using the USIGN
DESCRIPTOR clause, you use the SQLDA instead.) The DB2 precompiler does not
recognize the declaration of host variable arrays in Assembler programs. The programmer
is responsible for allocating storage areas correctly. Multi-row FETCH is not supported in
REXX, FORTRAN, and SQL procedure applications.
Our JDBC drivers currently do not support multi-row fetch and insert to pass data back and
forth between the Java application and DB2. (They do use it under the covers in some
cases, but that is transparent to the application.). However, we can expect to see support
for this in the not-too-distant future.
At the time of writing of this publication (DB2 for Linux, UNIX, and Windows Version 8.1
FixPak 4), there is no support for either multi-row fetch or insert with embedded SQL.
The ODBC/CLI driver on the distributed platform has limited support for multi-row fetch and
insert. The DB2 LUW ODBC driver supports "array fetch" and most of the DB2 for z/OS
multi-row INSERT is supported.
ODBC on z/OS currently does not, but that may change in the foreseeable future.
3-42 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
COBOL Example
Declare cursor C1 and fetch 10 rows using a multi-row FETCH statement
01 OUTPUT-VARS.
05 NAME OCCURS 10 TIMES.
49 NAME-LEN PIC S9(4) USAGE COMP.
49 NAME-TEXT PIC X(40).
05 SERIAL-NUMBER PIC S9(9) USAGE COMP OCCURS 10 TIMES.
01 IND-VARS.
10 INDSTRUC1 PIC S9(4) USAGE COMP OCCURS 10 TIMES.
10 INDSTRUC2 PIC S9(4) USAGE COMP OCCURS 10 TIMES.
PROCEDURE DIVISION.
EXEC SQL
DECLARE C1 SCROLL CURSOR WITH ROWSET POSITIONING FOR
SELECT NAME, SERIAL# FROM EMPLOYEE
END-EXEC.
EXEC SQL OPEN C1 END-EXEC.
EXEC SQL
FETCH FIRST ROWSET FROM C1 FOR 10 ROWS
INTO :NAME:INDSTRUC1,:SERIAL-NUMBER:INDSTRUC2
END-EXEC.
Notes:
The COBOL example on the visual demonstrates how to retrieve the first rowset consisting
of 10 rows using a single FETCH statement.
The program retrieves two columns, NAME defined as VARCHAR(40) and SERIAL#
defined as INTEGER from EMPLOYEE table. Both columns are nullable and so there is
need for indicator variable arrays, as well as normal arrays for both column values.
Note that the cursor must be declared as a scrollable cursor. This is because we use the
FETCH FIRST ROWSET clause. FETCH FIRST ROWSET can only be used if the cursor is
declared scrollable.
Note also how the host variable arrays and indicator variable arrays are set up. You can
use DCLGEN to set up the host variables and indicator variables. However, you must edit
the DCLGEN output to include the OCCURS clause appropriately to set up the arrays.
PL/I Example
You can retrieve 10 rows from table DEPARTMENT with:
Notes:
The PL/I example on the visual demonstrates the use of a (scrollable) cursor to retrieve the
first rowset consisting of 10 rows using a single FETCH statement.
The program retrieves four columns, DEPTNO defined as CHAR(3), DEPTNAME defined
as VARCHAR(29), MGRNO defined as CHAR(6), and ADMRDEPT defined as CHAR(3)
from DEPARTMENT table.
Note how the host variable arrays and indicator variable arrays are set up. You can use
DCLGEN to have the host variables and indicator variables set up. However, you should
edit DCLGEN output appropriately to set up the arrays.
3-44 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
C/C++ Example
Declare an integer and varying character array to hold columns retrieved
from a multi-row fetch statement
Notes:
The C/C++ example in the figure above, demonstrates the use of a cursor to retrieve the
first rowset consisting of 10 rows using a single FETCH statement.
The program retrieves two columns, NAME defined as VARCHAR(18), and SERIAL_NO
defined as INTEGER from the EMPLOYEE table.
Note that we added a dimension parameter [10] for both host variable arrays. Also note
that a VARCHAR column uses a C structure format.
Using Multi-Row
FETCH with Scrollable Cursors
As holes may occur, ensure at least one indicator variable array
is defined for a column
Even if no nullable columns exist add an indicator variable array for
at least one column
If nullable columns exist all indicator variable arrays provided are
updated if a hole is found
Value of -3 indicates hole
SQLCODE +222 is also returned
Notes:
A new value of -3 for an indicator variable indicates that values were not returned for the
row because a hole was detected. The value of -3 is only used for multiple-row FETCH
statements. You need to provide an indicator variable array for at least one column, even if
there are no nullable columns in the result table. If multiple indicator variable arrays are
provided, then the indication of the hole is reflected in each indicator array.
The purpose of an indicator variable is to indicate when the associated value is the null
value, or that values were not returned because a hole was detected. The value is:
• -1 if the value selected was the null value, as in prior versions.
• -2 if the null value was returned due to a numeric conversion or arithmetic expression
error that occurred in the SELECT list of an outer SELECT statement, as in prior
versions.
• -3 if the null value was returned because a hole was detected for the row on a multiple
row FETCH, and values were not returned for the row. In cases where -3 is set to
indicate a hole, SQLSTATE 02502, SQLCODE +222, is also returned for that row.
3-46 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty If no indicator variable arrays are provided for a multiple-row FETCH statement, and a hole
is detected, an error is returned (SQLSTATE 24519, SQLCODE -247).
In the example on the visual, CUSTNO and CUST_TYPE are NOT NULL columns and
ADDRESS_NULABLE column is nullable. IND-VAR is the indicator array for
ADDRESS_NULLABLE. For the second row in the rowset, (CUSTNO = 2000, the address
is unknown, and the ADDRESS_NULLABLE column is not filled in, and the value for the
IND-VAR column is -1. For the third row in the rowset, the value for the IND-VAR column is
-3, indicating a hole. No values are returned for any host variables. If the update or delete
hole is part of the last rowset that is returned, a +222 SQLCODE is returned (not
SQLCODE +100).
Rowsets
A ROWSET is a group of rows from the result table of a query, which are
returned by a single FETCH statement
(or inserted by a single (multi-row) INSERT statement)
The program controls how many rows are returned in a rowset
(it controls the size of the rowset)
Can be specified on the FETCH ... FOR n ROWS statement (n is the rowset size and
can be up to 32767)
Each group of rows is operated on as a rowset
Ability to intertwine single row and multiple row fetches for a multi-fetch
cursor
Notes:
We have seen some examples of multi-row FETCH operations. A formal definition of a
rowset is a group of rows for the result table of a query that are returned by a single FETCH
statement. The maximum size of the rowset is 32767 and is controlled by the FETCH
statement (FOR n ROWS clause). Each group of rows retrieved by a FETCH statement is
operated on as a rowset. It is possible to have FETCH statements retrieving single rows
and multi-row FETCH statements intertwined in a program.
When the cursor is opened, the associated SELECT statement is evaluated. How the
FETCH statements operate on the result table, and cursor positions on the result table,
depend on how the FETCH statement is issued.
The example on the visual shows a FETCH statement. The STARTING at ABSOLUTE 10
clause causes retrieval of rows starting at row 10. The FOR 6 ROWS clause causes rows
10 to 15 to be retrieved, and the cursor is positioned on all these 6 rows.
3-48 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
Cursor Positioning:
Rowset Positioned Fetches
Result table
CUST_NO CUST_TYP CUST_NAME
1 P Ian
FETCH FIRST ROWSET
2 P Mark
FOR 3 ROWS
3 P John
4 P Karen
FETCH NEXT ROWSET 5 P Sarah
6 M Florence
7 M Dylan
8 M Bert
FETCH ROWSET STARTING 9 M Jo
AT ABSOLUTE 8 10 R Karen
FOR 2 ROWS 11 R Gary
12 R Bill
13 R Geoff
Note: Cursor is positioned on
14 R Julia
ALL rows in current rowset 15 R Sally
Notes:
FETCH FIRST ROWSET causes retrieval of rows starting from the first row in the result
table. The rowset size is 3 because FOR 3 ROWS is specified on the FETCH.
FETCH NEXT ROWSET causes retrieval starting from the first row after the previous
rowset, in this case row 4. The rowset size continues to be 3 since FOR n ROWS is not
specified.
FETCH ROWSET STARTING AT ABSOLUTE 8 FOR 2 ROWS cause retrieval of rows
starting from row 8 in the result table. The rowset size is now 2 rows because the FETCH
statement specifies a value (2) other than 3 for n.
Consider another scenario assuming that the cursor has just been opened:
• FIRST FETCH ROWSET FOR 3 ROWS ... cursor is positioned on rowset on rows
1,2,3.
• FETCH NEXT ROWSET ... cursor is positioned on rowset on rows 4,5,6.
• FETCH CURRENT ROWSET ... cursor is still positioned on the same rowset (rows
4,5,6).
• FETCH PRIOR ROWSET FOR 4 ROWS ... cursor is positioned on a “partial” rowset on
rows 1,2,3 and returns a warning message. The reason for the warning is that we want
to go back more rows in the result table than there are rows. It is OK to specify a
different number of rows (4) on the FETCH PRIOR ROWSET, than was specified
(explicitly or implicitly) on the previous FETCH (3).
Although the rowset is logically obtained by fetching backwards from before the current
rowset, the data is returned to the application starting with the first row of the rowset to
the end of the rowset.
3-50 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
Mixing Row and Rowset Positioning
Result table
CUST_NO CUST_TYP CUST_NAME
1 P Ian
FETCH FIRST ROWSET
2 P Mark
FOR 3 ROWS
3 P John
FETCH NEXT ROWSET 4 P Karen
5 P Sarah
6 M Florence
7 M Dylan
FETCH NEXT 8 M Bert
Note: FETCH NEXT is relative 9 M Jo
to the FIRST row in the current 10 R Karen
rowset 11 R Gary
12 R Bill
13 R Geoff
14 R Julia
15 R Sally
Notes:
You are allowed to mix rowset fetches and row fetches from the same cursor. However, the
result may be different from what you expect.
The example in the figure above starts with a FETCH FIRST ROWSET FOR 3 ROWS. This
fetches the rowset with customer numbers 1, 2, and 3.
The next statement is a FETCH NEXT ROWSET. Because we did not specify the size of
the rowset, the rowset size of the previous call is used. This means we fetch another
rowset with three rows with customer numbers 4, 5, and 6.
Now we issue a row fetch by doing a FETCH NEXT or FETCH. In this case a single row is
returned. The row that is returned is the next row after the first row that makes up the
current rowset. That means that in this case, customer number 5 is returned (again). (The
positioning for a row fetch is relative to the first row in the current rowset.)
Because it may cause confusion, we do not recommend mixing rowset and row fetches on
the same cursor. It does work, but you have to know very well what you are doing in order
not to retrieve the wrong data.
In case you want to retrieve customer number 7 (instead of 5), you can use a FETCH
NEXT ROWSET FOR 1ROW. In that case a rowset of one row is returned but we continue
where we left off on the previous rowset fetch.
3-52 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
Partial Rowsets (Result Sets)
Result table
Note: 2nd rowset fetch will CUST_NAME
return SQLCODE +100 FETCH ROWSET
SQLERRD(3) contains the Ian STARTING AT
number of rows returned Mark ABSOLUTE 5
John FOR 3 ROWS
(2 in this case)
Karen
Sarah
Florence FETCH PREVIOUS
Dylan ROWSET
FETCH ROWSET STARTING Bert FOR 10 ROWS
AT ABSOLUTE 11 Jo
FOR 3 ROWS Karen Note: 2nd rowset
Gary fetch will return
Bill
SQLCODE +20237
Geoff
FETCH NEXT ROWSET SQLERRD(3)
FOR 3 ROWS Julia
Sally contains 4
Notes:
Another important topic to address when using multi-row fetch is what happens in case you
reach the end of a result set. When you do an “old-fashioned” row fetch, when you fetch
beyond the end of the result set you receive an SQLCODE +100 and you know that you
are done (you do not have to look at the values of the fetch that returned the SQLCODE
+100, as there is no more data).
Life becomes more complicated when using rowset fetches. When you fetch past the end
of the result set, you also receive an SQLCODE +100. However, in this case, it may not be
sufficient to stop processing the data. The last rowset that is returned may still contain a
number of valid rows that you need to process; there may just not have been enough rows
left in the result set to fill up a complete rowset.
This case is illustrated on the left side of the figure above. The second rowset fetch asks for
3 rows. However, the result set only contains 2 more rows. In this case those 2 remaining
rows are returned in the provided host variable arrays and an SQLCODE +100 is returned.
To determine how many rows were returned in the rowset, you have to examine the
SQLERRD(3) flag in the SQLCA (or use GET DIAGNOSTICS). In this case SQLERRD(3)
contains the value 2.
The same situation can occur when you fetch beyond the beginning of the result set. This is
shown on the right-hand side in the figure above. If you are positioned on the rowset for
rows 5,6,7 after the first rowset fetch and issue a FETCH PREVIOUS ROWSET FOR 10
ROWS, you go beyond the start of the result set. In this case you receive a different
SQLCODE +20237, and SQLERRD(3) will contain 4, and the provided host variable array
will contain the four previous values in the result set.
3-54 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
Locking and Isolation Levels
Cursor is positioned on all rows in current rowset
Locks are held on all rows in rowset depending on isolation level
and whether a result table has been materialized
Affects whether you refetch the same rows when issuing a
FETCH CURRENT to refetch current rowset
Notes:
The isolation level of the statement (specified implicitly or explicitly) can affect the result of
a rowset-positioned FETCH statement. This is possible, for example, when changes are
made to the tables underlying the cursor when isolation level UR is used with a dynamic
scrollable cursor, or with other isolation levels when rows have been added by the
application fetching from the cursor.
This is not any different in versions prior to DB2 V8. The only change in DB2 V8 is that
locks may be held on multiple rows.
Notes:
Be aware of the considerations shown on the visual when using multi-row FETCH with
static scrollable cursors. Depending on the type of fetching you do, you can see updates or
deletes, made through your cursor (insensitive fetch), or also updates and deletes made by
other applications (sensitive fetch). In both cases, you can have update and/or delete holes
in your rowset. These are the same considerations with static scrollable cursors in V7.
3-56 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
Considerations with
Dynamic Scrollable Cursors
Starting point and contents of rowset changes when scrolling
back and forth
Note that just after fetching the CURRENT ROWSET, other
applications can insert rows in between the rows being
returned as part of the rowset
Refetching current rowset can return different rows, unless using
RR ISOLATION
FETCH PRIOR ROWSET returns the previous n rows that qualify
from the start of the current cursor position
Therefore n rows are returned as long as start of rowset is not
reached
Notes:
The current row of a cursor cannot be updated or deleted by another application process if
it is locked. Unless it is already locked because it was inserted or updated by the
application process during the current unit of work, the current row of the cursor is not
locked if:
• The isolation level is UR, or
• The isolation level is CS, and
- The result table is of the cursor is read-only.
- The bind option is CURRENTDATA(NO) (which allows for lock avoidance), and DB2
actually managed to avoid having to take a lock.
The following situations can occur depending on the isolation level, other activity in the
system and the fetch orientation you use:
• PRIOR ROWSET:
With a dynamic scrollable cursor and isolation level UR, the content of a prior rowset
can be affected by other activity within the table. It is possible that a row that previously
qualified for the cursor, and was included as a member of the "prior" rowset, has since
been deleted or modified before it is actually returned as part of the rowset for the
current statement. The same is true for cursor stability with CURRENTDATA(NO), when
lock avoidance is used. To avoid this behavior, use an isolation level other than UR or
CS with CURRENTDATA(NO).
• CURRENT ROWSET:
With a dynamic scrollable cursor, additional rows can be added between rows that form
the rowset that was returned to the user. With isolation level RR, these rows can only be
added by the application fetching from the cursor. For isolation levels other than RR,
other applications can insert rows that can affect the results of a subsequent FETCH
CURRENT ROWSET. To avoid this behavior, use a static scrollable cursor instead of a
dynamic scrollable cursor.
• LAST ROWSET:
With a dynamic scrollable cursor and isolation level UR, the content of the last rowset
can be affected by other activity within the table. It is possible that a row that previously
qualified for the cursor, and was included as a member of the "last" rowset, has since
been deleted or modified before it is actually returned as part of the rowset for the
current statement. To avoid this behavior, use an isolation level other than UR.
• ROWSET STARTING AT RELATIVE n (where n is a negative number):
With a dynamic scrollable cursor and isolation level UR, the content of a prior rowset
can be affected by other activity within the table. It is possible that a row that previously
qualified for the cursor, and was included as a member of the "prior" rowset, has since
been deleted or modified before it is actually returned as part of the rowset for the
current statement. To avoid this behavior, use an isolation level other than UR.
3-58 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
Positioned UPDATE of Multi-Row FETCH
positioned-update
Notes:
You can issue the positioned UPDATE statement in one of the following two ways:
• WHERE CURRENT OF ... as in prior versions and this is the default.
When the UPDATE statement is executed, the cursor must be positioned on the row or
rowset of the result table:
- If positioned on a single row, that row is updated.
- If positioned on a rowset, all rows corresponding to rows of the current rowset are
updated.
• WHERE CURRENT OF ... followed by FOR ROW ... OF ROWSET, which is new in V8.
This enables you to update a specific row in a rowset.
Example 1:
The following UPDATE statement is used to update all 10 rows of the
rowset:
Example 2:
The following UPDATE statement is used to update row 4 of the rowset:
EXEC SQL
UPDATE T1 SET COL1='ABC'
WHERE CURRENT OF CS1 FOR ROW 4 OF ROWSET
END-EXEC
Notes:
If you use the UPDATE statement in conjunction with multi-row FETCH processing, notice
that without the "FOR ROW n OF ROWSET" clause, all rows of the current rowset are
updated, which would have a dramatically different effect on data being changed.
3-60 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
Positioned DELETE of Multi-Row FETCH
positioned-delete
Notes:
You can issue the positioned DELETE statement in one of the following two ways:
• WHERE CURRENT OF ... as in prior versions, and this is the default.
When the DELETE statement is executed, the cursor must be positioned on a row or
rowset of the result table:
- If positioned on a single row, that row is deleted and, after deletion, the cursor is
positioned before the next row of the result table. If there is no next row, cursor is
positioned after the last row.
- If positioned on a rowset, rows corresponding to rows of the current rowset are
deleted and, after deletion, the cursor is positioned before the next rowset of the
result table. If there is no next rowset, the cursor is positioned after the last rowset.
• WHERE CURRENT OF ... FOR ROW ... OF ROWSET, which is new in DB2 V8.
This enables you to delete a specific row in a rowset.
Example 1:
The following DELETE statement is used to delete all 10 rows of the
rowset:
Example 2:
The following DELETE statement is used to delete row 4 of the rowset:
EXEC SQL
DELETE FROM T1
WHERE CURRENT OF CS1 FOR ROW 4 OF ROWSET
END-EXEC
Notes:
If you use the DELETE statement in conjunction with multi-row FETCH processing, notice
that without "FOR ROW n OF ROWSET" clause, all 10 rows are deleted, which would have
a dramatically different effect on data being deleted.
3-62 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
Multi-Row INSERT
New third form of insert
INSERT with FOR "n" ROWS is used to insert multiple rows into the
table or view using values provided in a host variable array
FOR "n" ROWS
For static, it is valid to specify FOR "n" ROWS on the INSERT statement
(for dynamic INSERT, specify FOR "n" ROWS on the EXECUTE statement)
Input provided with host variable array -- each array represents cells for
multiple rows of a single column
VALUES clause allows specification of multiple rows of data
Host variable arrays used to provide values for a column on INSERT
Example: VALUES (:hva1, :hva2)
Notes:
Prior to DB2 V8, inserting rows into a table can be done in one of the following ways:
• INSERT with VALUES is used to insert a single row into the table using values provided
or referenced.
• INSERT with SELECT is used to insert one or more rows into the table using values
from other tables or views.
DB2 V8 has introduced another way to insert multiple rows using values provided in host
variable arrays.
To use a multiple-row FETCH or INSERT statement with a host variable array per column,
the application must define one or more host variable arrays that can be used by DB2.
Each language has its own conventions and rules for defining a host variable array (see
Figure 3-23 "Host Variable Arrays" on page 3-41.
A host variable array corresponds to the values for one column of the result table for
FETCH, or column of data to be inserted for INSERT. The first value in the array
corresponds to the value for that column for the first row, the second value in the array
corresponds to the value for the column in the second row, and so on. DB2 determines the
attributes of the values in the array based on the declaration of the array. Host variable
arrays are used to return the values for a column of the result table on FETCH, or to
provide values for a column on INSERT.
Multi-row INSERT can be used to reduce network traffic when input is presented through a
remote application.
3-64 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
Using Multi-Row INSERT
Single row versus multi-row INSERT
Application
Application
HV array
DB2 DB2
Notes:
The figure above is divided horizontally into two parts.
The top part of the picture shows the difference between individual row inserts and
multi-row insert. In the first case (single-row insert) we have as many trips from the
application to DB2 as we insert rows. In the second case (multi-row insert), in a single trip
across the API, we insert 4 rows (the maximum is 32K rows in a single insert).
The bottom part of the picture shows that when using multi-row insert, not variables have to
be a host variable array. In the example we use a fixed character string and a special
register. During the insert, the values of those are “duplicated” as many times as required
to match the FOR n ROWS clause. (Note that the literal “my string”’ can also be a normal
host variable.
INSERT Syntax (1 of 2)
( column-name )
VALUES expression
DEFAULT
NULL
,
( expression )
DEFAULT
NULL
fullselect
WITH RR QUERYNO-integer
RS
CS
multiple-row-insert
Notes:
This visual shows a new block multiple-row-insert has been introduced in the INSERT
syntax to facilitate multi-row INSERT operation.
This is applicable to static SQL INSERT. There are two flavors, static and dynamic.
3-66 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
INSERT Syntax (2 of 2)
multiple-row-insert
FOR
integer-constant ROWS
host-variable
,
ATOMIC
VALUES ( expression )
host-variable-array
NULL NOT ATOMIC CONTINUE ON SQLEXCEPTION
DEFAULT
Notes:
The visual shows the details of the new multiple-row-insert block.
The FOR n ROWS clause is used to insert multiple rows into a table or view. To facilitate
this, the VALUES clause has the host-variable-array specification.
ATOMIC/NOT ATOMIC
CONTINUE ON SQLEXCEPTION
ATOMIC (default)
If the insert for any row fails, all changes made to database by that INSERT
statement are undone
Notes:
ATOMIC or NOT ATOMIC CONTINUE ON SQLEXCEPTION clause is provided so that the
application can specify if it wants the multiple-row INSERT to succeed or fail as a unit, or if
it wants DB2 to proceed despite a partial failure (one or more rows).
ATOMIC specifies that if the insert for any row fails, then all changes made to the database
by any of the inserts, including changes made by successful inserts, are undone. This is
the default.
When NOT ATOMIC CONTINUE ON SQLEXCEPTION is specified, the inserts are
processed independently. This means that if one or more errors occur during the execution
of an INSERT statement, then processing continues and any successful inserts made
during the execution of the statement are not undone. You can use the GET
DIAGNOSTICS statement to keep track of this. For more information on GET
DIAGNOSTICS, see Figure 3-50 "GET DIAGNOSTICS" on page 3-80.
A consideration regarding ATOMIC or NOT ATOMIC CONTINUE ON SQLEXCEPTION is
the amount of data you are inserting. Inserting 32K rows into a table whose rows are 32K
3-68 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty bytes long (the row is in a 32K page and 1 row/page) consumes 1 GB of space in the
application and would log > 1 GB of data, so rollback could be painful.
Another consideration is the effect on triggers.With a multiple row INSERT statement,
when triggers are processed is dependent on the atomicity option in effect for the
statement:
• ATOMIC:
The inserts are processed as a single statement, any statement level triggers are
invoked once for the statement, and transition tables include all of the rows inserted.
• NOT ATOMIC CONTINUE ON SQLEXCEPTION:
The inserts are processed separately, any statement level triggers are processed for
each inserted row, and transition tables include the individual row inserted. With this
option in effect when errors are encountered, processing continues, and some of the
specified rows do not end up being inserted. In this case, if an insert trigger is defined
on the underlying base table, the trigger transition table includes only rows that are
successfully inserted.
ATOMIC is easier to restart/reposition from since it is an “ALL or NONE” type of process.
INSERT - Example 1
Insert a variable number of rows using host variable arrays for
column values. Assume table T1 has one column and that a
variable (:hv) number of rows of data are to be inserted into T1.
Notes:
Notice the use of the keyword ATOMIC.
Therefore either all rows are inserted, or in case of a failure no rows are inserted. It is either
all or nothing.
Note that if the host variable array has fewer elements than n specified in the FOR n
ROWS clause, SQL error is returned (SQLSTATE 42873, SQLCODE -246). If the host
variable array has more elements than specified in the FOR n ROWS clause, the excess
elements are just ignored.
3-70 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
INSERT - Example 2
Insert multiple rows using host variable arrays for column values.
Table T2 has 2 columns C1 and C2. INSERT 10 rows of data into T2.
Values to be inserted are in host variable arrays :hva1 and :hva2.
The corresponding indicator variable arrays are :hvaind1 and :hvaind2.
Notes:
Notice the use of the keyword NOT ATOMIC CONTINUE ON SQLEXCEPTION.
If the insert of one or more of the rows fails, the application is notified of the failure through
the SQLCA and must then issue a GET DIAGNOSTICS (see Figure 3-50 "GET
DIAGNOSTICS" on page 3-80) in order to determine the failing record(s). All the other rows
are successfully inserted.
PREPARE Syntax
attribute-string
ASENSITIVE
INSENSITIVE
SENSITIVE STATIC
DYNAMIC
sensitivity
scrollability
rowset-positioning
holdability
returnability
fetch-first-clause
read-only-clause
update-clause
optimize clause
isolation-clause
ATOMIC
NOT ATOMIC CONTINUE ON SQLEXCEPTION
For NOT ATOMIC CONTINUE ON SQLEXCEPTION, rejected rows can be viewed with
GET DIAGNOSTICS
Notes:
The visual shows the changed PREPARE syntax to facilitate inserting multiple rows of data
with a single dynamic SQL INSERT statement. The keywords ATOMIC and NOT ATOMIC
CONTINUE ON SQLEXCEPTION have the usual meaning.
3-72 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
EXECUTE Syntax
EXECUTE statement-name
USING host-variable
USING DESCRIPTOR descriptor-name
multiple-row-insert
multiple-row-insert: ,
Notes:
The visual shows the changed PREPARE syntax to facilitate inserting multiple rows of data
with a single dynamic SQL INSERT statement.
PREPARE/EXECUTE Example
Assume that table PROG has 9 columns. Prepare and execute a dynamic
INSERT statement to insert 5 rows of data into PROG.
stmt = 'INSERT INTO PROG (C1, C2, C3, C4, C5, C6, C7, C8, C9)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)';
attrvar = 'FOR MULTIPLE ROWS ATOMIC'
nrows = 5
EXEC SQL PREPARE ins_stmt ATTRIBUTES :attrvar FROM :stmt;
EXEC SQL EXECUTE ins_stmt FOR :nrows ROWS
USING :V1, :V2, :V3, :V4, :V5, :V6, :V7, :V8, :V9
Notes:
Notice the difference in syntax between static SQL and dynamic SQL for multi-row INSERT
statement:
• Static SQL:
The FOR n ROWS and ATOMIC/NOT ATOMIC CONTINUE ON SQLEXCEPTION
clauses are specified on the INSERT statement.
• Dynamic SQL:
The FOR n ROWS clause is specified on the EXECUTE statement.
The ATOMIC/NOT ATOMIC CONTINUE ON SQLEXCEPTION clause is specified on
the PREPARE statement.
In the example on the visual, the PREPARE statement uses the INSERT statement string
in host variable stmt and the attributes in the host variable attrvar and places the prepared
statement in the DB2 designated area ins_stmt for use by the EXECUTE statement.
3-74 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
Multi-Row FETCH /
INSERT -- DRDA Considerations
Can be implemented by any requester or server that supports
DRDA Version 3
Between DB2 for z/OS V8 systems
Multi-row INSERT does not affect blocking
A single rowset is inserted in a single INSERT statement
For remote client one rowset returned in a network request
32K block size is ignored
A single network multi-row fetch or insert request can maximum
be 10 MB
Between DB2 on distributed platforms and DB2 for z/OS V8
No support for multi-row operations in embedded SQL
In ODBC/CLI driver
Limited support for multi-row FETCH
Support for multi-row INSERT
Notes:
Let us first look at multi-row operations between DB2 for z/OS systems.
requester even though the same FETCH statement when executed as part of a local
application would have been successful.
For example, suppose that a row with a null value was returned to an application, but no
indicator variable was provided by the application. In this case, SQLCODE -353 is issued
on a subsequent fetch by a remote application, but not for a local application.
Multi-row INSERT and FETCH statements can be implemented by any DRDA application
requester or server that supports the DRDA Version 3 protocols. SQLCODE -30005,
SQLSTATE 56702 is returned if an attempt is made to issue a multi-row INSERT or FETCH
statement on a server that does not support DRDA Version 3 protocols. The fact that both
the AR and AS support DRDA V3 is of course not enough to make this work. The AR and
AS have to implement rowset processing inside the engine or driver as well.
Multi-row fetch and insert operations are fully supported between DB2 for z/OS systems.
The support between DB2 distributed clients and DB2 for z/OS is discussed next.
Multi-row Operations Between the Distributed Platform and DB2 for z/OS
At the time of writing of this publication (DB2 Connect for Linux, UNIX, and Windows
Version 8.1 FixPak 4), there is no support for either multi-row fetch or insert with embedded
SQL.
The ODBC/CLI driver on the distributed platform supports most of the DB2 for z/OS
multi-row INSERT functionality, but there is no support for "WITH ROWSET
POSITIONING" cursors beyond the their implicit use by dynamic scrollable cursors.
To use multi-row INSERT, you code an ODBC "array-input" in your application. When the
DB2 AR realizes that it is talking to a DB2 for z/OS V8 running in new-function mode, it
transforms the flow into a single message containing a single INSERT of n rows (multi-row
insert). (When communicating with a DB2 V7 on the mainframe, the ODBC/CLI driver at
the DB2 for LUW client would have sent a single message containing n INSERT
statements, each for a single row.) You can specify whether this is an atomic operation or
not by using the statement attributes SQL_ATTR_PARAMOPT_ATOMIC and
SQL_ATOMIC_NO.
DB2 Connect on Linux, UNIX, and Windows V8.1 has implemented dynamic scrollable
cursors via ODBC (or DB2 CLI) with FixPak 4. Via calling the SQLSetStmtAttr() function
with the SQL_ATTR_CURSOR_TYPE statement attribute set to
SQL_CURSOR_DYNAMIC, you can make the statement a dynamic scrollable cursor.
These dynamic scrollable cursor always use “WITH ROWSET POSITIONING” cursors
when retrieving data for a DB2 for z/OS V8 system. At the time of writing, this is the only
implementation of rowset cursors on the distributed platform.
Using “array fetch”, you can code arrays for output column values, and then via
SQLSetStmtAttr set the number of rows for an SQLFetch, SQLExtendedFetch, or
SQLScroll. However, this only applies to the client application program talking to the
ODBC/CLI driver on the workstation. No cursor with rowset positioning is used when going
out to obtain data from the database on DB2 for z/OS. (To help out a little bit, the DB2 for
3-76 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty z/OS server implements multi-row fetch “under the covers” when DDF is fetching rows from
the database, but the query block will be based on the client's RQRIOBLK size, which can
only go up to 65535 in the current level of DB2 Connect, and only impacts the API
crossings between the DDF and DBM1 address space.)
3-78 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
GET DIAGNOSTICS
Returns SQL error information
For overall statement
For each condition (when multiple errors occur)
Supports SQL error message tokens greater than 70 bytes (SQLCA
limitation)
Must be embedded - cannot be dynamically prepared
Notes:
The GET DIAGNOSTICS statement enables applications to retrieve diagnostic information
about statements that have been executed. This statement complements and extends the
diagnostics that are available in the SQLCA. This statement can only be embedded in an
application program that cannot be dynamically prepared.
GET DIAGNOSTICS can be used in conjunction with and instead of the SQLCA to
interrogate the results of all SQL statements. It is especially important when dealing with
non-atomic multi-row insert statements, and objects with long names, that potentially no
longer fit into the SQLCA message area.
3-80 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
GET DIAGNOSTICS Syntax (1 of 3)
statement-information:
,
host-variable = statement-information-item-name
statement-information-item-name:
Information about last statement
DB2_GET DIAGNOSTICS_DIAGNOSTICS
DB2_LAST_ROW executed, that is, capability of cursor
DB2_NUMBER_PARAMETER_MARKERS
DB2_NUMBER_RESULT_SETS
CONDITION host-variable2
integer
,
host-variable3 = condition-information-item-name
connection-information-item-name
Notes:
Diagnostic information is provided in three main areas: the statement-information area, the
condition-information area, and the combined-information area. After the execution of an
SQL statement, information about the execution of the statement is provided in the
statement-information area, and at least one instance of the condition-information area.
The number of instances of the condition-information area is indicated by the NUMBER
item that is available in the statement-information area. The combined-information area
contains a text representation of all the information gathered about the execution of the
SQL statement.
For example, the statement-information-item name ROW_COUNT has the following
information:
• It identifies the number of rows associated with the previous SQL statement that was
executed.
• If the previous SQL statement is a DELETE, INSERT, or UPDATE statement,
ROW_COUNT identifies the number of rows deleted, inserted, or updated by that
statement, excluding rows affected by either triggers or referential integrity constraints.
3-82 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
GET DIAGNOSTICS Syntax (2 of 3)
condition-information-item-name:
CATALOG_NAME
CONDITION_NUMBER
CURSOR_NAME
DB2_ERROR_CODE1
DB2_ERROR_CODE2
DB2_ERROR_CODE3
DB2_ERROR_CODE4
DB2_INTERNAL_ERROR_POINTER
DB2_MESSAGE_ID
DB2_MODULE_DETECTING_ERROR
DB2_ORDINAL_TOKEN_n
DB2_REASON_CODE
DB2_RETURNED_SQLCODE
DB2_ROW_NUMBER
DB2_SQLERRD_SET
DB2_SQLERRD1
DB2_SQLERRD2
DB2_SQLERRD3
DB2_SQLERRD4
DB2_SQLERRD5
DB2_SQLERRD6
DB2_TOKEN_COUNT
MESSAGE_TEXT
RETURNED_SQLSTATE
SERVER_NAME
Notes:
The item names in the condition-information-item-name block are shown on the visual.
For example, RETURNED_SQLSTATE contains the SQLSTATE for the specified
diagnostic.
Please refer to the DB2 SQL Reference, SC18-7426 for more details on all the items.
connection-information-item-name:
DB2_AUTHENTICATION_TYPE
DB2_AUTHORIZATION_ID
DB2_CONNECTION_STATE
DB2_CONNECTION_STATUS
DB2_ENCRYPTION_TYPE
DB2_SERVER_CLASS_NAME
DB2_PRODUCT_ID
combined-information:
,
CONDITION
CONNECTION host-variable5
integer
Notes:
The item names in the connection-information-item-name block are shown on the visual.
For example, DB2_SERVER_CLASS_NAME contains QDB2 for DB2 UDB for z/OS.
Please refer to the DB2 SQL Reference, SC18-7426 for more details on all these items.
3-84 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
GET DIAGNOSTICS Example
To determine how many rows were updated in an UPDATE
statement
GET DIAGNOSTICS :rcount = ROW_COUNT;
To handle multiple SQL errors during a NOT ATOMIC multi-row
insert
GET DIAGNOSTICS :numerrors = NUMBER;
Then code a loop to execute the following for the number of errors
GET DIAGNOSTICS CONDITION :i :retstate = RETURNED_SQLSTATE
To see all diagnostic information for an SQL statement
GET DIAGNOSTICS :diags = ALL STATEMENT
Sample output in :diags
Number=1; Returned_SQLSTATE=02000;
DB2_RETURNED_SQLCODE=+100;
Would continue for all applicable items and for all conditions
Items are delimited by semicolons
Notes:
As an example of the use of the GET DIAGNOSTICS statement, we discuss the diagnostic
information for multi-row FETCH and multi-row INSERT.
Additional information may be obtained about the fetch, including information on all
exception conditions encountered while processing the fetch statement, from the GET
DIAGNOSTICS statement.
Consider the following examples, where we attempt to retrieve 10 rows with a single
FETCH statement.
Assume that an error, SQLCODE -802, is detected on the row 5. SQLERRD3 is set to 4 for
the four returned rows, SQLSTATE is set to 22003, SQLCODE is set to -802. This
information is also available from the GET DIAGNOSTICS statement, for example:
GET DIAGNOSTICS :num_rows = ROW_COUNT, :num_cond = NUMBER;
This would result in num_rows = 4 and num_cond = 1 (1 condition).
GET DIAGNOSTICS CONDITION 1 :sqlstate = RETURNED_SQLSTATE,
:sqlcode = DB2_RETURNED_SQLCODE, :row_num = DB2_ROW_NUMBER;
This would result in SQLSTATE = 22003, SQLCODE = -802, and ROW_NUM = 5.
There are some cases where DB2 returns a warning if indicator variables are provided, or
an error if indicator variables are not provided. These errors can be thought of as data
mapping errors that result in a warning (SQLCODE +802 for instance) if indicator variables
are provided. The GET DIAGNOSTICS statement may be used to retrieve information
about all the data mapping errors that have occurred as in the case of multi-row INSERT.
3-86 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty provided in host variable arrays :hva1 (an array of INTEGERS and :hva2 an array of
DECIMAL(15,0) values. The data values for :hva1 and :hva2 are represented in Table 3-2.
Table 3-2 Data values for :hva1 and :hva2
Array entry :hva1 :hva2
1 1 32768
2 -12 90000
3 79 2
4 32768 19
5 8 36
6 5 24
7 400 36
8 73 4000000000
9 -200 2000000000
10 35 88
EXEC SQL
INSERT INTO T1 (C1, C2) FOR 10 ROWS VALUES (:hva1:hvind1, :hva2:hvind2)
NOT ATOMIC CONTINUE ON SQLEXCEPTION;
After execution of the INSERT statement, we have the following in SQLCA:
SQLCODE = 0
SQLSTATE = 0
SQLERRD3 = 8
Although we attempted to insert 10 rows, only 8 rows of data were inserted. Further
information can be found by using the GET DIAGNOSTICS statement, for example:
GET DIAGNOSTICS :num_rows = ROW_COUNT, :num_cond = NUMBER;
This would result in NUM_ROW = 8 and NUM_COND = 2 (2 conditions).
GET DIAGNOSTICS CONDITION 1 :sqlstate = RETURNED_SQLSTATE,
:sqlcode = DB2_RETURNED_SQLCODE, :row_num = DB2_ROW_NUMBER;
This would result in SQLSTATE = 22003, SQLCODE = -302, and ROW_NUM = 4.
GET DIAGNOSTICS CONDITION 2 :sqlstate = RETURNED_SQLSTATE,
:sqlcode = DB2_RETURNED_SQLCODE, :row_num = DB2_ROW_NUMBER;
This would result in SQLSTATE = 22003, SQLCODE = -302, and ROW_NUM = 8.
3-88 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
INNER JOIN
(
SELECT S.HIREDECADE, MIN(S.SALARY) AS MINIMUM_SALARY
FROM
(
SELECT SUBSTR(CHAR(HIREDATE,ISO),1,3)
CONCAT '0 - 9' AS HIREDECADE,
SALARY
FROM EMPLOYEE
) AS S
GROUP BY S.HIREDECADE
) AS M
ON E.HIREDECADE = M.HIREDECADE
Notes:
This visual shows the SQL statement that uses nested table expressions in joins to
determine, for each employee, their employee number, last name, hiring decade, salary,
and the minimum salary being paid to employees of their hiring decade. The EMPLOYEE
table is used for this purpose.
The nested table expression creates a temporary result table within the FROM clause of an
outer query. The nested table expression cannot be referenced elsewhere in the query,
although the result columns can be referenced.
The SQL statement is fairly complex because the table expressions are part of the FROM
clause. Perhaps you are wondering whether the query can be made more comprehensible.
Yes, there is an easier way: common table expressions introduced in DB2 V8 to be
compatible with other members of the DB2 family.
3-90 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
Common Table Expressions
WITH
E AS
(
SELECT EMPNO, LASTNAME, SALARY,
SUBSTR(CHAR(HIREDATE,ISO),1,3) CONCAT '0 - 9'
AS HIREDECADE
FROM EMPLOYEE
),
M (HIREDECADE, MINIMUM_SALARY) AS
(
SELECT HIREDECADE, MIN(SALARY)
FROM E
GROUP BY HIREDECADE
)
SELECT E.EMPNO, E.LASTNAME, E.HIREDECADE,
E.SALARY, M.MINIMUM_SALARY
FROM E INNER JOIN M
ON E.HIREDECADE = M.HIREDECADE
Notes:
The example on the visual reformulates the query that we used in the previous example,
that is, the example for nested table expressions. It uses common table expressions
(CTEs), which are named E and M as the nested table expressions were before.
They are introduced by the keyword WITH and occur at the beginning of the query. They
are separated from each other by commas. Every reference to a specific common table
expression within the same query uses the same result set. Common table expressions
can be referenced elsewhere in the query, even by other common table expressions within
the same query.
The first common table expression (this is the SQL statement we used in the previous
visual for table expression E) determines employee number, last name, salary, and hiring
decade for all employees of the EMPLOYEE table. The first common table expression is
again called E. The columns of the associated result table are those named in the SELECT
statement.
Although the second table expression looks different, it provides the same result as the
SQL statement for nested table expression M of the previous query: For the various
decades, it determines the minimum salary being paid to the employees hired during the
appropriate decade.
These are the basic differences:
• The SELECT statement now uses common table expression E (defined before in front
of common table expression M) instead of table EMPLOYEE.
• The columns of the common table expression have the names specified in parentheses
following the name of the common table expression. This is the same technique as
naming the columns of a view. No AS clause is required for the calculated columns in
the SELECT statement.
The SELECT follows the common table expression. Since it can refer to the common table
expressions, the SQL statement is more comprehensible compared to the use of nested
table expressions.
Notice that the common table expression M is based on the common table expression E.
This is by no means a requirement for using common table expressions, but is certainly
possible and useful.
The example shows the solution to an application requirement that is somewhat common:
listing aggregate information on the same output line as detailed information.This problem
can also be handled with views as follows:
CREATE VIEW E (EMPNO, LASTNAME, SALARY, HIREDECADE)
AS SELECT
EMPNO, LASTNAME, SALARY, SUBSTR(CHAR(HIREDATE,ISO),1,3) CONCAT '0 -9’
FROM EMPLOYEE;
CREATE VIEW M (HIREDECADE, MINIMUM_SALARY)
AS SELECT
HIREDECADE, MIN(SALARY)
FROM E
GROUP BY HIREDECADE;
SELECT E.EMPNO, E.LASTNAME, E.DECADE, E.SALARY, M.MINIMUM_SALARY
FROM E INNER JOIN M
ON E.HIREDECADE = M HIREDECADE;
Remember that, if views are used, each view needs to be defined, and then access to it
has to be granted. This can be a tedious procedure if the solution requires lots of views,
and so common table expressions are useful.
Common table expressions are materialized if they are referenced more than once. If a
CTE is only referenced once, the CTE is treated like a regular table expression and
materialization is avoided whenever possible.
Common table expressions are required if you want to use recursive SQL introduced in
DB2 V8 to be compatible with other members of the DB2 family.
3-92 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
Recursive SQL
WITH
RPL (PART, SUBPART, QUANTITY) AS
(
Initialization Select
SELECT ROOT.PART, ROOT.SUBPART, ROOT.QUANTITY
FROM PARTLIST ROOT
WHERE ROOT.PART = '01'
UNION ALL
Iterative Select
SELECT CHILD.PART, CHILD.SUBPART, CHILD.QUANTITY
FROM RPL PARENT, PARTLIST CHILD
WHERE PARENT.SUBPART = CHILD.PART
)
Main Select
SELECT PART, SUBPART, SUM(QUANTITY) AS QUANTITY
FROM RPL
GROUP BY PART, SUBPART
Notes:
Recursive SQL is very useful to retrieve data from tables that contain component
breakdowns where each component is broken down into subcomponents and each
subcomponent is broken down again into sub-subcomponents, etc. Applications involving
these kinds of tables are often called "Bill of Materials" applications. A table that represents
the parts in a computer would be an example of Bill of Materials: the major components,
the monitor, system unit, and printer, all contain subassemblies like the hard drive, the
mother board, and the print head, each of which is composed of other subassemblies, etc.
Another example is given: A table of courses containing course codes, course names, and
prerequisites, determines all the courses that are a prerequisite to a particular course. Yet
another example is given: A table of airline connections containing an originating airport, a
destination airport, and distance, determines all the places you can go to and how distant
each destination is from the originating point.
Recursive SQL involves defining a common table expression that references itself. The
common table expression consists of two distinct components, an initialization SELECT
and an iterative SELECT. The initialization SELECT is the first SELECT in the table
expression and the iterative SELECT is the second SELECT in the table expression. The
iterative SELECT is combined with the initialization SELECT by means of UNION ALL.
The recursive common table expression in the example on the visual is named RPL. The
definition is enclosed in parentheses.
The common table expression in a recursive SQL statement is followed by a main
SELECT. The main SELECT identifies the columns which are obtained from the result set
of the common table expression.
The example on the visual builds a final result set that identifies all the parts and subparts
needed to build Part 01 (WHERE clause of initialization SELECT) in a parts table called
PARTLIST. We will see the PARTLIST table when stepping through the various "phases" in
the subsequent visuals.
3-94 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
Recursive SQL - Initialization SELECT
Notes:
This visual shows the PARTLIST table and illustrates what happens as the consequence of
the initialization SELECT.
The initialization SELECT is executed only once. In the example, it reads the PARTLIST
table.
The WHERE clause of the initialization SELECT controls the starting point of the recursion.
In the example, the starting point is all rows with a part number of '01'.
The right-hand side of the visual displays the four rows placed in the temporary table RPL
as the consequence of the initialization SELECT. Parts 02, 03, 04, and 06 are the
assemblies that directly make up Part 01. The first column (PART) of the interim result
identifies the major part. The second column (SUBPART) identifies the subparts that make
up the major part. The third column (QUANTITY) identifies the quantity of the subpart
needed to construct one complete major part. For example, it takes three units of Part 06 to
construct Part 01.
Notes:
Unless it is limited by control variables (See “Controlling Depth of Recursion - Example” on
page 3-100), the iterative SELECT is executed until all subparts of all parts have been
broken down into their subparts, no matter how many repetitions are required. In our
example, there are no control variables so the iteration continues until all parts are
completely resolved.
Note that It is very easy to write a recursive SQL statement incorrectly and initiate an
infinite loop. Control variables are very useful for limiting the number of iterations and are
discussed in a later visual.
The iterative SELECT in the example is the part of the recursive SQL statement between
the UNION ALL and the parenthesis that closed the common table expression named RPL.
Only the iterative SELECT is repeated on this visual.
During the first iteration, each row from the initialization select is joined to all rows in the
PARTLIST table that meet the join criteria. The result rows are added to the temporary
table RPL. The rows that are added to RPL indicate that Parts 05 through 09 and 12
through 13 make up the parts returned by the initialization select:
3-96 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Notes:
The second iteration joins the rows added by the first iteration to the PARTLIST table. The
result rows of the second iteration are again added to RPL. The second iteration indicates
that Part 05 consists of Parts 10 and 11, Part 06 consists of Parts 12 and 13, and Part 07
consists of Parts 12 and 14.
Since there are no correspondences for the subparts of Parts 04 and 06, added by the first
iteration, in the PARTLIST table, parts are not added for them to the RPL temporary table.
Note that RPL now contains two occurrences each of the rows that define the subparts of
Part 06, namely, Parts 12 and 13. The first occurrence of these rows was contributed by the
first iteration and the second occurrence of these rows came from the second iteration. The
UNION ALL preceding the iterative select prevents the removal of duplicate rows.
In this example, the recursion does not yield additional rows after the second iteration
because there are no further subparts for the parts added by the second iteration.
However, if the PARTLIST table contained additional levels of subparts, the recursion
would continue, since the current example does not limit the depth of the recursion.
3-98 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
Recursive SQL - Main SELECT
Notes:
After the recursive common table expression has been evaluated completely, the main
SELECT is evaluated. The main SELECT references the result of RPL, the common table
expression.
The main SELECT summarizes the total quantity of all parts needed to build Part 01. The
grouping and the SUM() function ensure that the quantities of the respective subparts of
Part 06 are added together. In other words, the two rows for Part 06, Subpart 12, are
combined to make a single row. So are the two rows for Part 06, Subpart 13. A user who
wishes to verify that the warehouse contains enough of each of the components required to
make Part 01 can execute this query, and then check existing stocks against the result of
the query.
WITH
RPL (LEVEL, PART, SUBPART, QUANTITY) AS
(
Initialization Select
SELECT 0, ROOT.PART, ROOT.SUBPART, ROOT.QUANTITY
FROM PARTLIST ROOT
WHERE ROOT.PART = '00'
UNION ALL
Iterative Select
SELECT PARENT.LEVEL+1, CHILD.PART, CHILD.SUBPART,
CHILD.QUANTITY
FROM RPL PARENT, PARTLIST CHILD
WHERE PARENT.SUBPART = CHILD.PART
AND PARENT.LEVEL < 2
)
Main Select
SELECT LEVEL, PART, SUBPART, SUM(QUANTITY) AS QUANTITY
FROM RPL
GROUP BY LEVEL, PART, SUBPART
Notes:
3-100 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty interested in the breakdown of Part 00, but we could have started with any part number we
were interested in. The initial value of LEVEL would still be 0, regardless of the starting part
number. The iterative select increments the LEVEL value by adding 1 on each iteration.
The following condition in the WHERE clause of the iterative SELECT is used to limit the
number of iterations:
PARENT.LEVEL < 2
You simply set the constant to the number of iterations that are desired.
The main SELECT displays the result of the table expression. The LEVEL column in the
final result makes the origin of each result row clear: rows that came from the initialization
SELECT have a level of 0, rows from the first iteration have a level of 1, rows from the
second iteration have a level of 2, and so on. The ORDER BY puts the result in a
convenient sequence.
Note that LEVEL is not a column of table PARTLIST. It does not have to be added to table
PARTLIST using an ALTER TABLE statement. It is a “virtual” column created by the SQL
statement.
The actual result of the recursive SQL statement is illustrated on the next visual.
Notes:
3-102 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
Recursive SQL - Recommendations
Notes:
Recursive SQL is cyclical by definition. This means that it is easy to cause infinite loops if
the SQL is coded incorrectly or if the data itself is cyclical. For example, if the join in the
iterative select in the earlier examples was coded as:
PARENT.SUBPART = CHILD.SUBPART
Infinite recursion would occur if there were even one row where the Part and Subpart
values were the same. By the same token, a loop can occur if the data were illogical. For
example, if the PARTLIST table had a row where the Part was 05 and the Subpart was 01,
a loop would occur. To prevent this sort of problem, you should desk-check all recursive
SQL. Also, test it against small tables before implementing it in production.
Any recursive SQL statement that does not use a control variable receives an SQL warning
(SQLSTATE = 01605 and SQLCODE = +347 WARNING). Although this is not a serious
problem, you can use techniques shown on the preceding visuals to avoid it.
3-104 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Notes:
The column attribute AS IDENTITY was introduced in the DB2 for OS/390 Version 6
refresh through APAR PQ30652 and was delivered as part of DB2 Version 7. DB2 for z/OS
Version 8 enhances identity columns by extending the ALTER COLUMN clause of the
ALTER TABLE SQL statement to include the identity column specification. In addition,
there is a close tie between the enhancements to identity columns and sequences
discussed in the next topic.
Prior to Version 8, you could not alter the characteristics of an identity or whether the
GENERATED keyword used the ALWAYS or BY DEFAULT options. Not being able to alter
the GENERATED option was one of the real challenges of using this option when it was
initially introduced.
Since sequence objects build on the existing infrastructure of identity columns, we will
discuss the identity enhancements first.
3-106 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
Identity Column Considerations
Before Version 8, if you had a requirement to unload and reload your
tables, you were forced to specify GENERATED BY DEFAULT for the
identity column
If you had specified GENERATED ALWAYS at table design, the only
option to unload/reload would be to:
Unload the table
DROP the table
Re-CREATE the table using GENERATED BY DEFAULT
Reload the table
Otherwise, DB2 will generate new values for the rows during the reload,
which is probably not what you want
With Version 8, if you specify GENERATED ALWAYS and later have a
requirement to unload/reload your tables, you could:
ALTER TABLE ALTER COLUMN SET GENERATED BY DEFAULT
Unload the table
Reload the table
ALTER TABLE ALTER COLUMN SET GENERATED ALWAYS
Notes:
Two keywords work in conjunction with identity columns: GENERATED BY DEFAULT and
GENERATED ALWAYS. There is a significant difference on how these values affect the
generation of values for a column defined with AS IDENTITY. GENERATED ALWAYS will
always generate a value for its column. GENERATED BY DEFAULT will only generate a
value if a value does not already exist.
The preferred way to use identity columns is GENERATED ALWAYS. However, before
Version 8, if you had a requirement to unload and reload your tables, you were forced to
specify GENERATED BY DEFAULT for the identity column. If you had specified
GENERATED ALWAYS at table design, the only option to unloading and reloading the
table would be to unload the table, DROP the table, and then re-CREATE the table using
GENERATED BY DEFAULT. Only after all of those steps could you reload the data.
If this procedure was not followed and GENERATED ALWAYS was used, DB2 would
generate new values for the rows during the reload. This is not what you would want to
occur.
With Version 8, if you specify GENERATED ALWAYS and later have a requirement to
unload and reload the table, you could simply use the following procedure:
• ALTER TABLE ALTER COLUMN SET GENERATED BY DEFAULT
• Unload the table
• Reload the table
• ALTER TABLE ALTER COLUMN SET GENERATED ALWAYS
3-108 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
Identity Column Enhancements
Dynamic alter of Identity column attributes
ALTER TABLE ALTER COLUMN extended to:
Enable modification of identity column attributes, and
Specify continuation of the sequence associated with identity column from
a new point in the range of values that is different from where the column
values would otherwise have continued
Only future values of column affected by change
Cannot alter data type of identity column
Unused cache values may be lost when column attributes are altered
New keyword support to aid porting from other vendor platforms
NO MINVALUE
NO MAXVALUE
NO ORDER, ORDER
Allows:
INCREMENT BY to be 0
MINVALUE = MAXVALUE
Notes:
In addition to being able to add an identity to a column, the ALTER SQL statement can now
be used to modify the attributes of an identity column. ALTER TABLE ALTER COLUMN has
been extended to allow you to modify the identity column attributes and specify
continuation of the sequence associated with identity column from a new point in the range
of values that is different from where the column values would otherwise have continued.
Making this change only affects future values of columns. Although the keywords can be
changed, the data type of the identity cannot be altered. If caching is in use, values that
have already been cached may be lost when the identity column is altered.
Version 8 adds new keywords to the syntax of the AS IDENTITY clause. New keyword
support now exist for NO MINVALUE, NO MAXVALUE, NO ORDER and ORDER. The
semantics of INCREMENT BY, MINVALUE, and MAXVALUE have also changed.
INCREMENT BY allows 0, and MINVALUE and MAXVALUE can be equal. The keyword on
the GENERATED clause is also now modifiable via the ALTER SQL statement. You can
dynamically switch between GENERATED ALWAYS and GENERATED BY DEFAULT.
Notes:
AS IDENTITY is an attribute of the GENERATED keyword on the CREATE TABLE SQL
statement. With each new version of DB2, identity columns were enhanced.
Version 6
Initially, identity columns only allowed three option keywords. START WITH, if used,
specified the first, or starting, value for the identity column. This value had a numeric data
type and could be any positive or negative value with a zero scale. If not specified, this
keyword defaulted to 1.
The second allowed, but optional, keyword was INCREMENT BY. INCREMENT BY defines
the interval between subsequent sequentially generated values. This keyword allowed any
non-zero positive or negative numeric value with a zero scale within the range of a large
integer. If not specified, this keyword defaults to 1.
The final grouping of keywords all have to do with caching the generated value to help
improve performance. You have two options when dealing with caching:
3-110 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty • The first option turns caching on by specifying CACHE followed by the number of
values that should be cached. Any value greater than 2 within the range of an integer
can be used, and if not specified, the default is 20.
• The other option is to turn caching completely off with NO CACHE. Caching sequence
values in memory is a performance and tuning option that promotes faster access to the
sequence values when the application can handle the behavior. NO CACHE turns off
the caching mechanism. A data sharing environment is one example where NO CACHE
might be specified to avoid the possibility of out of sequence values.
Version 7
So what happened in Version 7? What did DB2 do to improve identity columns? Version 7
added CYCLE, MINVALUE and MAXVALUE to the definition of an identity column. CYCLE
allows an identity to wrap around to a new beginning when the minimum or maximum value
is reached. Cycling through values a second time can create duplicate values. NO CYCLE
caused the identity value to stop being generated when the minimum or maximum was
reached. NO CYCLE is the default.
MAXVALUE specified the maximum value that can be generated for this identity column.
This value can be any negative or positive value that is greater than the MINVALUE.
Having the MAXVALUE greater than the MINVALUE was a Version 7 requirement. If
MAXVALUE is not specified, the default for an ascending sequence is the greatest value
allowed by the data type. For a descending sequence, it is the START WITH value if
specified or –1 if no START WITH was used.
MINVALUE specified the minimum value that could be generated for this identity column.
This value can be any negative or positive value that is less than MAXVALUE. If not
specified, the START WITH value, or 1 if a START WITH value was not specified, is the
default for an ascending sequence. For a descending sequence, the default would be the
lowest value for the data type.
MINVALUE, MAXVALUE, and CYCLE have the same semantics as sequence objects and
are explained again in more detail in the next topic.
All keywords introduced in Version 6 were carried forward into Version 7 with no change in
their definitions.
Version 8
Version 8 enhances identity columns by adding additional keywords and changing the
semantics of some of the existing keywords.
ORDER, NO ORDER, NO MINVALUE and NO MAXVALUE are the keywords added to
identity columns in Version 8. NO MINVALUE and NO MAXVALUE also become the
defaults if either MINVALUE or MAXVALUE is not specified.
NO MINVALUE specifies that no minimum end point of the range has been set. The
minimum value for an ascending sequence becomes the START WITH value, or 1 if a
START WITH value was not specified. For a descending sequence, the default would be
the lowest value, the largest negative value, for the data type of the column the identity is
assigned.
NO MAXVALUE specifies that no maximum end point of the range has been set. The
maximum value for an ascending sequence is the greatest value allowed by the data type
for that column. For a descending sequence, it is the START WITH value if specified or –1
if no START WITH was specified.
The semantics of MINVALUE and MAXVALUE have also changed. In Version 8, any
negative or positive value including zero can be specified. Version 8 also allows
INCREMENT BY to be set to zero. In addition, MINVALUE can be equal to MAXVALUE.
ORDER and NO ORDER specifies whether or not the identity values must be generated in
the order of the request. NO ORDER is the default. NO ORDER specifies that the values
do not need to be generated in order of request, while ORDER specifies that the values are
generated in order of request. In data sharing environments where sequence values are
cached by multiple DB2 members simultaneously and the CACHE option is used, the value
assignments may not be in strict numeric order unless you also specify the ORDER option.
3-112 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
Identity Column -
ALTER TABLE ... ALTER COLUMN
ALTER TABLE table-name ALTER column alteration
column alteration:
COLUMN
column-name SET DATA TYPE altered-data-type
generation-alteration
identity-alteration
identity alteration:
(1)
RESTART
WITH numeric-constant
SET INCREMENT BY numeric-constant
SET NO MINVALUE
MINVALUE numeric-constant
SET NO MAXVALUE
MAXVALUE numeric-constant
SET NO CYCLE
CYCLE
SET NO CACHE (1) At least one option must be
CACHE integer-constant specified and the same clause must
SET NO ORDER not be specified more than once.
ORDER integer-constant
Figure 3-69. Identity Column - ALTER TABLE ... ALTER COLUMN CG381.0
Notes:
In DB2 Versions 6 and 7, the only ALTER operation that could be performed that involved
an identity column was to add it to a column definition. Once added, or if the table had been
created with an identity column, none of the characters of that identity could be changed.
This all changes with the arrival of DB2 Version 8.
The ALTER COLUMN portion of the ALTER TABLE SQL statement has been extended in
Version 8 to include the ability to modify the characteristics of the GENERATED and AS
IDENTITY clauses.
The GENERATED value can be changed by coding SET GENERATED ALWAYS or SET
GENERATED WITH DEFAULT as the column-alteration value of the ALTER COLUMN
clause. Because GENERATED has been a part of DB2 for some time now, there will be no
further discussion here.
The next column-alteration value is RESTART WITH. Altering this value changes the
starting point of the next value generated for an identity column. This keyword can be set to
any negative or positive value including zero, that could be assigned to column’s data type.
If RESTART is specified without WITH, the sequence is restarted with the START WITH
value the identity column was originally created with. An example of using RESTART WITH
is to correct a gap in sequence number possibly cause by the loss of cached sequence
values.
SET INCREMENT BY, SET MINVALUE or NO MINVALUE, SET MAXVALUE or NO
MAXVALUE, SET CYCLE or SET NO CYCLE, SET CACHE or SET NO CACHE and SET
ORDER or SET NO ORDER are additional values that can be specified on the ALTER
COLUMN clause. The values to be SET are discussed in detail later in the topic of the
course describing sequence objects.
Altering an identity column only affects future values of that column and there is limited
validation performed by DB2 when an identity column is altered. Validation is performed
for the value specified for RESTART WITH to ensure that it conforms to the same rules as
for START WITH at the time the column's original definition was created. Keywords and the
values specified for those keywords are also validated.
Any values of an identity column that are not specified on the ALTER are left unchanged.
No validation is performed to verify the affects of altering an identity column on existing
values in that column. For example, if an ascending sequence is changed to descending,
no messages are generated to warn you of the possibility that duplicate values could be
created.
3-114 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Notes:
During the design of a new table, it is decided that the primary key should have some kind
of business intelligence, some kind of meaning. The choice for the key is an ascending
sequential value. At one time this was the easiest part of the process. The difficult task
came when the application actually needed to “generate” that next sequential value for the
INSERT.
It was common practice to SELECT MAX(seq_key) and add one to the value for use in
your INSERT. Of course, the primary key needed to be unique. Somehow, the transaction
had to ensure that no other transaction needing the same key, performed any processing
against the table until the current transaction had inserted its row. ISOLATION (RR) could
be used to prevent other transactions from incrementing the key value until the current
transaction could commit.
Another popular method was a single row table that contained the next higher key value.
The application would retrieve the key and update the table with the next key to be used.
This caused serialization within a commit scope. This led to locking conflicts if there was a
high INSERT rate.
3-116 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty The one-row table approach also has shortcomings in a data sharing environment. The
page containing the counter can easily become a hot spot in the data base, resulting in
unpredictable transaction delays caused by buffer invalidation and refresh. This contention
inhibits transaction throughput and the application's processing power. In addition, if one
DB2 member fails, retained locks that are held by the failed member can prevent access to
the shared counter from the surviving members. As you will see, DB2 has a solution
forthcoming.
DB2 Version 6 introduced the column attribute AS IDENTITY. This turned out to be only a
partial solution. Identity columns are tied to a specific table and cannot be used
independently from that table. In addition, a table can have one and only one column that
specifies the identity attribute. Manageability of identity columns becomes an issue
because attributes of an identity cannot be altered in Version 7. So, Version 8 introduced
improvements to identity columns that were discussed in the Figure 3-65 "Identity Column
Enhancements" on page 3-106. However, for now, let us focus our attention on something
completely new to DB2 of z/OS.
Notes:
What is the significance of sequences and why are they preferred over the identity
attribute? There are a couple of reasons. First, compatibility with other major database
management systems, including the distributed DB2s, is needed. Sequences became
available in DB2 for UNIX, Windows, and OS/2 in Version 7.2, and the support for
sequences in DB2 for z/OS V8 is part of an effort to make SQL transparent across all DB2s
on all platforms.
Second, sequences are completely stand-alone objects and have no connection to a table.
Because they stand by themselves, they can be used by multiple applications in different
ways. A sequence is a stored object that simply generates the next ascending or
descending value when requested by an application. Sequences provide an excellent way
for an application to obtain unique values for use in key structures.
A sequence is defined using SQL DDL statements and the attributes of the sequence can
be explicitly defined by the user, can use defaults, or a combination of both. The values
generated by a sequence can be SMALLINT, INTEGER or DECIMAL with a zero scale.
The user has complete control over the starting value of the sequence using the START
3-118 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty WITH keyword and can change the starting value via the ALTER SQL statement by
specifying RESTART WITH keyword.
How the sequence is incremented is also under the user control through the use of the
INCREMENT BY keyword. A minimum (MINVALUE) and maximum (MAXVALUE) value
can be specified creating an upper and lower boundary for the sequence range. When
either of the range boundaries is reached, the user can choose to cycle (keyword CYCLE)
through the sequence again or terminate (NO CYCLE) the sequence generation with an
error.
Sequences also provide some performance relief when an application must deal with
generating sequential values for use in key structures. A caching capability (CACHE
keyword) is available that makes the next sequence value more readily available. In
addition, DB2 does not need to wait for an application to commit after incrementing a
sequence to allow another different transaction from using that same sequence.
Concurrency, as mentioned earlier, has always been an issue when attempting to generate
sequential values for this reason, and sequences can resolve this problem. This also
carries forward into a data sharing environment. If one data sharing member fails, there are
no locks retained on the sequence preventing the surviving DB2 from using it.
Finally, the user has the ability to change (ALTER) any attribute, other than the data type, of
a sequence at any time.
Can use one sequence for many One to one relationship between identity
tables or many sequences in one table and tables
Notes:
DB2 Version 8 introduces sequence objects. Sequence objects build on the concepts
introduced with the identity attribute in previous versions of DB2 with some significant
enhancements. Sequences are stand-alone objects that generate sequence values when
requested by an application. These values can be used by that application for whatever
purpose the application chooses. However, identity columns are associated with a specific
column in a specific table and can only be used to supply a value for that column.
3-120 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty VALUE FOR and PREVIOUS VALUE FOR to retrieve the next generated or previously
generated values from the sequence.
These expressions are not allowed against identity columns. To retrieve a value supplied
by an identity, that column should be SELECTed or retrieved with the
IDENTITY_VAL_LOCAL function. The sequence object used can also be displayed as it is
used in an INSERT SQL statement using a SELECT ... FROM FINAL TABLE(INSERT …).
Unlike a sequence, which uses the ALTER DDL statement to modify the sequence
attributes, an identity column’s characteristics can only be altered using the ALTER TABLE
and ALTER COLUMN statements. The ability to alter an identity column’s attributes is an
option introduced in DB2 Version 8. The ALTER SEQUENCE DDL statement cannot be
used with an identity column.
This topic discusses the details behind creating and using sequence objects along with
their advantages over other techniques.
ALTER SEQUENCE
Can be used to change INCREMENT BY, MIN VALUE, MAXVALUE, CACHE,
CYCLE and to RESTART WITH different sequence
Only future values affected and only after COMMIT of ALTER
Cannot alter data type of sequence
Unused cache values may be lost
DROP SEQUENCE
COMMENT ON SEQUENCE
GRANT/REVOKE ON SEQUENCE
NEXT VALUE FOR and PREVIOUS VALUE FOR
Notes:
DB2 Version 8 introduces five SQL statements and two expressions in support of sequence
objects.
CREATE
The CREATE SEQUENCE SQL statement creates a sequence object at the application
server. The sequence object is a user-defined object that generates sequential numeric
values. The CREATE is also used to describe the sequence value’s specifications.
ALTER
The ALTER SEQUENCE SQL statement changes the attributes of the sequence object
such as INCREMENT BY, MIN VALUE, MAX VALUE, CACHE, CYCLE and the point the
sequence should be restarted at. Only future values are affected and then only after the
ALTER has been committed. Whenever a sequence is ALTERed, there is always the risk of
3-122 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty creating duplicate values, so care should be taken. If uniqueness is critical to the
application, an unique index can be defined for the column using the sequence value.
DROP SEQUENCE
The DROP SEQUENCE statement drops the sequence object and removes a sequence
object description from the DB2 catalog.
COMMENT ON SEQUENCE
This statement allows a user to supply a comment or description for a sequence in the
REMARKS column of the DB2 catalog table SYSIBM.SYSSEQUENCES.
GRANT/REVOKE
This statement is used to grant or revoke the ALTER or USAGE privilege for a user defined
sequence or list of user defined sequences from an authorization identifier (auth ID, or list
of auth IDs, or from PUBLIC.
INTEGER
AS data-type
Notes:
The CREATE SEQUENCE SQL statement creates a sequence object at the application
server. The sequence object is a user-defined object that generates sequential numeric
values. The CREATE statement also describes the sequence value’s specifications. DB2
records the characteristics of a sequence in the catalog tables SYSIBM.SYSSEQUENCES,
SYSIBM.SYSSEQUENCEDEP, and SYSIBM.SYSSEQUENCEAUTH.
The catalog table SYSIBM.SYSSEQUENCEAUTH is new in Version 8. The catalog tables
SYSIBM.SYSSEQUENCES and SYSIBM.SYSSEQUENCEDEP were initially introduced in
Version 6 to support identity columns and extended in Version 8 in support of sequences.
The catalog tables that support sequences are discussed in more detail in Figure 3-94
"Catalog and Directory Changes" on page 3-158.
This statement can be specified in an application program or can be prepared dynamically.
It can be prepared dynamically only if DYNAMICRULES run behavior is implicitly or
explicitly specified. The following keywords can be specified on the CREATE SEQUENCE
statement.
3-124 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty sequence_name
The sequence_name identifies the sequence and is a required value. A schema name can
implicitly or explicitly qualify the sequence name. The sequence name can be up to 128
characters. If explicitly qualified, the schema can also be up to 128 characters and must be
separated by the name portion with a period. The sequence name, when qualified with a
schema name, must be unique at the current server. It is the sequence creator’s
responsibility to ensure that the name chosen for the sequence does not conflict with any
sequence names generated for use with identity columns. The keyword sequence_name is
the only required value. All other keywords are optional. SQL PATH has no affect on a
sequence.
AS data type
The AS keyword specifies the data type of the sequence value. The data type must be one
of the three DB2 numeric data types: SMALLINT, INTEGER, or DECIMAL with a zero
scale. It can also be a user defined distinct type sourced on one of the above three numeric
data types. The sourced type must be the exact same numeric type and must have a zero
scale.
If SMALLINT is specified, the sequences have a range from –32768 through +32767. If
INTEGER is chosen, the range is –2147483648 to +2147483647 and decimal allows
31-digit precision, both negative and positive values.
If the AS keyword is not specified, the default data type used for this sequence is an
INTEGER.
START WITH
START WITH is an optional keyword that specifies the sequence’s starting, or first, value.
Because the START WITH keyword does not have to be explicitly specified, the value
specified for MINVALUE or MAXVALUE could be used. For an ascending sequence, the
value from the MINVALUE keyword, or the MINVALUE default if MINVALUE is not
specified, would be used. If the sequence is descending, the value used for the
MAXVALUE keyword, or the MAXVALUE default if MAXVALUE is not specified, is used.
The following rules govern the values that can be specified for the START WITH keyword:
• START WITH can specify any negative or positive value, including zero, that is valid for
the data type stated in the AS data_type keyword
• START WITH cannot contain any non-zero values to the right of the decimal point.
There is no requirement for a sequence to be started within the range defined by
MINVALUE and MAXVALUE when using the START WITH keyword. Any value within the
data type range is allowed regardless of the MINVALUE or MAXVALUE values. The first
value of a sequence always starts with the START VALUE value. However, once the
sequence reaches the end of the logical range of values established by MINVALUE,
MAXVALUE, or their defaults, and the CYCLE option is in effect, the sequence wraps
around to the first value of the other end of the range. The first value after being cycled may
or may not be the same as the original START WITH value if the START WITH value is
different than the established MINVALUE or MAXVALUE or their defaults.
INCREMENT BY
INCREMENT BY determines the next value in the sequence. If START WITH is equal to 1
and INCREMENT BY is equal to 2, the sequenced values returned would be 1, 3, 5, 7, etc.,
until the MAXVALUE or maximum value for the data type is reached.
INCREMENT BY can be negative, positive, or zero and cannot contain any digits to the
right of the decimal point. If a negative value is specified (INCREMENT BY < 0), the current
sequence is decremented by the INCREMENT BY value to create the next sequence. If a
positive value is specified (INCREMENT BY > 0), the current sequence is incremented by
this value to create the next sequence. If a zero is used (INCREMENT BY = 0), the same
value as the current value is used for the sequence. This method could be employed to
create a constant, non-changing sequence.
If INCREMENT BY is not specified, the default increment is a positive 1.
MINVALUE / NO MINVALUE
MINVALUE defines the minimum end point range. This is the lowest value this sequence
can reach or the last value of a descending sequence. The last value will always be equal
to or greater than the MINVALUE depending on the value of INCREMENT BY. It is also
the value an ascending sequence will restart at if CYCLE is specified and the MAXVALUE
is reached.
If the MINVALUE keyword is used, a numeric value must be specified. The MINVALUE can
be any positive or negative value, including the value zero, within the data type’s range and
must have a zero scale. MINVALUE must also be less then or equal to the MAXVALUE.
If MINVALUE is not specified and left to default, the default value will vary depending on
whether the sequence is descending or ascending. If the sequence is descending, the
default is the minimum value for the sequence’s data type. This is also the largest negative
value for the sequence’s data type. If the sequence is ascending, the default is the START
WITH value specified for the sequence. If the START WITH value is not specified, then
the default MINVALUE is 1 (one).
NOMINVALUE, as a single word, can be used in place of NO MINVALUE.
MAXVALUE / NO MAXVALUE
MAXVALUE specifies the highest value that can be reached by this sequence. Although
this keyword is optional, if used, a value must be specified. DB2 requires the value to be in
the range of the data type being used. If the date type is decimal the scale must be zero. In
addition, the MAXVALUE must be greater than or equal to the MINVALUE, if the
MINVALUE is specified.
3-126 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty If MAXVALUE is not specified and is allowed to default, the default for ascending values is
the largest value allowed by the sequence’s data type. For a descending value,
MAXVALUE will default to the START WITH value or a -1 if START WITH is not specified.
The MAXVALUE does not necessarily have to be reached within a cycle of value. If an
INCREMENT BY value great than 1 was used, the MAXVALUE may not be reached
depending on the value specified. For example, suppose that MAXVALUE is set to 50 and
INCREMENT BY is set to 3. On the 17th sequenced value, 49 is returned. The next
increment would take the sequence to 52, a value out of the range for this sequence. This
sequence would then recycle at 49 rather than 50.
If NO MAXVALUE, the default if this keyword is not used, is in effect, the maximum value
for an ascending sequence is the largest value allowed by the data type. For a descending
sequence, the START WITH value is specified. If START WITH is not specified, then –1 is
used.
NOMAXVALUE, as a single word, can be used in place of NO MAXVALUE.
CYCLE / NO CYCLE
The CYCLE and NO CYCLE keywords determine what the sequence should do if and
when the minimum or maximum value for the range is reached. If CYCLE is chosen,
values will continue to be generated for this sequence after the minimum or maximum
value for the range is reached:
• If a descending sequence reaches the minimum value for the range, the maximum for
the range is generated as the next value in the sequence.
• If an ascending sequence reaches the maximum value for the range, the minimum for
the range is generated as the next value in the sequence.
Processing then starts over at the new first value. This generated new value does not
have to be equal to the START WITH value. An application using this sequence should be
aware that duplicate values can be generated if this sequence is cycled.
NO CYCLE is the default. If NO CYCLE is specified or no cycle keyword is not specified
and defaulted to, and the MINVALUE or MAXVALUE is reached, an error occurs and no
sequence value is generated. This sequence would have to be altered to continue using
this sequence to generate values, or be dropped and recreated with a different data type
that would allow more values to be generated.
The first cycle always begins with the START WITH value if specified or the default START
WITH value 1 if START WITH is not specified. However, when specifying CYCLE and
MAXVALUE is reached, the sequence restarts at the MINVALUE. It is possible then that
subsequent sequences could have a different number of values than the first sequence.
NOCYCLE, as a single word, can be used in place of NO CYCLE.
CACHE / NO CACHE
The CACHE keyword specifies whether or not sequenced values will be pre-allocated in
memory. This option is used to improve performance in a non-data sharing environment
and in a data sharing environment where the application can tolerate the possibility of out
of sequence values.
The number of cached sequence values must be a positive integer value greater than one.
If nether the CACHE or NO CACHE keywords are specified, the default is CACHE 20.
Caching sequence values could improve performance because it reduces synchronous I/O
during a sequence request to the DB2 catalog table SYSIBM.SYSSEQUENCES to retrieve
the next sequence value.
Caching is performed independent of cycles. If the MAXVALUE is reached during the last
cache cycle, only values from the current cycle are cached. For example, if CACHE is set
to 20 and only 10 sequence values are created before MAXVALUE is reached, only those
10 values will be cached. The values from the next cycles will be cached on the next cache
request.
Data sharing considerations: Strict sequential order cannot be guaranteed in a data
sharing environment with sequence numbers being cached on multiple members. In data
sharing, each data sharing member has its own cache. However, there is only one
SYSIBM.SYSSEQUENCES to request values from. If we have a 2-way data sharing
system, for example, an application that is running on both DB2 members at the same time
and caching of the sequence is active, each application can alternately request a sequence
on each member and the sequence would be satisfied from that member’s set of cached
sequence numbers. If CACHE is set to 20, each member would have 20 cached values.
Member 1 would have cached 1 through 20 and member 2 cached 21 through 40. If the
application alternated between members, the sequence order would be 1, 21, 2, 22, 3, 23,
4, etc. If strict order is necessary, NO CACHE should be specified with the ORDER
keyword.
NOCACHE, as a single word, can be used in place of NO CACHE.
ORDER / NO ORDER
The ORDER / NO ORDER keywords specify whether or not the sequence numbers must
be generated in order of request.
If ORDER is specified, sequences are generated in order of request. If order is important to
the application, then ORDER should be specified.
If NO ORDER, the default, is specified, the sequence values do not need to be generated
in order of request.
In data sharing environments where sequence values are cached by multiple DB2
members simultaneously and the CACHE option is used, the sequence value assignments
may not be in strict numeric order unless you also specify the ORDER option.
NOORDER, as a single word, can be used in place of NO ORDER.
3-128 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
CREATE Authorizations
To CREATE a sequence object, the privilege set needs:
CREATEIN for the schema, or all schemas
SYSADM or SYSCTRL authorizations
Implicit schema match has CREATEIN privilege
Notes:
To create a sequence object, the privilege set must contain CREATEIN for the schema, or
all schemas, SYSADM or SYSCTRL authorizations. An auth ID that matches the schema
name implicitly has the CREATEIN authority for that schema.
If the data type of the sequence is a distinct type, the privilege set must contain the USAGE
privilege for the distinct type.
CREATE Example
Notes:
This is a simple example showing the format of the CREATE statement with all of the
keywords specified. In this example, we have specified INTEGER even though it is the
default. The sequence will start at 1 and increment by 1 to a maximum value of 5. When 5
is reached, the sequence will cycle back to the minimum value of 0 (zero). Five values will
be kept in the cache to assist with performance. On the first pass, that would be all five
values generated. However, on subsequent cycles, six values will be generated.
3-130 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
Sequence Objects - ALTER Statement
(1)
ALTER SEQUENCE sequence-name RESTART
WITH numeric-constant
INCREMENT BY numeric-constant
NO MINVALUE
MINVALUE numeric-constant
NO MAXVALUE
MAXVALUE numeric-constant
NO CYCLE
CYCLE
NO CACHE
CACHE integer-constant
NO ORDER
ORDER integer-constant
Notes:
The ALTER SEQUENCE SQL statement changes the attributes of the sequence object.
The attributes that can be modified by ALTER are: INCREMENT BY, MINVALUE,
MAXVALUE, CACHE, CYCLE, and ORDER. At least one sequence attribute must be
specified on the ALTER statement. In addition, ALTER can be used to restart the sequence
at a different point by specifying the RESTART WITH keyword. ALTER can only affect
future values in the sequence and only takes affect after the ALTER has been committed.
Altering a sequence could cause unused sequences in cache to be lost.
Although you can alter selected keywords describing a sequence, the data type of the
sequence cannot be altered. In order to change the data type of a sequence, the sequence
object must be dropped and re-created.
Be cautious when altering the sequence specification. It is easy to create a
non-incrementing sequence. For example, if you set MAXVALUE to 50 and INCREMENT
BY 100, then every INSERT will exceed the maximum value. DB2 will not give you a
warning or error; it will simply insert a zero for the sequence value.
3-132 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
ALTER Example
COMMIT;
Notes:
In this example, the sequence just created will be modified. It was decided that the original
maximum was a little low, so it is being increased to 1,000,000. The sequence will also no
longer be incremented by one, but rather by one hundred. When the sequence is restarted,
it will restart at 100 rather than the minimum. The minimum, cache size, and cycle values
will all remain the same as the values the sequence was created with.
Also note the COMMIT after the ALTER. The ALTER does not take affect until it has been
committed.
RESTRICT
DROP SEQUENCE sequence-name
Notes:
The DROP SEQUENCE statement drops the sequence object and removes a sequence
object description from the DB2 catalog. The sequence combined with an implicit or explicit
schema qualifier must identify an existing defined sequence in the catalog. DROP
SEQUENCE cannot be used to remove a system generated sequence defined to support
an identity column. The default keyword RESTRICT indicates that a sequence cannot be
dropped if certain dependencies exist. Those dependencies are:
• A trigger exists that uses the NEXT VALUE FOR or PREVIOUS VALUE FOR
expression for the specified sequence.
• An inline SQL routine exists so that a NEXT VALUE FOR or PREVIOUS VALUE FOR
expression in the routine body specifies the sequence.
Dropping a sequence object also drops all privileges associated to that object, and all
packages or plans with a dependency on that sequence are invalidated. A DROP
SEQUENCE SQL statement cannot be used to remove an identity column from DB2. In
addition, a DROP TABLE SQL statement has no effect on a sequence object because
there is no direct tie between a sequence object and a table.
3-134 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
COMMENT Statement
Notes:
The COMMENT SQL statement allows a user to supply a comment or description for a
sequence in the REMARKS column of the DB2 catalog table SYSIBM.SYSSEQUENCES.
A comment can be up to 762 characters. The name of an identity column cannot be
specified on the COMMENT SEQUENCE SQL statement.
Sequence is one of many object types that can be specified on a COMMENT. The
COMMENT SQL statement is not new in Version 8.
, , ,
GRANT ALTER ON SEQUENCE <sequence-name> TO <authid>
USAGE PUBLIC
, , ,
REVOKE ALTER ON SEQUENCE <sequence-name> FROM <authid>
USAGE PUBLIC
RESTRICT
BY <authid>
ALL
Notes:
The GRANT and REVOKE SQL statements are used for granting or revoking the ALTER
and/or USAGE privilege for a sequence or list of sequences to/from an authorization
identifier (auth ID) or to/from PUBLIC. Two flavors of the GRANT and REVOKE privileges
for sequences exist.
Granting the ALTER privilege over a sequence allows the auth ID specified to modify the
characteristics of the sequence or add a comment describing that sequence. Granting the
USAGE (or SELECT, a synonym for USAGE) privilege over the sequence to an auth ID
allows the auth ID specified to invoke the NEXT VALUE and PREVIOUS VALUE SQL
expressions. In both cases, a list of auth IDs or PUBLIC can be specified in place of the
single auth ID.
Revoking ALTER over a sequence removes the ALTER privilege from the auth ID.
Revoking USAGE (or SELECT, a synonym for USAGE) over the sequence removes the
privilege to invoke the NEXT VALUE and PREVIOUS VALUE expressions. REVOKE can
also remove the ALTER and USAGE privilege from a list of auth IDs or from PUBLIC.
3-136 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty The REVOKE SQL statement has the additional keyword RESTRICT. The REVOKE will
default to using the RESTRICT keyword if it is not specified. The keyword RESTRICT
prevents the USAGE privilege from being revoked on a sequence if the revokee owns one
of the following objects, and does not have the USAGE privilege from another source:
• A trigger that specifies the sequence in a NEXT VALUE or PREVIOUS VALUE
expression
• An inline SQL function that specifies the sequence in a NEXT VALUE or PREVIOUS
VALUE expression
The sequence name when qualified implicitly or explicitly with a schema qualifier must
uniquely identify a sequence that exists at the current server.
Notes:
A sequence is referenced by using the NEXT VALUE FOR or PREVIOUS VALUE FOR
SQL expressions and specifying the name of the sequence. The keywords NEXTVAL and
PREVVAL can be used alternately for NEXT VALUE and PREVIOUS VALUE respectively.
3-138 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
Important: Once the NEXT VALUE FOR expression is issued, the returned value is
considered consumed. If a ROLLBACK occurs, it has no effect on the values already
generated. Even though the sequence itself always generates unique values (provided
cycling is not allowed) with no gaps, you can still have gaps, for example, because the
transaction performs a ROLLBACK after a new sequence value was obtained. In this
respect, as in many others, sequences behave exactly the same as identity columns.
More information on gaps can be found in Figure 3-91 "Gaps" on page 3-153.
If the next value for a sequence is generated and it exceeds the maximum value for an
ascending sequence or the minimum value for a descending sequence, an error
(SQLCODE -359) occurs if NO CYCLE was specified. The sequence would need to be
modified to enable cycles for the sequence or dropped and recreated with a different data
type that allowed more values for the sequence.
The sequence value returned by the NEXT VALUE FOR expression has the same data
type as the sequence it was issued against.
• An UPDATE statement can specify both expressions on the SET clause. The
PREVIOUS VALUE expression can be specified for any UPDATE statement. However,
the NEXT VALUE expression cannot be specified within the select clause of a full select
of an UPDATE statement.
• A SET host variable statement can use both expressions.
• The VALUES and VALUES INTO statements can use both expressions as long as they
are not specified in the select clause of a full select or the expression.
• Both expressions can be used in CREATE PROCEDURE, CREATE FUNCTION, and
CREATE TRIGGER statements.
3-140 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
NEXT VALUE and
PREVIOUS VALUE Restrictions
Join condition of a full outer join
Default value for a column in the CREATE or ALTER TABLE
statement
Generated column definition in a CREATE or ALTER TABLE
statement
Materialized query table definition in a CREATE or ALTER TABLE
statement
Condition of a CHECK constraint
Input value specification for LOAD
CREATE VIEW statement
Notes:
There are a number of SQL situations where the NEXT VALUE and PREVIOUS VALUE
expressions cannot be used. These include:
• Join condition of a full outer join
• Default value for a column in the CREATE or ALTER TABLE statement
• Generated column definition in a CREATE or ALTER TABLE statement
• Materialized query table definition in a CREATE or ALTER TABLE statement
• Condition of a CHECK constraint
• Input value specification for LOAD
• CREATE VIEW statement
In addition, the NEXT VALUE expression cannot be specified in the following places:
• CASE expression
• Parameter list of an aggregate function
• Subquery in a context other than those explicitly allowed
• SELECT statement for which the outer SELECT contains a DISTINCT operator or a
GROUP BY clause
• SELECT statement for which the outer SELECT is combined with another
• SELECT statement using the UNION set operator
• Join condition of a join
• Nested table expression
• Parameter list of a table function
• Select-clause of fullselect of an expression in the SET clause of an UPDATE statement
• WHERE clause of the outer-most SELECT statement or a DELETE or UPDATE
statement
• ORDER BY clause of the outer-most SELECT statement
• IF, WHILE, DO ... UNTIL, or CASE statements in an SQL routine
3-142 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
Sequence Examples (1 of 2)
1. Assume sequence created with START WITH 1, INCREMENT BY 1
SELECT NEXT VALUE FOR MYSEQ; Generates Value of 1
SELECT NEXT VALUE FOR MYSEQ; Generates Value of 2
COMMIT;
SELECT PREVIOUS VALUE FOR MYSEQ; Returns most recently
generated value (2)
Notes:
In the first example, the sequence is simply being incremented by one. Each subsequent
SELECT statement generates the next available sequence. The COMMIT has no affect
on the sequence generation. The last SELECT, performing a PREVIOUS VALUE FOR, still
returns the last value generated by the previous SELECT.
The second example uses the sequence generated as part of the SET clause of an
UPDATE. This is valid for both searched and positioned updates.
In the final example on this page, we the Version 8 syntax for a select from insert. This
syntax allows us to examine the sequence value generated without having to perform a
separate SELECT.
Sequence Examples (2 of 2)
Notes:
Sequences can also be used to spread inserted rows across multiple partitions if used to
supply the partitioning value. By manipulating the INCREMENT BY and START WITH
values and specifying CYCLE, a sequence could be generated that would force each new
inserted row to go to a different partition of a partition table space. As an alternative and
because a sequence is independent and not associated to any particular column, all or part
of the generated sequence value could be used as part of the partition key in an attempt to
spread the rows evenly across multiple partitions of a portioned table space.
3-144 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
Cursors
OPEN CURSOR
Notes:
Caution should be the rule of thumb if cursors and sequences are used together.
NEXT VALUE
When working with a SELECT statement, a sequence is generated for every row selected
for the number of rows in the answer set when NEXT VALUE is specified. However, if that
SELECT with NEXT VALUE is defined as part of a cursor, the sequence is generated as
the row is retrieved.
This can cause gaps in the sequence if all the rows retrieved are not accessed. When a
client requests rows using a cursor, groups of rows are blocked together and then sent to
the client to complete the fetch process. If all of the rows in the block are not processed by
the fetch at the client, sequence values could be unused, creating a gap in the sequence.
This could be avoided if FETCH FOR 1 ROW ONLY was specified on the cursor. FETCH
FOR 1 ROW ONLY will affect performance, though.
So the consequences must be carefully weighed between causing gaps in the sequence
and potentially negatively impacting the performance of the fetch. This issue is important
only if gaps in the sequence cannot be tolerated. If performance and the prevention of gaps
are both important, then NEXT VALUE should not be considered for use with the cursor in
a client application.
PREVIOUS VALUE
In almost all cases, PREVIOUS VALUE should be avoided when working with cursors.
PREVIOUS VALUE does not function the same with a cursor as it might with other SQL
statements. The sequence evaluated to PREVIOUS VALUE is the last sequence
generated prior to opening the cursor. The fetch process after cursor open has no affect on
the sequence returned by PREVIOUS VALUE on a cursor, the value returned will always
be the same regardless of the number of fetches that are executed with a cursor with NEXT
VALUE specified.
After the cursor is closed, PREVIOUS VALUE will return the last sequence generated by
the NEXT VALUE expression of the previously opened cursor. There is probably little if any
reason to take advantage of PREVIOUS VALUE when using cursors.
3-146 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
Ranges and Cycles
Sequence can be optionally defined to CYCLE after reaching its
maximum value (or minimum if descending)
Sequence will wrap around to other end of range and start new cycle of
values
Default is NOCYCLE
First cycle always starts with START WITH value but all
subsequent cycles start with MINVALUE (asc sequence) or
MAXVALUE (desc sequence)
If NO CYCLE in effect, sequence value generation will stop when
sequence reaches end of logical range of values
Notes:
Cycles are tied to a range. When the CYCLE keyword is specified and the end of a range is
reached, the sequence cycles back to the beginning of that range. The range itself can be
affected by ALTER/CREATE keywords and by the sequence’s data type.
The data type affects the high value end of a range. The range cannot extend beyond the
highest allowable value within a data type.
If specified, the keywords MINVALUE and MAXVALUE will take precedence if within a data
type’s range. MINVALUE establishes the low end of a range and the point at which the
sequence will start over if CYCLE is used. MAXVALUE is the highest sequence value that
can be generated. When MAXVALUE is reached and CYCLE is specified, the sequence
starts over.
INCREMENT BY can also have an affect on what sequences are generated within a range.
If INCREMENT BY is a value greater than 1 or less than -1, the full range of sequence
values may not be realized. For example, if a sequence is defined with START WITH set
to 1, a MINVALUE of 1, MAXVALUE of 10, and an INCREMENT BY of 3, the highest
sequence that can be generated is 7. The sequence requested would cause this sequence
to cycle back to the MINVALUE of 1.
A variation of this problem could be using an INCREMENT BY value that is too large. Using
the same basic setup as above with an INCREMENT BY value set to 10, MAXVALUE
would be reached every time a sequence is requested so the only sequence value that
would ever be generated would be the MINVALUE of 1 because the sequence would
constantly cycle.
3-148 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
Generating Constants
CREATE SEQUENCE consequence AS INTEGER
START WITH 1
INCREMENT BY 0
MINVALUE 0
MAXVALUE 5
CYCLE
CACHE 5
NO ORDER;
Notes:
A constant, non-changing sequence can be generated by specifying an INCREMENT BY
keyword set to 0 (zero). Every sequence generated would be the same as the START
WITH value, the first sequence generated. The MAXVALUE or high value of the data type
could never be reached and a cycle would never be performed for this sequence if the
CYCLE keyword were specified. The START WITH value would have to be within the
MINVALUE and MAXVALUE keyword values.
If START WITH were equal to MINVALUE and MINVALUE were equal to MAXVALUE and
CYCLE were specified, the START WITH value would be generated repeatedly. However, if
NO CYCLE were specified, one sequence value would be generated equal to the START
WITH value and a subsequent attempt to generated a sequence would receive an error.
This is true even if INCREMENT BY is a non-zero value.
Duplicate Sequences
Sequences are guaranteed to be unique within a cycle
However, duplicates can occur if:
Sequence cycles while data from previous cycles still exist
Sequence is restarted with a value that has already been generated
Ascending/descending direction of sequence is reversed by ALTER
statement (that is, changing INCREMENT BY from positive to negative
number, vice versa)
System crashes followed by COLD START or CONDITIONAL RESTART
and skips forward recovery leaving SYSIBM.SYSSEQUENCES table in
inconsistent state
A point-in-time recovery of SYSIBM.SYSSEQ table space regresses
SYSIBM.SYSSEQUENCES table to point in time, causing
MAXASSIGNEDVAL to become inconsistent with actual current point
of the sequence
Notes:
DB2 will make every attempt to ensure that all sequences generated are unique. However,
this is not always possible. If CYCLE is specified for the sequence, the sequence is
restarted and the same sequence values are generated. If the previous set of values are
still in use, the new set will be duplicates of the previous set.
The same can be true if the RESTART WITH keyword is specified and the sequence is
restarted with a value that is already in use or the direction the sequence is generated is
reversed by altering the INCREMENT BY keyword from positive to negative or negative to
positive. One also must be careful to not recover SYSIBM.SYSSEQUENCES to a prior
point-in-time, forcing the MAXASSIGNEDVAL to get out of sync with the actual values
being cached.
Finally, conditional restart and cold start of the DB2 subsystem do not process all log
records and could leave SYSIBM.SYSSEQUENCES in an inconsistent state. Duplicate
values can be avoided if a unique index is created on the column that will contain the
sequence value. Prevention is accomplished by generating an error and having the insert
process fail for that sequence value.
3-150 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
Caching
CACHE option allows DB2 to cache preallocated values in memory
for fast access
Reduces synchronous I/O to the catalog table SYSIBM.SYSSEQUENCES
I/O to table only required when cached values are exhausted
Recommended value is 20
Assigning a higher value gives no benefit and increases size of gap should a
failure occur
In data sharing, each member will have its own set of cached values
for a single sequence
Numbers will not be allocated in sequence
Notes:
The performance of allocating the next sequence number can happen faster with caching
than without caching because a range of sequence numbers is allocated in DB2 memory
when caching is in effect. When the CACHE keyword is specified and the cache is
allocated, the first value of the cache is assigned to the sequence and the
MAXASSIGNEDVAL column of the SYSIBM.SYSSEQUENCES table will contain the last
value in this cache. Subsequent values assigned to the sequence come from the set of
cached values that have not been assigned yet, until all the cached values are exhausted.
There is no update of the SYSIBM.SYSSEQUENCES table while the values from a cache,
except the very first value, are being assigned. When caching is not in effect, each
assignment of a sequence value results in an update of the catalog as compared to having
caching in effect. With caching in effect, the catalog is only updated when the cache is
refreshed and will minimize catalog update activity. Catalog locking should not be an issue
when SYSIBM.SYSSEQUENCES is being updated.
A cache value should be chosen that allows access to more successive sequence values
while minimizing the number of I/Os to the catalog table. Care should be taken though, to
not chose a number too large. The larger the cache, the greater the number of unused
values could be lost in the event of a system failure. These lost values represent a gap in
the sequence.
The recommended value for the CACHE keyword is 20. Assigning a higher value gives no
benefit and increases the size of the gap should a failure occur. Remember also, that in a
data sharing environment, each data sharing member will use its own cache and set of
cached values for a single sequence.
3-152 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
Gaps
Gaps are possible if:
Transaction advances sequence and then rolls back
SQL statement leading to generation of next value fails after value generated
NEXTVAL used in SELECT statement of cursor in DRDA where client uses
block-fetch and not all retrieved rows are FETCHed
Sequence or identity column associated with sequence is altered and then
ALTER rolled back
Sequence or identity column table DROPped and then DROP rolled back
SYSIBM.SYSSEQ table space is stopped, leading to loss of unused cache
values
DB2 system failure or shut-down leading to loss of unassigned cache values
lost causing gap in sequence
Note that a transaction incrementing a sequence twice may not be
assigned consecutive values
Big gaps could be removed by Altering sequence using RESTART
WITH parameter
Notes:
A gap is any unintentional break in a sequence. Unintentional here means the gap was not
the result of using the INCREMENT BY keyword of the CREATE or ALTER SQL statement.
If INCREMENT BY specifies a value greater than 1 or less than -1, the generated
sequences, although sequential, will not be without gaps. The gaps that are of concern to
us are the unintentional gaps caused by anything other than the INCREMENT BY keyword.
There are a number of ways gaps might be caused in a sequence. If a transaction is rolled
back after the sequence has been generated, that generated sequence is lost and a gap
will exist. The SQL statement using the sequence is all that is rolled back; the sequence
itself continues to increment forward. Similarly, if an SQL statement fails after the sequence
for that SQL statement has already been generated, the generated sequence is once again
lost and a gap will exist. Another example of how a gap might be created is given in Figure
3-86 "Cursors" on page 3-145, where the use of NEXT VALUE and cursors was discussed.
Generated sequences can also be lost if something happens to the cache. If the cache is
lost, all sequences not used in the cache are also lost, causing a gap. The cache can be
lost by stopping the SYSIBM.SYSSEQ table space, or if the DB2 subsystem is stopped or
the DB2 subsystem crashes. DDL can also negatively affect a sequence. If the
SEQUENCE is dropped or altered and the drop or alter is rolled back, gaps could be left in
the sequence.
Intentional gaps could also result if multiple transactions are processing using the same
sequence name.
3-154 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
Data Sharing
SYSIBM.SYSSEQUENCES
SEQ1 41 20
DB2A DB2B
Notes:
Caution should be taken using sequence cache in a data sharing environment. If caching is
in effect and multiple tasks cause a sequence to be generated in multiple DB2 members,
each member would cache some number of sequence values. For example, if CACHE
were set to 20, member 1 would cache values 1-20 and member 2 would cache values
21-40. However, the task running would simply take the next value from their particular
cache. So if task A running on member 1 requested the next sequence, it would get the
value 1 from that member’s cache. Then, if another task running on a different member is
next to request a sequence, it would get 21 from its cache. Task A would then get 2, and so
on. The generated sequence values being assigned are not in sequential order.
For data sharing systems, if sequence numbers must be assigned in strict numeric order,
then the NOCACHE option must be used. This consideration does not apply for non-data
sharing subsystems where the assigned numbers will always be in strict numeric order,
since there is only one active cache at any given time.
Recoverability
If DB2 fails, sequence status is recovered from catalog and logs, thereby
guaranteeing unique sequence values continue to be generated
Unassigned sequence values kept in cache of failing members are lost
With efficient use of sequence, gaps can be minimized
DB2 may generate duplicate sequence numbers after restart if no log records
are processed in forward recovery
If there is a gap between first value assigned before system crash and value
assigned after system crash
ALTER sequence to RESTART WITH value that is next value in sequence to be
assigned
DROP and then re-CREATE sequence specifying a START WITH value that is next
value in sequence to be assigned
SELECT MAX(colname) or SELECT MIN(colname) may give actual last assigned value
(colname is column to which sequence numbers were being assigned) -- works only if
every value generated goes into one table -- won't work if CYCLE used
Notes:
If DB2 fails, sequence status is recovered from catalog and logs, thereby guaranteeing that
unique sequence values continue to be generated. However, this does not preclude the
chance that some sequence values can be lost. For example, unassigned sequence values
kept in the sequence cache of failing member(s) are lost. If this happens and you are able
to determine the last sequence used, an ALTER sequence specifying the RESTART WITH
keyword could reset the sequence back to the correct next value. You can also DROP the
sequence and recreate the sequence specifying the START WITH keyword with a value
equal to the next sequence that should be generated. In some instances, it is possible that
duplicate sequences could be generated after a restart if no log records are processed in a
forward recovery. With efficient use of sequence, gaps and duplicates can be minimized.
In some instances it may be possible to determine the last sequence used with a SELECT
MAX(colname) or SELECT MIN(colname) where colname is the column the sequence was
assigned. This test will only work if the sequence is not used in multiple tables and the
sequence is not defined to cycle. If the sequence is used by multiple tables, the test above
would have to be applied to every table and the results compared to determine the actual
3-156 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty last sequence used. If the CYCLE keyword was specified, it is possible that the sequence
has cycled and all values would appear to be used.
Selecting the column MAXASSIGNEDVAL from SYSIBM.SYSSEQUENCES may also give
you inaccurate results. Although it will reflect the last sequence value assigned, that value
may have been assigned to a cache. If a failure occurred and the values still in the cache
were lost, MAXASSIGNEDVAL may be higher than the actual last value used. The best
practice of course, is not to skip any of the recovery steps during a restart, thus minimizing
the opportunities to corrupt the sequence.
Notes:
Here we discuss the use of several catalog tables involved when using sequences.
SYSIBM.SYSSEQUENCES
SYSIBM.SYSSEQUENCES is an existing catalog table in SYSIBM.SYSSEQ. It became a
DB2 catalog in Version 6 in support of identity columns. In Version 8 this table is still used
to track information about identity columns. However, it now also tracks all information
describing a sequence. SYSIBM.SYSSEQUENCES contains rows for both identity
columns and sequences.
Two new columns, PRECISION and RESTARTWITH, have been added to the
SYSIBM.SYSSSEQUENCES for Version 8.
The PRECISION column records the precision of the numeric data type chosen. For
SMALLINT the precision is 5, for INTEGER the precision is 10, and the actual value is
coded by the user for decimal. This column contains a value only for rows created in
Version 8. Any rows that existed prior to Version 8 will contain a zero.
3-158 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty RESTARTWITH contains the RESTART WITH value specified on an ALTER SEQUENCE
DDL statement. This column is set to null prior to altering the sequence object and is set
back to null after the first value is generated after the sequence object has been altered.
A number of other columns in SYSIBM.SYSSEQUENCES have taken on new or additional
meanings and uses after migrating to Version 8. REMARKS has been increased to
VARCHAR (762) and contains the comment specified by the user with the COMMENT ON
SEQUENCE SQL statement. Prior to Version 8 this column was only VARCHAR (254) and
was blank.
The already existing column SEQTYPE has an additional new value, an “S”, to signify that
this row is associated with a sequence object. This column only represented an identity
column in DB2 Version 7.
SYSIBM.SYSSEQUENCESDEP
Currently, the relationship between an identity column and the associated DB2-generated
sequence is recorded in the existing catalog table SYSIBM.SYSSEQUENCESDEP in the
SYSSEQ2 Table Space. Some columns have been added as well. DTYPE describes the
type of object that is dependent on this sequence. The object can be an identity column, an
inline SQL function, or blank for entries created prior to Version 8.
Columns describing the sequence name and the schema associated with sequence have
also been added. The qualifier for the object dependent on this sequence has likewise
been added. In addition, the meaning of DNAME and DCREATOR has changed. DNAME
is now the name of the object that is dependent on this sequence, and DCREATOR is the
owner of that dependent object.
SYSIBM.SYSSEQUENCEAUTH
This new DB2 catalog table added in Version 8 records the privileges ALTER and USAGE
held by an user over a sequence. This table is created in the new table space
SYSIBM.SYSSEQ2.
3-160 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Scalar Fullselect
What is it? .....
A scalar fullselect is a fullselect, enclosed in parentheses, that returns a
single value
Allows scalar fullselect where expressions were previously supported
Example:
SELECT PRODUCT, PRICE
FROM PRODUCTS
WHERE PRICE <= 0.7 * (SELECT AVG(PRICE)
FROM PRODUCTS);
Benefits .....
Enhances usability and power of SQL
Facilitates portability
Conforms with SQL standards
Notes:
DB2 UDB for Linux, UNIX, and Windows provides support for scalar fullselect. The
introduction of this support in DB2 UDB for z/OS enhances the usability and power of SQL,
allows applications that use scalar fullselects to be portable without any changes and also
conform with the SQL standards.
A scalar fullselect is a fullselect enclosed in parentheses that can be specified in an
expression and returns either a null or a single value. If more than one value is retrieved, it
results in SQLSTATE 21000, SQLCODE -811, and no value is returned.
3-162 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
Extension to Expressions Syntax
expression:
operator
function
(expression)
+ constant
operator:
- column-name CONCAT
host-variable ||
special-register /
labeled-duration *
case-expression +
cast-specification -
(scalar-fullselect)
Notes:
SQL Syntax has been enhanced to allow for scalar fullselects where only expressions were
allowed prior to V8. An example follows.
For each part, find its price and its inventory:
SELECT PART,
(SELECT PRICE FROM PARTPRICE WHERE PART=A.PART),
(SELECT ONHAND#FROM INVENTORY WHERE PART=A.PART)
FROM PARTS A;
PARTS PARTPRICE
PART PROD# SUPPLIER PART _ PROD# SUPPLIER PRICE
WIRE 10 ACWF WIRE 10 ACWF 3.50
OIL 160 WESTERN_CHEM OIL 160 WESTERN_CHEM 1.50
MAGNETS 10 BATEMAN MAGNETS 10 BATEMAN 59.50
PLASTIC 30 PLASTIC_CORP PLASTIC 30 PLASTIC_CORP 2.00
BLADES 205 ACE_STEEL BLADES 205 ACE_STEEL 8.90
PRODUCTS INVENTORY
PROD# PRODUCT PRICE PART PROD# SUPPLIER ONHAND#
505 SCREWDRIVER 3.70 WIRE 10 ACWF 8
30 RELAY 7.55 OIL 160 WESTERN_CHEM 25
205 SAW 18.90 MAGNETS 10 BATEMAN 3
10 GENERATOR 45.75 PLASTIC 30 PLASTIC_CORP 5
BLADES 205 ACE_STEEL 10
Notes:
The visual shows four tables, PARTS, PRODUCTS, PARTPRICE, and PARTINVENTORY
used in the examples that we use to demonstrate the enhancement. The column PROD#
links all the tables.
3-164 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
Scalar Fullselects in a WHERE Clause
PRODUCT PRICE
RELAY 7.55
SAW 18.90
GENERATOR 45.75
Notes:
The example on the visual demonstrates the use of scalar fullselect in the WHERE clause.
Here the scalar fullselect is used as part of a predicate.
SELECT PRODUCT,
(SELECT COALESCE(SUM(X.COST),0) AS INV_COST
FROM (SELECT (
(SELECT PRICE FROM PARTPRICE P WHERE P.PART = B.PART)
*(SELECT ONHAND# FROM INVENTORY I WHERE I.PART =B.PART)
) AS COST
FROM PARTS B
WHERE B.PROD# = A.PROD# PRODUCT INV_COST
) X(COST) SCREWDRIVER .00
) RELAY 10.00
FROM PRODUCTS A; SAW 89.00
GENERATOR 206.50
Notes:
The example on the visual demonstrates the use of nested scalar fullselects in a SELECT
list. Since the SQL construct is not very easy to follow at a glance, we provide some
explanation. If the AS clause is specified, then the name of the result column is the name
specified on the AS clause. Therefore, in this example, the name of the result column is
COST and is derived by multiplying the values in two columns.
These columns are PRICE retrieved from table PARTPRICE and ONHAND# retrieved from
INVENTORY table. Since the scalar fullselects for these two tables are within the scope of
the SELECT statement for the PARTS table, the columns in the PARTS table can be
referred to in these statements. Since the scalar fullselect for the PARTS table is within the
scope of the SELECT statement for PRODUCTS table, the columns in the PRODUCTS
table can be referred to in the SELECT statement for the PARTS table. The column COST
is passed through the derived table X.
Note, however, that it is not essential to specify the column name COST with the derived
table name X, that is, X(COST) as shown in the example. Merely specifying the name of
the derived table X would also work.
3-166 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty The query works OK because either a null value in the case of product SCREWDRIVER or,
at the most, one value for other products (because of the column function SUM) is retrieved
by the scalar fullselect. Instead, if the scalar fullselect specifies just the column name
COST, it would work OK for the first three products and then result in SQLSTATE 21000
and SQLCODE -811 since more than one value would be retrieved for product
GENERATOR.
CASE expression:
ELSE NULL
CASE searched-when-clause END
ELSE result-expression
simple-when-clause
allows as a search-condition, a
predicate that contains fullselects
(scalar or non-scalar)
searched-when-clause:
Notes:
The visual shows the extended syntax of scalar fullselect in CASE expressions. The
search-condition can be specified as a predicate that contains fullselect.
We now look at an example.
3-168 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
Scalar Fullselects in CASE Expression
Give discount to the parts that have the large inventory and
raise price on the parts that have the small inventory.
Notes:
Here the scalar select participates as part of the search condition of the WHEN clause.
Note that the two SELECT statements make use of the correlation variable N to refer to the
table NEW_PARTPRICE.
A grouping expression
A column function
ORDER BY clause
Notes:
A scalar fullselect cannot be used in the following instances:
• A CHECK constraint in CREATE TABLE and ALTER TABLE statements.
• A grouping expression, which is limited to a list of column names. Even though you can
now (in V8) code an expression in a GROUP BY (see Figure 3-115 "GROUP BY
Expression" on page 3-192), you cannot specify a scalar fullselect in a grouping
expression.
• A view definition that has a WITH CHECK OPTION.
• A CREATE FUNCTION (SQL) statement (subselect already restricted from the
expression in the RETURN clause).
• An aggregate function.
• An ORDER BY clause.
• A join-condition of the ON clause for INNER and OUTER JOINs.
3-170 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Notes:
In general, DB2 V7 allows only one DISTINCT keyword on the SELECT or HAVING clause
of any given query.
With DB2 V7 you can have multiple DISTINCT keywords on the same column only, as in:
SELECT COUNT(DISTINCT(A1)), SUM(DISTINCT A1) FROM T1
However, if you specify multiple DISTINCT keywords on different columns as shown below:
SELECT COUNT(DISTINCT A1), SUM(DISTINCT A2) FROM T1
In this case, you receive an SQLCODE -127 for the reason "DISTINCT IS SPECIFIED
MORE THAN ONCE IN A SUBSELECT".
This restriction causes inefficient execution of a query when multiple distinct operations
must be performed for multiple columns before any column functions, such as AVG,
COUNT, and SUM are applied. Instead of one query retrieving multiple distinct column
values, multiple queries must be executed in Version 7, causing performance degradation.
3-172 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty DB2 V8 removes this restriction, and allows more than one DISTINCT keyword on the
SELECT clause or the HAVING clause for a query. This enhancement is accomplished by
performing multiple sorts on multiple distinct columns. As a result, when multiple distinct
column values need to be processed, only one query is required. This enhancement
supports DB2 family compatibility.
Notes:
The DISTINCT keyword can be used at the statement level, as in:
SELECT DISTINCT C1,C2,C3 FROM T1
Or, it can be used at the column level, as in:
SELECT AVG(C1),COUNT(DISTINCT C2) FROM T1
Executing a query with multiple distinct columns can be costly in terms of the number of
sorts performed and work files created. Therefore, whenever possible, some optimization
may be done to the query by eliminating unnecessary DISTINCT keywords if it is
semantically correct to do so. For instance, the following two SELECT statements are
semantically the same after DISTINCT is removed from the first statement:
SELECT DISTINCT COUNT(DISTINCT(A1)), COUNT(A2)
SELECT COUNT(DISTINCT(A1)), COUNT(A2)
The use of this enhancement in DB2 V8 is shown in the following list of queries:
3-174 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
3-176 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Notes:
Enhancements such as identity columns, ROWID columns, sequences, and triggers have
resulted in more data being inserted into DB2 tables that is not directly inserted by
applications. Instead, either DB2 or the trigger inserts the data. In cases like this, users
need a way to immediately determine the values that have been inserted into a table for
them.
INSERT within the SELECT statement provides this capability, enhancing the usability and
power of SQL. The associated benefits include reduced network costs and simplified
procedural logic in stored procedures.
3-178 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
INSERT within SELECT Syntax Changes
INSERT
goes here
subselect
select-clause - from-clause table-spec
where-clause group-by-clause having-clause
Notes:
The INSERT statement is now allowed in the FROM clause of:
• A SELECT statement that is a subselect
• A SELECT INTO statement
You should be aware of the following considerations:
• Authorization to INSERT and SELECT from the target object is required. That is, if a
user only has INSERT privileges on a table, the user would not be able to use the
INSERT within SELECT statement in a query to retrieve the rows that are inserted in
the table.
• The INSERT statement can only appear in the FROM clause of the top-level SELECT
statement; that is a subselect or a SELECT INTO statement.
• A fullselect in the INSERT statement cannot contain correlated references to columns
outside the fullselect of the INSERT statement.
• If a table-spec includes an INSERT, then exactly one table-spec can be specified in the
FROM clause. That is, joins are not allowed.
• The INSERT statement in a SELECT statement makes the cursor read-only. That is,
you cannot use UPDATE WHERE CURRENT OF or DELETE WHERE CURRENT OF
against the cursor.
• If the application declares a non-scrollable cursor and the same application performs a
searched update or searched delete against the target object of the INSERT statement
within the SELECT statement, the searched update or searched delete does not affect
the result table rows of the cursor. For example, suppose that the application declares a
cursor, opens the cursor, performs a fetch, updates the table, and then fetches
additional rows. The fetches after the update statement always return those values that
are determined at the time the cursor is opened.
3-180 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
INSERT within SELECT -
Changes in Table-Reference
from-clause:
,
FROM table-reference
table-reference:
single-table
nested-table-expression
table-function-reference
I-table-reference
joined-table
I-table-reference:
Notes:
The SELECT FROM INSERT statement enhancement provides the applications with the
following facilities:
• Find the value of an automatically generated column.
• Retrieve default values for columns.
• Retrieve column values changed by a BEFORE INSERT trigger.
• Retrieve all values for an inserted row without specifying individual column names.
• Retrieve all values inserted through a multiple-row INSERT.
With the new syntax of having an INSERT statement on the SELECT statement, the rows
inserted into the table are considered to be a result table. Therefore, all of the columns in
this result table can be referenced by name in the select list of the query.
The keywords FINAL TABLE refer to the result table of the INSERT statement.
The result table contains all of the rows inserted and includes all of the columns requested
in the SELECT list. Triggers, constraints, and DB2 generated values affect the result table
in the following ways:
• If the INSERT activates a BEFORE trigger, the values in the result table include any
changes that are made by the trigger. AFTER triggers cannot affect the values in the
result table.
• DB2 enforces check constraints, unique index constraints, and referential integrity
constraints before it generates the result table.
• The result table includes generated values for identity columns, ROWID columns, and
columns based on expressions.
3-182 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
INSERT within SELECT Examples
ROWID
NOT NULL
GENERATED
ALWAYS
DECLARE CS1 CURSOR FOR
SELECT EMP_ROWID
FROM FINAL TABLE
(INSERT INTO EMP_RESUME (EMPNO)
SELECT EMPNO FROM EMP)
Notes:
Retrieving the values for multiple rows being inserted could be done by having the INSERT
statement within a SELECT statement as a subquery. You can use the result table of that
subquery as input to the INSERT statement and then use the outer query to retrieve the
rows that you inserted.
In the first example on the visual, column EMP_ROWID is defined as ROWID NOT NULL
GENERATED ALWAYS in the EMP_RESUME table. In the program, corresponding to the
EMP_ROWID column, the host variable is set up as USAGE SQL TYPE IS ROWID.
The second example on the visual uses the VALUES clause to insert one row using the
values in the host variables into the PROJ table. Column PROJNAME is a NOT NULL
character data type column with the default value of “PROJECT NAME UNDEFINED’.
The DECLARE CURSOR statement sets up the cursor using the new SQL construct. The
program logic includes, as usual, the OPEN CURSOR statement followed by the FETCH to
retrieve the ROWID value into the host variable set up for the purpose.
order-by-clause:
,
ASC
ORDER BY sort-key
DESC
INPUT SEQUENCE
Notes:
If there is a requirement in the applications to retrieve the rows in the same sequence as
they are inserted, the application may use the INPUT SEQUENCE keywords with the
ORDER BY clause of SELECT.
The INPUT SEQUENCE clause can only be specified if the table-spec is included in a
SELECT statement that contains an INSERT statement.
The example on the next visual illustrates this situation.
3-184 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
INSERT within SELECT - ORDER BY Example
Input EMPNO
1
HVA1 HVA2 2
Liz 555-1212 3
David 555-9876
Jessica 555-0110
Notes:
The visual shows an example of a multi-row INSERT. HVA1 and HVA2 are host variable
arrays each representing multiple rows, in this case 3 rows, with each entry for each array
representing a single row.
The program logic includes the OPEN CURSOR CS2 statement followed by the FETCH
FIRST ROWSET FROM CS2 FOR 3 ROWS INTO ... and FETCH NEXT ROWSET FROM
CS2 FOR 3 ROWS INTO :HC1 statements.
The ORDER BY INPUT SEQUENCE ensures that the rows are retrieved in the sequence
they are inserted.
Trigger Example
Notes:
For example, consider an EMPLOYEE table defined with columns EMPNO, NAME,
SALARY, DEPTNO, TELE, and LEVEL. The column EMPNO holds integer data and is
defined as GENERATED ALWAYS AS IDENTITY.
A BEFORE INSERT trigger is created on this table to give all new employees at level
‘Associate’ a $5000 salary raise.
The example on the visual shows the BEFORE INSERT trigger statement followed by the
SELECT FROM INSERT statement.
Since the value specified for column LEVEL is ‘Associate’, the INSERT trigger is activated
and the salary is raised by $5000 before the row is inserted. Thus, the SELECT statement
returns a salary of $40000.00 for employee ‘New Hire’.
What happens if the INSERT statement or the SELECT fails in the SELECT INTO
statement? No row is inserted into the target table and so no row is returned.
3-186 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty What happens if the INSERT statement fails during OPEN CURSOR processing time? If
any row being inserted fails, then all the rows that were successfully inserted before the
failure occurred are undone, and the result table is empty.
Notes:
In the example on the visual, assume that when the cursor is opened, five rows are
inserted. The first two FETCHes retrieve the first two rows. COMMIT causes all the five
rows inserted to be committed. However, since the cursor is declared as WITH HOLD, the
cursor is still positioned at the second row. When the next FETCH is executed, the third row
is retrieved.
3-188 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
Using SAVEPOINT and ROLLBACK
If application sets a savepoint prior to opening cursor and then
rolls back to that savepoint, all inserts are undone
Example:
Notes:
If an application sets a savepoint prior to opening the cursor and then rolls back to that
savepoint, all of the inserts are undone. The visual shows an example of this situation.
Some Considerations
FETCH FIRST clause
Does not affect which rows are inserted
All of the rows from the INSERT statement will be inserted into the target
object
The result table will only contain those rows for which the FETCH FIRST
clause satisfies
DECLARE CURSOR
The cursor is always read-only
OPEN CURSOR
SQLERRD3 is set to reflect the effects of the INSERT statement (number
of rows inserted)
Notes:
If the INSERT is used within the SELECT statement of a cursor and the cursor definition
contains the FETCH FIRST clause, all of the rows from the INSERT statement are
inserted, and the outer SELECT result table contains only those rows for which the FETCH
FIRST clause satisfies.
The INSERT statement may be defined within the SELECT statement of a scrollable
cursor. If the cursor is defined as ASENSITIVE, a warning is returned indicating that it is
being treated as INSENSITIVE. If SENSITIVE DYNAMIC or SENSITIVE STATIC is
specified, an error is returned. This is because the result table is generated at OPEN
CURSOR time and no further changes are reflected in the result table.
When a cursor associated with an INSERT within SELECT statement is opened, after
executing the INSERT INTO statement that contains the SELECT statement, the
SQLERRD3 field in SQLCA is set to indicate the number of rows inserted into the target
table of the INSERT statement.
3-190 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
GROUP BY Expression
SELECT SUBSTR(CHAR(HIREDATE,ISO),1,3)
CONCAT '0 - 9' AS HIREDECADE,
MIN(SALARY) AS MINIMUM_SALARY
FROM EMPLOYEE
GROUP BY SUBSTR(CHAR(HIREDATE,ISO),1,3)CONCAT '0 - 9'
HIREDECADE MINIMUM_SALARY
1960 - 9 29250.00
1970 - 9 23800.00
Notes:
Use of expressions in the GROUP BY clause is possible in DB2 V8 and thus provides
compatibility with the support existing in DB2 on the UNIX, Windows, and iSeries (AS/400)
platforms.
The example on the visual demonstrates the use of expression in GROUP BY. The
requirement is to find the minimum salary paid to any employee during the decade when
the employees were hired. The hiring decade is determined by extracting the first three
digits from the ISO format of the hiring date and appending the character string '0 - 9'. For
example, 1970 - 9 means that the employee was hired between January 1, 1970 and
December 31, 1979.
Prior to DB2 V8, since use of expressions in the GROUP BY clause is not permitted, the
query has to be written using the nested table expression as shown below:
3-192 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Qualified
Column Names in INSERT and UPDATE
Column names can be qualified with a table name, or a schema
followed by a table name in INSERT
Column names in the SET clause of an UPDATE statement can
be qualified
These enhancements provide for more DB2 family compatibility
For example:
Notes:
DB2 V8 allows column names to be qualified with a table name, or a schema followed by a
table name in INSERT statements. DB2 V8 also allows column names in the SET clause of
an UPDATE statement to be qualified. This facilitates application portability as DB2 on
UNIX and Windows platforms provide for this support.
Consider the following examples:
• A correlation name is not specified for T1 (table or view), and T1 is used as the qualifier.
This is allowed:
UPDATE T1 SET T1.C1 = C1 + 10 WHERE C1 = 1
• A correlation name 'T' is specified for T1, and it is used to qualify the column name. This
is allowed:
UPDATE T1 T SET T.C1 = C1 + 10 WHERE C1 = 2
• A correlation name 'T' is specified for T1, but it is not used to qualify the column name.
Instead, T1 is used as the qualifier, but it is not exposed because of the correlation
name. This results in SQLCODE -206 being returned:
3-194 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Notes:
By definition, a null value is unknown and this makes it unequal to all other values,
including other null values. The only way to test for null values is to use the IS NULL
predicate, as in "WHERE col IS NULL". A predicate of the form "WHERE col = :hv :nullind"
never matches a null value in "col", even if the host variable "nullind" contains a null
indicator. This is not intuitively obvious. For example: Assume that you are trying to select
rows where the value of a column P1 is the null value. You code WHERE P1 IS NULL.
Taking it a step further, to compare two expressions to see if they are equivalent or both
null, an application would currently have to use a compound search condition as follows:
( expr1 = expr2 ) OR ( expr1 IS NULL AND expr2 IS NULL )
For example, if you want to select rows for which the city value in one table is the same as
the value in a host variable and null values are considered the same, you might code it like
this:
CITY = :CT :ctind
3-196 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty However, the search condition above would never be true when the value in :CT is null,
even if the host variable "ctind" contains a null indicator. This is because one null value
does not equal another null value. Instead of the simple predicate above, two predicates
would need to be coded, one to handle the non-null values and another to handle the null
values:
CITY = :CT OR ( CITY IS NULL AND :CT :ctind IS NULL )
With the introduction of the IS NOT DISTINCT FROM predicate, the search condition could
be simplified as follows:
CITY IS NOT DISTINCT FROM :CT :ctind
EXPLAIN STMTCACHE
Enhancements to the EXPLAIN statement allow you to obtain
EXPLAIN information for entries in the DB2 global statement
cache
STMTTOKEN token-host-variable
string-constant
Notes:
A new clause is added in the EXPLAIN statement in order to get the information about
cached statements. The visual shows the syntax.
• STMTCACHE STMTID id-host-variable or integer-constant
Specifies that the cached statement associated with the statement ID contained in host
variable id-host- variable or specified by integer-constant is to be explained. The
statement ID is an integer that uniquely determines a statement that has been cached
in dynamic statement cache. The statement ID can be retrieved through IFI monitor
facilities from IFCID 316 or 124 and is shown in some diagnostic IFCID trace records
such as 172, 196, and 337.
The column QUERYNO is given the value of the statement ID in every row inserted into
the plan table, statement table, or function table by the EXPLAIN statement.
• STMTCACHE STMTTOKEN token-host-variable or string-constant
Specifies that all cached statements associated with a statement token contained in
host variable token-host-variable or specified by string-constant are to be explained.
3-198 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty The statement token must be a character string that is not longer than 240. This string is
associated with the cached statement by the application program that originally
prepares and inserts the statement into the cache. The application can do this by using
the RRSAF SET_ID function, or by using the sqleseti API from a remotely-connected
program.
For every row inserted into the plan table, statement table, or function table by the
EXPLAIN statement, the column STMTTOKEN (which is newly added column) is given
the value of the statement token, and the column QUERYNO is given the value of the
statement ID for the cached statement with the statement token.
Since column QUERYNO in the plan table, statement table, or function table can now
contain values of statement IDs, it may not be unique for each statement in the table. The
user can tell if a row in the plan table, statement table, or function table is for a cached
statement by checking the value of the PROGNAME column: If PROGNAME has the value
"DSNDCACH", then the row is for a cached statement.
When EXPLAIN is executed, the statement with specified ID/token must still be in the
cache. Otherwise, this EXPLAIN statement fails, returning SQLCODE -20248.
Important: When DB2 explains a statement from the statement cache, it writes the
current access path that is used by that statement in the PLAN_TABLE. The statement is
NOT going through access path selection again when the STMTCACHE option is used
on the EXPLAIN statement (This is different from normal EXPLAIN processing of
dynamic SQL statements). Going through access path selection again would defeat the
purpose, as we try to find out what the current access path of a statement in the cache is.
Determining the access path again may result in a different access path, as statistics for
example may have changed in the meantime.
Authorization
When an EXPLAIN STMTCACHE statement is used to explain a cached statement, the
application process must have the authority that is required to share the cached statement
or the process must have SYSADM authority.
If an application process tries to explain a cached statement but it is not authorized to
explain the cached statement, no row is added in the EXPLAIN tables.
Examples
1. Explain the cached statement with statement ID 124:
SID = 124;
EXEC SQL EXPLAIN STMTCACHE STMTID :SID;
2. Explain the cached statement with statement token 'SELECTEMP':
EXEC SQL EXPLAIN STMTCACHE STMTTOKEN 'SELECTEMP';
3-200 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
READ ONLY USING UPDATE LOCKS
Notes:
Depending on the WebSphere deployment options, the persistence layer (which interacts
with the database on behalf of an entity bean) currently uses ISOLATION(RS) to retrieve
one or more rows with the FOR UPDATE KEEP UPDATE LOCKS clause when loading the
WebSphere entity beans. The Java application is then allowed to perform updates on those
beans, and the updates are subsequently sent to DB2 as searched UPDATE statements.
WebSphere cannot do a positioned update because after the row is read in, the cursor is
closed. WebSphere uses this approach to minimize the number of open cursors at runtime.
In Version 7, you can only specify the KEEP UPDATE (or EXCLUSIVE) LOCKS clause in
combination with the FOR UPDATE clause. Specifying the FOR UPATE clause causes
DRDA to use a separate network flow for each operation (OPEN, FETCH,... CLOSE)
because the cursor may appear in an UPDATE or DELETE WHERE CURRENT OF
(positioned delete).
DB2 V8 is able to obtain exclusive locks with a FOR READ ONLY query. This allows the
JDBC driver and DDF to use block fetch for the SELECT (eliminating the extra network
messages required with a FOR UPDATE query), while still obtaining and holding the locks
WebSphere needs for the searched UPDATE statement. This will provide significant CPU
and elapsed time improvements.
Although your “hand-coded” application can immediately take advantage of this
enhancement once you migrate to DB2 V8, at the time of writing, WebSphere has not yet
implemented this enhancement for the persistence code they generate.
3-202 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
New Built-in Functions
Encryption related functions:
ENCRYPT_TDES: Encrypt a column in a table with a user-provided encryption
password
ENCRYPTION PASSWORD special register
DECRYPT_BIT, DECRYPT_CHAR, DECRYPT_DB
GETHINT: obtain hint to help remember the ENCRYPTION password
Generate unique values:
GENERATE_UNIQUE creates a CHAR(13) FOR BIT DATA value, unique across
a Sysplex
Obtaining the values of session variables:
GETVARIABLE (discussed later in session variables)
Character based string functions
Notes:
As in previous versions, DB2 Version 8 comes with a number of additional built-in
functions. They are described in the following topics.
Encryption Functions
Functions ENCRYPT_TDES (or ENCRYPT), DECRYPT_BIN, DECRYPT_CHAR, and
GETHINT are added. The SET ENCRYPTION PASSWORD statement allows the
application to specify a password.
These functions allow you to encrypt data at the column level. Because you can specify a
different password for every row that you insert, you can really encrypt data at the “cell”
level in your tables. Make sure to have a mechanism in place to manage the passwords
that are used to encrypt the data. Without the password, there is absolutely no way to
decrypt the data.
However, to facilitate remembering the password, you have an option to specify a hint (for
the password) at the time you encrypt the data. The example below shows how to insert
data that needs to be encrypted (a social security number in this case) using a password
and a hint. The row is then retrieved, first using the wrong password. As you can see, the
data is not readable. Then we obtain the hint that we used when we inserted the data to
help us remember the real password, and then retrieve the row using the correct password.
The hint string is stored with the data in the column (ssn in the example). In addition to the
length of the data that you are storing, encryption adds an additional 24 bytes; using a hint
adds another 32 bytes. Then the result is rounded up to the next 8 byte boundary. Make
sure to take this into account when specifying the length of your encrypted columns.
CREATE TABLE EMP (SSN VARCHAR(124)FOR BIT DATA);
3-204 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty decrypting all the row values for a column, so it should be avoided, or at least tuned
appropriately.
If an index exists on the ssn column in the following example, DB2 would be able to use the
index (provided that the optimizer decides to use it). In this case the provided string is
encrypted and can then be evaluated against the index. As mentioned before, this only
works for equal predicates, not range predicates; they require decryption before they can
be evaluated.
set encryption password = a1b2c3’;
select projectName
from empProject
where ssn = encrypt('480-93-7558');
For the example above to work, all rows in the ssn column must be encrypted using the
same encryption key. In addition, SQL statements that use the encryption functions are
considered multiple CCSID set statements. For more information, see Figure 5-42 "Multiple
CCSID Set per SQL" on page 5-92.
Encryption, by its nature, will slow down most SQL statements. If some care and discretion
are used, the amount of extra overhead can be minimized. Encrypted data can have a
significant impact on your database design. In general, you want to encrypt a few very
sensitive data elements in a schema, like social security numbers, credit card numbers,
patient names, etc.
Note: Built-in functions for encryption and decryption require cryptographic hardware in a
cryptographic coprocessor, cryptographic accelerator, or cryptographic instructions. You
must also have the z/OS Cryptographic Services Integrated Cryptographic Service
Facility (ICSF) software installed.
This function differs from using the CURRENT TIMESTAMP special register in that a
unique value is generated for each row of a multiple row insert statement or an insert
statement with a fullselect.
-- Using GENEREAT_UNIQUE()
-- The result is unique but not very readable
SELECT GENERATE_UNIQUE() FROM SYSIBM.SYSDUMMY1;
---------+---------+---------+---------+---------+---------+--
.[ZE¾>ì.-....
DSNE610I NUMBER OF ROWS DISPLAYED IS 1
DSNE616I STATEMENT EXECUTION WAS SUCCESSFUL, SQLCODE IS 100
3-206 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
Character-based String Functions
Additional built-in functions and changed existing functions to
add "character" based processing, instead of only "byte" based
processing
CHARACTER_LENGTH ( string-expression )
, CODEUNITS32
CODEUNITS16
OCTETS
Notes:
Unicode data stored in DB2 can be multiple bytes of a single character. Actually, a single
character in UTF-8 can be one, two, three or four bytes, depending on the character that
you are trying to represent. Today’s (V7) string functions are byte oriented. Therefore,
using existing functions on multiple byte character set data can be challenging.
An example to illustrate this is shown below.
CREATE TABLE TESTB (COLB CHAR(20) NOT NULL) CCSID UNICODE;
3-208 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty Note that there is no special option to work with UTF-8. Counting in CODEUNITS32, and
counting UTF-8 multi-byte character set code points, both return the same result. Therefore
you can use the CODEPOINTS32 option when dealing with UTF-8.
Note: It is good to know that UTF-8 data does NOT have to be converted to UTF-3 to
count in CODEUNITS32. However, it does have to be converted to UTF-16 to count in
CODEUNITS16.
The next example shows how to use the new SUBSTRING function to obtain the correct
result on the same data that we used in our previous example.
SELECT SUBSTRING(COLB,1,3,CODEUNITS32) AS RESULT FROM TESTB;
---------+---------+---------+---------+---------+---------+---------+-
RESULT
---------+---------+---------+---------+---------+---------+---------+-
§§§
Note that the query now correctly returns the three requested paragraph signs.
The specification of CODEUNITS on a built-in function does not affect the result data type
of the built-in function. However, the specification of CODEUNITS may affect the result as
shown in the examples above.
Implicit in the specification of CODEUNITS32 or CODEUNITS16 is conversion to Unicode,
if necessary, to evaluate the function. This is not the case when using OCTETS. In that
case the evaluation of the built-in function is done in the encoding of the input string. If you
want to evaluate a string in Unicode UTF-8 octets, and the string is not already Unicode
UTF-8, you need to cast the string to Unicode explicitly.
The CAST function is also enhanced. It now also allows the specification of
CODEUNITS32, CODEUNITS16, or OCTETS. The specification of this keyword indicates
how DB2 is to count the resulting string. If CCSID and length are specified, the conversion
to the correct CCSID is done first, before evaluation the length.
Assume an application needs to cast an EBCDIC string to Unicode UTF-8. The string
contains the word Jürgen, which is 6 bytes in EBCDIC, but 7 bytes in Unicode UTF-8. To
make sure the data is not truncated, we specify CODEUNITS32 in the length clause of the
cast function:
SELECT CAST (‘Jürgen’ AS VARCHAR( 6 CODEUNITS32)) FROM SYSIBM.SYSDUMMY1;
Session Variables
Variables set by DB2, connection or sign-on exit
Built in function to retrieve value for a variable
Use function in views, triggers, stored procedures, and constraints to
enforce security policy
Can have more general, flexible access checks
Multiple columns, AND/OR logic, and so forth
Complements other security mechanisms
Notes:
Session variables are similar to special registers in many respects. Session variables are
an additional way to provide certain information to applications. Session variables can be
referenced by SQL statements just like special registers. They can be used in view,
triggers, stored procedures, or constraints to enforce security policies, for example.
You have to use the GETVARIABLE built-in function to retrieve the values of a session
variable. Version 8 supports two types of session variables, which we now describe.
3-210 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty • SYSIBM.PLAN_NAME
• SYSIBM.SECLABEL
• SYSIBM.SYSTEM_NAME
• SYSIBM.SYSTEM_ASCII_CCSID
• SYSIBM.SYSTEM_EBCDIC_CCSID
• SYSIBM.SYSTEM_UNICODE_CCSID
• SYSIBM.VERSION
Notes:
Four new special registers are added. These special registers are CURRENT
CLIENT_ACCTNG, CURRENT CLIENT_APPLNAME, CURRENT CLIENT_USERID, and
CURRENT CLIENT_WRKSTNNAME. The information is provided through a number of
application programming interfaces.
These special registers were also added to DB2 for Linux, UNIX, and Windows V8, and so
provide for DB2 family compatibility.
For more information about the CURRENT PACKAGE PATH special register, see Figure
7-23 "CURRENT PACKAGE PATH Special Register" on page 7-45.
Another new special register is CURRENT SCHEMA. The CURRENT SCHEMA, or
equivalently CURRENT_SCHEMA, special register specifies the schema name used to
qualify unqualified database object references in dynamically prepared SQL statements.
The data type is VARCHAR(128). The usage of CURRENT SCHEMA is described in more
detail in Figure 7-22 "SET [CURRENT] SCHEMA" on page 7-42.
3-212 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
Transparent ROWID
Notes:
DB2 currently requires a ROWID column to be included in tables which have a LOB column
(one or more). In V8, DB2 will generate a “hidden” ROWID column if a ROWID column is
not present. This column will not be included in a SELECT * (but you can select its content
by specifying the name explicitly). These changes make porting of applications with LOBs
simpler.
The example below illustrates this concept. As you can see, there is no ROWID column in
the definition of the LOB_TEST table, even though it contains a LOB column (resume).
CREATE DATABASE BSDBLOB ;
CREATE TABLESPACE BSTSLOB IN BSDBLOB;
CREATE TABLE LOB_TEST
( EMPNO CHAR( 06 ) NOT NULL,
RESUME CLOB( 1K ) )
IN BSDBLOB.BSTSLOB
CCSID EBCDIC;
CREATE LOB TABLESPACE BSTSLOBC
IN BSDBLOB
LOG NO;
CREATE AUX TABLE AUX_LOB_TEST
IN BSDBLOB.BSTSLOBC
STORES LOB_TEST
COLUMN RESUME;
CREATE UNIQUE INDEX XAUX_LOB_TEST
ON AUX_LOB_TEST ;
INSERT INTO LOB_TEST VALUES ('1234', 'MY RESUME IS NOT LONG');
SELECT
SUBSTR(NAME,1,30) AS NAME
,COLTYPE
,HIDDEN
FROM SYSIBM.SYSCOLUMNS
WHERE TBNAME = 'LOB_TEST';
---------+---------+---------+---------+---------+---------+
NAME COLTYPE HIDDEN
---------+---------+---------+---------+---------+---------+
EMPNO CHAR N
RESUME CLOB N
DB2_GENERATED_ROWID_FOR_LOBS ROWID P
3-214 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
List of Topics
IBM DB2 Universal Driver for SQLJ and JDBC
Notes:
In this unit, we describe enhancements related to e-business applications. This general
term covers a lot of different areas, and DB2 V8 includes many enhancements that can
make e-business applications better.
The e-business world is very much a Java world, and DB2 makes great strides in this area
with the introduction of a new JDBC (and SQLJ) driver for the DB2 Family; the so-called
IBM DB2 Universal Driver for SQLJ and JDBC.
On the same note, SQLJ is very important to DB2 for z/OS, because it brings the static
SQL model, that has brought us so much over the years, in terms of performance and
security, to the Java world. With the tooling and run-time support in place, SQLJ becomes a
very attractive alternative to JDBC on the mainframe platform.
In December 2003, IBM introduced a new no-charge feature of DB2 V7 (and V8) called
z/OS Application Connectivity to DB2 for z/OS and OS/390. Its goal is to remove the
prerequisite of having a DB2 on the same LPAR as WebSphere for z/OS (even when that
DB2 only serves as a gateway to remote DB2 data) but it can do more than just that, as we
describe in this unit.
4-2 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty With XML becoming more and more prevalent in enterprises, especially when exchanging
information between enterprises, DB2 Version 8 integrates so-called XML publishing
functions into the DB2 engine. These functions are based on the emerging SQL/XML
standard. This is the first step DB2 for z/OS takes to more tightly integrate XML into the
database engine.
And last but not least, we describe the security enhancements in DB2 V8. With an ever
more complex interconnected e-business society, the need for proper and more granular
security mechanisms are a very valuable DBMS asset. DB2 V8 introduces multilevel
security at the row level.
4-4 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Notes:
In this topic, we describe the DB2 UDB Universal Driver for SQLJ and JDBC.
We start out with a short review of the different JDBC driver types defined in the JDBC
standard.
Then we introduce the new DB2 Universal Driver for SQLJ and JDBC, why it is introduced,
its architecture, and the benefits it brings to DB2 for z/OS.
We also describe the supported platforms, the licensing, and a few of the migration
considerations.
Lastly, we describe some of the enhanced functionality that comes with the new DB2
Universal Driver.
For more information on JDBC and the DB2 Universal Driver, also see DB2 for z/OS and
OS/390: Ready for Java, SG24 6435 and DB2 Application Programming Guide and
Reference for Java, SC18-7414.
4-6 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
Summary of JDBC Driver Types
Notes:
Before we dive into what is new and enhanced regarding JDBC and SQLJ support in DB2
V8, we provide you with an overview of the existing JDBC driver types based upon the
JDBC 3.0 specification.
Type 1: Drivers that implement the JDBC API as a mapping to another data access
API, such as ODBC. Drivers of this type are generally dependent on a native
library, which limits their portability. The JDBC-ODBC bridge driver is an
example of a Type 1 driver. This is usually a transition solution, and requires
an ODBC driver to work.
Type 2: Drivers that are written partly in the Java programming language, and partly
in native code. Part of the JDBC driver is implemented in Java and uses the
Java Native Interface (JNI) to call a database specific API. Type 2 drivers use
a native client library specific to the data source to which they connect.
Because of the native code, their portability is limited.
Type 3: Drivers that use a pure Java client (100% Java) and communicate with a
middleware server using a database independent protocol, often via TCP/IP
socket calls. The middleware server then communicates the client's requests
to the data source.
Type 4: Drivers that are pure Java and implement the network protocol for a specific
data source. The client connects directly to the data source. In case of DB2,
the DRDA protocol is used to talk directly to the data source.
Note: The number in the driver type has no meaning whatsoever in terms of capability.
Do not assume that because 4 is greater than 2, that a Type 4 driver is better than a Type
2 driver. In fact, a Type 2 driver is almost certain to outperform a Type 3 or Type 4 driver,
because it does not have to route through a network layer. Normally, a Type 2 driver is
the best suitable driver from the point of view of performance and scalability.
4-8 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
IBM DB2 Universal Driver for SQLJ and JDBC
What is it?
Architected as an abstract JDBC processor
Independent of driver-type connectivity or target platform.
As an abstract machine, driver types become connectivity types
Architecture-neutral JDBC driver for:
Distributed DB2 access (Type 4 connectivity) and
Local DB2 access (Type 2 connectivity)
Figure 4-4. IBM DB2 Universal Driver for SQLJ and JDBC CG381.0
Notes:
The clients (prior to DB2 V8) for Linux, UNIX, Windows and OS/2 platforms provide the
basis for three distinct products: DB2 Run-Time Client, also known as the Client Application
Enabler (CAE); the DB2 Application Development Client, formerly known as the Software
Development Kit (SDK); and DB2 Connect Personal Edition. DB2 Connect Enterprise
Edition is based on a combination of the UNIX, Windows, OS/2 engine infrastructure and
DB2 Connect Personal Edition. Each product is positioned as either the client for a Linux,
UNIX, or Windows application server, or the client for a z/OS database server.
Prior to V8, access to a Linux, UNIX, and Windows (LUW) server and a z/OS server used
different database connection protocols, for example, DB2RA, DRDA, “net driver”. Each
protocol defines a different set of methods to implement the same functions. To provide
transparent access across the DB2 Family, the database connection protocols are now
standardized.
All of them use the Open Group’s DRDA Version 3 standard, which provides an open,
published architecture that enables communication between applications, application
servers and database servers on platforms with the same or different hardware and
software architectures. This new architecture is called the Universal Client. A deliverable of
this new architecture for Java applications is the IBM DB2 Universal Driver for SQLJ and
JDBC, also known as Java Common Connectivity (JCC).
The Universal Driver is architected as an abstract JDBC processor that is independent of
driver-type connectivity or target platform. The IBM DB2 Universal Driver is an
architecture-neutral JDBC driver for distributed and local DB2 access.
Since the Universal Driver has a unique architecture as an abstract JDBC state machine, it
does not fall into the conventional driver type categories as described in Figure 4-3
"Summary of JDBC Driver Types" on page 4-7.
For the Universal Driver as an abstract machine, driver types become connectivity types.
This abstract JDBC machine architecture allows for both all-Java connectivity (Type 4) or
Java Native Interface (JNI)-based connectivity (Type 2) in a single driver. A single
Universal Driver instance is loaded by the driver manager for both Type 4 and Type 2
implementations. Type 2 and 4 connections may be made (simultaneously if desired) using
this single driver instance.
4-10 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
DB2 Universal Driver Objectives
Provided functions of DB2 UDB for Linux, UNIX, and Windows and
those for z/OS are the same
Improves Family compatibility - true portability
Functionality enhancements for Type 2 and Type 4 drivers
Fully compliant with JDBC 3.0 standard
Provides a full Java application development process for SQLJ
Ease of installation and deployment for Type 4 driver
No dependencies on a runtime of DLL
Installation merely a copy of a .jar and .zip file
Reduces the client footprint for Linux, UNIX and Windows platforms
Delivers functional and performance enhancements quicker
Improves trace capabilities
Notes:
The new common runtime environment fulfills the following key requirements:
• Has a single Java driver with a common code base for Linux, UNIX, Windows and
z/OS. The functions provided on DB2 UDB for Linux, UNIX, and Windows, and DB2
UDB for z/OS are the same, not just similar. This largely improves DB2 Family
compatibility. For example, it enables users to develop on Linux, UNIX, and Windows,
and deploy on z/OS without having to make any change, and eliminates the major
cause of today’s Java porting problems.
• Enhances the current API to provide a fully compliant JDBC 3.0 driver, for both a Type 2
and Type 4 JDBC driver. The functionality of the V7 (legacy) SQLJ/JDBC (Type 2) driver
for z/OS will not be enhanced. JDBC 3.0 compliance will only by made available in the
new Universal Driver. (However, the legacy driver will be shipped with DB2 for z/OS V8
for compatibility reasons.) DB2 for Linux, UNIX, and Windows V8 FixPak 3 was the first
to ship with a full JDBC 3.0 compliant Universal Driver.
• Provides a full Java application development process for SQLJ, by:
4-12 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
Java Universal Driver Architecture
SQLJ JDBC
application application
SQLJ runtime
T2 T4 Direct
DDF
Tx Driver
DB2 for LUW type
Notes:
Whereas before the DB2 Universal Driver, each DB2 client platform came with its own
driver files for JDBC and SQLJ support, the new driver is a single set of Java Archive (.jar)
and .zip files, which can be used on UNIX, Windows, and z/OS. It is a combined driver
providing both JDBC Type 2 and Type 4 connectivity, depending on the URL used in the
getConnection() call (or the underlying data source definition).
Of course, selecting Type 2 functionality requires that native code has been installed on the
client.
Notes:
The IBM DB2 JDBC Universal Driver is written from the ground up. It is an entirely new
architecture, design, and implementation, and should not be viewed as a follow-on release
of existing JDBC/CLI drivers, nor as the legacy type 2 driver for DB2 for OS/390 and z/OS.
You should plan to migrate to the new Universal JDBC Driver gradually, as there may be
subtle behavioral differences from the legacy drivers.
For your legacy applications, it may not always be possible to migrate in a plug-and-play
manner. Those of your applications that are written to be portable according to the JDBC
specification can continue to run under the Universal JDBC driver. However, currently
running JDBC applications that are not written in a portable way, and only run under one
particular driver, may require changes to run under the Universal JDBC driver.
For more information about behavioral differences between the legacy T2 driver for OS/390
and the new JCC driver, refer to the sections “JDBC differences between the DB2
Universal JDBC Driver and other DB2 JDBC drivers” and “SQLJ differences between the
DB2 Universal JDBC Driver and other DB2 JDBC drivers” in the DB2 Application
4-14 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty Programming Guide and Reference for Java, SC18-7414, as well as Figure 11-11 "DB2
Universal Driver for SQLJ/JDBC" on page 11-26.
However, one of the areas where changes are required is related to SQLJ program
preparation. The SQLJ program preparation process is different with the new JCC driver.
For more information, see “Application Development using SQLJ” on page 4-47. The result
of this is that the customization of the serialized profile is different. In order not to force
everybody to rebind all the packages of their existing SQLJ applications, existing serialized
profiles can be “upgraded” to the new format.
db2sqljupgrade Utility
The purpose of the upgrade utility is to upgrade serialized profiles that were customized by
DB2 for z/OS legacy driver, to work with JCC without having to bind a new packages. This
will prevent the optimizer from choosing a new access path. This upgrade utility cannot be
used to upgrade a serialized profile that was customized with any other driver.
Notes:
The visual above shows, for DB2 UDB for Linux, UNIX, Windows, and DB2 for OS/390 and
z/OS, which JCC release supports which version of those DB2 systems. We omitted other
DB2 family members here in order to keep it simple.
There is no downlevel support available for UNIX and Windows platforms for Type 4
connectivity. Only DB2 for Linux, UNIX, and Windows V8 systems are supported by the
Universal Driver.
As shown in the figure above, the Universal Driver that ships with DB2 for z/OS Version 8
has been out in the field for quite some time now, and has gone through a number of
versions and test cycles to make sure it is ready for prime time.
4-16 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
JCC Connectivity Options
Notes:
This figure describes the different options available for you to connect your Java
application to a DB2 for z/OS, or DB2 for Linux, UNIX and Windows server, using the IBM
DB2 Universal Driver for SQLJ and JDBC.
Licensing
Notes:
Although technically speaking, you do not need DB2 Connect to use the Universal Type 4
driver to connect your Java application to a DB2 for z/OS, you need a DB2 Connect license
to be able to use the Type 4 driver to connect to a DB2 for z/OS system.
Beginning in release 1.2 of JCC (DB2 for Linux, UNIX and Windows 8.1.2), JCC requires
an auxiliary license jar file installed to the application classpath in order to enable
connectivity to target servers. The names of the jar files are shown on the visual above.
The meaning of the suffix letters in the license file names, shown in the visual above, is as
follows:
• c = Cloudscape
• i = iSeries
• z = z/OS
• s = SQL/DS
• u = Linux/UNIX/Windows
The availability of the license jar files is checked by the driver at driver load time.
4-18 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
Getting a Connection using JCC
Manager Type 4
com.ibm.db2.
Type 2 URL
jcc.DB2Driver Driver
Notes:
In a Java program, you can establish a connection to a database, either by:
• The DriverManager interface (JDBC 1 java.sql.DriverManager API). This API requires
the application to load the com.ibm.db2.jcc.DB2Driver (in case of the Universal Driver)
and hard code a database URL description in the application code.
• The Datasource interface (JDBC 2 javax.sql.DataSource API). This API is preferred
over the DriverManager interface because the database target connectivity descriptions
are contained in the data source object itself, separate from the application code. The
same application code may be used to connect to any datasource object using its
associated data source properties.
Since the IBM Universal Driver supports both Type 4 and Type 2 connectivity in a single
driver, only a single driver instance is loaded by the driver manager for both Type 4 and
Type 2 implementation. This means that distinguishing between Type 2 and Type 4
connectivity cannot be determined by driver name or data source name. Instead, the
connectivity type (T2 or T4) is determined by the URL syntax under JDBC1, or a
proprietary data source property named driverType under JDBC 2. This means that
4-20 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty DB2 specific driver implementation and can only be used if you are using a DB2
connection.
jdbc:db2:databaseName[:propertyKey=value;...] Type 2
Notes:
The Universal Driver URL syntax differs between the platforms. The following JDBC
database URL syntax is accepted:
• jdbc:db2:databaseName [:propertyKey=value;...]
• jdbc:db2://server[:port]/databaseName[:propertyKey=value;...]
• jdbc:db2j:net://server[:port]/databaseName[:propertyKey=value;...] to Cloudscape
Network Server
You can also pass connection properties as part of the URL, for example:
jdbc:db2://wtscpok.ibm.com:12345/DB8A:user=itsousr;password=itsopwd;defe
rPrepares=false;
Note: The databaseName is case-sensitive, and must always be upper case for DB2 for
z/OS.
4-22 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
Data Source Classes
Notes:
The com.ibm.db2.jcc.DB2SimpleDataSource and com.ibm.db2.jcc.DB2BaseDataSource
classes have also been shipped for the legacy Type 2 driver on DB2 for OS/390 and z/OS.
However, those lack certain additions and refinements that exist in the versions that ship
with the Universal Driver.
Because of that, an interface com.ibm.db2.jcc.DB2JCCDatasource is provided by the
Universal Driver to distinguish between the two differing com.ibm.db2.jcc.DB2DataSource
instances. You can use this interface for your applications to distinguish between instances
of com.ibm.db2.jcc.DB2BaseDataSource as shipped with the IBM DB2 JDBC Universal
Driver from data sources that are shipped with the legacy OS/390 Type 2 driver. If the data
source instance that you use implements this interface, the data source is a “true”
Universal Driver data source instance.
To support XA transactions, the Type 4 Universal Driver in the z/OS environment provides
the required javax.sql.transaction.xa implementations via:
• com.ibm.db2.jcc.DB2XADataSource
• com.ibm.db2.jcc.DB2Xid
• XAConnection
• XAResource
You may want to use javax.sql.XADataSource instead of
com.ibm.db2.jcc.DB2XADataSource. The latter is a DB2 specific driver implementation
and can only be used if you are using a DB2 connection.
4-24 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
You can supply values for the individual keys by either using the
DriverManager.getConnection (String url, Properties properties) method (Example 4-1), or
by encoding them in the URL (for example, in a DataSource definition, or on the
application’s command line). When using WebSphere Application Server, you can specify
all these properties in the WebSphere Administrative console’s setup windows.
Example 4-1. Setting Properties Using DriverManager.getConnection(String, Properties)
import java.net.InetAddress;
import java.sql.Connection;
import java.sql.DriverManager;
import java.util.Properties;
...
4-26 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
Java API Enhancements
Scrollable cursor support
Batched updates support
Improved Java SQL error information
Java API for set client information (SQLESETI)
Built-in DB2SystemMonitor class (application-driver-network-server)
Native DB2 server SQL error messages
Multiple open result sets
SAVEPOINT support
Auto-generated keys
Enhanced LOB support
Notes:
The DB2 Universal Driver provides more functionality than the older drivers, for example, it
supports scrollable cursors, batch updates, and savepoints, as discussed in the following
topics. The Universal Driver delivers almost full JDBC 3.0 functionality. Next we list some of
the important functional enhancements.
Batched Updates
A batched update is a set of multiple update statements that are submitted to the database
for processing as a batch. Sending multiple update statements to the database together as
a unit can, in some situations, be much more efficient than sending each update statement
separately. It reduces the number of times the application has to cross over to the JDBC
driver. This ability to send updates as a unit, referred to as the batched update facility, is
one of the features of the JDBC 2.0 API, and is now supported with the DB2 Universal
Driver.
Example 4-2 demonstrates how to use batch updates. Note that autocommit should be
turned off when using batch updates. Also, each of the statements in the batch must be
one that returns an update count (for example, a SELECT statement in the batch is not
allowed, and will cause an SQLException to be thrown).
The Universal Driver uses non-atomic batching. Each statement is treated independently. It
is up to you to decide what to do in case one operation in the batch fails.
Example 4-2. Creating and Executing Batched INSERT Statements
4-28 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty http://www.ibm.com/servers/eserver/zseries/software/java/aboutj2.html
The IBM Java Cryptography Extension (JCE) is required for using encryption. The installed
JCE jar files are:
• ibmjceprovider.jar
• ibmjcefw.jar
• ibmjlog.jar
• US_export_policy.jar
• Local_policy.jar
These jar files are typically installed to jdk\jre\lib\ext.
IBM JGSS, JAAS and JCE are required for using Kerberos. The installed jar files are:
• ibmjgssprovider.jar
• jaas.jar
• ibmjceprovider.jar
• ibmjcefw.jar
• ibmjlog.jar
• US_export_policy.jar
• Local_policy.jar
The authentication technique can be specified by either setting a Java property in the
application, or by recording the required technique in the DB2 DataSource definition.
These jar files are typically installed to jdk\jre\lib\ext.
DSNTIAR message formatter via an SQL CALL statement. No result set is returned. The
output message is returned via a VARCHAR output parameter. This procedure is not used
by ODBC.
For more information, see the DB2 for z/OS and OS/390: Ready for Java, SG24 6435
Redbook.
4-30 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
4-32 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
Java Application Universal DB2 Server
Driver
monitor.start()
prepareStatement()
Network
executeUpdate()
executeUpdate()
monitor.stop()
_import com.ibm.db2.jcc.DB2Connection;
import com.ibm.db2.jcc.DB2SystemMonitor;
...
DB2SystemMonitor monitor = conn.getDB2SystemMonitor();
monitor.enable(true);
monitor.start(DB2SystemMonitor.RESET_TIMES);
monitor.stop();
// ServerTimeMicros is red line in figure above
System.out.println("Server time: " + monitor.getServerTimeMicros());
// NetworkIOTimeMicros is orange line in figure above
System.out.println("Network I/O time: " + monitor.getNetworkIOTimeMicros());
// CoreDriverTimeMicros is green line in figure above
System.out.println("Core driver time: " + monitor.getCoreDriverTimeMicros());
// ApplicationTimeMillis is blue line in figure above
System.out.println("Application time (ms): " + monitor.getApplicationTimeMillis());
______________________________________________________________________
You can choose to either reset or to accumulate times when starting the monitoring, using
DB2SystemMonitor.RESET_TIMES, or DB2SystemMonitor.ACCUMULATE_TIMES on the
monitor.start() method, respectively.
Note that the various getTime() methods may throw an SQLException if the driver cannot
provide the information requested.
DB2SystemMonitor Prerequisites
Although already shipped in an earlier FixPak, the DB2SystemMonitor is only fully
operational with DB2 for LUW Version 8 FixPak 4 or later.
In addition, to be able to get accurate times for the core driver and network I/O times, your
JVM has to be at a certain code level as well. Currently these JVM enhancements have
only been implemented as a feature of IBM’s JVMs, more specifically in:
• JDK 131 SR 5, available now, for non-z/OS platforms
• JDK131 (cm131s), and JDK 141 SR1(cm141) for the z/OS platform
At the time of writing of this publication, the Sun JVM 1.4.1 does not have this feature. If the
DB2SystemMonitor does not find the accurate timer functionality in the JVM, it throws an
SQLException:
com.ibm.db2.jcc.a.SqlException: Network IO time in microseconds is not
available
Or:
com.ibm.db2.jcc.a.SqlException: Core driver time in microseconds is not
available
The ServerTime is based on the DRDA server elapsed time feature that was introduced in
DB2 for z/OS and OS/390 Version 7. As a consequence, DRDA has to be involved in the
transaction to get the ServerTime. If you are running your application local to DB2, you
always get 0 (zero) from the getServerTimeMicros method. This is true both for DB2 LUW
and DB2 for z/OS when running the JCC Driver as a Type 2 driver. When run as a Type 4
driver, JCC always uses DRDA to connect to the server.
Please also note that the ApplicationTime is in milliseconds (using
System.currentTimeMillis), whereas the other times are presented in microseconds.
4-34 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty The SQLJ API allows an application to issue a “new” operation that makes a new copy of
the static cursor that can be used with different host variable input on OPEN. Since a
cursor has to be unique, based on the fully-qualified package name, consistency token,
and section number, it is currently (V7) not possible for an SQLJ application to have more
than one instance of an open cursor. In V7, a second instance or OPEN of the same cursor
would not be allowed in DB2, and would result in SQLCODE -502. Allowing multiple opens
for the same cursor in V8 eases this problem.
SAVEPOINT Support
The DB2 Universal Driver supports the SAVEPOINT mechanism specified in the JDBC 3.0
specification.
If JDBC statement Connection.setSavepoint() is issued, the Universal Driver immediately
executes the SQL statement ’SAVEPOINT savepoint_name ON ROLLBACK RETAIN
CURSORS’. ’RETAIN CURSORS’ is used as default and there is no JDBC or proprietary
API available to drop cursors on rollback to savepoint. This behavior is identical to the
regular usage of savepoints in DB2 for z/OS.
There are, however two differences:
• You cannot specify the UNIQUE keyword.
• Even though the RETAIN CURSORS attribute is used when rolling back to a savepoint,
a cursor will be invalidated by the server under some circumstances regardless. This
can occur when the cursor relies on DDL that was created after the savepoint and
subsequently rolled back. Currently, such a “ROLLBACK TO SAVEPOINT” request may
not immediately invalidate a cursor that is cached in the driver. If an application is
scrolling through cached cursor data, the driver is not able to determine that the cursor
has been invalidated due to a rollback to savepoint until a server request is made for
more data.
To release a savepoint, you can use the JDBC Connection.releaseSavepoint()
implementation, which immediately executes the SQL statement ’RELEASE SAVEPOINT
savepoint_name’.
Auto-generated Keys
Like many other database servers, DB2 does have a mechanism that automatically
generates a new, unique key value whenever a row is inserted. In the case of DB2, you
declare a column to be an IDENTITY column.
Of course, after inserting a new row into a table containing an IDENTITY column, you
probably want to retrieve the value that DB2 generated for that column (you might need it,
for example, as a foreign key value in a dependant table). You can use the DB2-specific
function IDENTITY_VAL_LOCAL() to retrieve the last value generated.
Beginning with JDBC 3.0, however, there is a mechanism that allows an application to
retrieve the value without using vendor-specific extensions, and this API is supported by
4-36 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty • The Universal Driver throws an exception if you invoke either of the following methods:
- PreparedStatement.execute (String sql, int autoGeneratedKeys)
- PreparedStatement.execute (String sql, int[] columnIndexes)
- PreparedStatement.executeUpdate (String sql, int autoGeneratedKeys)
- PreparedStatement.executeUpdate (String sql, int[] columnIndexes)
4-38 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Timeline of SQLJ/JDBC
SQLJ-Part 0 SQLJ-Part 2
ANSI ANSI
Notes:
SQLJ is a series of specifications for ways to use the Java programming language with
SQL. It was developed by IBM, Oracle, and Tandem to provide an alternative to the
dynamic JDBC specification. SQLJ is not part of J2EE, but is part of the SQL-1999
ISO/ANSI standard.
The SQLJ specification consists of two parts:
• ISO/IEC 9075 Part 10: Object Language Bindings (SQL/OLB). (This specification
sometimes also referred to as SQLJ Part 0.)
• ISO/IEC 9075 Part 13: Routines and Types Using the Java Programming Language
(SQL/JRT). This part is the specification for SQL routines using Java.
Although JDBC came up prior to SQLJ, SQLJ support has been around for a while now. It
provides superior performance, because it uses static SQL (in contrast to JDBC, which
uses dynamic SQL), and it uses a powerful authorization model (like static SQL in other
programming languages). But prior to the Universal Driver, the development and
deployment of SQLJ applications was somewhat cumbersome.
4-40 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
Why use SQLJ?
Static SQL performance for Java applications
Significant performance advantage over JDBC
Productivity/manageability
Less code written by the application programmer
Resulting code is easier to maintain
Static SQL packages for accounting/monitoring
Static SQL locks in access path, so that access path changes don't occur
without a conscious choice
Notes:
There are some major differences between SQLJ and JDBC, with a lot of good reasons for
using SQLJ over JDBC for Java application development.
tables that are used by the application. The recipient of those privileges can do anything
that is allowed by those privileges, for example, using them outside the application the
authorizations were originally granted for. The application cannot control what the user can
do.
Productivity/Manageability
SQLJ code is generally more compact and error-free than JDBC code.
SQLJ is easier to code
The first advantage of SQLJ over JDBC is that SQLJ is easier to code, to read and to
maintain. This is an effect of SQLJ being not an API, but a language extension, providing
for better integration of the SQL code with the Java code. The developer can concentrate
on the logic of individual SQL statements without having to worry about wrapping them in
API calls. This simplicity is helped by the ease by which host variables are defined,
maintained, and accessed within an SQLJ program.
As SQLJ statements are coded in purely SQL syntax, without the need to wrap them in a
Java method, the programs themselves are easier to read, making them easier to maintain.
Also, since some of the boilerplate code which has to be coded explicitly in JDBC is
generated automatically in SQLJ, programs written in SQLJ tend to be shorter than
equivalent JDBC programs.
We give some examples on the next couple of pages (Figure 4-17 "Comparing SQLJ and
JDBC Coding" on page 4-44).
SQLJ catches errors sooner
Not only is SQLJ typically more concise and easier to read than JDBC, it also helps you to
detect errors in your SQL statements earlier in the program development process.
JDBC is a pure call-level API. This means that the Java compiler does not know anything
about SQL statements at all — they only appear as arguments to method calls. If one of
your statements is in error, you will not catch that error until runtime when the database
complains about it.
SQLJ, on the other hand, is not an API but a language extension. This means that the
SQLJ tooling is aware of SQL statements in your program, and checks them for correct
syntax and authorization during the program development process.
It also enforces strong typing between iterator columns and host variables. In other words,
it prevents you, for example, from assigning a numeric column to a String host variable.
Common errors that will be caught earlier with SQLJ, but will only be detected at runtime by
JDBC, include these:
• Misspelled SQL statements (for example, INERT instead of INSERT)
The SQLJ translator will catch and report this error. However, the translator does not
parse the entire SQL statement, so most syntax errors will only be detected by the
profile customizer.
4-42 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
SQLJ
JDBC
String addr; String name;
....
PreparedStatement pstmt = con.prepareStatement(
“SELECT address FROM emp WHERE name = ?”);
pstmt.setString(1, name);
ResultSet names = pstmt.executeQuery();
names.next();
addr = names.getString(1);
names.close();
pstmt.close();
© Copyright IBM Corporation 2004
Notes:
The visual above, as well as Example 4-6 below, compare how to code a single-row query,
that is, a query returning exactly one row of data.
In JDBC, we have to open a result set, advance it to the next (and only) row, and retrieve
the values using getxx methods. Also, we have to check if exactly one row has been found.
In SQLJ, on the other hand, we can use the SELECT INTO syntax; an SQLException will
be thrown if more than one row was found.
By the way, the SQLJ version is more efficient as well. JDBC has to make four calls into
DB2 (prepare statement, fetch row, fetch row, close statement), whereas the SQLJ version
only has to do one single SELECT INTO call.
Note: The SQLJ version will only be more efficient when the program has been
customized and bound. If it is running uncustomized, it will emulate SELECT INTO by
using result sets under the covers, just like the JDBC version,
4-44 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
_____________________________________________________________________
As you can see, coding applications in SQLJ requires less code than coding in JDBC.
As a result, SQLJ statements that are directly embedded in Java programs show more
concise and easy-to-read code than JDBC. This makes coding and code maintenance
easier, especially when done by somebody that did not write the original code.
Below are some additional coding examples.
Example 4-7 shows a multi-row query. The amount of coding is similar with JDBC and
SQLJ. Note, however, that the binding between statement and host variables in SQLJ is
much tighter than between parameter markers and the setBigDecimal methods in JDBC.
Also, JDBC uses statement handles that must be explicitly closed when you are done with
the statement. In SQLJ, the translator automatically generates the cleanup code for you.
(Iterators must still be closed explicitly, of course.)
Example 4-7. JDBC versus SQLJ: Multi-row Query
PreparedStatement stmt =
conn.prepareStatement( EmployeeIterator iter;
"SELECT LASTNAME" #sql iter = {
+ " , FIRSTNME" SELECT LASTNAME
+ " , SALARY" , FIRSTNME
+ " FROM DSN8810.EMP" , SALARY
+ " WHERE SALARY BETWEEN ? AND ?"); FROM DSN8810.EMP
WHERE SALARY BETWEEN :min AND :max
stmt.setBigDecimal(1, min); };
stmt.setBigDecimal(2, max);
while (true) {
ResultSet rs = stmt.executeQuery(); #sql {
while (rs.next()) { FETCH :iter
lastname = rs.getString(1); INTO :lastname, :firstname, :salary
firstname = rs.getString(2); };
salary = rs.getBigDecimal(3); if (iter.endFetch()) break;
// Print row... // Print row...
} }
rs.close(); iter.close();
stmt.close();
_____________________________________________________________________
As our last example, consider the INSERT statement in Example 4-8. Again, the SQLJ
code is easier to read and to maintain.
Example 4-8. JDBC versus SQLJ: INSERT Statement
_____________________________________________________________________
Also note that people with a (static) embedded SQL programming background, for
example, in COBOL, will find it very easy to start using SQLJ, as iterators and SELECT
INTO constructs look very much like those in embedded SQL.
4-46 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
Old SQLJ Preparation Process
Compile
Modified Java
source class file
.Java .class
SQLJ
Source Translator
program
.sqlj
Notes:
SQLJ Preparation Process using the Legacy DB2 for z/OS JDBC Driver
The visual shows the Non-Universal Driver SQLJ program preparation process. After
creating the serialized profile by means of the SQLJ translator, you have to execute the
db2profc utility to create a DBRM, and then bind the DBRM into a set of packages (one
package for each isolation level, UR, CS, RS, and RR). Even if you prefer to develop your
Java applications on a workstation, the (uncustomized) serialized profile has to be shipped
to the host before you can run the db2profc utility. This is because db2profc creates
DBRMs, which are a unique feature of DB2 UDB for z/OS and OS/390 only.
The db2profc utility also customizes the serialized profile (by updating it). Unfortunately,
after customization, the profile is no longer portable.
4-48 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
New SQLJ Preparation Process
input to
produces
SQLJ translator
(sqlj)
produces 1
3
1
input to
.Java file Profile
.ser files updates customizer
(db2sqljcustomize)
input to
2
produces
4
Runtime uses
environment package DB2 for z/OS
.class files
Notes:
With the new Universal Driver, DBRMs (or .bnd files) are no longer used, as shown in the
visual. Using the db2sqljcustomize command, you can customize the serialized profile
and bind the packages at the same time against the target DB2 system. With the Type 4
driver, we connect from any platform directly to the target DB2 system, do the online
checking (highly recommended), and bind the packages on the target DB2 system.
When you develop on the workstation, for example, using WebSphere Studio Application
Developer (WSAD), you may now use the Type 4 driver to bind the packages against the
DB2 UDB for z/OS system. You no longer have to ship the uncustomized profile to the
z/OS system for customization.
In addition, the new Universal Driver customizes the serialized profile in such a way that it
remains portable. You can execute using the same customized program files against any
platform, as long as the db2sqlbind utility was used to connect to the new location and bind
the correct program packages.
WSAD Version 5.1 provides support for this new application development scheme used by
the Universal Driver for SQLJ and JDBC. Again, for more information, see DB2 for z/OS
and OS/390: Ready for Java, SG24 6435.
4-50 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty 4.3 z/OS Application Connectivity to DB2 for z/OS and OS/390
z/OS Application
Connectivity to DB2 for z/OS and OS/390
z/OS image z/OS image
USS
DB2 for z/OS DB2 for z/OS
Other
WAS JDBC Other DDF DDF address
address
T2 spaces
spaces
driver
DB2A DB2B
Current
configuration
Configuration with IBM z/OS Application Connectivity to DB2 for z/OS and
OS/390
z/OS image z/OS image
USS
DB2 for z/OS
Other
WAS JDBC DDF address
T4 spaces
XA
driver DB2B
Figure 4-20. z/OS Application Connectivity to DB2 for z/OS and OS/390 CG381.0
Notes:
z/OS Application Connectivity to DB2 for z/OS and OS/390 is a no-charge, optional feature
of DB2 Universal Database Server for z/OS and OS/390 Version 7, as well as DB2 for z/OS
Version 8. The FMID is HDDA210 for both DB2 versions.
4-52 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty • It delivers robust connectivity to the latest DB2 for z/OS and WebSphere Application
Server for z/OS (V5.0.2).
• It enables custom Java applications that do not require an application server to run in a
remote partition and connect to DB2 z/OS.
DB2 UDB for OS/390 and z/OS V7 servers also do not have built-in support for distributed
transactions that implement the XA specification. In this case, the DB2 Universal JDBC
Driver supports distributed transactions (two-phase commit) through emulation.
DB2 UDB for z/OS Version 8 has native XA two-phase commit support in DRDA.
Early measurements have shown that using the z/OS Application Connectivity to DB2 for
z/OS and OS/390 (T4 XA driver for z/OS and OS/390) provides better performance than
having to go through a local T2 driver attached to a local DB2 subsystem that routes all
SQL requests to the remote DB2 database server.
Note that the 2-phase commit support is also provided with the Universal Driver that ships
with DB2 for z/OS Version 8 (FMID JDB8812).
4-54 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Notes:
For the past few years, XML has been increasingly become the de facto data format on the
Internet, on corporate Intranets, and for data exchange. In DB2 UDB for OS/390 and z/OS
V7, if you need to create XML data from traditional relational databases (this is called XML
publishing or XML composition), you must create your own application that converts the
DB2 data to the XML format, or use DB2 XML Extender. DB2 V8 provides you with an
additional option. DB2 V8 contains a set of brand new built-in functions to help with XML
publishing. These DB2 built-in functions reduce your application development efforts in
generating XML data from relational data with high performance, and enable the
development of lightweight applications.
This set of built-in functions are part of a set of extensions to the SQL language, called
SQL/XML. SQL/XML is an emerging standard, and is part of the ANSI and ISO SQL
standard, describing the ways the database language SQL can be used in conjunction with
XML. The definition of SQL/XML is driven in part by the SQLX Group, of which IBM is a
very active member.
4-56 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty The SQL/XML publishing functions are built-in DB2 functions. They run inside the DB2
address spaces, unlike external user-defined functions (UDFs) that run in a WLM managed
address space outside of DB2, like the ones used by DB2 XML Extender. The fact that the
XML publishing functions are built into the DB2 engine gives them better performance. In
addition, much extra work has been done inside the DB2 engine, for example, to make the
tagging as efficient as possible.
Seven new built-in functions related to XML publishing can be used with DB2 V8:
• Cast function:
- XML2CLOB
• Scalar functions:
- XMLELEMENT
- XMLATTRIBUTES
- XMLFOREST
- XMLCONCAT
- XMLNAMESPACES
• Aggregate function:
- XMLAGG
Refer to the subsequent pages within this unit for more information about how to use the
new XML functions.
The visual above shows the power of the SQL/XML functions. The XML query statement to
generate this HTML table is shown in Example 4-9.
Example 4-9. Complex XML Query Example
Note that we use SQL/XML to generate the HTML tags necessary to be able to display the
result of the query as an HTML table.
These SQL/XML functions can be used instead of the XML publishing functions that are
provided by DB2 XML Extender. The DB2 XML Extender publishing functions will still be
available for you to use, but the SQL/XML functions provide more flexibility, and as they are
built into the DB2 engine, they are likely to provide better performance.
SQL/XML publishing functions can only be used to generate XML documents (or
fragments) from data that is stored in relational tables. They cannot be used to store XML
data in DB2 tables. For those functions, you need to use DB2 XML Extender’s XML
collection and XML column functionality. The same is true for XML transformation
functions. SQL/XML is not intended to be used for transformation. Again, you can use DB2
XML Extender for that purpose. As before, DB2 XML Extender ships as part of DB2 at no
extra charge.
Note: These SQL/XML functions do not require the XML Toolkit for z/OS.
4-58 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
XML2CLOB
XML2CLOB ( XML-value-expression )
Notes:
The XML data type is a new data type introduced by DB2 V8. However, it is not like any
other existing data type. It is a so-called transient data type. Transient means that this data
type only exists during query processing. There is no persistent data of this type and it is
not an external data type that can be declared in application programs. In other words, the
XML data type cannot be stored in a database or returned to an application.
To allow an application to deal with the result of a SQL/XML function (that results in a value
with an XML data type), DB2 supplies a new conversion function XML2CLOB, which
converts an XML value into a CLOB.
There are some restrictions that apply to the transient XML data type:
• A query result cannot contain this type.
• The columns of a view cannot be of this type.
• XML data cannot be used in SORT (GROUP BY and ORDER BY).
• XML data cannot be used in predicates.
• The XML data type is not compatible with any other data types. The only cast function
that may be used is XML2CLOB.
The resulting CLOB is MIXED character data and the CCSID is the mixed CCSID for
UNICODE encoding scheme UTF-8 (CCSID 1208). The maximum length of the resulting
CLOB is 2 GB -1.
4-60 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
XMLELEMENT
, XML-element-content
SELECT E.EMPNO,XML2CLOB(
XMLELEMENT ( NAME "EMP", E.FIRSTNME||' '||E.LASTNAME
)) AS "RESULT"
FROM DSN8810.EMP E;
---------+---------+---------+---------+---------+---------
EMPNO RESULT
---------+---------+---------+---------+---------+---------
000010 <EMP>CHRISTINE HAAS</EMP>
000020 <EMP>MICHAEL THOMPSON</EMP>
000030 <EMP>SALLY KWAN</EMP>
000050 <EMP>JOHN GEYER</EMP>
Notes:
The visual shows the syntax and a usage example of the XMLELEMENT built-in function.
The XMLELEMENT function returns an XML element from one or more arguments. The
arguments can be:
• An element name
• An optional collection of attributes
• Zero or more arguments that make up the element’s content.
The result type is the transient XML data type.
Let us now take a look at the components of the XMLELEMENT function:
• NAME:
NAME keyword marks the identifier that is supplied to XMLELEMENT for the element
name.
• XML-element-name:
Specifies an identifier that is used as the XML element name. (No mapping is applied to
this identifier.)
• XML-namespaces:
Specifies the XML namespace for the XML element. See the XMLNAMESPACES
function later in the unit for details.
• XML-attributes:
Specifies the attributes for the XML element. See the XMLATTRIBUTES function below.
• XML-element-content:
Specifies an expression making up the XML element content. The expression
cannot be:
- A ROWID
- A character string defined with the FOR BIT DATA attribute
- A BLOB
- A distinct type sourced on these types
If the result of the expression is an SQL value, it is mapped to the XML value according
to the mapping rules from an SQL value to an XML value (see , “Mappings from SQL to
XML” on page 4-77 for details).
If multiple XML-element-contents are specified, their XML values are concatenated to
form the content of the XML element. If the result of an expression is a null value, it is
not included in the concatenation result. If all the results of the arguments are the null
value, then the result of XMLELEMENT is an element with empty content.
The result of the XMLELEMENT function cannot be null.
Refer to the SELECT statement shown on the visual for a short and simple example of the
use of the XML2CLOB and XMLELEMENT function. As you can see in the SQL statement
above, the XMLELEMENT function is used to create an element called EMP, which
contains the concatenation of the contents of columns FIRSTNME and LASTNAME.
Example 4-10 shows a more complex SELECT statement using multiple elements.
Example 4-10. Nested Elements
4-62 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty As you can see, element <Emp> itself contains two nested elements <name> and
<hiredate>. Example 4-11 shows the result of this SELECT statement:
Example 4-11. Output of Nested Elements Example
---------+---------+---------+---------+---------+---------+---------+---------+
EMPNO RESULT
---------+---------+---------+---------+---------+---------+---------+---------+
000010 <EMP><NAME>CHRISTINE HAAS</NAME><HIREDATE>1965-01-01</HIREDATE></EMP>
000020 <EMP><NAME>MICHAEL THOMPSON</NAME><HIREDATE>1973-10-10</HIREDATE></EMP>
000030 <EMP><NAME>SALLY KWAN</NAME><HIREDATE>1975-04-05</HIREDATE></EMP>
______________________________________________________________________
XMLATTRIBUTES
XML-attributes: ,
XMLATTRIBUTES ( XM-attribute-value )
AS XML-attribute-name
SELECT E.EMPNO,XML2CLOB(
XMLELEMENT ( NAME "EMP",
XMLATTRIBUTES( E.EMPNO, E.FIRSTNME||' '||E.LASTNAME AS "NAME")
) )
AS "RESULT"
FROM DSN8810.EMP E;
---------+---------+---------+---------+---------+---------+---------+--
EMPNO RESULT
---------+---------+---------+---------+---------+---------+---------+--
000010 <EMP EMPNO="000010" NAME="CHRISTINE HAAS"></EMP>
000020 <EMP EMPNO="000020" NAME="MICHAEL THOMPSON"></EMP>
000030 <EMP EMPNO="000030" NAME="SALLY KWAN"></EMP>
Notes:
This function constructs XML attributes from the arguments. It can only be used as the
second argument to the XMLELEMENT function.
• XML-attribute-value:
Specifies the value of the attribute. This expression cannot be:
- A ROWID
- A character string defined with the FOR BIT DATA attribute
- A BLOB
- A distinct type sourced on these types, or XML
The result of the expression is mapped to an XML value according to the mapping rules
from an SQL value to an XML value. If the value is null, the corresponding XML attribute
is not included in the XML element.
• AS XML-attribute-name:
Specifies an identifier that is used as the attribute name. (No mapping is applied to map
the identifier to an XML name.)
4-64 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
XMLFOREST
,
XMLFOREST ( expression )
XML-namespaces , XML-element-name
AS
SELECT E.EMPNO,XML2CLOB(
XMLELEMENT ( NAME "EMP",
XMLATTRIBUTES(E.FIRSTNME||' '||E.LASTNAME AS "NAME"),
XMLFOREST(E.HIREDATE, E.JOB AS "PROFESSION")
) )
AS "RESULT"
FROM DSN8810.EMP E;
---------+---------+---------+---------+---------+---------+---------+--
EMPNO RESULT
---------+---------+---------+---------+---------+---------+---------+--
000010 <EMP NAME="CHRISTINE HAAS">
<HIREDATE>1965-01-01</HIREDATE><PROFESSION>PRES</PROFESSION>
</EMP>
000020 <EMP NAME="MICHAEL THOMPSON">
<HIREDATE>1973-10-10</HIREDATE><PROFESSION>MANAGER</PROFESSION>
</EMP>
000030 <EMP NAME="SALLY KWAN">
<HIREDATE>1975-04-05</HIREDATE><PROFESSION>MANAGER</PROFESSION>
</EMP>
000050 <EMP NAME="JOHN GEYER">
<HIREDATE>1949-08-17</HIREDATE><PROFESSION>MANAGER</PROFESSION>
</EMP>
© Copyright IBM Corporation 2004
Notes:
The XMLFOREST function returns a bunch of XML elements that all share a specific
pattern from a list of expressions, one element for each argument.
• Expression:
Specifies an expression that is used as an XML element content. The result of the
expression is mapped to an XML value according to the mapping rules from an SQL
value to an XML value. The expression cannot be:
- A ROWID
- A character string defined with the FOR BIT DATA attribute
- A BLOB
- A distinct type sourced on these types
If the result of an expression is null, then it is not included in the concatenation result for
XMLFOREST.
4-66 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty • AS XML-element-name:
Specifies an identifier that is used for the XML element name. (No mapping is applied to
map the identifier to an XML name.)
If XML-element-name is not specified, the expression must be a column name, and the
element name will be created from the column name. The fully escaped mapping is
used to map the column name to an XML element name.
• XML-namespaces:
Specifies the XML namespace for the XML element. See the XMLNAMESPACES
function later in the unit for details.
The figure above shows the syntax diagram and a usage example. This sample generates
an EMP element for each employee. It uses the employee name as its attribute and two
subelements that are generated from columns HIREDATE and JOB by using XMLFOREST
as its content. The element names for the two subelements are ‘HIREDATE’ and
‘PROFESSION’.
You can also produce the same result if you use two additional nested XMLELEMENT
statements instead of XMLFOREST. Example 4-12 shows a coding example for that.
Example 4-12. Nested Elements instead of XMLFOREST
SELECT E.EMPNO,XML2CLOB(
XMLELEMENT ( NAME "EMP",
XMLATTRIBUTES(E.FIRSTNME||' '||E.LASTNAME AS "NAME"),
XMLELEMENT(NAME "HIREDATE", E.HIREDATE),
XMLELEMENT(NAME "PROFESSION", E.JOB)
) )
AS "RESULT"
FROM DSN8810.EMP E;
______________________________________________________________________
Note: The generated element names are folded to uppercase. If you want them to be
lowercase or mixed, you must use quotes (“department”)
Note: In the examples used in this topic, there would be a difference between
XMLFOREST and XMLELEMENT if there were NULL values in HIREDATE and JOB.
XMLFOREST ignores the NULL value (not included in the result) and XMLELEMENT
results in an empty element.
XMLCONCAT
SELECT E.EMPNO,XML2CLOB(
XMLCONCAT (
XMLELEMENT(NAME "FIRST",E.FIRSTNME),
XMLELEMENT(NAME "LAST",E.LASTNAME)
) )
AS "RESULT"
FROM DSN8810.EMP E;
---------+---------+---------+---------+---------+--
EMPNO RESULT
---------+---------+---------+---------+---------+--
000010 <FIRST>CHRISTINE</FIRST><LAST>HAAS</LAST>
000020 <FIRST>MICHAEL</FIRST><LAST>THOMPSON</LAST>
000030 <FIRST>SALLY</FIRST><LAST>KWAN</LAST>
000050 <FIRST>JOHN</FIRST><LAST>GEYER</LAST>
Notes:
The XMLCONCAT function returns a forest of XML elements that are generated from a
concatenation of two or more arguments. A syntax diagram is shown in the figure above.
In this coding example:
• XML-value-expression:
Specifies an expression whose value is the XML data type. If the value of
XML-value-expression is null, it is not included in the concatenation.
The result type of XMLCONCAT is the transient XML data type. If all of the arguments
are null, then the null value is returned.
Example 4-13 shows an example of XMLFOREST, which produces the same result as
the XMLCONCAT example shown on the visual.
Example 4-13. XMLFOREST instead of XMLCONCAT
SELECT E.EMPNO,XML2CLOB(
XMLFOREST(E.FIRSTNME AS "FIRST", E.LASTNAME AS "LAST")
)
4-68 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty AS "RESULT"
FROM DSN8810.EMP E WHERE LASTNAME = 'HAAS';
______________________________________________________________________
Note: One reason for using XMLCONCAT instead of XMLFOREST is that XMLFOREST
cannot generate XML elements with attributes. For this purpose, use XMLELEMENT.
XMLAGG
XMLAGG ( XML-value-expression )
,
ASC
ORDER BY sort-key
sort-key DESC
column name
expression
SELECT XML2CLOB(
XMLELEMENT(NAME "PROFESSION",
XMLATTRIBUTES(E.JOB AS "NAME"),
XMLAGG(XMLELEMENT (NAME "EMP", E.LASTNAME)
ORDER BY E.LASTNAME
)))
AS "PROFESSIONLIST"
FROM DSN8810.EMP E
GROUP BY JOB;
---------+---------+---------+---------+---------+---------+---------+---------+---------+------------
PROFESSIONLIST
---------+---------+---------+---------+---------+---------+---------+---------+---------+------------
<PROFESSION NAME="ANALYST "><EMP>NATZ</EMP><EMP>NICHOLLS</EMP><EMP>QUINTTANA</EMP></PROFESSION>
<PROFESSION NAME="CLERK "><EMP>JEFFERSON</EMP><EMP>JOHNSON</EMP><EMP>MARINO</EMP><EMP>MONTEVERDE</EMP>
<EMP>O'CONNELL</EMP><EMP>ORLANDO</EMP></PROFESSION>
<PROFESSION NAME="DESIGNER"><EMP>ADAMSON</EMP><EMP>BROWN</EMP><EMP>JOHN</EMP><EMP>JONES</EMP><EMP>LUTZ
</EMP><EMP>PIANKA</EMP><EMP>SCOUTTEN</EMP><EMP>WALKER</EMP><EMP>YAMAMOTO
</EMP><EMP>YOSHIMURA</EMP></PROFESSION>
Notes:
The visual shows a syntax diagram and a usage example for the XMLAGG function, which
returns a concatenation of XML elements from a collection of XLM elements.
The XMLAGG function has one argument with an optional ORDER BY clause. The
ORDER BY clause specifies the ordering of the rows from the same grouping set to be
processed in the aggregation. If the ORDER BY clause is not specified, or the ORDER BY
clause cannot differentiate the order of the sort key value, the order of rows from the same
group to be processed in the aggregation is arbitrary.
• XML-value-expression:
Specifies an expression whose value is the transient XML data type. Different from
other column functions, a scalar fullselect is allowed as an argument to XMLAGG. The
function is applied to the set of values derived from the argument values by the
elimination of null values. If all inputs are null, or there are no rows, then the result of
XMLAGG is null.
4-70 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty • sort-key:
Specifies a sort-key that is either a column name or an expression. The ordering is
based on the SQL values of the sort keys, which may or may not be used in the XML
value expression. If the sort-key is a constant, it does not refer to the position of the
output column as in the ORDER BY clause of a SELECT statement, and it has no
impact on the ordering. You cannot use a CLOB value as a sort key. A character string
expression cannot have a length greater than 4000 bytes.
If the sort key is a character string that uses an encoding scheme other than Unicode,
the ordering might be different. For example, a column PRODCODE uses EBCDIC. For
two values "P001" and "PA01", the relationship "P001" > "PA01" is true in EBCDIC,
whereas in Unicode UTF-8 "P001" < "PA01" is true. If the same sort key values are
used in the XML value expression, use the CAST function to convert the sort key to
Unicode to keep the ordering of XML values consistent with that of the sort key.
The result type is XML. The result can be NULL.
In this example, the employees are grouped by their department. We generate a
department element for each department and nest all the emp elements for employees in
each department. In addition, all emp elements are ordered by lname.
XMLNAMESPACES (1 of 2)
XMLNAMESPACES ( )
XML-namespace-decl-item
XML-namespace-decl-item:
XML-namespace-uri AS XML-namespace-prefix
DEFAULT XML-namespace-uri
NO DEFAULT
Notes:
An XML namespace is a collection of element type and attribute names. The namespace is
identified by a unique name, which is a Uniform Resource Identifier (URI). If you use XML
namespaces, any element type or attribute name can be uniquely identified by a two-part
name (also known as the expanded name): the name of its XML namespace and its local
name.
URI references are often inconveniently long, so expanded names are not used directly to
name elements and attributes in XML documents. Instead, qualified names are used. A
qualified name is a name subject to namespace interpretation. In documents conforming to
this specification, element and attribute names appear as qualified names. Syntactically,
they are either prefixed names or unprefixed names.
QName(Qualified Names):
• PrefixedName: prefix:local part
• UnprefixedName: local part
• Local part: local name (NCName or “non-colonized” name)
4-72 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty The URIs used as XML namespace names are just identifiers, which are not guaranteed to
point to schemas, information about the namespace, or anything else. That is, the XML
namespaces recommendation does not define anything except a two-part naming system
for element types and attributes.
Note: Do not mix up URIs and URLs. URIs are Uniform Resource Identifiers, which are
generally not used for lookup, but as reference. In contrast to that, URLs are Uniform
Resource Locators, which are, for example, used to locate Web addresses.
The usage of XML namespaces allows people to do several things, such as:
• Combine fragments from different documents without any naming conflicts.
• Write reusable code modules that can be invoked for specific elements and attributes.
• Define elements and attributes that can be reused in other schemas or instance
documents without fear of name collisions.
Example 4-14 and Example 4-15 show two XML documents both using Address as XML
element name.
Example 4-14. Element Address meaning #1
______________________________________________________________________
______________________________________________________________________
Both XML documents use the same element type Address. However, the meaning of both
element types is different and should be interpreted differently by an application. This
construct is fine, as long as both element types are used in separate documents. Once
they are combined in one document, an application would not know which Address
element is to be processed.
One solution for this problem could be renaming one of the element types. If you refer to
our example above, the Address element type of document one could be renamed to
STREETAddress. This, however, is not very satisfactory. As a long term solution, a much
better way to deal with this problem is to assign different XML namespaces to each
document.
The XML namespace declaration attribute uses the following syntax:
• xmlns:prefix=“URI”
To use a default namespace, use the following syntax:
• xmlns=“URI”
When you do not allow for a default namespace, use:
• xmlns=“”
In these expressions, the URI reference is the namespace name.
As a result, the combination of both documents of Example 4-14 and Example 4-15 shown
above could look as shown in Example 4-16:
Example 4-16. XML Namespace Usage
<Department>
<Name>DVS1</Name>
<addr:Address xmlns:addr="http://www.ibm.de/Essen">
<addr:Street>Theodor-Althoff-Str. 1</addr:Street>
<addr:City>Essen</addr:City>
<addr:State>Nordrhein-Westfalen</addr:State>
<addr:Country>Germany</addr:Country>
<addr:PostalCode>D-45133</addr:PostalCode>
</addr:Address>
<serv:Server xmlns:serv="http://www.ibm.de/servers">
<serv:Name>OurWebServer</serv:Name>
<serv:Address>123.45.67.8</serv:Address>
</serv:Server>
</Department>
______________________________________________________________________
The previous visual shows the syntax of the XMLNAMESPACES publishing function in
DB2 for z/OS.
XMLNAMESPACES can only be used within the XMLELEMENT or XMLFOREST function.
In XMLELEMENT, XMLNAMESPACES can only be used as the second argument. While in
XMLFOREST, it can only be the first argument.
4-74 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
XMLNAMESPACES (2 of 2)
SELECT E.EMPNO,
XML2CLOB(XMLELEMENT(NAME "BO:EMPLOYEE",
XMLNAMESPACES('NAMESP' AS "BO"),
XMLATTRIBUTES(E.LASTNAME, E.FIRSTNME),
XMLELEMENT(NAME "BO:HIREDATE", E.HIREDATE)))
FROM DSN8810.EMP E
WHERE E.EDLEVEL = 12;
---------+---------+---------+---------+---------+---------+---------+--
---------+---------+---------+---------+---------+---------+---------+--
000290 <BO:EMPLOYEE
xmlns:BO="NAMESP" LASTNAME="PARKER" FIRSTNME="JOHN">
<BO:HIREDATE>1980-05-30</BO:HIREDATE>
</BO:EMPLOYEE>
000310 <BO:EMPLOYEE
xmlns:BO="NAMESP" LASTNAME="SETRIGHT" FIRSTNME="MAUDE">
<BO:HIREDATE>1964-09-12</BO:HIREDATE>
</BO:EMPLOYEE>
Notes:
The visual above shows a simple example of one of the two possibilities to use the
XMLNAMESPACES function, that is, as the second argument of XMLELEMENT. The
name of the element is <BO:EMPLOYEE>. The namespace, which is called BO, contains
two attributes, FIRSTNAME and LASTNAME, and one additional element, BO:HIREDATE.
The usage of XMLNAMESPACES in conjunction with XMLFOREST is shown in
Example 4-17:
Example 4-17. XMLNAMESPACES in XMLFOREST
SELECT empno,
XML2CLOB(XMLFOREST(
XMLNAMESPACES(DEFAULT ’http://hr.org’, ’http://fed.gov’ AS "d"),
lastname, job AS "d:job"))
FROM employee where edlevel = 12;
The result of the query would be similar to the result shown in Example 4-18:
Example 4-18. RESULT of XMLNAMESPACES in XMLFOREST
00029 <LASTNAME
xmlns="http://hr.org"
xmlns:d="http://fed.gov">PARKER
</LASTNAME>
<d:job
xmlns="http://hr.org"
xmlns:d="http://fed.gov"> OPERATOR
</d:job>
00031 <LASTNAME
xmlns="http://hr.org"
xmlns:d="http://fed.gov"> SETRIGHT
</LASTNAME>
<d:job
xmlns="http://hr.org"
xmlns:d="http://fed.gov"> OPERATOR
</d:job>
_____________________________________________________________________
These XML publishing functions complement the functionality delivered by the XML
Extender product. For more information, see DB2 UDB for z/OS Version 8 XML Extender
Administration and Programming, SC18-7431.
4-76 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
Mapping Character Sets
x'4A414D4553 JAMES
x'454D505F4E4F' EMP_NO
x'C2575F313233', _123 (read: blank underscore one two three)
Notes:
In Example 4-19, we use a DB2 subsystem with SCCSID 37, and the CCSID of the host
emulator is also set to 37. The EBCDIC table we use is called BSXML. The table has a
single character column COL1, and contains a single row with the value ‘§_123’.
Example 4-19. Create Table
_____________________________________________________________________
The top query in Example 4-20 shows the result of using XMLELEMENT on this column.
As you can see, even though the data was in UTF-8 format while processing the SQL/XML
functions, the result gets converted back to application encoding scheme (EBCDIC), and
the result is readable.
Example 4-20. Using XMLELEMENT
-------------------------------
<EMP>§_123 </EMP>
4CDD6B6FFF44444444444444446CDD6
C547E5D123000000000000000C1547E
------------------------------
SELECT HEX ( CHAR (XML2CLOB ( XMLELEMENT ( NAME "EMP", COL1 ))))AS "RESULT"
FROM BSXML
---------+---------+---------+---------+---------+---------+----
RESULT
---------+---------+---------+---------+---------+---------+----
3C454D503EC2A75F3132332020202020202020202020202020203C2F454D503E
DSNE610I NUMBER OF ROWS DISPLAYED IS 1
DSNE616I STATEMENT EXECUTION WAS SUCCESSFUL, SQLCODE IS 100
---------+---------+---------+---------+---------+---------+----
______________________________________________________________________
4-78 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty The bottom query shows the result of the SQL/XML function before it is translated back to
the application encoding scheme (by converting the CLOB to CHAR, and converting the
character string to hexadecimal). There you can see that the data is in UTF-8 format, with:
• x’C2A7’ = ‘§’
• x’5F’ = ‘_’
• x’31’ = ‘1’
• x’32’ = ‘2’
• x’33’ = ‘3’
This demonstrates that the data (stored in the table in EBCDIC), is converted to UTF-8
while executing SQL/XML functions, and converted back to the application encoding
scheme to display the result.
For more information about Unicode, see Unit 5., “Unicode in DB2 for z/OS” on page 5-1.
Notes:
The SQL/XML standard specifies some rules for XML names, which do not always conform
to those for the SQL identifier standards. Therefore, this topic describes these rules and
explains the effects and impacts for your applications.
SQL/XML and therefore DB2 V8 only uses a so-called “Fully Escaped Mapping”, which
applies only to column names. Other SQL identifiers are not escaped.
These are the rules that apply for full escaping and therefore for SQL identifiers that are
column names:
• A colon (:) is always mapped to string _x003A_, independently of the position this
column has within the SQL column name.
Refer to Example 4-21 to see how this escaping looks in the result table.
(XMLTESTTAB3 is a table with two columns, ’SGN:ME’ and CREATOR.)
Example 4-21. Escaping of Column Names
SELECT XML2CLOB(
XMLELEMENT ( NAME "VOLUMES",
4-80 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
(blank) _x0020_
“ _x0022_
# _x0023_
& _x0026_
’ _x0027_
< _x003C_
> _x003E_
{ _x007A_
} _x007D_
[ _x005B_
] _x005D_
If you want to display these escaped characters properly, for example, using a Web
browser, you must add some additional application logic to your program, which converts
them from UTF-16 big endian format to UTF-8.
See Unit 5.“Unicode in DB2 for z/OS” on page 5-1 for details on the different Unicode
encodings.
Note: As mentioned above, fully escaped mapping only applies to SQL column names. If
an SQL identifier, which is supplied in the AS clause of the SELECT statement, contains
any of the characters described above, SQLCODE -20275 occurs. Refer to
Example 4-22 for more details.
SELECT XML2CLOB(
XMLELEMENT ( NAME "VOLUMES",
XMLATTRIBUTES(SGNAME AS "COL:SG",SGCREATOR)
) )
AS "VOLUMES TO SGNAME"
FROM SYSIBM.SYSVOLUMES
---------+---------+---------+---------+---------+---------+---------+---------+
DSNT408I SQLCODE = -20275, ERROR: The XML NAME COL:SG IS NOT VALID. REASON
CODE = 2
______________________________________________________________________
4-82 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
SQL Data Value to XML Data Value
Restriction for: ROWID, FOR BIT DATA, BLOB
Exceptions:
TIMESTAMP yyyy-mm-dd-hh.mm.ss.nnnnnn converted to
yyyy-mm-ddThh:mm:ss.nnnnnn
Notes:
The third kind of mapping that needs to be performed when you invoke XML publishing
functions is the mapping of SQL data values to XML data values. This mapping is based on
SQL data types.
The following data types are not supported and therefore cannot be used as arguments to
XML value constructors:
• ROWID
• Character string defined with the FOR BIT DATA attribute
• BLOB
• Distinct types sourced on ROWID, FOR BIT DATA character string or BLOB
If you try to use one of the above-mentioned data types, SQLCODE -171, SQLSTATE
42815 is issued.
There are some more rules that apply for the mapping of SQL data values to XML data
values:
< <
> >
& &
“ "
‘ '
Note: The characters shown in Table 4-3 are always escaped. That is, if “&” is in a
string, it will result in “&amp;”
• Numeric data types are mapped to UTF-8.
• The DATE data type is mapped to UTF-8.
• The TIME data type is mapped to UTF-8.
• The TIMESTAMP data type is mapped from yyyy-mm-dd-hh.mm.ss.nnnnnn to
yyyy-mm-ddThh:mm:ss:nnnnnn.
• For DISTINCT types, the mapping follows its source data type.
4-84 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Security Enhancements
The need for more granular security
Introduction to multilevel security
Multilevel security for access control
Multilevel security for object level access
Multilevel security with row granularity
Session variables
Encryption built-in functions
Notes:
Security has become much more important in the past few years. DB2 for z/OS Version 8
makes very substantial changes related to security, with new options with multilevel
security and row level security to meet the new e-business security demands. Other new
V8 features, such as encryption, improve security as well. They are discussed in
“Encryption Functions” on page 3-203.
4-86 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
Database Security and Granularity
Low level access control is increasingly critical
Web hosting
Privacy of data
Notes:
Everyone seems to be more aware of security today. Improving integration and making
security more robust and easier to manage are very important. New options for higher
security, more granularity, and more information for additional flexibility in applications and
SQL are needed.
Low level access control is increasingly critical. Good examples are Web hosting
companies that need to store data from multiple customers into a single subsystem,
database, or table. Also, security requirements and laws on privacy demand security at the
lowest (row) level.
Many customers need to extend the granularity from table level to row level, so that an
individual user is restricted to a specific set of rows.
In addition, there is an increasing need for mandatory access control. This means that
subjects (such as users and programs) cannot control or bypass security mechanisms
(such as people with install SYSADM authorization in DB2 today).
Views can limit access to selected rows and columns, but they may be cumbersome to
construct with the desired level of granularity. Views are not very effective for security when
dealing with UPDATE, INSERT, DELETE statements as well as DB2 utilities. Database
constraints, triggers, UDFs, and stored procedures are often needed for update control.
4-88 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
New Concepts
Subjects and objects
Objects: "things" you try to protect
Subjects: "things" that need to access objects
Multilevel security (MLS)
Mandatory access control (MAC)
Governed by SECLABELs
Notes:
Two central concepts of security are security policy and accountability. A security policy is a
set of laws, rules, and practices that regulate how an organization manages, protects, and
distributes its sensitive data. It is the set of rules that the system uses to decide whether a
particular subject can access a particular object. Accountability requires that each
security-relevant event must be able to be associated with a subject. Accountability
ensures that every action can be traced to the user who caused the action.
Let us first describe subjects and objects in a bit more detail.
Object
An object is a system resource to which access must be controlled. Examples of objects
are: data sets, z/OS UNIX files and directories, commands, terminals, printers, disk
volumes, tapes, DB2 objects such as plans and tables, and rows in a DB2 table.
Subject
A subject is an entity that requires access to system resources. Examples of subjects are:
human users, started procedures, batch jobs, or z/OS UNIX daemons. The term user
usually has the same meaning as the term subject, but sometimes implies a human
subject. In this publication, unless stated otherwise, the terms user and subject are used
interchangeably.
Multilevel Security
Multilevel security (MLS) is a security policy that allows the classification of data and users
based on a system of hierarchical security levels, combined with a system of
non-hierarchical security categories. A multilevel-secure security policy has two primary
goals. First, the controls must prevent unauthorized individuals from accessing information
at a higher classification than their authorization (read up). Second, the controls must
prevent individuals from declassifying information (write down).
In the following topics we describe some of the concepts of multilevel security. Multilevel
security is a complex matter, and describing the details of it is beyond the scope of this
publication. For more information, please refer to the z/OS Security Server publications. An
introduction can also be found in z/OS Planning for Multilevel Security, GA22-7509.
4-90 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty access control is implemented using access control lists. A resource profile contains an
access control list that identifies the users who can access the resource and the authority
(such as read or update) the user is allowed in referencing the resource.
The security administrator defines a profile for each object (a resource or group of
resources), and updates the access control list for the profile. This type of control is
discretionary in the sense that subjects can manipulate it, because the owner of a
resource, in addition to the security administrator, can identify who can access the resource
and with what authority. This is what we have in today’s RACF systems.
Discretionary access control is governed by access lists.
Security Labels
Security levels
RDEFINE SECDATA SECLEVEL UACC(READ)
RALTER SECDATA SECLEVEL ADDMEM(L0/10 L1/30 L2/50)
Security categories
RDEFINE SECDATA CATEGORY UACC(READ)
RALTER SECDATA CATEGORY ADDMEM(C1 C2 C3 C4 C5)
Notes:
A security label enables an installation to classify subjects and objects according to a data
classification policy, identify objects to audit based on their classification, and protect
objects such that only appropriately-classified subjects can access them:
• Objects in a multilevel-secure system have a security label that indicates the sensitivity
of the object’s data.
• Subjects in a multilevel-secure system also have a security label. This label determines
whether the subject is allowed to access a particular object.
A security label is used as the basis for mandatory access control decisions. By assigning
security labels, the security administrator can ensure that data of a certain classification is
protected from access by a user of a lesser security classification. Security labels provide
the capability to maintain multiple levels of security within a system.
4-92 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
For example:
RDEF SECLABEL L1C12 SECLEVEL(L1) ADDCATEGORY(C1 C2) U(NONE)
You do not need to define a security label for every possible combination of level and
category. There is no limit on the number of security labels that can be defined.
4-94 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
Using Security Labels
Seclabel comparisons
Dominance
Reverse dominance
Equivalence
Null
Read up
Write down
Notes:
After security labels have been created and assigned, the security administrator can
activate the RACF SECLABEL resource class to cause the system to use the security
labels for authorization checks. Then, when a user tries to access a resource, RACF
checks whether the resource has a security label. If it does, RACF compares the security
label of the user with that of the resource (mandatory access control). If the security labels
allow access, RACF then it checks the access list of the profile that protects the resource
(discretionary access control). The decision as to whether or not to allow the access is
based on both mandatory access control (MAC), and discretionary access control (DAC).
SECLABEL Verification
When verifying whether a user with a SECLABEL is allowed to access a resource with a
SECLABEL, different checks can be done.
Dominance
One security label dominates another SECLABEL when both of the following conditions are
true:
• The security level that defines the first security label is greater than or equal to the
security level that defines the second security label.
• The set of security categories that define the first security label includes the set of
security categories that defines the second security label.
You can also look at dominance in a simplistic way as one SECLABEL being “greater than”
another.
Two security labels are disjoint when each of them has at least one category that the other
does not have. Neither of the labels dominates the other. These are also called
incompatible security labels.
Reverse dominance
With reverse dominance access checking, the access rules are the reverse of the access
rules for dominance access checking. This type of checking is not used by DB2.
In loose terms, it can be looked at “less than or equal to” checking.
Equivalence
Equivalence of security labels means that either the security labels have the same name,
or they have different names but are defined with the same security level and identical
security categories. The security label SYSMULTI is considered equivalent to any security
label.
You can look at this type of checking as “equal to” checking. (One way to check is if both
dominance and reverse dominance are true.)
Read-up
Multilevel security controls prevent unauthorized individuals from accessing information at
a higher classification than their authorization. It does not allow users to “read-up” or read
above their authorization level. Read-up is enforced through dominance checking.
Write-down
Multilevel security also prevents individuals from declassifying information. This is also
known as write-down, that is, writing information back at a lower level (down-level) than its
current classification. This property is sometimes called the star property, or *-property.
Write-down is prevented by doing equivalence checking.
However, there may be cases where you want to allow write-down by selected individuals.
The security administrator controls whether write-down is allowed at the system level by
activating and deactivating the RACF MLS option (using the SETROPTS command).
4-96 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty In addition, to allow for controlled situations of write-down, z/OS allows the security
administrator to assign a “write-down by user” privilege to individual users that allows those
users to select the ability to write down. To do so, a user has to have at least read authority
on the IRR.WRITEDOWN.BYUSER profile in the FACILITY class.
The current value of your user’s SECLABEL is available as a DB2 session variable. For
more information on session variables, see Figure 3-122. “Session Variables” on page
3-210.
DB2 access
control
RACF access
control
MLS with DB2 access control and row granularity (next topic)
Notes:
You can use multilevel security for multiple purposes in conjunction with DB2, as described
in the following topics.
Multilevel security with row-level granularity with DB2 authorization
In the figure above, this is the upper right option (Security at row level / DB2 access
control). In this combination, DB2 grants are used for authorization at the DB2 object level
(database, table, and so forth). Multilevel security is implemented only at the row level
within DB2. This is discussed in Figure 4-39 "MLS with Row Granularity (1 of 2)" on page
4-101.
Multilevel security at the object level with external access control
In the figure above, this is the lower left option (Security at object level / RACF access
control). In this combination, external access control, such as the RACF access control
module, is used for authorization at the DB2 object level. The RACF access control module
has been enhanced to also use security labels to perform mandatory access checking on
4-98 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty DB2 objects as part of multilevel security. This option is discussed in more detail in
“Implementing Multilevel Security at the Object Level” on page 4-99.
For information about the access control authorization exit, see the new DB2 V8 manual
DB2 UDB for z/OS RACF Access Control Module Guide, SC18-7433.
Multilevel security with row-level granularity with external access control
In the figure above, this is the lower right option (Security at row level / RACF access
control). This option combines both options mentioned before. It uses multilevel security to
control the access to the DB2 objects, as well as multilevel security (SECLABELs) to
control access at the row level within DB2.
4-100 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
MLS with Row Granularity (1 of 2)
In RACF
Set up a security hierarchy (SECLEVEL) and categories (CATEGORY)
The SECLABEL class is active
Assign security labels to users
Notes:
With the hierarchy shown on the foil established in the security server, the system
understands that users with authority to access RAINBOW can access anything. Someone
with authority to access PASTEL information can access any row associated with BLUE,
INDIGO, VIOLET, or PASTEL. Someone with SUNSET authority can access SUNSET,
RED, ORANGE, YELLOW.
This is a lot more powerful than just having an exact match on the security label, for
example, where the user's label must exactly match the data's label, since it has the notion
of “groups” that make security administration easier to manage.
With this additional capability, we are able to implement a hierarchical type of security
scheme without requiring the application to access the data using special views or
predicates.
Notes:
Customers asked for row-level security for applications that need more granular security or
mandatory access control. For example, an organization may want a hierarchy in which
employees can see their own payroll data, a first line manager can see his or her payroll
information, and all of the employees reporting to that manager, and so on. Security
schemes often include a security hierarchy and non-hierarchical categories.
To allow DB2 to verify access at the row level, using multilevel security with row granularity,
you must have a column that acts as the security label (SECLABEL), with a column defined
AS SECURITY LABEL. Each row value has a specific SECLABEL.
You incorporate the security label column in the table definition at CREATE TABLE time, or
add the column later.
Security labels are defined and provided by RACF. When connecting to DB2, a user’s
SECLABEL is retrieved from RACF. When rows are accessed, DB2 checks each new
SECLABEL value accessed. If access is allowed, then you can access the row. If access is
not allowed, the data is not returned, and you are not even aware it exists.
4-102 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty Normally, when a user performs an INSERT, UPDATE operation, or LOADs a row, the
user’s SECLABEL is stored in the table’s column that is defined with the “AS SECURITY
LABEL” attribute. However, if write-down is not in effect, or write-down is in effect but the
user has write-down privilege, the user can specify any SECLABEL for that row.
This is the runtime checking of the user’s SECLABEL against the data’s security label, in
addition to DB2 GRANT and RACF PERMIT controls.
Notes:
When you CREATE a table or ALTER it, you can decide to implement row-level security by
including a column that specifies the AS SECURITY LABEL attribute, or add a column with
that attribute to an existing table.
The only technique to disable row level security (a table that has a column that is defined
with the AS SECURITY LABEL attribute) is to drop the table, table space, or database.
You can assign any name to the security label column, but the same column name cannot
be used more than once in the table. Only one security label is allowed per table.
The security label column must be data type single byte character (SBCS), CHAR(8), NOT
NULL WITH DEFAULT.
This column cannot have field procedures, edit procedures, or check constraints.
When the audit trace (class 3) is active, an audit record IFCID 0142 is created. A table with
a security label is treated like an audited table.
4-104 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
Accessing a Table Defined with MLS (1 of 2)
SELECT
User's seclabel is compared to the seclabel of the row
If user's seclabel dominates the data seclabel -> row returned
If user's seclabel does not dominate -> row not returned, no error
INSERT
Value of the seclabel column for inserted row is set to the value of the
user's seclabel
If user has write-down authority, the user is allowed to set the seclabel
field
Notes:
Let us now have a look at how the different operations that you normally perform against
data in a table are affected by having defined multilevel security with row granularity for the
table.
Note: In this, and the next topic, (MLS with row level granularity and utilities), we assume
that write-down is in effect and that the user does not have write-down authority, unless
otherwise mentioned.
SELECT
The security rule for select is that your current security label must dominate the security
label of all the rows read. If your security label does not dominate the label of the data row,
then that row is not returned.
The user's SECLABEL is compared to the data SECLABEL of the row to be selected. If
user SECLABEL dominates the data SECLABEL, then the row is returned. If user
SECLABEL does not dominate the data SECLABEL, then the row is not included in data
returned, but no error is reported.
The user must be identified to RACF with a valid SECLABEL. If not, an authorization error
and audit record (IFCID 140) are produced, provided the audit trace is active.
INSERT
The access rules for INSERT are similar, but DB2 puts the user's current SECLABEL in the
row when the row is inserted, if a user does not have the write-down privilege. If the user
does have the write-down privilege, or write-down checking is not in effect, then he or she
can set the value of the SECLABEL column to any value.
4-106 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
Accessing a Table Defined with MLS (2 of 2)
UPDATE
User's seclabel is compared to the seclabel of the row to be updated
If the seclabels are equivalent
Row is updated
Value of the seclabel in the updated row is set to the value of the
user seclabel
If user has write-down authority, then down-level rows can be accessed
and updated
DELETE
User's seclabel is compared to the seclabel of the row to be deleted
If the seclabels are equivalent
Row is deleted
If user has write-down authority, then down-level rows can be accessed
and deleted
Notes:
We now continue with UPDATE and DELETE access against a table defined with multilevel
security with row granularity.
UPDATE
The rules for UPDATE are similar, using the SELECT rules for access to the data, and
setting the SECLABEL like INSERT. Update requires equivalence for users who are not
allowed to write down.
The user's SECLABEL is compared with the SECLABEL of the row to be updated. The
update proceeds according to the following rules:
• If the security label of the user and the security label of the row are equivalent, the row
is updated and the value of the security label is determined by whether the user has
write-down privilege:
- If the user has write-down privilege or write-down control is not enabled, the user
can set the security label of the row to any valid security label.
- If the user does not have write-down privilege and write-down control is enabled, the
security label of the row is set to the value of the security label of the user.
• If the security label of the user dominates the security label of the row, the result of the
UPDATE statement is determined by whether the user has write-down privilege:
- If the user has write-down privilege or write-down control is not enabled, the row is
updated and the user can set the security label of the row to any valid security label.
- If the user does not have write-down privilege and write-down control is enabled, the
row is not updated.
• If the security label of the row dominates the security label of the user, the row is not
updated.
The user must be identified to RACF with a valid SECLABEL. If not, an authorization error
and an audit record are produced.
DELETE
Delete operations in a multilevel security with row granularity environment proceed
according to the following rules:
• If the security label of the user and the security label of the row are equivalent, the row
is deleted.
• If the security label of the user dominates the security label of the row, the user’s
write-down privilege determines the result of the DELETE statement:
- If the user has write-down privilege or write-down control is not enabled, the row is
deleted.
- If the user does not have write-down privilege and write-down control is enabled, the
row is not deleted.
• If the security label of the row dominates the security label of the user, it is not
considered a matching row, and the row is not deleted.
4-108 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty DB2 performs multilevel security with row-level granularity by comparing the security label
of the user to the security label of the row that is accessed. Because security labels can be
equivalent without being identical, DB2 uses the RACROUTE REQUEST=DIRAUTH
macro to make this comparison when the two security labels are not the same. For read
operations, such as SELECT, DB2 uses ACCESS=READ. For update operations, DB2
uses ACCESS=READWRITE.
Notes:
Here we describe how DB2 utilities operate in an environment with multilevel security
(MLS).
LOAD RESUME
Executing a LOAD RESUME utility against a table space containing tables with multilevel
security with row granularity, requires that the user be identified to RACF and have a valid
security label (SECLABEL). If the user does not have a valid SECLABEL, an authorization
error message and an audit trace record (IFCID 140) is produced, provided audit trace
class 1 is active. LOAD RESUME adheres to the same rules as INSERT. Without
write-down authorization, the SECLABEL in the table is set to the SECLABEL associated
with the user ID executing the LOAD RESUME utility. With write-down, any valid
SECLABEL can be specified.
4-110 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Notes:
4-112 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty • If write-down privilege is in effect, and the user has write-down, rows that are
dominated are discarded.
• If write-down privilege is in effect, and the user does not have write-down, the
row is not considered to be a match and the row is not discarded.
• If write-down privilege checking is not in effect, then rows that are dominated are
discarded.
Notes:
Let us look at some of the MLS requirements and restrictions.
4-114 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
If an ALTER TABLE statement is used to add a security label column to a table with a
trigger on it, the same rules apply to the new security label column that would apply to any
column that is added to the table with the trigger on it.
When a BEFORE trigger is activated, the value of the NEW transition variable that
corresponds to the security label column is set to the security label of the user if either of
the following criteria are met:
• Write-down control is in effect and the user does not have the write-down privilege.
• The value of the security label column is not specified.
4-116 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty A network security zone can contain a single IP address or any combination of IP
addresses and subnetworks. All of the IP addresses in a security zone must have the same
security label (though all IP addresses with the same security label do not have to be in the
same security zone). Security zones are defined in TCP/IP via the NETACCESS
statement.
SECLABELs in a TCP/IP Multilevel Security Environment
Remote users that access DB2 by using a TCP/IP network connection use the security
label that is associated with the RACF SERVAUTH class profile when the remote user is
authenticated. Security labels are assigned to the database access thread when the DB2
server authenticates the remote server by using the RACROUTE REQUEST = VERIFY
service.
To assign the security label in a multilevel security environment (and the MLACTIVE option
is active) for incoming TCP/IP requests, you must use the TCP/IP Network Access Control
capabilities of the z/OS IP Communication Server V1.5 in combination with the RACF
SERVAUTH class. To use the RACF SERVAUTH class and TCP/IP Network Access
Control, perform the following steps:
1. Set up and configure TCP/IP Network Access Control by specifying the NETACCESS
statement in your TCP/IP profile.
For example, suppose that you need to allow z/OS system access only to IP addresses
from 9.1.37.0 to 9.1.37.255. You want to define these IP addresses as a security zone,
and you want to name the security zone IBMITSO. Suppose also that you need to deny
access to all IP addresses outside of the IBMITSO security zone, and that you want to
define these IP addresses as a separate security zone (WORLD). To establish these
security zones, use the following NETACCESS clause:
NETACCESS INBOUND OUTBOUND
; IP Addr MASK SECZONE
9.1.37.0 225.225.225.0 IBMITSO
DEFAULT WORLD
ENDNETACCESS
For more information about the NETACCESS statement, see z/OS V1.5
Communications Server: IP Configuration Reference, SC31-8776.
2. Activate the SERVAUTH class by issuing the following TSO command:
SETROPTS CLASSACT(SERVAUTH)
3. Activate RACLIST processing for the SERVAUTH class by issuing the following TSO
command:
SETROPTS RACLIST(SERVAUTH)
4. In the SERVAUTH class you need to define the general resource profiles for the
different security zones. These profiles have to adhere to a strict naming convention,
namely:
EZB.NETACCESS.system-name.tcpip-name.security-zone
In this statement:
• EZB.NETACCESS is the RACF required qualifier for these profile names.
• system-name is the MVS system name.
• tcpip-name is the TCP/IP started task name (of your stack that your DB2 runs
under).
• security-zone is the name of the security zone. It should match the security
zone names specified in the TCP/IP NETACCESS statement.
Define the IBMITSO general resource profiles in RACF to protect the IBMITSO security
zone by issuing the following command:
RDEFINE SERVAUTH (EZB.NETACCESS.SC63.TCPIP.IBMITSO) ACC(READ) SECLABEL(CLASSFD)
5. For this permission to take effect, refresh the RACF database by using the following
command:
SETROPTS CLASSACT(SERVAUTH) REFRESH RACLIST(SERVAUTH)
Now, suppose that USER5 has an IP address of 9.1.37.3. TCP/IP Network Access Control
will determine that USER5 has an IP address that belongs to the IBMITSO security zone.
USER5 will be granted access to the system using the CLASSFD security label.
Alternatively, suppose that USER6 has an IP address of 9.1.25.37. TCP/IP Network
Access Control will determine that USER6 has an IP address that belongs to the WORLD
security zone. As there is no profile in the SERVAUTH class, access is denied.
The SECLABEL that is assigned via the SERVAUTH class will be used to access DB2
resources.
4-118 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
© Copyright IBM Corp. 2004 Unit 5. Unicode in DB2 for z/OS 5-1
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
Student Notebook
List of Topics
Conversion basics
Unicode fundamentals
Unicode in DB2 for z/OS
How does Unicode affect me in V8?
Moving to Unicode data
Multiple CCSID set SQL statements
Notes:
DB2 UDB for OS/390 and z/OS is increasingly being used as a part of large client-server
systems. In these environments, character representations vary on clients and servers
across many different platforms and across different geographies. One area where this sort
of environment exists is in the data centers of multinational companies. Another example is
e-commerce. In both of these examples, a geographically diverse group of users interact
with a central server, storing and retrieving data.
The traditional way of encoding characters requires hundreds of different code page
systems, because no single code page was adequate for all the letters, punctuations, and
symbols in common use. These code pages also conflict with one another, because two
code pages can use the same code points for different characters.
In order to solve these problems, the support of the Unicode encoding scheme was
introduced in DB2 V7 and its use was optional. DB2 for z/OS Version 8 provides additional
support for Unicode and globalized applications. Fundamental changes such as the
conversion of the DB2 catalog from EBCDIC to Unicode, and Unicode parsing of SQL
5-2 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty statements, affects everybody to some degree, even when you do not plan to store any
user data in Unicode.
This unit explains the changes in V8 and helps you to understand the impact they have.
© Copyright IBM Corp. 2004 Unit 5. Unicode in DB2 for z/OS 5-3
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
Student Notebook
5-4 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
© Copyright IBM Corp. 2004 Unit 5. Unicode in DB2 for z/OS 5-5
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
Student Notebook
Notes:
Before we start to explain what Unicode is all about, we first describe DB2’s code page
support in general and clarify some of the terminology.
5-6 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty You can refer to the following Web page for a list of the IBM code pages:
http://www.ibm.com/servers/eserver/iseries/software/globalization/codepages.html
Encoding Scheme
An encoding scheme is a collection of the code pages for various languages used on a
particular computing platform. For example, the EBCDIC encoding scheme is typically
used on z/OS and iSeries (AS/400). The ASCII encoding scheme is used on Intel-based
systems (like Windows), and UNIX-based systems.
ASCII stores the character ’A’ as x’41’ and the number ’1’ as x’31’. As mentioned above,
EBCDIC stores the character ’A’ as x’C1’ and ’1’ as x’F1’. This results in a different collating
sequence in EBCDIC and ASCII.
• The collating sequence in ASCII is: space, numerics, upper case characters, lower case
characters.
• The collating sequence in EBCDIC is: space, lower case characters, upper case
characters, numerics.
You can create table spaces with different encoding schemes within one database. Within
one table space, however, you can only create tables with which use the same encoding
scheme as the table space. For indexes, you cannot choose any specific encoding
scheme. Indexes always inherit the table’s encoding scheme.
CCSID Set
A CCSID set comprises a single byte, mixed byte, and double byte CCSID which you can
specify on panel DSNTIPF when you install a DB2 for z/OS subsystem.
Most languages can be encoded using a single 256 code point code page (sometimes also
called a SBCS CCSID). In that case you can set up your DB2 subsystem as a MIXED=NO
system. Then the mixed and double byte CCSIDs do not apply and they default to 65534,
which is a reserved or dummy CCSID. For example, CCSIDs 1026, 273, and 37,
mentioned above, do not have CCSIDs for mixed or double byte data. Refer to Appendix A
of the DB2 Installation Guide, GC18-7418 for a list of available CCSID sets.
The languages that do not only use single byte CCSIDs are Chinese, Japanese, and
Korean. They use double byte and mixed character sets due to the range and complexity of
their symbols. Refer to “Mixed Data Characteristics” on page 5-24 for a discussion of mixed
data.
© Copyright IBM Corp. 2004 Unit 5. Unicode in DB2 for z/OS 5-7
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
Student Notebook
-0 ä ü Ö 0 -0 { } \ 0
-1 A J 1 -1 A J 1
-2 B K S 2 -2 B K S 2
-3 { [ C L T 3 -3 ä Ä C L T 3
-4 D M U 4 -4 D M U 4
-5 E N V 5 -5 E N V 5
-6 F O W 6 -6 F O W 6
-7 G P X 7 -7 G P X 7
-8 H Q Y 8 -8 H Q Y 8
-9 I R Z 9 -9 I R Z 9
-A Ä Ü ö -A ¢ ! ¦ [
-B -B ]
-C ¦ } \ ] -C ö ü Ö Ü
-D -D
-E -E
-F ! ^ -F | ¬
Notes:
Generally speaking, computers store all data as bytes. As shown on the visual above, the
EBCDIC encoding for the character ’A’ is always stored as x’C1’. This is independent from
the code page that has been used to bring this data into your DB2 for z/OS subsystem
EBCDIC table. So for some characters, it doesn’t matter which code page you use. They
are always represented by the same hex string. However, this is not true for all characters.
The code pages shown above are not complete. We just picked a few characters to show
you some of the differences.
Refer to Table 5-1 for a comparison of some characters, where the hexadecimal
representation of the stored data is different between the German EBCDIC code page 273,
and the US EBCDIC code page 37.
5-8 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
When comparing CCSID 37 with some other code pages, you may find more differences
than just a few characters. For example, in CCSID 290 (or CCSIDs based on CCISD 290),
the lower case characters are not in the "usual" place either.
© Copyright IBM Corp. 2004 Unit 5. Unicode in DB2 for z/OS 5-9
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
Student Notebook
Figure 5-4. When does Conversion Occur for Local Applications? CG381.0
Notes:
Before we start talking about how character conversion is performed, we want to make you
aware of the situations which may or may not require conversion.
Generally speaking, we can say that conversion does not occur for local application. There
are, however, some exceptions which require your data to be converted to another CCSID
even if you are working locally. As also shown on the visual above, conversion occurs
when:
• Dealing with ASCII or Unicode tables. See also Figure 5-42, "Multiple CCSID Set per
SQL" on page 5-92 for more information about when conversion occurs.
• Specified by application. The idea is that the application requests to have (some or all)
of the data returned in a specific encoding scheme.
- CCSID override in the SQLDA:
The possibility for specifying a CCSID for a string column has been around since
DB2 V2.3. For languages other than REXX, the assignment of any valid CCSID is
5-10 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty done through field SQLNAME of SQLVAR. For REXX, the CCSID is in the
SQLCCSID field.
- DECLARE VARIABLE:
If the SQL statement uses a host variable that has been declared in a DECLARE
VARIABLE statement, then the DB2 precompiler automatically codes the equivalent
setting in the SQLDA with a CSSID. This allows statements where a USING clause
is not allowed to indicate that the data should be returned in a specific CCSID.
- Application ENCODING bind option:
This controls the application encoding scheme that is used for all the static
statements in the plan/package.
The default is the system default APPLICATION ENCODING SCHEME specified at
installation time (DSNZPARM APPENSCH). The default package application
encoding option is not inherited from the plan application encoding option. For both
plan and package, the system default is used when the ENCODING option is not
specified during BIND.
- CURRENT APPLICATION ENCODING SCHEME special register enables an
application to specify the encoding scheme that is being used by the application for
dynamic statements.
The value returned in the special register is a character representation of a CCSID.
Although you can use the values ‘ASCII’, ‘EBCDIC’, or ‘UNICODE’ to SET the
special register, the value set in the special register is the character representation
of the numeric CCSID corresponding to the value used in the SET command. The
values ‘ASCII’, ‘EBCDIC’, and ‘UNICODE’ are not stored.
DECLARE VARIABLE, the application ENCODING bind option, and the CURRENT
APPLICATION ENCODING SCHEME special register have been introduced in DB2 with
Version 7.
© Copyright IBM Corp. 2004 Unit 5. Unicode in DB2 for z/OS 5-11
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
Student Notebook
3270 CCSID 37
UPDATE EMP
SET NAME = 'MÜLLER'
DB2 for z/OS SCCSID = 37
MÜLLER
WHERE NAME =
'SMITH' =
EBCDIC Table EMP
x'D4FCD3D3C5D9'
NAME FIRSTNME TEL
SMITH
MÜLLER ED 111
3270 CCSID 273 x'D4FCD3D3C5D9'
MÜLLER MILLER TOM 222
SELECT TEL
= CROFT ANNE 333
FROM EMP x'D45AD3D3C5D9'
WHERE NAME =
'MÜLLER'
Result: 0 Rows !!
Notes:
Let us consider a case where we only work locally on a DB2 Version 7 subsystem using
3270 emulators; that is, we do not access data remotely through DB2 Connect. Even in this
case, we have to be aware of how character encoding is performed, although most of the
time, character conversion does not occur. For more details on when conversion occurs,
see Figure 5-4, "When does Conversion Occur for Local Applications?" on page 5-10.
The scenario on the visual above shows what happens when DB2 data is, for example,
manipulated through ISPF by terminal emulators that use different CCSIDs. The 3270
emulation for the PC shown in the upper part of the visual uses CCSID 37. Because Miss
Smith got married to Mr. Müller, and she wants to adopt her husband’s name, we update
the name of employee ’SMITH’ in the EBCDIC table EMP to ’MÜLLER’. The hexadecimal
representation of character ’Ü’ in CCSID 37 is ’FC’. That is, ’MÜLLER’ is stored as
x’D4FCD3D3C5D9’ in the data set containing EMP.
Now somebody, using a different CCSID for her/his 3270 emulation, in our case the
German CCSID 273, selects from table EMP and asks for all employees whose name
is ’MÜLLER’. Since this user is using a different CCSID for their terminal emulation, ’Ü’ is
5-12 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty now interpreted as the hexadecimal code point x’5A’, and the name MÜLLER looks like
x’D45AD3D3C5D9’ in hexadecimal.
DB2 compares this hexadecimal string with the data stored in EMP and does not find any
match. That is, although the user of CCSID 273 is logically searching for the exact same
person that the user of CCSID 37 just updated in the employee table, this entry is not
displayed as a result row.
The situation described is also true for QMF and batch programs that have been coded
using a different CCSID than the one that has been used to store the data.
The described situation does not cause a problem for the data that is entered and retrieved
by the same user. It is only a problem when different users using different CCSIDs have to
retrieve each other’s data.
To tackle this problem, you can force everybody to use the same CCSID on their terminal
emulation program (37 in our example), but when people are spread out over different
countries, using different languages and keyboards, that is not always an easy solution.
Another option is to use the ENCODING BIND option, also known as the application
encoding scheme. This allows you to specify the encoding scheme used by the application.
It tells DB2 how to interpret information in incoming and outgoing host variables.
If you can make sure that people using CCSID 37 in their terminal emulation program also
use an application with the ENDCODING(37) BIND option, and users with 273 use
ENCODING(273), you should be OK. Users using 37 in both the emulator and the
application undergo no conversion (as before), but users using 273 in both their emulation
and ENCODING BIND options will trigger conversion between the application encoding
scheme’s CCSID (273) and DB2’s SCCSID 37. x’D45AD3D3C5D9’ is converted to
x’D4FCD3D3C5D9’ before it is compared to data in the database, and the correct row will
be retrieved. The use of the ENCODING BIND option is discussed in more detail in the next
topic.
In summary, it is extremely important that all users of a DB2 system who use applications
where the data is not tagged with a CCSID should either use the same CCSID, or use
techniques like the application encoding scheme, to convert the data to the correct CCSID
before it is stored in the DBMS. Otherwise you end up with data corruption sooner or later.
© Copyright IBM Corp. 2004 Unit 5. Unicode in DB2 for z/OS 5-13
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
Student Notebook
3270 Emulation CCSID 37 DB2 - SCCSID 500 3270 Emulation CCSID 500
Appl. Enc. Scheme: EBCDIC Appl. Enc. Scheme: EBCDIC
TAB1
Insert into TAB1 values ('ROW1','¢');
x'D9D6E6F1' x'4A'
x'D9D6E6F1' x'4A'
Select * from TAB1
NO CONVERSION
Select * from TAB1
where c2 = 'ROW1' -> ROW1 [
where c2 = 'ROW1' -> ROW1 ¢
NO CONVERSION
Notes:
Because if is very important to specify the correct CCSID for your DB2 system, as well as
the correct application encoding scheme for your applications, we now look at the potential
problems and how to solve them using the application encoding scheme in more detail.
The problem is the same as described in the previous topic. Because local DB2
applications, especially prior to V7, were not likely to do conversions, it is possible to end
up with data from multiple CCSIDs stored in the same table, for example, because different
people use a different CCSID in their terminal emulation program, and do not force proper
conversion (via SQLDA CCSID override, by using the DECLARE VARIABLE construct, or
by using the appropriate application encoding scheme).
This problem is not new. It has existed ever since day one of DB2. However, it was much
less likely to be a problem in the early days, for a number of reasons:
• Most applications then used local 3270 terminals that were connected via a local
communication control unit, and the CCSID used by these terminals was defined inside
the control unit. Therefore, it was much more difficult to change the CCSID compared to
changing the CCSID in your 3270 PC terminal emulation program.
5-14 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty • Unlike today’s configurations, where people from all over the world access a
consolidated mainframe environment in a single location, in the early days, most DB2
applications were only accessed by local users, in the same building, or at most spread
throughout the country.
• In addition, most people were just happy that they could store their data electronically,
and were not too worried about getting all the “language specific” characters correctly
represented.
Therefore, in many early DB2 installations, people did not worry too much about specifying
the correct system EBCDIC CCSID (DSNHDECP SCCSID) parameter. This may have led
to a situation where the data was inserted into DB2 via local applications using different
CCSIDs, and without the proper override in the SQLDA (which was the only possibility to
override the CCSID in pre-V7 local applications). The net result is that you can have a mix
of data with different CCSIDs in a single DB2 table (or in multiple tables within the same
DB2 subsystem).
This situation is illustrated in the figure above. The DB2 system uses SCCSID=500, and
two 3270 emulators, one with CCSID 37 and one with CCSID 500. Both applications are
bound with the ENCODING(EBCDIC) option. Both applications are inserting the same type
of data, the character “¢” and the “ROWx” string.
As you can see, each application is able to correctly retrieve the row that it inserted, but
sees the row that was inserted by the other application in a different (wrong) way. This is
because DB2 does not do CCSID translation in this case, and stores the data as is in the
table. But when user 1 (using CCSID 37) wants to retrieve data that was inserted by user 2
(using CCSID 500), translation should occur in order to be able to retrieve the data
correctly.
Note: Not converting is only a problem for those characters that are different between
the SCCSID code page and the code page used by the application.
© Copyright IBM Corp. 2004 Unit 5. Unicode in DB2 for z/OS 5-15
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
Student Notebook
3270 Emulation CCSID 37 DB2 - SCCSID 500 3270 Emulation CCSID 500
Appl. Enc. Scheme: 37 Appl. Enc. Scheme: EBCDIC
TAB1
Select * from TAB1
x'D9D6E6F1' x'4A'
NO CONVERSION
x'D9D6E6F1' x'BA' Select * from TAB1
where c2 = 'ROW1' -> ROW1 [
Select * from TAB1
x'D9D6E6F1' x'4A'
where c2 = 'ROW2' -> ROW2 ¢
x'D9D6E6F2' x'4A' Insert into TAB1 values ('ROW2','¢');
x'D9D6E6F2' x'B0' x'D9D6E6F2' x'B0'
= Problem
Notes:
To “force” translation, you can bind the application that is used by the user with CCSID 37
with the ENCODING(37) BIND option. This will trigger translation between 37 and 500, and
vice versa. We did not choose to bind the application using CCSID 500 with the
ENCODING BIND option, because the DB2 system SCCSID=500, and therefore there is
no need to trigger conversion for that application.
The application that uses CCSID 37 (and the ENCODING(37) BIND option) can now
correctly access all the data that was inserted by the other application, as well as all new
data that it inserted after the application started to use the ENCODING(37) BIND option.
The data inserted earlier cannot be correctly displayed, because with the ENCODING(37)
BIND option, character conversion occurs between CCSID 500 (the CCSID of the table’s
data) and 37 (the CCSID specified on the ENCODING BIND option). However, the data
that was previously inserted into TAB1 via the CCSID 37 application was in CCSID 37 and
stored as-is (because previously no conversion was done).
The application using CCSID 500 can correctly access all the data it inserted, as well as
the newly inserted data from the other application (because that data is now converted at
5-16 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty insert time from 37 to 500) and can be correctly retrieved by the CCSID 500 application
without conversion. The only “problem” here is that the old data which was inserted by the
application using CCSID 37 before the ENCODING BIND option was implemented
(ROW1). In order to make this solution work 100%, you must track down which rows were
inserted incorrectly and manually remove them, or convert them from 37 to 500.
Note that this is only a problem if you have inserted data incorrectly in the past. If you start
a new application, and use the correct ENCODING BIND option where required, there is of
course no need to clean up any old data, and all data will be accessible by all applications
from day one.
© Copyright IBM Corp. 2004 Unit 5. Unicode in DB2 for z/OS 5-17
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
Student Notebook
Remote
Window
s
CCSID
1252 DRDA
1
SCCSID = 500
UPDATE EMP
SET NAME = 'MÜLLER'
EMP
WHERE NAME = 'SMITH'
PNR NAME TEL
'MÜLLER' = x'4DDC4C4C4552'
SELECT NAME FROM EMP 2 SMITH
1 MÜLLER 11
WHERE PNR = '1' Converted to x'D4FCD3D3C5D9'
'MÜLLER'
Result
(x'4DDC4C4C4552') 2 JONES 22
3 RIGHT 33
Local 3
3270 Emulation SELECT TEL FROM EMP
CCSID 273
WHERE NAME = 'MÜLLER' (x'D45AD3D3C5D9')
Notes:
If you are accessing your data remotely, character conversion is more likely to occur, since
we are often dealing with different computer platforms using different encoding schemes.
Consider these two cases, for example:
• The values of host variables sent from the requester to the current server, such as
SELECT predicates or INSERT column values
• The values of result columns sent from the current server back to the requester, such as
SELECTed columns
In either case, the string data can have a different representation at the sending and
receiving system. The sample on the visual above assumes that the SCCSID for the DB2
for z/OS system is set to 500, which is the so-called International EBCDIC code page.
If you access your data through DRDA, character strings are always converted when
different CCSIDs are involved. As shown above, (1) we update the name of
employee ’SMITH’ and set her name to ’MÜLLER’. We assume that the workstation, which
is used to run this update, uses the Windows CCSID 1252. That is, the code points of
5-18 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty character string ’MÜLLER’ using CCSID 1252 are converted to the SCCSID EBCDIC 500
and stored in EBCDIC table EMP.
This conversion occurs the other way around when you select from table EMP (2) using a
remote request from your workstation. Because conversion occurs for both directions, you
can properly work with the data stored in table EMP.
DRDA automatically triggers character conversion when different CCSIDs are involved.
However, this is not the case for local applications. Let us assume that the SCCSID
parameter for your DB2 subsystem is set to 500. On the other hand, your 3270 emulation
program that is used to access your data in table EMP locally, is set to 273 (that is the
German EBCDIC CCSID). This is shown in the bottom part of the figure. As discussed
earlier in this unit, data is usually not converted when the data is accessed locally (unless
one of the conditions described earlier applies).
Therefore, as shown in (3), if you run a SELECT statement asking for all rows of table EMP,
whose values in column NAME equal to the string MÜLLER (spelled in CCSID 273), you do
not receive any result row.
The reason is that the string 'MÜLLER' is represented by x'D45AD3D3C5D9' in EBCDIC
CCSID 273. This hexadecimal string is used to compare the data that is stored in table
EMP. Since CCSID 500 was used to store the data (when the DRDA user updated it),
’MÜLLER’ has been stored as x'D4FCD3D3C5D9'. These two strings are not the same and
therefore, the SELECT statement does not return any result row.
Important: Again, this example clearly demonstrates the importance of specifying a
CCSID in DB2 that matches the CCSID used by your terminal emulators and
applications, or use the application encoding scheme bind option (or special register).
Failure to do so can lead to losing data and data corruption.
To deal with this problem, you can use the application encoding scheme (ENCODING)
BIND option. In this case, you can bind your local application with ENCDODING(273). This
will trigger conversion between 273 and 500 (the DB2 SCCSID), and vice versa.
© Copyright IBM Corp. 2004 Unit 5. Unicode in DB2 for z/OS 5-19
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
Student Notebook
RESPONSE=SC63
CUN3000I 02.37.19 UNI DISPLAY 752
ENVIRONMENT: CREATED 04/17/2004 AT 10.08.19
MODIFIED 04/17/2004 AT 10.08.20
IMAGE CREATED 12/08/2003 AT 17.02.33
SERVICE: CHARACTER CASE NORMALIZATION COLLATION
STORAGE: ACTIVE 424 PAGES
LIMIT 51200 PAGES
CASECONV: NORMAL
NORMALIZE: DISABLED
COLLATE: DISABLED
CONVERSION: 00850-01047-ER 01047-00850-ER
00037-01200(13488)-ER 01200(13488)-00037-ER
00037-01208-ER 01208-00037-ER
00437-01208-ER 01208-00437-ER
00037-00367-ER 01252-00037-ER
00037-01252-ER 00367-00037-ER
00500-01200(13488)-ER 01200(13488)-00500-ER
01047-01200(13488)-ER 01200(13488)-01047-ER
01047-01208-ER 01208-01047-ER
01208-01200-ER 01200-01208-ER
01383-01200-ER 01200-01383-ER
00932-01200-ER 01200-00932-ER
.....
Notes:
There are two methods for character conversion support in DB2 for z/OS V8. The methods
are used in the following order:
• DB2 catalog table SYSIBM.SYSSTRINGS (this has been available since DB2 V2.3):
- If DB2 does not provide a conversion table for a certain combination of source and
target CCSIDs, you will get an error message, unless a conversion can be found by
the other conversion method.
- If the conversion is incorrect, you might get an error message or unexpected output.
To correct the problem, you need to understand the rules for assigning source and
target CCSIDs in SQL operations.
• z/OS conversion services:
z/OS conversion services are used since V7. They are certainly used for conversion to
and from Unicode, but also support other conversions. You must customize z/OS
Unicode support in order to have Unicode data in DB2.
5-20 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
Note: APAR OA04069 provides automatic configuration of a default conversion image.
When no UNI=xx is specified in the system parameters at IPL-time, Unicode Services will
automatically load a pre-built image that contains all the EBCDIC and ASCII conversion
tables. Detailed information of the conversion tables can be found in the APAR text. The
pre-built image must be page fixed in storage and will require 39 Megabytes (9862
pages) of real storage. (A quick way to validate the successful loading and page fixing of
the pre-built image, is by issuing the console command DISPLAY UNI,STORAGE. The
result should show 9862 pages of storage utilized by the active image.)
Although the prebuilt image contains all possible conversions, 39 MB of page fixed real
storage is a lot. Therefore, make sure that you have enough real storage available to
accommodate the conversion image.
In order to save storage, it is recommended that you build you own conversion image,
based on the conversions that are required in your installation. So far, from feedback
from North American and European customers, they seldom have conversion images
bigger than 2 MB, so they are much smaller than the default 39 MB.
DB2 Version 7 sometimes uses conversion services provide by the Language Environment
product. LE conversions are no longer used in DB2 Version 8.
To ensure that conversions between EBCDIC and UTF-8 are as fast as possible, in some
cases DB2 V8 performs so-called "inline conversion" instead of calling the z/OS
Conversion Services. As a general rule, inline conversion can be done when a string
consists of single-byte characters in UTF-8. This conversion enhancement is not available
in V7, nor is it available for conversions between EBCDIC and UTF-16 and vice versa.
In addition, z/OS V1.4 has improved the performance of EBCDIC to UTF-8 (and vice versa)
conversions by streamlining the conversion process, and V1.4 dramatically outperforms
z/OS 1.3 conversions.
On top of that, zSeries machines have hardware instructions that assist CCSID conversion.
These instructions were first implemented on the z900, and have been enhanced on the
z990 machines.
Tip: The way to get conversions done as quickly as possible is to use DB2 V8, and use
z/OS V1.4 or above, on a z990 machine.
© Copyright IBM Corp. 2004 Unit 5. Unicode in DB2 for z/OS 5-21
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
Student Notebook
Notes:
The following methods are used for conversion:
• Round-trip conversion:
The integrity of all character data is maintained from the source coded character set
identifier (CCSID) to the target CCSID and back to the source.
When performing a round-trip conversion, you may see incorrect representation of the
characters displayed in the target CCSID. The integrity is preserved, however. When
the characters are converted back to the source CCSID, they regain their original
hexadecimal values and representation.
• Enforced subset conversion (substitution):
Characters that exist in both the source and target CCSID have their integrity
maintained. Characters in the source CCSID but not in the target CCSID are replaced.
Replaced values are also referred to as substitution characters. All source characters
that do not have a corresponding character in the target CCSID map to the same
5-22 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty substitution character. For EBCDIC encoding, these appear on most display stations as
a solid block. For ASCII encoding, these substitution characters appear differently.
This substitution is permanent when converting back to the source CCSID because it is
not possible to retrieve the original hexadecimal values.
DB2 uses a combination of RT and ES conversions. The trend, however goes toward ES
conversions. When your SQL statement requires a conversion from/to Unicode, you can
use:
• ASCII/EBCDIC to Unicode: Performed as round-trip conversions
• Unicode to ASCII/EBCDIC: Generally performed as enforced subset conversion
© Copyright IBM Corp. 2004 Unit 5. Unicode in DB2 for z/OS 5-23
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
Student Notebook
MIXED DATA
Notes:
5-24 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty Mixed data is capable of representing SBCS and MBCS (Multi-Byte Character Set). SBCS
data can be compared to mixed without conversion to mixed, because it is a subset of the
mixed repertoire. This is true for ASCII, EBCDIC, and Unicode.
• If MIXED=YES is specified for EBCDIC data, code points x’0E’ and x’0F’ have a special
meaning. They are known as shift-in and shift-out controls for character strings that
include double-byte characters.
• ASCII uses first byte code points. If the first byte is within a certain range, say x’81’ -
x’9F’, then it is the first byte of a DBCS character. For example, x’8155’ is a DBCS
character.
© Copyright IBM Corp. 2004 Unit 5. Unicode in DB2 for z/OS 5-25
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
Student Notebook
Unicode
Cyrillic
Basic Latin
Mongolian
CJK
Dingbats
Notes:
The description of the basics of character conversion and CCSID usage in the previous
topics shows that dealing with EBCDIC and ASCII code pages in a multinational
environment might cause some difficulties. These difficulties exist because two encodings
can use the same hexadecimal number for two different characters, or use different
numbers for the same character. Servers, especially nowadays, need to be able to support
many different encoding schemes. This means that whenever data is passed between
different encodings or platforms, that data always runs the risk of data loss if not converted
properly.
Unicode can make a big difference here!
Unicode provides a unique code point for every character, no matter what the platform, no
matter what the program, no matter what the language is. The Unicode standard has been
adopted by such industry leaders as Apple, HP, IBM, Microsoft, Oracle, SAP, Sun, and
many others. Unicode is required by modern standards such as XML, Java, LDAP, CORBA
3.0, etc., and is the official way to implement ISO/IEC 10646. It is supported in many
operating systems, all modern browsers, and many other products. The Unicode standard,
5-26 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty and the availability of tools supporting it, are among the most significant recent global
software technology trends.
Incorporating Unicode into client-server or multi-tiered applications and Web sites offers
significant cost savings over the use of legacy character sets. Unicode enables a single
software product or a single Web site to be targeted across multiple platforms, languages,
and countries without re-engineering. It allows data to be transported through many
different systems without corruption.
Since Unicode is still pretty new for DB2 for z/OS users, and quite different from what most
DB2 people are used to, we discuss and describe the different Unicode standards in a bit
more detail in the next few topics. For more information about Unicode, see also:
http://www.unicode.org
© Copyright IBM Corp. 2004 Unit 5. Unicode in DB2 for z/OS 5-27
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
Student Notebook
Unicode Myths
TRUE
Myth 3: Unicode does not affect me
Notes:
Since the enhanced Unicode support is one of the main changes for DB2 for z/OS Version
8, a lot has been written and talked about. However, there seem to be some myths
surrounding Unicode that are hard to dispel.
Myth 1: You Have to Convert Your Application’s Stored Data to Unicode
Version 8 enhances the infrastructure to support storing and retrieving data in Unicode
format. This includes the conversion of the DB2 catalog to Unicode. However, the intention
is not for all user data to be stored as Unicode, but to provide a choice. There is no need for
you to convert all your data from EBCDIC to Unicode. With the support for multiple CCSIDs
in a single SQL statement, there is even less need to convert all your data to Unicode than
there is in Version 7. (However, you should be aware that there may be performance
implications when using multiple CCSID set SQL statements.)
Myth 2: Unicode Always Doubles Your Storage Requirement
There is a common misunderstanding, that the conversion of your data to Unicode always
doubles its size. The DB2 Family uses Unicode Transformation Format UTF-8 and UTF-16.
5-28 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty A detailed description of this format follows within the next few pages. It is important to
know that in UTF-8, the first 127 code points are the same as that of 7-bit ASCII, with one
byte being used for characters such as A-Z, a-z, and 0-9. Other characters are stored as
one to four bytes, with accented characters often taking two bytes, and Far Eastern
characters taking three to four bytes. Therefore, the actual increase for your system
depends on the nature of your data.
Myth 3: Unicode Does Not Affect Me
Everyone is impacted by the changes in DB2 Version 8 regardless of whether user data is
to be stored as Unicode. Most of the remainder of this unit describes in detail why and
where you are affected.
© Copyright IBM Corp. 2004 Unit 5. Unicode in DB2 for z/OS 5-29
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
Student Notebook
Notes:
The enhanced Unicode support is one of the biggest changes within DB2 for z/OS V8. With
V8, Unicode is around everywhere. Refer to the following list for a quick impression about
which areas are impacted by the enhanced Unicode support:
• The DB2 catalog is stored in Unicode
• The parsing of your SQL statements is performed in Unicode
• The DB2 precompiler generates DBRMs, which are stored in Unicode
• If you use SQL statements containing multiple CCSID sets, the comparison of character
values is very often done in Unicode
• The results of SQL statements are often returned to you in Unicode
• The Unicode collating sequence is different from the one you are used to for EBCDIC
data.
• Unicode data may take more space. This is important for the storage of your data within
DB2 and also for your application programs.
5-30 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty More detail regarding the topics listed above can be found in Figure 5-26, "How Does
Unicode Affect Me?" on page 5-58.
© Copyright IBM Corp. 2004 Unit 5. Unicode in DB2 for z/OS 5-31
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
Student Notebook
5-32 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
© Copyright IBM Corp. 2004 Unit 5. Unicode in DB2 for z/OS 5-33
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
Student Notebook
Unicode Fundamentals
Current Unicode Standard, 4.0
Contains character codes for 96248 characters
Unique number assigned to every character
Assignment between this number and the character the same for all
encodings but are stored in different formats (UTF-8, UTF-16, ..)
Three forms of Unicode encoding:
UTF-8
Unicode Transformation Format in 8 bits
UTF-16
Unicode Transformation Format in 16 bits
UTF-32
Unicode Transformation Format in 32 bits
Replaces UCS-4
Notes:
Historically, two projects started independent attempts to create a single unified character
set. Those two projects were:
• ISO 10646 of the International Organization for Standardization (ISO)
• Unicode project
In the early nineties, both projects fortunately realized that two different unified character
sets did not make sense. They joined their efforts and have agreed to keep the code tables
of the Unicode and ISO 10646 standards compatible. Therefore, Unicode 3.0 corresponds
to ISO 10646-1:2000, Unicode 3.2 to ISO 10646-2:2001, and the latest Unicode standard
4.0 to ISO ISO 10646-2:2003.
As stated previously, Unicode is a standard for coding every available character
world-wide. The latest version of this standard, Version 4.0, contains 96248 graphic and
format characters. The number of characters that are defined by the Unicode standard
continues to grow and will probably never be complete. This is especially true if you
consider that apart from living languages, some very old and dead languages are also part
of the Unicode standard.
5-34 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty All Unicode versions since 2.0 are compatible. No existing characters will be removed, and
only new characters will be added.
A special number is assigned to every single character. You can refer to
http://www.unicode.org to find out more about Unicode. You can find the code charts,
which show you which number is assigned to which character within the Unicode standard
on Web site http://www.unicode.org/charts/. If you prefer to have a list of all available
characters on your workstation so that you do not have to be online every time you want to
find a specific character in the unicode code page, you can download a ’Unibook’ from the
Web site http://www.unicode.org/unibook/.
The international standard ISO 10646 defines the Universal Character Set (UCS). UCS
and Unicode are just code tables that assign integer numbers to characters. Several
alternatives exist for how a sequence of such characters or their respective integer values
can be represented as a sequence of bytes.
Currently, the following three forms of Unicode encoding are probably the most used:
• UTF-8:
Unicode Transformation Format in 8 bits.
• UTF-16:
Unicode Transformation Format in 16 bits.
• UTF-32:
Unicode Transformation Format in 32 bits. This encoding replaces UCS-4.
Refer to the following pages to find out more about how these encodings differ.
There are some additional Unicode encoding forms that we will only mention here for
completeness. They are not used by DB2 to store data.
• UCS-2:
Universal Character Set coded in 2 octets. UCS-2 was published as a part of the
original UNICODE standard. UCS-2 is a fixed-width 16-bit encoding standard, with a
range of 2*16 code points. The UNICODE standard originally hoped this many code
points would be more than enough and hoped to stay with this range. UCS-2 is a subset
of UTF-16.
• UCS-4:
Universal Character Set coded in 4 octets. UCS-4 was also published as a part of the
original UNICODE standard. UCS-4 is a fixed-width 32-bit encoding standard, with a
range of 2*31 code points. The 2*31 code points are grouped into 2*15 planes, each
consisting of 2*16 code points. The planes are numbered from 0. Plane 0, the Basic
Multilingual Plane (BMP), corresponds to UCS-2 above.
Note: The assignment between the Unicode number and a character is unique for all
forms of Unicode encodings such as UTF-8, UTF-16, and so on.
© Copyright IBM Corp. 2004 Unit 5. Unicode in DB2 for z/OS 5-35
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
Student Notebook
UTF-8 Encoding (1 of 2)
Unicode UTF-8
U+00000000 - U+0000007F 0xxxxxxxx
U+00000080 - U+000007FF 110xxxxx 10xxxxxx
U+00000800 - U+0000FFFF 1110xxxx 10xxxxxx 10xxxxxx
U+00010000 - U+001FFFFF 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
Examples:
Notes:
UTF-8 is one of the available ways to encode Unicode characters. As you can see from the
upper table in the figure above, all Unicode characters from U+00000000 to U+0000007F
can be represented in UTF-8 using one byte. In this byte, the high order bit is always set to
0. For this range, the UTF-8 encoding is identical to the one of ASCII.
For all Unicode characters from U+00000080 to U+000007FF, the UTF-8 representation
needs two bytes. For characters from U+00000800 to U+0000FFFF, three bytes are used,
and for U+00010000 to U+001FFFFF UTF-8 takes up 4 bytes.
To become a little bit more familiar with the way the encoding is done, let us look at some
encoding examples:
Example 1:
The Unicode code point for character ’A’ is U+00000041. Since this number is within the
first range in the upper table above, you can easily tell that the UTF-8 encoding for ‘A’ is
also x’41’.
5-36 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty You can also perform the following steps to come to the same conclusion:
1. Determine the bit representation of x’41’. The resulting bit pattern is 100 0001 as shown
in the lower table on the visual above. Since the high order bit of a UTF-8 encoded
character that takes up one byte must always be 0, only bit patterns with a maximum of
111 1111 can be represented. As you probably can tell, binary 111 1111 is equal to x’7F’.
2. Join the 7 digit bit stream 100 0001 with the leading 0. The resulting 8 bit binary value is
0100 0001.
3. To easily derive the hexadecimal value, you must now split the 8 bits (1 byte) into to half
bytes. The first four bits, 0100, are equal to hexadecimal 4, and the second four bits are
equal to hexadecimal 1.
4. As a result, you can now tell that the UTF-8 hexadecimal representation for character
’A’ is x’41’.
Example 2:
Now let us look at the second row on the example shown on the visual above.
1. As mentioned before, you can find the code point for this character on Web site
http://www.unicode.org/charts/. The code point for the paragraph sign (’§’) is
U+000000A7 and larger than U+0000007F. Therefore the representation of this
character in UTF-8 needs two bytes. The binary representation of x’A7’ is 1010 0111,
which are eight bits.
2. Refer to the upper table on the visual above. You can see that if two bytes are used, the
first three bits of the first byte is predefined to 110. Therefore it is always in the range of
x’C0’ to x’FD’ and it indicates how many bytes follow for this character (two bits means
a two-byte character). For the second byte, the first two bits are predefined to 10.
Therefore it is always in a range of x’80’ and x’BF’.
3. Now take the bit sequence 1010 0111 and append this to the “available” bits from the
back. For our example, the second byte ends up in binary 10100111, that is, the first two
bits are the predefined ones and the remaining six bits are taken from the bit sequence
1010 0111 (x’A7’).
The first byte ends up in 11000010, that is, the first three bits are the predefined ones,
and the remaining two bits are placed on bit positions 7 and 8.
4. Now we must take byte one and two, and cut them into half bytes; that is, 1100 0010
1010 0111 has the following four half bytes:
- 1100, which is 12 decimal, that is, x’C’
- 0010, which is 2 decimal, that is, x’2’
- 1010, which is 10 decimal, that is, x’A’
- 0111, which is 7 decimal, that is, x’7’
The combination of these half bytes results in x’C2A7’, which is the UTF-8 encoding for the
paragraph sign.
© Copyright IBM Corp. 2004 Unit 5. Unicode in DB2 for z/OS 5-37
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
Student Notebook
UTF-8 Encoding (2 of 2)
Examples:
Character Unicode Bit Pattern (binary) UTF-8(binary)
Half bytes in hexadecimal
U+1156 1000101010110 11100001 10000101 10010110
E 1 8 5 9 6
=> x'E18596'
Encoding Ranges:
Unicode UTF-8
U+00000000 - U+0000007F x'00' - x'7F'
U+00000080 - U+000007FF x'C080' - x'DFBF'
U+00000800 - U+0000FFFF x'E08080' - x'EFBFBF'
U+00010000 - U+001FFFFF x'F0908080' - 'F7BFBFBF'
Notes:
The upper part of the visual above shows a third example of how to encode a character in
UTF-8.
The Unicode code point of the character used above is U+00001156. The x’1156’ is equal
to the binary number 1 0001 0101 0110. As shown on the previous visual, this number
must now be adjusted to the UTF-8 bit pattern. The result is a 3-byte string. The final
hexadecimal number, which is used to store the character above in the UTF-8 column, is
x’E18596’.
The table in the lower part of the visual shows a different way to look at the encoding in
UTF-8. As you can see, Unicode code points U+00000000 - U+0000007F are always
encoded using one byte. The hexadecimal values for this byte are in the range between
x’00’ and x’7F’.
The hexadecimal values for those characters whose Unicode code points are between
U+00000080 and U+000007FF always lay in the range between x’C080’ and x’DFBF’, and
so on. This table helps you to check if hexadecimal values you find for your character data
are in a valid range, and it also helps you if you try to decode hexadecimal values for UTF-8
5-38 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty columns back to the Unicode code points. Refer to the next page to learn more about how
to decode UTF-8 data.
© Copyright IBM Corp. 2004 Unit 5. Unicode in DB2 for z/OS 5-39
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
Student Notebook
Decoding UTF-8
The following data is stored as UTF-8 in a table:
x'50C39CC39F'
Notes:
On the previous two pages, we explained how the encoding from a Unicode code point to
its UTF-8 representation format is done. Encoding and decoding is usually done by DB2.
What is important for you to know is how to work with Unicode hexadecimal constants in
the future.
To analyze your data on a hexadecimal basis, the first thing you must be aware of is which
type of Unicode encoding was used to store the data (UTF-8 or UTF16 are the only two
formats used by DB2). This is important, because you must first find out if the characters
you are currently looking at have been encoded using one, two, three, or even four bytes.
Once you know the encoding format, you can recalculate the Unicode code points from the
hexadecimal values.
Refer to the example shown on the visual above. Let us assume that your DSN1PRNT
output shows string x’50C39C339F’ and you know that the table storing this data uses
Unicode, UTF-8.
Start looking at this string byte-wise.
5-40 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty The first byte, x’50’, lies within the range of x’00’ and x’7F’. This means that we have a
one-byte representation of a character, where the hexadecimal value is the same as the
Unicode code point. In our example, you can find the first character on position
U+00000050, which is reserved for a ’P’.
The next byte, x’C3’ is greater than x’7F’. Therefore, it cannot be a one-byte representation
of a character. If we include the next byte, there is a two-byte string of x’C39C’, which is
within the range between x’C280’ and x’DFBF’. This is a valid Unicode character.
X’C39C’ equals binary 1100 0011 1001 1100. If we remove the bits that are a fixed part of
the two-byte UTF-8 string, 000 1101 1100 remains. 1101 1100 binary is equal to x’DC’. If
you look at the Unicode code page (for example at http://www.unicode.org/charts/),
you can see that U+000000DC represents the character ’Ü’.
The next remaining two bytes also start with x’C3’, indicating another two-byte character.
The x’C39F’ is equal 1100 0011 1001 1111. Again we must remove the fixed part of the bit
pattern and end up with 000 1101 1111, which is equal to x’DF’. Code point U+000000DF in
Unicode represents the character ’ß’.
© Copyright IBM Corp. 2004 Unit 5. Unicode in DB2 for z/OS 5-41
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
Student Notebook
UTF-16 Encoding
Notes:
As described previously, UTF-8 encoding uses one to four bytes to represent a Unicode
character, that is, a Unicode code point. The first 127 characters, whose representation in
UTF-8 only use one byte, are the main characters used for a lot of languages. Only a few
characters which are part of the EBCDIC CCSIDs need two or even more bytes for the
encoding.
For languages such as Japanese, Korean, and Chinese, the character encoding in
EBCDIC was DBCS, that is, all characters need two bytes for the representation of every
single character. With UTF-8, most of those characters are represented using three bytes.
In UTF-16 encoding, characters are represented using either two or four bytes. The first
65536, that is, x’10000’, can be represented with two bytes.
This means that characters in languages such as Japanese, Korean, and Chinese can be
represented in two bytes (as their code points are before U+10000). This is a big
advantage for these languages, because otherwise, if they use UTF-8, their storage needs
would increase by 50 percent.
5-42 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty (There is an extension mechanism in UTF-16 (called surrogates - or planes) which allows
for a million or so characters to be coded as two successive code points. (2 times 2 bytes)).
The visual above gives you a formal description of how Unicode code points are
represented in UTF-16. For all characters, whose Unicode code points are less then
U+00010000, the encoding is just the two-byte hexadecimal value.
For characters whose Unicode code points are greater than or equal to U+00010000 (and
up to U+00010FFFF), the encoding is a bit more complicated, as described on the visual.
(Characters beyond U+00010FFFF cannot be represented in UTF-16.) You can calculate
the hexadecimal representation in UTF-16 as follows:
1. Calculate a value U’ where U’ = U - U+00010000.
2. Take the hexadecimal value of U’ and determine the binary value for it. Consider U’ to
be something like yyyyyyyyyyxxxxxxxxxx.
3. Assume two binary strings:
- W1= 110110yyyyyyyyyy
- W2 = 110111xxxxxxxxxx
4. Fill up W1 and W2 with the y and x values taken from the binary string of U’, that is,
assign the 10 high-order bits of the 20 bit U’ to the 10 low-order bits of W1, and the 10
low-order bits of U’ to the 10 low-order bits of W2.
5. W1 contains 16 bits, that is, two bytes or four half bytes. The same is true for W2.
Convert W1 and W2 to hexadecimal.
6. Combine the eight hexadecimal values to one value, which is now the UTF-16 encoding
of the corresponding character.
This description is very formal and may not be so easy to understand. Refer to the next
visual for an example of how to encode a Unicode code point in UTF-16.
© Copyright IBM Corp. 2004 Unit 5. Unicode in DB2 for z/OS 5-43
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
Student Notebook
Notes:
As shown in the visual above, characters such as ’A’ or any other character, which is
represented by a Unicode code point less than U+0010000 the Unicode number, equals
the hexadecimal value in UTF-16, which is used to store the data.
For the character that is represented by U+000200D0, the encoding works as follows:
• Calculate x’200D0’ - x’10000’. The result is x’100D0’.
Tip: The easiest way to do all these hexadecimal and binary calculations is to use a
scientific calculator that comes with your Windows operating system, for example. You
can simply select between the different bases which you want to use for your
calculations, and you can also use Edit -> Copy to copy long binary strings.
• The binary value for x’100D0’ is 1 0000 0000 1101 0000 (17 bits). Add enough leading
zeros so that the string length is 20 bits, that is, the string we start looking at is
0001 0000 0000 1101 0000.
• Now you must calculate the first two bytes of the hexadecimal value, which is used to
encode the character shown above. To do that, first take the bit pattern W1 as shown on
5-44 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty the previous visual and replace all with the first 10 digits of the 20-bit string that we have
just calculated. You end up with a 16-bit string, which can be divided into four half bytes:
- 1102 = x’D’
- 1000 = x’8’
- 0100 = x’4’
- 0000 0 x’0’
These hexadecimal values can be combined to x’D840’ and represent the first two
bytes of the four-byte encoding.
You can now do the same with W2. You end up with x’DCD0’, which are the last two
bytes of the four-byte representation.
• As a last step, you must combine the two two-byte strings, that you have just calculated.
The value which is stored in your DB2 table is x’D840DCD0’.
© Copyright IBM Corp. 2004 Unit 5. Unicode in DB2 for z/OS 5-45
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
Student Notebook
UTF-16 Decoding
Check the value of the first two bytes. Let this value be W1.
Notes:
As for UTF-8, you do not simply have a need to know how UTF-16 is encoded. Probably
more often, you must decode whatever you find in your DB2 data sets.
If you know that your data is encoded in UTF-16, you must start looking at your data
somewhat differently than for UTF-8, because UTF-16 always uses either two or four
bytes. Instead of just checking the value of the first byte, you must check whether the first
two bytes are less than x’D800’ or greater than x’DFFF’. If this is the case, the decoding is
very simple, because the hexadecimal value of these two bytes equal the Unicode code
point.
If the first two bytes are between x’D800’ and x’DFFF’, you must check whether there is
another pair of bytes available. If this is not the case, the encoding of the first two bytes is
not a valid UTF-16 encoded character.
Otherwise (and this should be the case), you must continue to check the next two bytes
(W2). Their value should be between x’DC00’ and x’DFFF’. If the second group of two
bytes have any other value, the four bytes you are currently looking at are not a valid
UTF-16 representation.
5-46 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty If the first group of two bytes, W1, is between x’D800’ and x’DFFF’, and the second group
of two bytes, W2, is in the range between x’DC00’ and x’DFFF’, you must construct a 20-bit
long bit stream. Take the 10 low-order bits of W1 as the 10 high-order bits of the Unicode
code point, and take the 10 low-order bits of W2 as the 10 low-order bits of the Unicode
code point. Convert this binary stream to hexadecimal and add x’10000’ to this value. The
hexadecimal value you receive this way is the Unicode code point. You can now use any
listing of Unicode characters to search for the corresponding character.
Let us check out the next visual for a sample of the decoding procedure described above.
© Copyright IBM Corp. 2004 Unit 5. Unicode in DB2 for z/OS 5-47
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
Student Notebook
1. x'D800' is not less than x'D800' and is not greater than x'DFFF'
2. x'D800' is between x'D800' and x'DFFF'
3. There are at least two more bytes, which are x'DF38'
4. x'DF38' is between x'DC00' and x'DFFF'
5. x'D800' = 1101100000000000 (binary)
6. x'DF38' = 1101111100111000 (binary)
7. 00000000001100111000 (binary) = x'338'
8. Calculate x'338' + x'10000' = x'10338'
9. U+10338 represents GOTHIC LETTER THIUTH, which is:
Notes:
After looking at the formal way to decode UTF-16 encoding, we provide an example to
illustrate the process.
Assume that you look at some data, for example, from a DSN1PRNT output, and find the
following hexadecimal string: x’D800DF381820D84DCD0’. You know that the table space
you are looking at only contains UTF-16 data, and you want to know, first of all, whether the
data that is stored is valid UTF-16 encoding, and second, which characters are stored here.
To find the answers to your questions, you must perform the steps described on the visual.
• Since UTF-16 encoding always uses two or four bytes to represent a character, you
must start looking at the first two bytes. The first two bytes are x’D800’. If they are less
than x’D800’ or greater than x’DFFF’, their hexadecimal value equals the Unicode
number of the represented character. This is not the case in our example.
• Next, check whether x’D800’ is between x’D800’ and x’DFFF’. This is true for our
example, because “between” includes the upper and lower ranges. The fact that the
value is in the specified range indicates that this might be a UTF-16 character, which
uses four bytes for the character representation.
5-48 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty • If, in UTF-16 a character is represented using four bytes, the second two bytes must
always be in a range between x’DC00’ and x’DFFF’. If this is not the case, the four
bytes you are currently looking at are invalid and do not represent a UTF-16 encoded
character.
• In our example, the second two bytes are x’DF38’, which is definitely within the
specified range.
• Now we need three more steps to identify the Unicode code point. Convert x’D800’,
which are the first two bytes, to a binary representation.
• Do the same for bytes three and four.
• Take the 10 low-order bits of byte one and two as the 10 high order bits of a new binary
string, and continue with the 10 low-order bits of byte three and four as shown under
step 7 on the visual above.
• Now add x’10000’ to the hexadecimal value of the binary string you have just
constructed. (Remember that when we encoded a UTF-16 string, we subtracted
x’10000’, so for decoding we need to add it back.)
• The result of the prior addition now equals the Unicode number, which you can just look
up in appropriate lists of Unicode characters. In our case, the Gothic letter THIUTH is
stored here.
© Copyright IBM Corp. 2004 Unit 5. Unicode in DB2 for z/OS 5-49
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
Student Notebook
x'D800DF381820D840'
1. D840' is not less than x'D800' and is not greater than x'DFFF'
2. x'D840' is between x'D800' and x'DFFF'
3. There are no more bytes in the string
This is no valid UTF-16 character
x'D800DF381820'
=
Notes:
Now that we have successfully identified the first character, we can continue with the next
two bytes. The next two bytes contain x’1820’.
This is a very easy case, because x’1820’ is less than x’D800’, which means that the
Unicode number equals this one. If you look up U+1820, you can see that it represents the
Mongolian Letter A.
Now let us check the remaining part of the hexadecimal string, which is x’D840’.
• x’D840’ is between x’D800’ and x’DFFF’. That is, it seems to be the first two bytes of a
four-byte encoded character.
• Unfortunately there are no more hexadecimal values available, which means that
x’D840’ is not valid.
5-50 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
Character Encoding Comparison
UTF-16 UTF-32
Character ASCII UTF-8
(Big Endian format) (Big Endian format)
Å 'C385'x
(The character A with Ring 'C5'x Note: 'C5'x becomes '00C5'x '000000C5'x
accent) double byte in UTF-8
'CDDB'x
U+9860 (CCSID 939)
'E9A1A0'x '9860'x '00009860'x
Note: UCS-2/UTF-16 and UCS-4/UTF-32 are using a technique called Zero Extension
Notes:
Refer to the figure above for a summary example of the different encoding schemes and
formats. As you can see again, UTF-8 data varies between one and four bytes. UTF-16
always uses at least two bytes.
Today, countries that use CCSIDs like ASCII 939 (which is for DBCS), now need two bytes
for the encoding of their characters. This is also true if they use UTF-16 for the encoding of
their Unicode data, whereas if they used UTF-8, almost all characters would need three
bytes for the encoding.
For UTF-16, an additional note indicates that the BIG ENDIAN FORMAT is used here.
Endianess only affects UTF-16 and UTF-32, which is not covered in this unit. There is a
distinction between these types:
• Big Endian:
Used by pSeries, zSeries, iSeries, Sun, HP.
For a four byte word, the byte order is 0,1,2,3. For a 2-byte word it is 0,1.
© Copyright IBM Corp. 2004 Unit 5. Unicode in DB2 for z/OS 5-51
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
Student Notebook
The encoding of character ’A’ on the visual above is a 2-byte word, which is stored as
x’0041’ in Big Endian format.
• Little Endian:
Used by Intel based machines, including xSeries.
For a four byte word, the byte order is 3,2,1,0. For a 2-byte word it is 1,0.
The encoding of character ’A’ in the Little Endian Format would be x’4100’.
Since this course deals with DB2 for z/OS, within DB2, only the Big Endian format is used.
Within a byte, the Endianess does not matter. A byte is always ordered from leftmost
significant bit to rightmost least significant bit. Bit order within a byte is always
7,6,5,4,3,2,1,0.
5-52 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
Unicode CCSIDs within DB2
DSNHDECP - USCCSID/UMCCSID/UGCCSID
Notes:
Until now, we have only described Unicode in general and talked about the two encoding
formats, UTF-8 and UTF-16, which are used by DB2. Let us now look in more detail at how
Unicode can be used within your DB2 for z/OS subsystem.
© Copyright IBM Corp. 2004 Unit 5. Unicode in DB2 for z/OS 5-53
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
Student Notebook
in a mixed environment). The reason for this is that the default setting for Unicode data is
also the mixed CCSID. This is even true if your setting for the parameter MIXED is set to
NO.
You will probably not use the Unicode SBCS 367 very often, because you can just encode
127 characters, that is, the seven bit ASCII characters using this CCSID. The advantage of
SBCS data in a Unicode environment is that it will perform faster for certain types of
processing. LIKE predicates for example, will perform faster.
5-54 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
© Copyright IBM Corp. 2004 Unit 5. Unicode in DB2 for z/OS 5-55
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
Student Notebook
5-56 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
© Copyright IBM Corp. 2004 Unit 5. Unicode in DB2 for z/OS 5-57
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
Student Notebook
Notes:
With DB2 for z/OS Version 8, Unicode is ubiquitous. Everybody is affected by the changes
regarding Unicode that come with Version 8.
In this topic, we describe the following areas that are affected by Unicode:
• You must now specify a valid SCCSID in your DSNHDECP.
• Most parts (character columns) of the catalog table spaces and one of the directory are
stored in Unicode.
• Data rows may become longer when stored in Unicode. This can impact your
application programs as well as your database design.
• The CREATE LIKE SQL statement is one of several impacts regarding DDL.
• The collating sequence of Unicode data is different from EBCDIC. This can affect
ORDER BY results, as well as range predicates.
• Literals are affected by Unicode.
5-58 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty • You need to keep an eye your buffer pool, EDM pool, and so on, since data stored as
Unicode may require more space to store, and you may have to increase the size of
these pools.
• Utility Unicode statements also need consideration.
We now look at these areas in more detail.
© Copyright IBM Corp. 2004 Unit 5. Unicode in DB2 for z/OS 5-59
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
Student Notebook
• The precompiler will not run when it finds a DSNHDECP with SCCSID=0, ASCCSID=0,
or both.
• DB2 will make sure that you do not change the CCSIDs in the system by accident, as
changes in the subsystem’s CCSIDs are not supported. In V8, the CCSID information is
also recorded in the BSDS (and can be displayed by the display log map utility), and at
startup time DB2 checks to make sure they match the values in the DSNHDECP. If the
values do not match, message DSNT108I will be issued, and DB2 startup processing
will terminate (DB2 does not start successfully).
5-60 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
Accessing the Unicode Catalog (1 of 2)
---------+---------+---------+---------+-----
NAME HEXNAME
---------+---------+---------+---------+-----
DSN8G810 44534E3847383130
SABI 53414249
Z!L\ 5A214C5C
| 7C
Notes:
To enable Unicode names and literals in DDL, most parts of the DB2 catalog are converted
to UTF-8 in DB2 Version 8. The only two exceptions are tables SYSIBM.SYSDUMMY1 in
the newly created table spaces DSNDB06.SYSEBCDC and SYSIBM.SYSCOPY in
DSNDB06.SYSCOPY. This conversion to Unicode is being done at the same time as the
enabling of the long names support during enabling-new-function mode (ENFM)
processing.
Although the way the catalog data is stored changes “under the covers”, you can still
access the catalog tables using SPUFI, for example. As long as the character that is stored
in the catalog tables can be represented in the CCSID used by the application encoding
scheme, and can also be represented in the CCSID used by your 3270 emulation, the data
is displayed as for V7, that is, in EBCDIC (SPUFI’s encoding scheme) — CCSID 37 in our
case.
You can see this in the upper SELECT statement on the visual above. The hexadecimal
representation (HEXNAME column) shows that the column data is really stored in Unicode,
© Copyright IBM Corp. 2004 Unit 5. Unicode in DB2 for z/OS 5-61
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
Student Notebook
but the application encoding scheme is used to convert all characters to a valid character in
CCSID 37, which is also used by our terminal emulation.
5-62 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
Accessing the Unicode Catalog (2 of 2)
Notes:
If, however, you start using more “fancy” characters for your DB2 objects, like table names,
index names, and so on, you may not be able to represent those characters from all clients.
This is what we did in the example above. We used the DB2 Control Center to create a
table with the keyboard set to URDU and entered the Arabic letter TTEH as table name
enclosed in apostrophes (delimited table name). The creation of the table was successful,
and since we did not use a specific table space name, DB2 chose L. If we now select from
SYSTABLES where the table space name equals ‘L’, the resulting row looks like the one
shown on the visual above. Since the arabic letter TTEH cannot be displayed using
EBCDIC CCSID 37 (our terminal emulation’s CCSID), you just see a ‘.’ instead. From the
hexadecimal value (HEXNAME column) you can see that it really is the Arabic letter TTEH
which is stored as table name. The hexadecimal value is x’D9B9’. Follow the procedure
described in Figure 5-18, "Decoding UTF-8" on page 5-40, if you want to check again how
© Copyright IBM Corp. 2004 Unit 5. Unicode in DB2 for z/OS 5-63
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
Student Notebook
to decode the hexadecimal value x’D9B8’ into the Unicode code point U+0679, which
represents TTEH.
Tip: Although possible, we recommend that you do not use characters for your DB2
object names which are not part of a common subset that is representable on all clients.
For some objects, whose names must be passed to z/OS, the characters used must be
convertible to EBCDIC. Here are a few examples:
• Database names are part of the data set qualifier.
• Table space names are part of the data set qualifier.
• DBRM names are PDS member names.
• UDF and stored procedures (when no explicit external name is specified), exits, and
FIELDPROCs are PDS member names.
5-64 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
Data Length Issues
Notes:
Conversions can cause the length of a string to change. This can be true in both directions!
Let us first have a look at expanding conversions, those in which the conversion from one
CCSID to another causes the data to become longer. As stated previously, all characters
that can be represented by the first 127 ASCII characters can be stored using just one byte
in UTF-8. If you refer to the visual above, you can see that this covers the major part of
letters used in many countries today. If, however, you are also using characters that require
code points greater than U+7F, the representation of these characters requires at least two
bytes.
As an example, you may want to use the character é, which is very common in French.
Once you start using this character for your DB2 objects or in your application data with a
© Copyright IBM Corp. 2004 Unit 5. Unicode in DB2 for z/OS 5-65
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
Student Notebook
Unicode encoding scheme, you must take into account that the storage of these characters
requires more space than it used to previously in EBCDIC or ASCII.
Attention: Always remember that you must allocate the width of your data columns
according to the storage length and not according to their display length. For example the
storage of string ‘René’ requires five bytes instead of the four bytes which one could
assume just looking at it in UTF-8. Therefore using varying length strings may be
appropriate.
This, however, does not affect your application programs if you do not start using
application encoding scheme Unicode. As long as you stick to EBCDIC, for example, DB2
converts the data to the program’s CCSID based on the application encoding scheme and
the precompiler CCSID. Refer to Figure 5-34, "Application Programming Issues" on page
5-74 for more details.
5-66 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
Strings May Exceed 255 bytes
BUT .. OK in V8 NFM!
String can be 32704 bytes
Notes:
If you are using strings in your SQL statements, it is possible that those strings that were
valid in V7 are invalid in V8 compatibility mode. This is due to the potential increase in
number of bytes when they must be converted to Unicode. These statements might be
flagged as too long in Version 8 (exceeding 225 bytes), and you will receive message
DSNH102I (during a precompilation) or SQLCODE -102 (otherwise).
This is, however, is only a problem while you are running your DB2 V8 subsystem in
compatibility and enabling-new-function mode. Once you enter the new-function mode, the
255 byte limit is raised to 32704 bytes.
© Copyright IBM Corp. 2004 Unit 5. Unicode in DB2 for z/OS 5-67
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
Student Notebook
MIXED = NO subsystem
EBCDIC or Unicode
database
EBCDIC Unicode
TS TS
CREATE TB ... LIKE unicodetab (only SBCS cols) CREATE TB ... LIKE unicodetab
Notes:
In this topic, we look at Unicode and EBCDIC objects. We do not explicitly mention ASCII,
because it is similar to what we discuss for EBCDIC.
If you have a database defined with CCSID Unicode, you can create both EBCDIC and
Unicode table spaces in this database. This is also true for a database defined with CCSID
EBCDIC. The CCSID that you have specified in your CREATE DATABASE statement
becomes the default encoding scheme for this database. This means that if you
subsequently create table spaces in this database, they inherit the database’s default
encoding scheme. You can, however, override this default by specifying another CCSID in
the CREATE TABLESPACE statement. If you do not specify a CCSID clause in your
CREATE DATABASE statement, the option defaults to the value of field DEF ENCODING
SCHEME on installation panel DSNTIPF (ENSCHEME DSNHDECP).
Table 5-2 shows information you can see in the DB2 catalog table
SYSIBM.SYSDATABASE, depending on which CCSID is associated to the database either
explicitly or implicitly through the system default encoding scheme.
5-68 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
EBCDIC E 37 0 0
DSNDB04 blank 0 0 0
If you create an EBCDIC database and your DB2 subsystem has mixed data disabled
(MIXED NO), the system’s default encoding scheme is stored in column SBCS_CCSID and
no CCSID is stored in DBCS and mixed columns. For Unicode databases, all three CCSID
columns are always filled with the exact same values as shown in Table 5-2.
Database DSNDB04 is neither EBCDIC nor UNICODE and no CCSIDs are associated with
the default database. We will refer to this special behavior a little bit later in this topic.
When you create a table space, unless you explicitly specify the CCSID clause on the
CREATE TABLESPACE statement, its CCSID is the same as the CCSID of the database.
As you can see in the figure above, for both Unicode and EBCDIC table spaces, there are
some restrictions regarding tables which can be defined in them. First we describe the
behavior of EBCDIC table spaces.
If you do not use the CCSID parameter in your CREATE TABLE statement, the table
space’s encoding scheme is the default. Therefore, it is not very surprising that you can
create EBCDIC tables in EBCDIC table spaces. In contrast to that, you cannot create a
table using CCSID UNICODE in an EBCDIC table space. All objects in a table space must
have the same encoding scheme.
If your DB2 subsystem is defined with MIXED=NO, all characters that are stored in
EBCDIC tables take up one byte of storage and they are all encoded in the specific
encoding scheme, which is associated with the table space. In addition to that, only the
SBCS_CCSID can be used to code your data.
© Copyright IBM Corp. 2004 Unit 5. Unicode in DB2 for z/OS 5-69
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
Student Notebook
As shown in Table 5-2, an EBCDIC database (or table space) created in a MIXED=NO
subsystem does not have mixed_CCSIDs associated and therefore cannot be used.
Tip: CREATE TABLE TEST1 LIKE SYSIBM.SYSDATABASE works fine. The implicitly
created table space is now being created in the default database DSNDB04. As shown in
Table 5-2, DSNDB04 does not have any encoding scheme associated with it. In this
case, the implicitly created table space can inherit the LIKE table’s encoding scheme,
which is Unicode in case of SYSIBM.SYSDATABASE. Since mixed is the default for
Unicode table spaces, the definition of the table and the storage of the data would match.
Table SYSIBM.SYSDUMMY1 used to be part of table space DSNDB06.SYSSTR. The
SYSSTR table space is encoded in Unicode starting with DB2 V8. In order to provide a
consistent behavior when working with this “dummy “table, it has been moved to a separate
EBCDIC table space called DSNDB06.SYSEBCDC.
5-70 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
Collating Sequence
---------+---------+---------+---------+ ---------+---------+---------+-----
! 21 | 4F
DSN8G810 44534E3847383130 ! 5A
SABI 53414249 baba 82818281
SABIGRP1 5341424947525031 DSN8G810 C4E2D5F8C7F8F1F0
SHOW11 53484F573131 SABI E2C1C2C9
SHOW22 53484F573232 SABIGRP1 E2C1C2C9C7D9D7F1
SHOW33 53484F573333 SHOWAA E2C8D6E6C1C1
SHOW44 53484F573434 SHOWBB E2C8D6E6C2C2
SHOW55 53484F573535 SHOWCC E2C8D6E6C3C3
SHOWAA 53484F574141 SHOWDD E2C8D6E6C4C4
SHOWBB 53484F574242 SHOW11 E2C8D6E6F1F1
SHOWCC 53484F574343 SHOW22 E2C8D6E6F2F2
SHOWDD 53484F574444 SHOW33 E2C8D6E6F3F3
SYSDEFLT 5359534445464C54 SHOW44 E2C8D6E6F4F4
baba 62616261 SHOW55 E2C8D6E6F5F5
SYSDEFLT E2E8E2C4C5C6D3E3
Notes:
As you can see from the visual above, whenever you access Unicode data, the collating
sequence as result of an ORDER BY statement is different than the one in EBCDIC. The
Unicode collating sequence is equivalent to the ASCII behavior, as the first 127 code points
are identical to Unicode.
This means that whenever you run a query on your V8 NFM catalog, the results may return
in a different sequence. However, this is only an issue if the columns being ordered contain
a mix of numeric, upper case and lower case characters and special characters.
See Table 5-3 for an overview of how the characters are ordered in Unicode versus
EBCDIC.
© Copyright IBM Corp. 2004 Unit 5. Unicode in DB2 for z/OS 5-71
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
Student Notebook
Range predicates and MIN/MAX functions may also be impacted where they span sets of
characters. Refer to “ORDER BY Sort Order” on page 5-113 for a more detailed
description.
Tip: A possible workaround for the ORDER BY issue is the usage of the CAST function
to convert the data to EBCDIC format, as this will result in the original sort order. The
following statement returns the data in EBCDIC sequence:
SELECT CAST(NAME AS CHAR(20) CCSID EBCDIC) AS E_NAME
FROM SYSIBM.SYSTABLES
WHERE NAME LIKE 'T%'
ORDER BY E_NAME
Note, however, that DB2 cannot use an index to avoid the ORDER BY sort in this case.
The use of this construct may impact performance, especially on large result sets.
5-72 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
Utility Unicode Parser
//DSNUPROC.SYSIN DD *
66CEDEDDDC4EEECD4444CC445444444444444444444444444
1142547963B2829500004400C000000000000000000000000
-------------------------------------------------
{íñáëäá } â<áë& äá @ë+@â ëßëáâä@ä
Unicode
5544544254444554442454443325554444422222222222222
statement
15953350412C530135043E4206E3935234300000000000000
in UTF-8
-------------------------------------------------
//*QUIESCE TABLESPACE DSNDB06.SYSEBCDC
665DECCECC4ECCDCEDCCC4CEDCCFF4EEECCCCC44444444444
11C84952350312352713504254206B2825234300000000000
Notes:
As you can see in the figure above, starting with V8, you can specify your utility control
statement in EBCDIC or UTF-8. By utility control statement, we mean those input data sets
provided to the DSNUTILB program with DD names SYSIN, SYSLISTD or SYSTEMPL or
the contents of the UTSTMT field passed to the DSNUTILU stored procedure.
Refer to the visual above. The output on the SYSPRINT data set will continue to be in
EBCDIC. This is also true for messages displayed on the console.
For a more detailed description of the new utility stored procedure interface DSNUTILU for
Unicode, also refer to Figure 8-36, "Utility Unicode Statements" on page 8-82 and
subsequent pages.
© Copyright IBM Corp. 2004 Unit 5. Unicode in DB2 for z/OS 5-73
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
Student Notebook
Be prepared!
Notes:
Starting with DB2 V8, Unicode is everywhere. This also requires some changes to the way
programs can access DB2 data. These changes are described in detail on the next few
pages. They include:
• An explanation of why the SQL parser needs to be able to read Unicode data
• A list of changes to the program preparation steps
• A discussion about the concept of application encoding schemes
• A description of how Unicode changes the behavior of functions, routines, and stored
procedures
• A reminder regarding the way you must start working with your DBRMs and catalog
tables SYSIBM.SYSSTMT, and SYSIBM.SYSPACKSTMT
5-74 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
Unicode Parser
TAB#1 x'E3C1C2B1F1'
CREATE TABLE
TAB#1 .... 3270 Emulation EBCDIC 297
V8
Application Encoding EBCDIC 297
TAB#1 x'E3C1C223F1'
Unicode Parser uses EBCDIC 297,
# interpreted as #, that is code point U+23
Notes:
The use of an EBCDIC parser in a DB2 subsystem that is used for global e-commerce
creates a few interesting problems:
• One problem is that you cannot include string constants from multiple character sets in
a single source SQL statement. To solve this problem, you must use one of the
following techniques:
- Host variables (or parameter markers for dynamic SQL) and DECLARE VARIABLE
(or a descriptor) to specify the different CCSIDs that you need to use
- Hexadecimal string constants
Both techniques are obviously not very convenient.
• A second problem is that various EBCDIC code pages are inconsistent regarding code
points of various special characters. Those characters include “$@#|¬, as well as
Katakana lower case characters. This makes it very difficult for an EBCDIC parser to
correctly process SQL statements that use these characters. Even though the DB2 V7
© Copyright IBM Corp. 2004 Unit 5. Unicode in DB2 for z/OS 5-75
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
Student Notebook
parser tries to recognize these characters in different locations in different code pages,
it does not always work out.
For example, some EBCDIC CCSIDs represent the ‘#’ character as different hex code
points. This can cause parsing errors. The figure above shows that in case you are
using the French EBCDIC CCSID 297, you cannot use character ‘#’ for the creation of a
table in V7, and you get an SQLCODE -104.
To avoid some of these problems, you can sometimes use a different way to code your
SQL statements. For example:
COLA ¬= 5 can be coded as COLA <> 5
This avoids the use of the ¬ (NOT) sign that tends to jump around in different EBCDIC
code pages. A similar problem exists with the | (CONCAT) operator. For example:
COLA || COLB¬= 5 can be coded as COLA CONCAT COLB
DB2 V8 introduces the Unicode SQL parser. This adds some of the key functionality which
leads DB2 from a basic Unicode implementation (data only), to enhanced Unicode
exploitation. Unicode parsing in V8 transforms the traditional EBCDIC parser into a parser
which accepts the syntax regardless of the EBCDIC CCSID. The Unicode parser converts
all SQL statements that are not currently encoded as Unicode UTF-8, to that format before
parsing.
This technique solves both problems discussed before.
Because you can code your SQL statement in Unicode, and the statement can also be
processed (parsed and interpreted) in Unicode (instead of having to convert to EBCDIC,
before DB2 could work on it), you can code the literals in Unicode as well. This way, you
can code any existing character as part of a literal in your Unicode SQL statement.
The second problem we discussed is illustrated in the figure above. In this example we
assume that the user’s 3270 emulation is set to the French EBCDIC code page 297, and
we want to create a table named ‘TAB#1’. This is a valid name.
In V7, the creation fails, because DB2 is not always able to interpret the ‘#’ (x’B1’ in CCSID
297) as a ‘#’ because of its different locations in certain EBCDIC code pages. The CREATE
TABLE statement fails with the following message:
CREATE TABLE TAB#1 (COLA CHAR(5))
DSNT408I SQLCODE = -104, ERROR: ILLEGAL SYMBOL " ". SOME SYMBOLS THAT
MIGHT BE LEGAL ARE: ( LIKE
In V8, when using application encoding scheme (297), which is the same as the terminal
emulator’s CCSID (as it should be), the statement works fine. When you type a ‘#’ in your
3270 emulator with EBCDIC 297, it is coded as x’B1’. The value x’B1’ is now converted to
Unicode using the application encoding scheme, which is set to EBCDIC 297 in our
example. x’B1’ is converted to Unicode code point U+23, which is represented as x’23’ in
UTF-8. This is also illustrated in Example 5-1.
5-76 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
NAME HEX_NAME
-----------------------------------------
TAB#1 5441422331
______________________________________________________________________
© Copyright IBM Corp. 2004 Unit 5. Unicode in DB2 for z/OS 5-77
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
Student Notebook
Unicode Precompiler
DBRM either
Unicode UTF-8 or EBCDIC
Program source code
Precompiler depending on
NEWFUN parm
Convert to UTF-8
Modified source
converted back to
original CCSID
Notes:
When migrating your DB2 UDB for OS/390 and z/OS from V7 to V8, you must go through
three different modes. Irrespective of the mode in which you are currently running your DB2
subsystem, DB2 V8 always uses a Unicode precompiler (or precompiler services).
As you can see from the visual above, the Unicode precompiler converts the program
source code to Unicode UTF-8, performs the precompilation, and then converts all
statements, including the generated and modified statements, back to the system CCSID
as specified on the panel DSNTIPF during installation (unless you specify the CCSID
precompiler option, in which case the source is interpreted as being in this CCSID, and is
converted back to that CCSID after the precompilation is done).
Apart from the modified source, the Unicode precompiler also generates the corresponding
DBRM. If you do not specify any additional precompiler options, the DBRM is generated in
EBCDIC as long as your subsystem is not running in new-function mode (as explained in
more detail in the next topic).
5-78 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
Program Preparation
Application
source
NEWFUN
(YES|NO) Unicode YES
NFM?
Precompiler
NO
Modified Listing, EBCDIC Unicode
Source Messages DBRM DBRM
Compiler
Linkage
Editor
Dynamic SQL
Notes:
© Copyright IBM Corp. 2004 Unit 5. Unicode in DB2 for z/OS 5-79
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
Student Notebook
marked as V8-dependent and is therefore not compatible with V7. This happens regardless
of whether the program contains new V8 syntax or not. As a consequence, it cannot be
bound on a V7 nor on a V8 subsystem that is not yet running in NFM. The DBRM that is
produced as a result of the precompilation is in Unicode.
The default value for the NEWFUN precompiler parameter is set to NO during compatibility
and enabling-new-function modes. For a new V8 subsystem or for a subsystem which has
successfully been converted to new-function-mode, the default changes to YES. The
advantage of changing the default is that there is no need to change all the precompile jobs
once you get to NFM.
This behavior is also illustrated on the visual above. As you can see, regardless of whether
NEWFUN is set to YES or NO, DB2 invokes the V8 SQL parser. The V8 SQL parser uses
Unicode UTF-8 for parsing. If the source program’s SQL statements are not in UTF-8, the
precompiler converts them to UTF-8 for parsing (as explained in the previous topic).
You can also refer to Table 5-4. This is a slightly different way to explain the dependencies
between the different modes and the value used for the NEWFUN option.
Table 5-4 Dependencies and binding for DB2 V8
Value of NEWFUN keyword NO YES YES
For more information, see Figure 11-49, "Application Programming" on page 11-118.
5-80 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
Functions and Routines
Functions
LENGTH, SUBSTR, POSSTR, LOCATE
For SBCS and MIXED (UTF-8) they are byte-oriented
For DBCS (UTF-16) they are double-byte-oriented
New character based functions (and enhanced existing functions)
New CHARACTER_LENGTH, SUBSTRING, POSITION
New parm indicating how to count (CODEUNITS32, CODEUNITS16,
OCTETS)
Cast functions
Unicode generally accepted where CHAR is accepted
For CHAR functions, UTF-8 is result data type
Routines
UDFs, UDTs, and stored procedures allow Unicode parameters
Parameters converted as necessary between CHAR(UTF-8) and
GRAPHIC(UTF-16)
Date, time, timestamp passed as UTF-8 (ISO format)
Notes:
Functions and routines have been enhanced to be able to deal with Unicode data.
If you use the LENGTH function, DB2 returns the actual storage length; in other words, it
counts the number of bytes.
To handle the problem of varying length UTF-8 characters, DB2 has enhanced a number of
existing functions and introduces three new functions (CHARACTER_LENGTH, POSITION
and SUBSTRING). You can now indicate to these functions how you want DB2 to count,
byte-wise or character-wise. You can specify the following new keywords:
• CODEUNITS32
• CODEUNITS16
• OCTETS
For more information on these new functions an how to the new options, Figure 3-121,
"Character-based String Functions" on page 3-207.
Routines, UDFs, and stored procedures allow for the specification of Unicode parameters.
Parameters are converted when required between CHAR (UTF-8) and GRAPHIC
© Copyright IBM Corp. 2004 Unit 5. Unicode in DB2 for z/OS 5-81
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
Student Notebook
(UTF-16). The character representation of date, time, and timestamps are passed as
UTF-8.
5-82 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
TEXT in SYSIBM.SYSSTMT
Notes:
Once you are in NFM, and the parameter NEWFUN is set to YES, the text of your DBRMs,
which is stored in column TEXT of catalog table SYSIBM.SYSSTMT, or the STMT column
of SYSIBM.SYSPACKSTMT, is encoded in Unicode. These columns are marked with the
FOR BIT DATA attribute. This means that if you look at the column, using SPUFI for
example, you are no longer able to easily read the contents, because columns with the
FOR BIT DATA attribute are never converted, and these columns are now in Unicode.
If you see readable information in this column as shown on the visual above in the blue
(upper) frame, this package has either never been bound since DB2 has been migrated to
V8, or the NEWFUN option was set to NO. In this case, the information in column TEXT is
still stored in EBCDIC.
You can also use the V8 Visual Explain (VE) tool to look at those statements. VE takes care
of the conversion for you, or you can use a tool like DB2 Administration Tool.
A “quick and dirty” approach is to download the SPUFI output to your PC in binary and look
at it using the NOTEPAD editing tool. Because the first 127 characters in Unicode are the
© Copyright IBM Corp. 2004 Unit 5. Unicode in DB2 for z/OS 5-83
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
Student Notebook
5-84 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
Unicode DBRMs
SENT TO NOTEPAD!
Notes:
The same technique can be used for your DBRMs. Once you start precompiling and your
programs with the parameter NEWFUN set to YES, the DBRMs are stored in Unicode and
you cannot read them easily any more. The workaround described in the previous topic,
that is, downloading and looking at it using NOTEPAD, also applies here.
© Copyright IBM Corp. 2004 Unit 5. Unicode in DB2 for z/OS 5-85
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
Student Notebook
5-86 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
© Copyright IBM Corp. 2004 Unit 5. Unicode in DB2 for z/OS 5-87
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
Student Notebook
Move to Unicode?
EBCDIC
Shipping Co.
Notes:
The decision whether to store data as Unicode in the database should be considered
carefully on a case-by-case basis. Obviously the main benefit to be realized is the ability to
store multilingual data in a single DB2 system, database, or even table. If you are using
Unicode-based technologies such as Java and XML, you may also wish to store data as
Unicode to avoid conversion issues.
If you plan to start storing your data as Unicode, you must remember that storage size does
not equal the displayed size. This means that you may have to increase column length to
cater to maximum storage lengths. (This is also true in mixed EBCDIC systems because of
the use of shift-in and shift-out characters.)
In addition to potential column length increase, the data sets may also grow in size, which
requires additional DASD. The amount of increase that you can expect depends upon the
nature of the data as discussed previously.
5-88 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
Note: You do not have to move your data to Unicode if you do not have any special
needs for it. For the time being, you can easily continue to work with EBCDIC. However,
with the V8 enhancements that allow you to have multiple CCSID sets in a single SQL
statement (discussed next), you can now gradually migrate your EBCDIC data to
Unicode data. In DB2 V7, you were almost always forced to move all the data that was
touched by an application to Unicode in one conversion operation. If you shared tables
between applications, which is very often the case, you may have to convert multiple
applications at the same time, or potentially the entire DB2 subsystem. In V8 you can
take it one step at a time.
Important: However, bear in mind that there are cases where multiple CCSID SQL
statements do not perform as well as single CCSID statements.
© Copyright IBM Corp. 2004 Unit 5. Unicode in DB2 for z/OS 5-89
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
Student Notebook
5-90 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
© Copyright IBM Corp. 2004 Unit 5. Unicode in DB2 for z/OS 5-91
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
Student Notebook
Notes:
Even though you can store data in ASCII, since DB2 for MVS V5, and in Unicode, since
DB2 for z/OS and OS/390 V7, DB2 V7 does not allow you to reference multiple table
objects defined with different encoding schemes in the same SQL statement.
With DB2 V8, this restriction is removed. Once your DB2 subsystem is running in
enabling-new-function mode (ENFM), it allows you to access multiple CCSID sets per SQL
statement. (This functionality is not available in compatibility mode.) With character
columns in the DB2 V8 catalog being in Unicode, it became an absolute necessity to
implement this feature. Otherwise, your applications, or vendor products, that join DB2
catalog tables with your own EBCDIC tables, or vendor EBCDIC tables, would run into
errors.
Most of the other new features of DB2 V8 are only available once you successfully migrate
to new-function mode. This feature is an exception. It is also available in
enabling-new-function mode, because already during enabling-new-function mode, the
catalog tables are migrated step by step to Unicode.
5-92 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty The visual above lists the cases that make an SQL statement become a multiple CCSID
set statement.
Note: When an SQL statement has a single CCSID set, the “old” rules apply; for
example, all constants and special registers referenced in the statement use the CCSID
set of the statement, where the CCSID set of the statement is the CCSID of the table (or
view) you are running your query against. If no table (or view) is present in the statement,
the default EBCDIC encoding scheme is used for the string constant. The CCSID is the
appropriate character string CCSID for the encoding scheme.
If a statement is considered to be a multiple CCSID set SQL statement, the CCSID set
associated with a string constant or special register always comes from the application
encoding scheme.
© Copyright IBM Corp. 2004 Unit 5. Unicode in DB2 for z/OS 5-93
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
Student Notebook
A simplified syntax diagram is shown in Figure 5-1. We only show the character-related
data types in the data type specification, as the CCSID clause can only be specified on
those. Numeric data types or the BLOB data type can of course be specified in the CAST
function, but cannot use the CCSID clause.
When you use the CCSID clause in a CAST function, this makes the statement a multiple
CCSID set statement, and the multiple CCSID string comparison rules apply. These are
explained in more detail in Figure 5-46, "String Comparison (1 of 2)" on page 5-104.
To illustrate the use of the CCSID clause in the CAST function, let us look at the following
example. Assume we have defined the following tables defined in the catalog: TA, TB, T1,
T2. Example 5-4 shows the retrieval of these table names, and order the result by the table
name.
Example 5-4. SELECT from Catalog with ORDER BY
SELECT NAME
FROM SYSIBM.SYSTABLES
WHERE NAME LIKE 'T%'
ORDER BY NAME
______________________________________________________________________
When you are in V8 compatibility mode (or in V7), the catalog is in EBCDIC, and the result
of the query is:
5-94 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty TA
TB
T1
T2
Once the SYSTABLES catalog table is converted to Unicode during enabling-new-function
mode processing, the result looks like this:
T1
T2
TA
TB
To return the result of this query in EBCDIC collating sequence, which most people are
used to after all these years, you can rewrite the query as shown in Example 5-5.
Example 5-5. SELECT using CAST with CCSID
Table UDF
The decision for a table UDF which DB2 uses follows the method of finding the best fit as
described in the DB2 SQL Reference, SC18-7426. In order to be able to find a UDF, DB2
must resolve the parameters. Once DB2 has decided which UDF to use, it is processed
using the UDFs application encoding scheme.
Depending on which encoding scheme is used to find the best fitting UDF, the length of the
character strings may vary and therefore the best fit could be a different one depending on
which encoding scheme is used. Using the default application encoding scheme to
evaluate the character strings gives you more flexibility than V7, where DB2 always used
EBCDIC to resolve the character strings for a UDF.
© Copyright IBM Corp. 2004 Unit 5. Unicode in DB2 for z/OS 5-95
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
Student Notebook
Notes:
When a statement references table objects with multiple CCSID sets, there is a need to
determine which CCSID set to use in the various semantic rules. This need further requires
every string expression, such as a string constant, a special register, etc., in the statement
to have a CCSID associated with it.
All of the following examples use the naming conventions shown in Table 5-5.
Table 5-5 Naming Conventions for Examples in This Section
Name Meaning
5-96 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty Example 5-6 shows the importance of knowing the CCSID associated with the string
constants in the predicates.
Example 5-6. Multiple CCSID SQL Statement
_____________________________________________________________________
The statement joins two tables, an EBCDIC table ET1 and a Unicode table UT1. The
EBCDIC column ET1.C1 is compared with a hexadecimal constant X’C1C2C3’. The
Unicode column UT1.C1 is compared with a hexadecimal constant X’414243’. (Note that
x’C1C2C3’ in EBCDIC represents the same letters as x’414243’ in Unicode.)
Should the hexadecimal constants use EBCDIC or Unicode? Or should DB2 use EBCDIC
for the comparison of ET1.C1=X’C1C2C3’ and Unicode for UT1.C1=X’414243’ ? This
decision influences the result set and must be consistent for all those decisions which need
to be made by DB2.
In fact, coding the statement as shown in Example 5-6 is probably not going to give you the
result that you want. Let us look at the rules which govern multiple CCSID set SQL
statements.
Assume that the application encoding scheme for the package which issues this statement
is EBCDIC. In this case, DB2 uses the EBCDIC SBCS CCSID set to “interpret” both
hexadecimal constants. If the hexadecimal constants are as shown in Example 5-6, this
converts to an SQL statement shown in Example 5-7.
Example 5-7. Conversion of Hexadecimal Constants
______________________________________________________________________
This is not what we intended with this query. The purpose was to compare both C1
columns with value ’ABC’. In EBCDIC, x’C1C2C3’ is ’ABC’, but x’414243’ equals ’ âä’. In
order to have the Unicode and the EBCDIC column evaluated against ’ABC’, your query
must look like Example 5-8.
Example 5-8. Correct Encoding of Hexadecimal Values
______________________________________________________________________
© Copyright IBM Corp. 2004 Unit 5. Unicode in DB2 for z/OS 5-97
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
Student Notebook
Rule: When a statement contains table objects with different CCSID sets, the CCSID set
associated with a string constant or a special register is determined by the application
encoding scheme.
5-98 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
Why You Need to Know More
Character conversion can influence:
The length of your result rows
SELECT HEX(U.WORKDEPT||A.PROJECT),U.LASTNAME
FROM AT1 A,UT1 U
WHERE A.LASTNAME=U.LASTNAME AND A.LASTNAME = 'HAAS';
---------+---------+---------+---------+---------+---------
LASTNAME
---------+---------+---------+---------+---------+---------
413030C384C384C384C384 HAAS
Notes:
Before we describe how DB2 does data comparison and concatenation in more detail, it is
important to know what happens to your data when you start using multiple CCSIDs in your
SQL statements.
Let us assume that you are working in the following environment:
• UT1 - Unicode table with columns WORKDEPT and LASTNAME. The only row in this
table has the following values:
- WORKDEPT: A00, defined as CHAR(3), SBCS
- LASTNAME: HAAS
• AT1 - ASCII table with one column, PROJECT, which just contains one row with value
ÄÄÄÄ, defined as CHAR(4), SBCS.
Length of Result Rows - Changes
© Copyright IBM Corp. 2004 Unit 5. Unicode in DB2 for z/OS 5-99
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
Student Notebook
Example 5-9 shows a SELECT statement producing a result row with concatenated
columns from a Unicode table UT1 and an ASCII table AT1. If you look at the result row, it
looks exactly as you would have expected it to look.
Example 5-9. Select with Concatenating Two Columns
SELECT HEX(U.WORKDEPT||A.PROJECT),U.LASTNAME
FROM AT1 A,UT1 U
WHERE A.LASTNAME=U.LASTNAME AND A.LASTNAME = 'HAAS';
---------+---------+---------+---------+---------+---------
LASTNAME
---------+---------+---------+---------+---------+---------
413030C384C384C384C384 HAAS
The value for WORKDEPT is ‘A00’ and PROJECT is ‘ÄÄÄÄ’
______________________________________________________________________
As you can see, the representation of ’A00ÄÄÄÄ’ takes up 11 bytes instead of the original
seven bytes now. This is, because DB2 has to convert the ASCII data to Unicode.
Depending on which Unicode CCSID DB2 uses in a particular conversion, the number of
bytes used to store a character may increase from one to four bytes. In Example 5-10, the
representation of character ’Ä’ in Unicode (UTF-8) takes up two bytes x’C384’. Refer to
Figure 5-16, "UTF-8 Encoding (1 of 2)" on page 5-36 to learn more about Unicode UTF-8
encoding.
Encoding Scheme of Your Result Rows - Changes
Apart from the changes in length, the encoding scheme of your result data can also change
if you reference more than one CCSID in your SQL statement. Example 5-11 below shows
how the hexadecimal representation of ’ÄÄÄÄÄÄÄÄÄÄ’ in ASCII CCSID 437. The x’8E’ is
used to represent the character ’Ä’, whereas in the result row shown above in
Example 5-10, x’C384’ is used.
Example 5-11. Single CCSID SELECT Statement - Hex Representation
5-100 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty ---------+---------+----
8E8E8E8E8E8E8E8E8E8E
______________________________________________________________________
If, for some reason, you must look at your result data using the hexadecimal
representation, you must know which encoding scheme is used for the table in order to be
able to interpret the byte sequence correctly. According to the examples shown above, and
after reading the Unicode unit, you might be able to convert x’C384’ back from UTF-8 to
ASCII CCSID 437, but without knowing that what you see is UTF-8, you cannot tell which
character is represented here. In this case, you could have also read x’C384’ as characters
“Cd” (where x’C3’ stands for “C”, and x’84’ stands for “d” in EBCDIC code page 37).
© Copyright IBM Corp. 2004 Unit 5. Unicode in DB2 for z/OS 5-101
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
Student Notebook
Notes:
Before we can go into more detail regarding how DB2 deals with multiple CCSID sets in
SQL statements, we must first look at the terminology.
For the purpose of CCSID determination, string expressions in SQL statements are divided
into six types, called operand types. Refer to Table 5-6 for a more detailed description of
each operand type and its associated CCSID.
5-102 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Derived value based on a column CCSID of the source of the derived value:
A derived value based on a column is an expression, other
than a column, a constant, a special register, or a host
variable, whose source is directly or indirectly based on a
column. The CCSID of such an expression is the CCSID of its
source.
Example: SUBSTR(column_1,5,length(column_2))
Derived value not based on a CCSID of the source of the derived value:
column
A derived value not based on a column is an expression, other
than a column, a constant, a special register or a host variable,
whose source is not directly or indirectly based on any column.
The CCSID of such an expression is the CCSID of its source.
Example: SUBSTR(‘ABCD’,1,length(‘AB’))
© Copyright IBM Corp. 2004 Unit 5. Unicode in DB2 for z/OS 5-103
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
Student Notebook
String Comparison (1 of 2)
String
constant second Note 1 Note 1 Note 1 Note 1
Special
register second Note 1 Note 1 Note 1 Note 1
Host variable
second Note 1 Note 1 Note 1 Note 1
Notes:
You can use the table shown on the visual to find out which operand supplies the CCSID
for the character conversion, if character conversion is required.
Typically the evaluation of semantic rules involves two operands, operand 1 and operand
2. CCSID conversion is required if all of the following are true:
• The CCSID of operand 1 is different from the CCSID of operand 2.
• Neither CCSID is x’FFFF’, that is, neither operand is defined as FOR BIT DATA, nor
BLOB.
• The operand selected for conversion is neither NULL, nor empty.
Note 1: If the CCSID sets are different, both operands are converted, if necessary, to
Unicode.
5-104 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
Multiple CCSIDs - Example
ASCII EBCDIC
Notes:
Let us now look at an example to become more familiar with the “string comparison” table
provided earlier.
The visual shows a sample SELECT statement with multiple CCSIDs involved. The
statement joins three tables.
Important: The evaluation for CCSID conversion in (most) comparisons and
concatenations is a pair-wise evaluation.
The following explanation demonstrates how comparisons are done, and how to use the
table from the previous topic to find out which operand supplies the CCSID for the
comparison or concatenation.
1. AT1.C1 || x’C1’ is evaluated first. AT1.C1, which is the first operand, is a column. X’C1’,
which is the second operand, is a string constant. If you refer to the table on the
previous visual, you see that the first operand supplies the CCSID for this
concatenation. The result of this first comparison is a column value based on a column,
© Copyright IBM Corp. 2004 Unit 5. Unicode in DB2 for z/OS 5-105
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
Student Notebook
whose associated CCSID is ASCII. Therefore the resulting CCSID of the expression
AT.C1 || X’C1’ is ASCII.
2. The expression AT.C1 || X’C1’ (ASCII) becomes the first operand for the second
operation. The second operand (UT.C1) is a column with Unicode CCSID, because the
table’s encoding scheme is Unicode. The concatenation of these two operands ends up
in column one, row one on the string comparison table (“column value or derived value
based on a column” for both operands), which means that since the encoding schemes
of both operands are different, they are both converted to Unicode.
3. AT1.C1 is an ASCII column. ET1.C1 is an EBCDIC column. Since two columns with
different encoding schemes are compared here, they are both converted to Unicode.
4. AT1.C1 is an ASCII column. X’C1C2C3’ is a hexadecimal constant, which is interpreted
in the application encoding scheme, which is EBCDIC. Since the column value is
always favored, the comparison table shows ’first’ here, which means that the
hexadecimal value x’C1C2C3’ is converted to ASCII.
5. UX’414243’ is an Unicode constant. The hexadecimal values are interpreted as
Unicode values, that is, it converts to ’ABC’. x’C1C2C3’ is an hexadecimal constant and
is therefore interpreted according to the application encoding scheme, which is EBCDIC
in our example. The string comparison table shows a ’1’, which means that in case of
different encoding schemes, both operands are converted to Unicode.
5-106 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
String Comparison (2 of 2)
second operand
SBCS Data Mixed Data DBCS Data
first operand
(1)
SBCS Data N/A second second
(1)
Mixed Data first N/A second
DBCS Data first first N/A
Notes:
In the previous example we looked at the general encoding scheme of the data which is a
candidate for conversion. In addition to that, conversion can also force the data to be
changed, for example, from SBCS to mixed, or DBCS data.
The table above shows which operand is selected for conversion when comparing
operands with different SBCS, mixed, or DBCS attributes.
As shown in the very beginning of this topic, the length of your result data may change if a
statement contains multiple CCSID sets. The string then becomes a varying-length string.
That is, the data type of the string becomes VARCHAR, CLOB, VARGRAPHIC, or
DBCLOB.
Refer to Table 5-7 to determine the result length of CCSID conversion. The “x” in Table 5-7
represents LENGTH (string in bytes).
© Copyright IBM Corp. 2004 Unit 5. Unicode in DB2 for z/OS 5-107
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
Student Notebook
From SBCS Mixed DBCS SBCS Mixed DBCS SBCS Mixed DBCS
CCSID
1 1 1
Mixed X X*1.8 X*2 X X X*2 X X*3 X*2
1. These conversions are not allowed because of the high probability of data loss and getting SQLCODE -332 returned.
Tip: The catalog table SYSIBM.SYSCOLUMNS contains the necessary information for
you to find out which encoding scheme applies to which column and whether it is SBCS,
mixed, or DBCS. You can refer to the CCSID column to find out the encoding scheme.
The FOREIGNKEY column shows which subtype is associated with a specific column.
Note: Column CCSID is new in V8.
5-108 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
CCSIDs in Views
Notes:
In this topic, we discuss how to deal with CCSIDs in viSuews. First, we discuss the case
where the view only contains a single CCSID. Next, we look at views with multiple CCSIDs.
© Copyright IBM Corp. 2004 Unit 5. Unicode in DB2 for z/OS 5-109
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
Student Notebook
If DB2 would not have this information available in SYSIBM.SYSVIEWS, it would interpret
x’C1’ as “Á” (A with accent), which is the ASCII character for code point x’C1’ and therefore
not return any rows.
Note: If you access your tables directly, instead of using the view, as shown in “3”, and
you are using an application encoding scheme that is different from the one in column
APP_ENCODING_CCSID in catalog table SYSIBM.SYSVIEWS, you may receive
different results than when you use the view to select from.
Note also that this can create some interesting challenges, when a view needs to be
dropped and recreated as part of a schema change of any of the underlying tables.
Important: The described situation can lead to differences in results between working
with views, or working with tables directly. If you use views to select your data, you
always receive the exact same result independent of the application encoding scheme
you are using. If you are accessing your tables directly, and use operands for
comparison, which are dependent on the application encoding scheme, such as
hexadecimal constants, it is likely that you receive different results.
5-110 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
Multiple CCSIDs in Views
C1 C2
Table UT1: A
Table ET1: A
Appl. enc. scheme = EBCDIC B B
SCCSID = 37 1 1
2 2
1 CREATE VIEW MULTI_V1(VC1,VC2) AS
SELECT UT1.C1, ET1.C2
FROM UT1, ET1 SYSVIEWS: APP_ENCODING_CCSID => 37
WHERE UT1.C1 = ET1.C2 SYSCOLUMNS: CCSID VC1 => 1208
AND ET1.C1 >= X'C1'; CCSID VC2 => 37
Notes:
If you create a view which refers to tables with multiple CCSIDs, every column of the view
has a specific CCSID associated with it.
Case "1"
View MULTI_V1, in the figure above, refers to a Unicode and an EBCDIC table. In
SYSIBM.SYSCOLUMNS, DB2 stores CCSID 1208 for column VC1 and CCSID 37 for
column VC2.
Case "2"
If you select VC1 and VC2 in conjunction with the hex values of the returned characters,
you can see that the characters keep their encoding schema. The values returned for
column VC1 are represented as Unicode values, whereas the characters in column VC2
are shows as EBCDIC.
Case “3”
The fact that DB2 remembers that both columns are using different encoding schemes, can
also be verified if you look at case "3" in the figure above. On the one hand, DB2 needs to
© Copyright IBM Corp. 2004 Unit 5. Unicode in DB2 for z/OS 5-111
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
Student Notebook
know that the contents of column VC1 are stored in Unicode. On the other hand, if DB2
would not treat the comparison as it does in a multiple CCSID statement — that is,
interpreting the hexadecimal constant x'C1' as EBCDIC, which is the application encoding
scheme, and converting it to Unicode later for the actual comparison — then the SELECT
statement would not have returned a single row. This is true, because none of the
hexadecimal values stored in column VC1 equals x'C1'. As you can see from the visual
above, DB2 returns one result row, because x'C1' has been interpreted as EBCDIC
character 'A'. Prior to the comparison of the values in column VC1, this character is
converted to Unicode, which is x'41'. As you can see from the resulting table in part "2",
x'41' is present in VC1.
5-112 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
ORDER BY with Multiple CCSIDs
Notes:
When talking about SQL statements which contain multiple CCSID sets, there are some
additional things that you must take into account.
© Copyright IBM Corp. 2004 Unit 5. Unicode in DB2 for z/OS 5-113
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
Student Notebook
5-114 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
BETWEEN Predicate and Multiple CCSIDs
Notes:
Generally, the BETWEEN predicate can be decomposed into basic predicates connected
by the AND logical operator, that is, the two statements shown in Example 5-12 are
equivalent in Version 7.
Example 5-12. BETWEEN - Equivalent Statements
is equivalent to:
© Copyright IBM Corp. 2004 Unit 5. Unicode in DB2 for z/OS 5-115
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
Student Notebook
This is also true for V8, as long as you do not have multiple CCSID sets in your SELECT
statements. The logical equivalence is only true when the CCSIDs used in the comparison
are the same between the original predicate and the decomposed one.
In the example on the visual above, the statement joins two tables, an EBCDIC table ET1,
and an ASCII table AT. In order to avoid CCSID conversion conflict on the left-hand side
(LHS) operand, BETWEEN predicates are processed in the following two steps:
1. Evaluate the high and low bound values.
2. Evaluate the result of step 1 with the LHS operator.
In the case of the example shown in the figure, this means that the following evaluation
steps are performed:
1. Evaluate: AT1.SC1 AND UX’00410042’
The CCSID of UX’00410042’ is always Unicode 1200, which is a DBCS.
If you refer to the string comparison tables shown in Figure 5-46, "String Comparison (1
of 2)" on page 5-104 and Figure 5-48, "String Comparison (2 of 2)" on page 5-107, the
value UX’00410042’ is converted to ASCII DBCS.
2. Now the LHS operand, ET1.MC1, which is EBCDIC mixed, is evaluated. This
evaluation ends up in a conversion to the Unicode DBCS CCSID.
As a consequence, for example, the collating sequence that is used for the comparison is
the Unicode collating sequence, which is different from the EBCDIC collating sequence. To
be more concrete, UX’00410042’ has code points x’41’ and x’42’.
Let us now look at the values that are checked if you decompose the BETWEEN predicate.
Refer to the second SQL statement in the figure above. The following values are evaluated
here:
1. WHERE ET1.MC1 >= AT1.SC1
Here we have two columns with different CCSIDs. Following the conversion rules, those
are converted to Unicode mixed CCSIDs.
2. AND ET1.MC1 <= UX’00410042’
UX’00410042’ is converted to EBCDIC DBCS.
Again, as the collating sequence for EBCDIC is different than the one in Unicode, in case 1,
all rows qualify the request, whereas the decomposed statement results in just one row.
The same CCSID consideration applies to the transitive closure rule. If the generated
predicate is a range predicate, that is, the operator is >, >=, <, or <=, and would use a
different CCSID in CCSID conversion, the transitive closure predicate is not generated.
This might lead to a performance degradation.
5-116 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
SET Assignment
Notes:
The SET assignment, VALUES INTO, SET special register, VALUE, and CALL statements
are statements that, prior to Version 8, do not have a table context. Beginning with DB2 V8
ENFM, they are treated as multiple CCSID set statements, which means that the new
CCSID conversion rules apply.
Refer to the visual above:
• In case number 1, DB2 behaves exactly the same as it does in V7, that is, the
application encoding scheme is used to interpret the hexadecimal string.
• In case number 2, following the regular conversion rules, the string constant would
inherit the encoding scheme from the EBCDIC table. That is, x’4142’ would convert to
the appropriate character for the EBCDIC CCSID but not to x’AB’. Since, as mentioned
above, the SET assignment statement is considered to be a multiple CCSID statement,
the string constant is converted according to the application encoding scheme, which is
ASCII in our example. Therefore, x’4142’ is nicely converted to ‘AB’.
© Copyright IBM Corp. 2004 Unit 5. Unicode in DB2 for z/OS 5-117
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
Student Notebook
• In case number 3, apart from the SET assignment statement, the subselect itself is a
multi-CCSID statement. Therefore, the hexadecimal constant is interpreted using the
application encoding scheme.
5-118 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
CCSIDs in Assignment
Notes:
In the evaluation of assignment and cast rules, there is a concept of a source operand and
a target operand. The CCSID set of the target operand determines the CCSID set in
CCSID conversion.
• For assignment, the CCSID value of the target operand is used in the actual
conversion.
• For cast, when DBCS CCSID is cast to SBCS or mixed CCSID, it is always converted to
the mixed CCSID.
Refer to the example shown in the visual. Although the INSERT statement contains
multiple CCSIDs, the values which are inserted into the EBCDIC table ET1 are all
converted to EBCDIC, and then inserted. That is, if you want to insert “SU” into SC2, you
must code SU in ASCII (the application encoding scheme), that is, exactly as it has been
done in the example. Then “SU” is going to be converted to EBCDIC and stored as
x’E2E4’.
© Copyright IBM Corp. 2004 Unit 5. Unicode in DB2 for z/OS 5-119
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
Student Notebook
This is a little bit different for Unicode constants, because those are always interpreted as
Unicode and then, as shown in our example, converted to whatever column GC1 is. If it is
EBCDIC DBCS, then it is converted to that type. If it is mixed, then it is converted to mixed.
5-120 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
V3.1
Student Notebook
Uempty
Multiple CCSID Sets Restrictions
The following view-name must be single CCSID set view
CREATE TABLE LIKE view-name
CREATE GLOBAL TEMPORARY TABLE LIKE view-name
DECLARE GLOBAL TEMPORARY TABLE LIKE view-name
The result columns of the fullselects below must be in a single
CCSID set
DECLARE GLOBAL TEMPORARY TABLE AS (fullselect) WITH NO DATA /
DEFINITION ONLY
CREATE TABLE AS (fullselect) WITH NO DATA / DEFINITION ONLY
ALTER TABLE ADD (fullselect) WITH NO DATA / DEFINITION ONLY
CREATE TABLE AS (fullselect) refreshable-table-options
ALTER TABLE ADD (fullselect) refreshable-table-options
Fullselect SELECT list colums in MQT must be a single CCSID and
must be the same CCSID set as the table space specified in the IN
clause
Notes:
There are some restrictions when using DDL statements with multiple CCSID sets.
Views
You cannot use the following DDL statements, if the columns of the underlying views are
referencing multiple CCSID sets:
• CREATE TABLE LIKE view-name
• CREATE GLOBAL TEMPORARY TABLE LIKE view-name
• DECLARE GLOBAL TEMPORARY TABLE LIKE view-name
This restriction applies, because it is not allowed to have different encoding schemes within
one table. That is, all columns within one table have to be either EBCDIC, ASCII, or
Unicode. The statements fail with an SQLCODE = -873 “ERROR: The statement
referenced data encoded with different encoding schemes or ccsids in an invalid context”.
© Copyright IBM Corp. 2004 Unit 5. Unicode in DB2 for z/OS 5-121
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.
Student Notebook
5-122 DB2 UDB for z/OS V8 Transition © Copyright IBM Corp. 2004
Course materials may not be reproduced in whole or in part without the prior
written permission of IBM.