IBM Db2 13 for zOS Performance Topics_sg248536x
IBM Db2 13 for zOS Performance Topics_sg248536x
Data and AI
Redbooks
IBM Redbooks
January 2023
SG24-8536-00
Note: Before using this information and the product it supports, read the information in “Notices” on
page xi.
Notices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
Trademarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii
Authors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiv
Now you can become a published author, too! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvi
Comments welcome. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii
Stay connected to IBM Redbooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii
Chapter 1. Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Overview of Db2 13 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.1 Key Db2 13 features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 High-level performance expectations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3 Synergy with the IBM Z platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3.1 IBM zSystems hardware leverage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3.2 Synergy with the IBM z/OS operating system. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.4 Db2 for z/OS and its ecosystem performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.4.1 IBM Db2 Analytics Accelerator and IBM Integrated Synchronization . . . . . . . . . 10
1.4.2 IBM Db2 AI for z/OS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.5 How to use this book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Contents v
5.1.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
5.2 Group buffer pool cast-out-related changes in Db2 13 . . . . . . . . . . . . . . . . . . . . . . . . 194
5.2.1 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
5.2.2 Performance measurements. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
5.3 Data sharing with distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
5.3.1 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
5.3.2 Performance measurement. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
5.3.3 Monitoring information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
5.3.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
Chapter 10. Performance enhancements for IDAA for z/OS and IBM Db2 for z/OS Data
Gate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307
10.1 IBM Integrated Synchronization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 308
10.1.1 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 308
10.1.2 Performance measurement. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 308
Contents vii
10.1.3 Monitoring information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311
10.2 Cluster load . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311
10.2.1 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312
10.2.2 Performance measurement. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312
10.2.3 Usage considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313
10.2.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313
10.3 COLJOIN 1 enablement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314
10.3.1 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314
10.3.2 Performance measurement. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314
10.3.3 Usage considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315
10.4 Direct I/O for IBM Db2 Analytics Accelerator on IBM Z. . . . . . . . . . . . . . . . . . . . . . . 315
10.4.1 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316
10.4.2 Performance observations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316
10.5 Automatic statistics collection and collection improvement. . . . . . . . . . . . . . . . . . . . 316
10.5.1 Early statistics collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316
10.5.2 Copy Table Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317
10.5.3 Explicit RUNSTATS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319
10.5.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321
10.6 Query workload comparison between Db2 12 and Db2 13 for z/OS. . . . . . . . . . . . . 321
10.6.1 Performance measurement. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321
10.7 Performance impact of AT-TLS enablement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322
10.7.1 AT-TLS configuration for IBM Integrated Analytics Accelerator . . . . . . . . . . . . 322
10.7.2 Results and usage considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322
10.8 Performance overview for IBM Db2 for z/OS Data Gate. . . . . . . . . . . . . . . . . . . . . . 323
10.8.1 IBM Db2 for z/OS Data Gate deployment options . . . . . . . . . . . . . . . . . . . . . . 323
10.8.2 Performance test strategies and results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325
10.8.3 Performance measurement: Load throughput and synchronization latency . . . 325
10.8.4 Performance measurement: Query. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 328
10.8.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329
Chapter 13. IBM Db2 AI for z/OS benefits and capacity requirement. . . . . . . . . . . . . 365
13.1 How Db2ZAI helps reduce application cost: SQL optimization . . . . . . . . . . . . . . . . . 366
13.2 How Db2ZAI helps with Db2 subsystem management: System assessment and
performance insights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371
13.2.1 System assessment and performance insights use case example. . . . . . . . . . 372
13.2.2 System assessment in-depth recommendations . . . . . . . . . . . . . . . . . . . . . . . 375
13.3 How Db2ZAI helps manage inbound network traffic: DCC . . . . . . . . . . . . . . . . . . . . 376
13.3.1 Db2ZAI DCC use case: Setting up profiles to monitor and control distributed
connections. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 377
13.3.2 Db2ZAI 1.5 DCC use case: Visualizing distributed workload activities with the
dashboard. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 380
13.4 Summary of capacity requirements and general recommendations . . . . . . . . . . . . . 385
13.5 Detailed capacity evaluations for Db2ZAI 1.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 387
13.5.1 High-level capacity evaluation results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 390
13.5.2 Detailed capacity evaluation results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393
13.6 Monitoring information. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 401
Contents ix
Related publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 427
IBM Redbooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 427
Online resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 427
Help from IBM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 428
This information was developed for products and services offered in the US. This material might be available
from IBM in other languages. However, you may be required to own a copy of the product or product version in
that language in order to access it.
IBM may not offer the products, services, or features discussed in this document in other countries. Consult
your local IBM representative for information on the products and services currently available in your area. Any
reference to an IBM product, program, or service is not intended to state or imply that only that IBM product,
program, or service may be used. Any functionally equivalent product, program, or service that does not
infringe any IBM intellectual property right may be used instead. However, it is the user’s responsibility to
evaluate and verify the operation of any non-IBM product, program, or service.
IBM may have patents or pending patent applications covering subject matter described in this document. The
furnishing of this document does not grant you any license to these patents. You can send license inquiries, in
writing, to:
IBM Director of Licensing, IBM Corporation, North Castle Drive, MD-NC119, Armonk, NY 10504-1785, US
This information could include technical inaccuracies or typographical errors. Changes are periodically made
to the information herein; these changes will be incorporated in new editions of the publication. IBM may make
improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time
without notice.
Any references in this information to non-IBM websites are provided for convenience only and do not in any
manner serve as an endorsement of those websites. The materials at those websites are not part of the
materials for this IBM product and use of those websites is at your own risk.
IBM may use or distribute any of the information you provide in any way it believes appropriate without
incurring any obligation to you.
The performance data and client examples cited are presented for illustrative purposes only. Actual
performance results may vary depending on specific configurations and operating conditions.
Information concerning non-IBM products was obtained from the suppliers of those products, their published
announcements or other publicly available sources. IBM has not tested those products and cannot confirm the
accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the
capabilities of non-IBM products should be addressed to the suppliers of those products.
Statements regarding IBM’s future direction or intent are subject to change or withdrawal without notice, and
represent goals and objectives only.
This information contains examples of data and reports used in daily business operations. To illustrate them
as completely as possible, the examples include the names of individuals, companies, brands, and products.
All of these names are fictitious and any similarity to actual people or business enterprises is entirely
coincidental.
COPYRIGHT LICENSE:
This information contains sample application programs in source language, which illustrate programming
techniques on various operating platforms. You may copy, modify, and distribute these sample programs in
any form without payment to IBM, for the purposes of developing, using, marketing or distributing application
programs conforming to the application programming interface for the operating platform for which the sample
programs are written. These examples have not been thoroughly tested under all conditions. IBM, therefore,
cannot guarantee or imply reliability, serviceability, or function of these programs. The sample programs are
provided “AS IS”, without warranty of any kind. IBM shall not be liable for any damages arising out of your use
of the sample programs.
The following terms are trademarks or registered trademarks of International Business Machines Corporation,
and might also be trademarks or registered trademarks in other countries.
AIX® IBM Research® Redbooks®
CICS® IBM Spectrum® Redbooks (logo) ®
Cognos® IBM Watson® Tivoli®
Db2® IBM Z® WebSphere®
DS8000® IBM z13® z/OS®
FICON® IBM z14® z13®
GDPS® IBM z16™ z15®
HyperSwap® InfoSphere® z16™
IBM® Netezza® zEnterprise®
IBM Cloud® OMEGAMON®
IBM Cloud Pak® RACF®
Intel, Intel logo, Intel Inside logo, and Intel Centrino logo are trademarks or registered trademarks of Intel
Corporation or its subsidiaries in the United States and other countries.
The registered trademark Linux® is used pursuant to a sublicense from the Linux Foundation, the exclusive
licensee of Linus Torvalds, owner of the mark on a worldwide basis.
Windows, and the Windows logo are trademarks of Microsoft Corporation in the United States, other
countries, or both.
Java, and all Java-based trademarks and logos are trademarks or registered trademarks of Oracle and/or its
affiliates.
Red Hat and OpenShift are trademarks or registered trademarks of Red Hat, Inc. or its subsidiaries in the
United States and other countries.
UNIX is a registered trademark of The Open Group in the United States and other countries.
Other company, product, or service names may be trademarks or service marks of others.
IBM® Db2® 13 for z/OS delivers innovations that can improve your ability to make
well-informed business decisions. As the industry's first relational database that integrates
artificial intelligence (AI) into SQL queries, Db2 13 combines deep-learning capabilities with
advanced IBM Z® technologies to reveal hidden relationships across tables and views in your
Db2 data.
As with earlier releases of Db2, Db2 13 continues to enhance availability, scalability, security,
and resiliency, and applies optimization to improve the performance of your operational
processes. These improvements include the following ones:
New business insights without complex AI application deployment
Scalability and performance improvements through smarter optimization without tuning
actions
Highly available (HA) system management with greater resiliency and flexibility
Greater insights into managing complex enterprise systems
Db2 13 is built on the continuous development delivery model that was introduced in Db2 12
for z/OS, and it continues to leverage a tight synergy between IBM zSystems hardware and
IBM z/OS®.
This IBM Redbooks® publication provides performance comparisons between Db2 13 and
Db2 12, including high-level performance expectations when you migrate to Db2 13. It also
provides measurement results and usage considerations for using the new capabilities that
are delivered in Db2 13. Because Db2 13 is built on Db2 12 and the functions that were
delivered through the continuous delivery stream, measurements are provided for the key
performance features that were delivered in Db2 12 and the Db2 ecosystem since IBM Db2
12 for z/OS Performance Topics, SG24-8404 was published in September 2017. These
features include various IBM Z platform synergy items and new features that were added in
IBM Db2 Analytics Accelerator (IDAA), IBM Db2 for z/OS Data Gate, and IBM Db2 AI for z/OS
(Db2ZAI).
The performance measurements in this book were generated at the IBM Silicon Valley
Laboratory under specific and tightly controlled conditions. Your results are likely to vary due
to differences in your conditions and workloads.
For the purposes of this publication, it is assumed that you are familiar with the performance
aspects of Db2 for z/OS. For more information about the functions that were delivered in Db2
13, see the Db2 13 for /OS and IBM Db2 13 for z/OS and More, SG24-8527.
Neena Cherian is a member of the Db2 for z/OS Performance team at the IBM Silicon Valley
Laboratory. She has been with IBM for over 30 years, with 14 years that have been devoted to
Db2 for z/OS performance. Her areas of expertise include Online Transaction Processing
(OLTP), direct access storage device (DASD) storage-related system performance, and SQL
Data Insights.
Nguyen Dao is a member of the Db2 for z/OS performance team at the IBM Silicon Valley
Laboratory. He has been with IBM for 22 years, 13 of which have been spent working on Db2
for z/OS performance. His areas of expertise include system-level and distributed
environment performance.
Reynaldo Fajardo is a Software Engineer at the IBM Silicon Valley Laboratory. He has a year
of experience in Db2 for z/OS performance. He holds a degree in Computer Science from the
University of California, Santa Cruz. His areas of expertise include query performance and
analysis for the IDAA.
Akiko Hoshikawa is a Distinguished Engineer in IBM Data and AI at the IBM Silicon Valley
Laboratory. She is a Db2 for z/OS architect with overall responsibility for the performance and
strategic integration of AI within Db2, which includes SQL Data Insights and many other
optimizations in Db2. She does hands-on work with customers to provide performance
evaluation, tuning, benchmarks, and to design performance features to solve customer
challenges. Akiko has been involved with or written IBM Redbooks publications about Db2 for
z/OS Performance Topics since 1998.
Peng Huang is a member of the Db2 for z/OS performance team at the IBM China
Development Laboratory. She has been working for the IBM Db2 for z/OS team for 10 years,
7 of which have been focused on Db2 for z/OS performance in the area of query optimization.
Maggie Ou Jin is a member of the Db2 for z/OS performance team at the IBM Silicon Valley
Laboratory. She has been with IBM for 20 years. Her areas of expertise include Db2 utilities,
IDAA performance, and query optimization.
Ping Liang is a member of the Db2 for z/OS performance department at the IBM China
Development Laboratory. He has been working for IBM Db2 for z/OS team for 16 years, with
eight years that have been devoted to Db2 for z/OS performance. His areas of expertise
include application optimization, system performance, and spring-boot application design and
development.
Jie Ling is a member of the Db2 for z/OS performance team at the IBM China Development
Laboratory. She has been with IBM for 14 years, 7 of which have been focused on Db2 for
z/OS performance. Her areas of expertise include query optimization and high insert
performance.
Todd Munk is a Senior Software Engineer at the IBM Silicon Valley Laboratory. He has been
with IBM for 27 years with the Db2 for z/OS Performance team. His areas of expertise include
distributed connectivity and stored procedures performance. He is spending much of his time
on Scala back-end development for IBM Db2ZAI.
Jasmi Thekveli is a member of the SAP on IBM Z performance team that is based in
Poughkeepsie, NY. She has been with IBM for 20 years, with over 2 years in SAP on IBM Z
performance, and 8 years with the Db2 for z/OS performance team. Her areas of expertise
include SAP performance on Db2 for z/OS, IDAA performance, and query optimization.
Lingyun Wang is a member of the Db2 for z/OS performance team at the IBM Silicon Valley
Laboratory. She has been with IBM for 18 years, 15 of which have been spent working on Db2
for z/OS performance. Her areas of expertise include query optimization, IDAA performance,
and replication.
Chung Wu is a member of the Db2 for z/OS performance team at the IBM Silicon Valley
Laboratory. He has been with IBM for more than 30 years, with over 20 years that have been
devoted to Db2 for z/OS performance. His areas of expertise include system-level
performance, and OLTP.
Chongchen Xu is a member of the Db2 for z/OS performance team at the IBM Silicon Valley
Laboratory. He has been with IBM Db2 for z/OS for 17 years. His areas of expertise include
Db2 performance regression, batch processing, and OLTP performance.
Huiya Zhang is a Performance Engineer at the IBM Silicon Valley Laboratory. Before joining
the Db2 for z/OS performance development team, she had more than 10 years of experience
working in the Db2 for z/OS Client Success team with a focus on Db2 Relational Data System
(RDS).
Xiao Wei Zhang is a Senior Software Engineer at the IBM China Development Laboratory.
He has over 15 years of experience with database development and performance at IBM.
Xue Lian Zhang is a member of the Db2 for z/OS performance team at the IBM China
Development Laboratory. She started her mainframe career in the finance industry in 2007
and joined IBM in 2012. She has been working on Db2 for z/OS since 2016 and focuses on
the OLTP areas.
Special thanks to the following people for their unique contributions to this publication and the
overall project:
Leilei Li and Huiya Zhang for leading teams across organizations and geographies to
ensure the timely delivery of this publication.
Akiko Hoshikawa and Bart Steegmans for their generous technical guidance throughout
the entire writing and reviewing process.
Eric Radzinski and Paul McWilliams for providing technical editing, writing guidance, and
improving the format and style of this publication.
Steven Brazil and Mo Townsend for their management support.
Preface xv
Thanks to the following people for their contributions to this project:
Gayathiri Chandran, Rick Chang, Julie Chen, Ying-lin Chen, Tammie Dang, Thomas Eng,
Sarbinder Kallar, Tran Lam, Allan Lebovitz, John Lyle, Manvendra Mishra, Randy Nakagawa,
Steven Ou, Sharon Roeder, Michael Shadduck, Tom Toomire, Frances Villafuerte, Bituin
Vizconde, Sherry Yang
IBM Db2 for z/OS Development
Craig Friske, Koshy John, Laura Kunioka-Weis, Patrick Malone, Ka-Chun Ng,
IBM Db2 for z/OS Utilities - Rocket Software
Christian Michel
IBM Db2 Analytics Accelerator
Pooja Bhargava, Brian Lee, Nicholas Marion, Jose Neves, Dale Riedy, Guo Ran Sun, Dave
Surman
IBM zSystems Development
Rajesh Bordawekar
IBM Research®
Find out more about the residency program, browse the residency index, and apply online at:
ibm.com/redbooks/residencies.html
We want our books to be as helpful as possible. Send us your comments about this book or
other IBM Redbooks publications in one of the following ways:
Use the online Contact us review Redbooks form found at:
ibm.com/redbooks
Send your comments in an email to:
[email protected]
Mail your comments to:
IBM Corporation, IBM Redbooks
Dept. HYTD Mail Station P099
2455 South Road
Poughkeepsie, NY 12601-5400
Preface xvii
xviii IBM Db2 13 for z/OS Performance Topics
1
Chapter 1. Introduction
Scalability and performance continue to be important themes in IBM Db2 for z/OS and its
ecosystem. Performance optimization in Db2 13 for z/OS is built on continuous delivery of
Db2 12 for z/OS and includes in-memory optimization and autonomic learning behavior
based on prior executions.
This chapter provides an overview of the key features of Db2 13 and describes the Db2 13
performance improvements and expectations (based on laboratory measurements) for
different types of workloads when you migrate to Db2 13 from Db2 12. You can realize many
performance benefits, including potential CPU or cost reductions, by migrating to Db2 13 and
rebinding applications, but other enhancements might require extra actions or memory
allocation to leverage them. A summary of IBM Z platform synergy and key updates in
ecosystem products is also provided.
Db2 13 is built on continuous development and delivery that was introduced in Db2 12 for
z/OS and continues to leverage a tight synergy with both IBM zSystems hardware and z/OS.
The Db2 ecosystem includes IBM zSystems hardware synergy and extra features and
components. It includes IBM Db2 Analytics Accelerator (IDAA) and IBM Db2 for z/OS Data
Gate, both of which are tightly integrated with Db2 for z/OS to support business-critical
reporting and analytic workloads with speed. It also includes IBM Db2 AI for z/OS (Db2ZAI) to
help improve the operational performance and maintain the health of Db2 systems by using
AI capability.
This publication provides performance comparisons of Db2 13 against Db2 12 for various
types of workloads when you migrate to Db2 13. It also provides measurement results for the
new features and considerations for using these features.
Because Db2 13 is built on Db2 12 continuous delivery, it also includes measurements for the
key performance features that were delivered in the Db2 12 continuous delivery stream and
the Db2 ecosystem since IBM Db2 12 for z/OS Performance Topics, SG24-8404 was
published in September 2017. These features include IBM Z platform synergy items, such as
the following ones:
The leveraging of group buffer pool (GBP) residency time in IBM z16™
Coupling Facility (CF) scalability and coupling short link improvement in IBM z16
System Recovery Boost (SRB) in IBM z15® and later
IBM Integrated Accelerator for IBM Z Sort (Z Sort) in IBM z15 and later
Large object (LOB) compression improvement in IBM z15 and later
Huffman compression in IBM z14® and later
Asynchronous cross-invalidation in IBM z14 and later
IBM zHyperLink in IBM z14 and later
Transparent data set encryption improvement in IBM z14 and later
Most of the Db2 instrumentation data, including accounting, statistics, and performance
traces, were formatted by using IBM Tivoli® OMEGAMON® for Db2 Performance Expert on
z/OS 5.5.0.
The performance measurements in this book were generated primarily by members of the
Db2 for z/OS Performance Team by using the z/OS system environment at the IBM Silicon
Valley Laboratory. To illustrate the performance characteristics and usage considerations of
Db2 12 and Db2 13, the performance measurements were conducted in a highly controlled
environment. Although we strive to provide the best comparison between the baseline and
new features, performance results depend on many variables, including the measurement
environment, configuration setting, conditions, and workload characteristics that are used. As
with any performance discussions, your results can vary.
The SQL Data Insights user interface and training services provide a quick way to trigger a
simple training process, referred to as Enable AI Query, against Db2 tables and views. After
training is complete, you can invoke three AI functions (AI_SIMILIARY, AI_SEMANTIC_CLUSTER,
and AI_ANALOGY) from any SQL interface to perform semantic queries based on Natural
Language Processing to discover the relationship between the relational rows and columns.
Both the training and query processes leverage IBM zSystems Integrated Information
Processors (zIIPs) and invoke the IBM Z Deep Neural Network Library (zDNN) to leverage
IBM zSystems hardware optimization to speed up the operations. Because the capability to
embed the semantic query into SQL did not exist previously, the performance evaluation of
this feature focuses on the scalability and factors that influence the performance of the Enable
AI Query function and AI query execution.
Chapter 1. Introduction 3
Since the introduction of Db2 12, both through the continuous delivery stream and in Db2 13,
fast index traversal has been enhanced to increase the candidate indexes that are eligible for
FTB. We started by extending FTB support to unique indexes with a maximum key length of
64 bytes. Later, we improved how indexes with “include” columns are stored in FTBs, and
added support for non-unique indexes through the FTB_NON_UNIQUE_INDEX subsystem
parameter in V12R1M508 (function level (FL) 508). To control the usage of FTB features to
specific indexes, a new option was added to define these selective indexes.
In Db2 13, non-unique indexes become FTB candidates by default. In Db2 13 FL 500, FTB
support is further extended to indexes with up to a 128-byte key length for unique indexes and
120 bytes for non-unique indexes.
The optimization through FTBs is effective for workloads that access databases through
indexes in a random key order. Therefore, the FTB feature tends to benefit Online Transaction
Processing (OLTP) workloads with many simple lookups or random key inserts or updates.
For workloads that use sequential index access, an optimization that is called index
look-aside can be used to improve performance. When Db2 determines that index access is
performed sequentially, Db2 caches index leaf pages and uses the cached pages to avoid
repeated access to the same pages. Db2 13 enhanced the triggering decision for the index
look-aside feature for INSERT, UPDATE, and DELETE based on the real-time execution statistics
instead of catalog statistics.
With these two optimizations in Db2 13, index access in both sequential and random order
should see an excellent performance improvement.
The scalability of partition-by-range (PBR) table spaces that use relative page numbering
(RPN) is enhanced to reduce global contention when you use row-level locking in a
data-sharing environment. With this enhancement, you benefit from more flexible and
scalable partition management.
Db2 13 uses a z/OS 2.5 enhancement that moved a portion of data-set-related control blocks
from 31-bit memory to 64-bit memory. As a result, you can keep more Db2 data sets open
concurrently. Furthermore, you can leverage the new allocation option of z/OS scheduler
work blocks (SWBs) in Db2 13 and further increase the number of data sets that you can
open concurrently per Db2 subsystem.
In addition to application-level lock concurrency, a new profile RELEASE option was introduced
to control the package release behavior for both local and remote applications. This
enhancement increases the likelihood of Data Definition Language (DDL) break-in or
maintenance operations, and reduces the impact to other running applications.
Availability improvements
Db2 12 for z/OS at FL 508 delivered the online conversion from a multi-table table space to
PBG table spaces to reduce application outages and simplify the conversion process for
DBAs.
To improve availability further, Db2 13 delivers online conversion from a PBG to a PBR table
space. Db2 13 also eliminates the disruption when deleting the active logs from the bootstrap
data set (BSDS). You can use the new -SET LOG REMOVELOG command to delete unneeded
active log data sets without stopping Db2 subsystems.
The maximum size of Db2 directory table spaces SPT01 and SYSLGRNX was increased to
256 GB to allow more Db2 applications and data sets activities and reduce the need for
frequent REORG statements after Db2 directory clean-ups.
Simplified migration
The process of migrating to Db2 13 is related to the Db2 12 continuous delivery process,
which minimizes the disruptions from the upgrade process as much as possible. In Db2 13,
the functional upgrade and catalog upgrade are controlled separately. You can migrate to
Db2 13 without catalog updates and start taking advantage of the enhancements
immediately.
Continuous compliance
In a hybrid cloud environment, providing evidence for security compliance becomes an
important task for many Db2 customers. Db2 13 is enhanced to generate System
Management Facility (SMF) type 1154 trace records for the recommended system security
parameter settings. This information is used by the IBM Z Security and Compliance Center.
Db2 13 with z/OS 2.5 caches successful execution of plan authorization checks when using
the access control authorization exit (DSNX@XAC). In addition, the way information is stored
in the plan authorization cache is enhanced to allow more entries to be stored for the same
plan authorization cache size. These enhancements reduce the performance impact of plan
authorization checking.
The default statistics interval was changed from 60 to 10 seconds through the STATIME_MAIN
subsystem parameter to record more granular system-level performance information.
Chapter 1. Introduction 5
Db2 statistics trace information now includes information about the residency time for pages
in the GBP in the CF with the IBM z16™, Coupling Facility Control Code Level (CFCC Level)
25 and later. You can use this information to adjust the size of GBP.
A new exception trace record, Instrumentation Facility Component Identifier (IFCID) 396, was
added to the Db2 statistics to report on elongated index split operations. Combined with
several new fields that are related to index split operations in RTS, you can easily identify the
objects that are impacted by index split processing and possibly address the issue.
Distributed connections and thread monitoring are also enhanced. The Distributed
Connection Control (DCC) feature of Db2ZAI 1.5 uses this new information to filter and
control the distributed threads thresholds at the client application level or client user ID level.
Accounting trace (IFCID 3) provides new serviceability fields to record the longest lock or
latch suspension and the resource for which the thread is waiting. This enhancement speeds
up the analysis of slowdowns when the thread is holding or waiting for certain resources.
Utility history
Monitoring utility run times becomes easier and simpler by using the new Db2 13 utility history
feature that collects essential information about Db2 utility run times in real time. Before
Db2 13, monitoring and analyzing utility run times was challenging and cumbersome because
the job logs from each run time were the main source of analysis. Utility processes write Db2
accounting trace information, but critical information such as the type of utilities that are
running is missing. By using the new utility history feature, you can monitor utility run times for
failures or delays and take appropriate corrective actions. The historical information can be
used to tune or balance utility workloads.
At a high level, compared to the same workload running on Db2 12 with default subsystem
parameters (other than the parameters that are related to the cache/pool), you might see the
following improvements:
OLTP: Equivalent, or up to a 5% CPU reduction by using the default subsystem
parameters
Relatively complex set of queries: Equivalent
Update intensive batch processing: Equivalent, or up to a 10% CPU reduction
Utility: Equivalent for most utilities, and up to a 60% CPU reduction for online REORG by
using default subsystem parameters
Querying workloads
Earlier Db2 releases introduced major improvements in access path optimization for complex
queries. In Db2 13, we expect no significant access path changes when you migrate to
Db2 13. A few optimizations are done in a Db2 Relational Data System (RDS) sort that do not
influence the access path, as described in Chapter 8, “Query performance” on page 271.
Unless you have sort-intensive queries, we expect equivalent performance between Db2 12
and Db2 13 with or without rebinding.
Update-intensive workloads
We expect noticeable improvement for sequential batch insert, delete, or update processes
from the index look-aside improvements. The batch jobs that trigger random access to the
index access might benefit from the expanded fast index traversal feature.
If the batch involves inserts into PBG table spaces that cross partitions, you might see a
reduction in getpage and corresponding CPU time reduction, as described in Chapter 7,
“Application concurrency” on page 241.
Db2 utilities
We observed improvement with REORG INDEX because Db2 automatically enables the
NOSYSUT1 feature in Db2 13. As described in Chapter 9, “IBM Db2 for z/OS utilities” on
page 279, the NOSYSUT1 enhancement can provide elapsed and CPU time reduction by
avoiding the usage of the SYSUT1 data set for REORG INDEX operations. This feature was
introduced in Db2 12, but is disabled by default.
If you are using inline statistics for REORG or LOAD, page-level sampling is supported and
enabled based on the default value of the STTPGSAMP subsystem parameter in Db2 13. The
default value of SYSTEM triggers page sampling that can result in performance improvement.
Chapter 1. Introduction 7
IBM z16
The IBM z16 brings AI and cyber resiliency to your hybrid cloud. In addition, the CF has
several important improvements that affect Db2 performance in a data-sharing environment.
The following list describes the key features from IBM z16 that Db2 leverages explicitly or
implicitly:
Improved single instruction, multiple data (SIMD) and on-chip AI accelerator
SQL Data Insights in Db2 13 takes advantage of improved AI capability in IBM z16. SQL
Data Insights training operations use the SIMD capability on IBM zSystems processors for
vector processing. Db2 AI queries invoke the IBM Z AI Optimization library (zAIO) and
IBM z/OS OpenBLAS libraries to leverage AI optimization between SIMD and on-chip
accelerator based on the request.
GBP residency time
CFCC level 25, which was delivered in IBM z16, reports new cache residency times to its
clients. These metrics show how long the data and directory entries remain in the cache
structure. This information can be used as guidance to tune the GBP size. Db2 13
generates the information in the Db2 statistics trace records.
CF scalability and coupling short link improvement
IBM z16 CF images can scale higher than CF images on IBM z15. Extra performance
improvements with CF short link (CS5) can produce CPU reduction for Db2 data-sharing
configurations.
SRB enhancements on z16 support the middleware restart boost in addition to initial
program load (IPL) boost. Db2 can leverage the middleware restart boost.
IBM z15
The IBM z15 delivers resiliency and performance. Db2 12 and Db2 13 for z/OS take
advantages of the following features that are delivered in IBM z15:
The IBM Integrated Accelerator for IBM Z Sort (Z Sort) feature is added in IBM z15 and
later to speed up sort-intensive workloads. Both z/OS DFSORT and IBM Db2® Sort for
z/OS leverage this feature, and the Db2 REORG utility can leverage it. Separately, Db2 RDS
sort (ORDER BY and GROUP BY) also leverages Z Sort through SORT LISTS (SORTL)
instructions starting with Db2 12 and later.
DEFLATE-Conversion Facility (on chip compression) on IBM z15 replaces
IBM zEnterprise® Data Compression (zEDC) Express. It is used by Db2 LOB
compression with improvements in performance and scalability compared to zEDC
Express.
SRB at IPL offers extra capacity during system restart, including Db2 restart during the
boost periods.
Encrypted Diagnostic Data on IBM z15 provides encryption of diagnostic information.
Db2 12 and later takes advantage of this feature to encrypt buffer pool contents in a Db2
dump.
Chapter 1. Introduction 9
1.4.1 IBM Db2 Analytics Accelerator and IBM Integrated Synchronization
IBM Integrated Synchronization delivers a replication technology that is used by IDAA.
Integrated Synchronization offers a performance advantage by reducing both the replication
latency and CPU time during the log read operations. The same technology is used by IBM
Db2 for z/OS Data Gate to replicate and synchronize Db2 for z/OS data to the target Db2 or
Db2 Warehouse tables in the cloud in near real time. For more information about IDAA and
IBM Db2 for z/OS Data Gate performance, see Chapter 10, “Performance enhancements for
IDAA for z/OS and IBM Db2 for z/OS Data Gate” on page 307.
For more information about Db2ZAI and its capacity requirements, see Chapter 13, “IBM Db2
AI for z/OS benefits and capacity requirement” on page 365.
The performance measurements in this book are meant to show how the various
enhancements in Db2 13 and its ecosystem perform. The topics that are covered are not
limited only to performance enhancements. This book also describes usage considerations
that were observed through the measurements, reasonable expectations for the
enhancements, and a general understanding of the new functions.
All measurements were conducted in a laboratory environment, and as such they might be
atypical because they were run in a dedicated and controlled environment with no other
workloads that could potentially cause resource contentions, and in some cases focus on
specific functions. Extrapolation and generalization require judgment.
The focus of the book is performance, and it does not cover all the features that are available
in Db2 13. For a technical overview of Db2 13, see IBM Db2 13 for z/OS and More,
SG24-8527.
Chapter 1. Introduction 11
Chapter 13, “IBM Db2 AI for z/OS benefits and capacity requirement” on page 365
This chapter describes how Db2ZAI can improve operational efficiency, and the Db2ZAI
capacity requirement.
Appendix A, “IBM Db2 workloads” on page 403
This appendix describes the details of the workloads that are used to evaluate Db2
performance.
Appendix B, “Artificial intelligence semantic queries” on page 409
This appendix covers the AI semantic queries that run in the evaluation that is described in
Chapter 6, “SQL Data Insights” on page 211.
Appendix C, “IBM Db2 Analytics Accelerator for z/OS workloads” on page 421
This appendix covers the workloads that run in the evaluation that is described in
Chapter 10, “Performance enhancements for IDAA for z/OS and IBM Db2 for z/OS Data
Gate” on page 307.
As the number of concurrently running threads increase, the system can run low on BTB
storage. Therefore, moving agent local BTB storage to an above-the-bar (ATB) storage area
lifts these local agent storage constraints, allowing Db2 subsystems to handle a larger
number of concurrent users so that you no longer need to worry about application or system
outages due to a BTB storage shortage.
In Db2 13, the SQL statement text and attribute string for PREPARE and EXECUTE IMMEDIATE are
stored in agent-local private ATB storage to help support more concurrent threads. The
amount of storage that is allocated for the SQL statement text is based on its length. For any
specific thread, multiple dynamic SQL statements can be running, depending on the nesting
level. Although the maximum SQL statement length is 2 MB, much more storage can be
allocated in ATB storage. This enhancement ensures that Db2 remains available, reliable, and
resilient.
Besides the local and distributed dynamic and static workloads (WL1), two special workloads
also were used. The first one, WL5, is a modified version of the distributed dynamic SQL
workload (WL1) that uses a large SQL statement text. The second one, WL6, is a modified
version of the native stored procedure workload (WL2) that uses a large variable (2 MB) to
evaluate the benefit of this feature.
The main metrics that were used were the CPU time per commit, internal throughput rate
(ITR), and BTB storage usage of the Db2 DBM1 and DIST address spaces.
Table 2-1 on page 15 presents the percentage difference between Db2 12 and Db2 13 for
these six workloads.
Total CPU per commit 0.00 0.00 0.07 -0.55 1.68 -18.92
Total DBM1 Storage BTB -1.99 -1.56 -1.73 0.91 -96.62 -96.07
Total DIST Storage BTB -2.33 0.00 0.00 0.00 -0.64 -20.86
2.1.2 Conclusion
By moving the SQL statement text and attribute string to ATB storage, DBM1 BTB storage
usage on Db2 13 is reduced. The larger the SQL statement text, or the larger the variable in a
stored procedure to store the SQL text, the more DBM1 storage reduction can be expected.
With this feature, you can scale up the number of application threads and have more
concurrent threads without having to worry about storage resources.
Measurement environment
The workload was run by using the following system configuration:
IBM z15 LPAR with three general processors, two IBM zSystems Integrated Information
Processors (zIIPs), and 32 GB of memory.
z/OS 2.5.
Db2 non-data-sharing environment running the following levels of Db2:
– Db2 12 function level (FL) 510
– Db2 13 FL 501
Db2 statistics trace class (*) and monitor trace for Instrumentation Facility Component
Identifier (IFCID) (124) are turned on.
Measurement results
The workload shows a consistent increase in real storage and thread storage as the number
of concurrent threads increases. In the tested workload, Db2 12 uses only a small amount of
ECSA, so the reduction in Db2 13 was minimal.
2.2.2 Conclusion
By moving some IFC structures and DDF storage from ECSA to HVCOMMON and ATB,
Db2 13 offers some reduction of BTB storage usage. The storage that is allocated for IFI
buffers to monitor Db2 performance is moved outside of ECSA for the customers with multiple
online monitoring products.
In Db2 13, the storage manager component monitors the virtual storage usage by the Db2
subsystem in BTB private storage areas and in ECSA. When certain thresholds are
exceeded, a storage management daemon starts contracting the corresponding storage
pools until the amount of virtual storage that is allocated falls below the thresholds. Efficient
BTB and ECSA storage contraction helps decrease the likelihood of Db2 subsystems hitting
storage limitations and allows more concurrent threads to run in a Db2 subsystem.
In Db2 12, the default setting of the REALSTORAGE_MANAGEMENT subsystem parameter is AUTO.
With this setting, Db2 issues IARV64 REQUEST=DISCARDDATA requests to contract 64-bit thread
storage in the following situations:
At thread deallocation
If the thread is reused, every 120 commits during normal operation
After 30 commits, if the number of available real storage frames of the LPAR are low and
the Db2 subsystem is running in “contraction mode”
For an explanation of Db2 12 real storage management behavior in detail, and best practices
for using the REALSTORAGE_MANAGEMENT system parameter, see Db2 Real Storage
Management and Recommendations.
In Db2 13, the REALSTORAGE_MANAGEMENT subsystem parameter is removed. Db2 13 does not
issue the IARV64 REQUEST=DISCARRDATA request during thread deallocation or after any
specific number of commits for threads that are reused. Instead, Db2 keeps the 64-bit storage
that is used by the terminated threads and reuses it when other threads need 64-bit storage.
This change helps Db2 to avoid causing large bursts of concurrent DISCARDDATA requests. To
return unused ATB thread storage to z/OS in Db2 13, a timer-controlled storage contraction
process issues DISCARDDATA requests and releases unused real frames back to the operating
system periodically.
With this new way of handling ATB thread storage in Db2 13, there should be fewer
DISCARDDATA requests because of the more efficient storage reuse. With the expectations of
fewer DISCARDDATA requests, the Db2 ssnmMSTR address space is expected to show some
CPU savings. With proper storage reuse and centrally controlled storage contractions,
Db2 13 should not consume much more ATB real storage than Db2 12 running with
REALSTORAGE_MANAGEMENT=AUTO.
2.3.1 Requirements
All the storage management changes take effect when you start running with Db2 13 code
without dependencies on FLs.
Measurement environment
The workload was run by using the following system configuration:
IBM z14 LPAR with 16 general processors, four zIIPs, and 512 GB of memory
z/OS 2.5
Coupling Facility (CF): Two CFs at CF LEVEL 23, with three dedicated CPUs each
Two-way Db2 data-sharing environment with two members running on the same LPAR
running the following Db2 levels:
– Db2 12 FL 510
– Db2 13 FL 100
Figure 2-1 Storage manager above-the-bar contraction changes: key measurements metrics
With the significant reduction in DISCARDDATA operations, the ssnmMSTR address space
SRB CPU time was reduced 11 - 12% when comparing the Db2 13 measurement to the
Db2 12 REALSTORAGE_MANAGEMENT=AUTO measurement.
Compared to the Db2 12 REALSTORAGE_MANAMGEMENT=OFF measurement, which does not
perform DISCARDDATA at all, the Db2 13 measurement’s ssnmMSTR SRB time increased
by approximately 1 - 4%. Because the ssnmMSTR SRB time is typically a small portion of
the total CPU time that is used by a Db2 workload, the overall impact of Db2 12
REALSTORAGE_MANAMGEMENT=OFF users should be minor.
Note: Because the LPAR real storage is not under stress, none of the three
measurements trigger KEEPREAL=NO DISCARDDATA operations.
Without IFCID 503 turned on, the DISCARDED STORAGE ELIGIBLE FOR STEAL is always
recorded as 0. The total real storage in use numbers (REAL + AUX - DISC STORAGE IN USE)
still includes the discarded pages and reflects the maximum amount of real storage that
the Db2 subsystems used at any point during the measurements.
Figure 2-2 Storage areas that make up total real storage in use for the Db2 members
Summary of results
The performance tests show that changes in Db2 13 to optimize the storage management of
ATB thread storage by reducing unnecessary DISCARDDATA processing are efficient, and
overall real storage usage is similar to running Db2 12 with REALSTORAGE_MANAGEMENT=AUTO.
The DSMAX subsystem parameter specifies the maximum number of data sets that can be
open concurrently. A DSMAX value of 200000 is the maximum that is allowed in both Db2 11 and
Db2 12. This limit does not satisfy customer needs.
Two factors govern the limit of the number of concurrent open data sets in Db2:
The required 31-bit DBM1 private memory per open data set
The amount of 31-bit private memory remaining in DBM1 after excluding the required
common memory (ECSA)
z/OS 2.5 provides an enhancement that moves a portion of the control blocks that are
required for open data set to ATB storage. Most of these moves are transparent with Db2. In
addition to these moves, in z/OS 2.5, dynamic allocation processing supports scheduler work
blocks (SWBs) for data sets to use 64-bit storage, which reduces BTB storage usage. Moving
SWBs to ATB storage requires functions that are provided in Db2 13. As a result, more
concurrent data sets can be opened with z/OS 2.5 and Db2 13. Additionally, Db2 13
performance might be improved when opening many data sets concurrently by using
z/OS 2.5.
2.4.1 Requirements
The following z/OS versions, Db2 version, and FLs are employed in this performance
evaluation:
IBM z14 LPAR with eight general processors, two zIIPs, and MT=1, with 4 TB of real
storage
z/OS 2.5 and z/OS 2.4
Db2 13 at FL V13R1M100
Figure 2-3 Db2 13 function level 100 DBM1 BTB available storage as opened sets that use z/OS 2.4 and z/OS 2.5
When Db2 DBM1 BTB storage is nearly depleted and only approximately 5% of the extended
region size remains, Db2 issues a DSNV508I warning message (DSNV508I DSNVMON - DB2
DBM1 BELOW THE BAR STORAGE alert-level) to protect Db2 from failing. Any subsequent request
to open a data set receives an intentional Db2 abend with a reason code of 00E20003 or
00E20016.
DBM1 BTB storage usage is equal to the sum of the following fields in the DBM1 and MVS
Storage Below 2 GB section of the OMPE Db2 Statistics trace report:
31-Bit Extended Low Private
31-Bit Extended High Private
Figure 2-4 shows z/OS 2.5 allows more opened data sets than z/OS 2.4 with the same Db2
version opening data sets.
Figure 2-4 Db2 13 DBM1 BTB storage usage as opened data sets while using z/OS 2.4 and z/OS 2.5
Figure 2-5 shows the Db2 13 FL 100 DBM1 64-bit storage in use, excluding the storage that
is used by the Virtual Buffer Pool, for z/OS 2.4 and z/OS 2.5.
Figure 2-5 Db2 13 DBM1 64-bit storage usage excluding the virtual buffer pool
There is no appreciable difference between the two z/OS releases, except that the size stops
growing around 370,000 open data sets for z/OS 2.4. The reason is that the DBM1 BTB
storage is exhausted around this point for z/OS 2.4. For z/OS 2.5, ATB storage for the DBM1
address space continues to grow linearly until the limit of 400,000 open data sets is reached.
We excluded the memory that is used by virtual buffer pools in Figure 2-5 to demonstrate a
clear view of 64-bit storage usage.
Figure 2-6 CPU cost of opening data sets on the Db2 side (reflected in DBM1 TCB time)
The elapsed time of Db2 13 FL 100 opening data sets by using z/OS 2.4 and
z/OS 2.5 is shown in Figure 2-7. Again, there is no appreciable difference, except that
z/OS 2.5 reaches the maximum of 400,000 data sets, and z/OS 2.4 falls short of the limit due
to exhaustion of BTB DBM1 storage.
Figure 2-7 Elapsed time of opening data sets in z/OS 2.4 and z/OS 2.5
Figure 2-8 Histogram of LPAR CPU busy: Db2 13 function level 100 on z/OS 2.5
The following new DSNB280I message is issued during Db2 13 start or restart if the new
dynamic allocation function that supports SWB blocks for data sets in 64-bit storage is
successfully enabled:
DSNB280I DYNAMIC ALLOCATION SUPPORT FOR 64-BIT RESIDENT SWB BLOCKS IS ENABLED
2.4.5 Conclusion
To summarize, with Db2 13, 400,000 data sets can be opened concurrently by using
z/OS 2.5, and approximately 370,000 data sets can be open on z/OS 2.4 before Db2 fails.
SWBs are placed in 64-bit storage in z/OS 2.5, which meets the DBM1 BTB storage
requirement.
The estimated BTB storage requirement for an open data set is approximately 3 KB bytes for
z/OS 2.5, and 4 KB is used per open data set for z/OS 2.4.
Nontrivial collateral cost such as CATALOG address space CPU time consumption, GRS
address space storage growth, and sysplex SYSIGGCAS_ECS and ISGLOCK structures activity
should not be overlooked.
Besides the enhancement in z/OS 2.5 to move the SWBs ATB, there are other enhancements
that reduce the amount of storage that is needed to represent an open data set. As a result,
Db2 12 running on z/OS 2.5 also can open more data sets than with z/OS 2.4. However,
because Db2 12 cannot leverage SWB in ATB storage, expect less concurrently opened data
sets than when using Db2 13.
You can still specify the CACHESIZE bind option on the BIND PLAN subcommand to control the
plan authorization cache size at the plan level (and override the default plan authorization
cache size of 4 K).
The following counters were added in Db2 13. These new counters are set regardless of
whether you use Db2 native or external security controls.
QTAUCNOT tracks the number of times that the plan EXECUTE privilege was checked and not
found in the plan authorization cache.
QTAUCOW1 tracks the number of times that Db2 had to swap out an authorization ID in the
plan authorization cache due to lack of space.
The improvement that is seen can vary depending on the number of authorization IDs that are
involved and whether the authorization IDs are found in the cache.
At Db2 12 FL 508, the following two improvements were introduced to expand FTB support to
cover more indexes:
The columns in the INCLUDE list of unique indexes are no longer built into the FTB
structures, and their lengths do not count toward the size limit for the index key size, which
enables a more unique index to be eligible for FTB usage.
FTB functions also were expanded to support non-unique indexes. A non-unique index is
deemed an FTB candidate when its key size for the columns is 56 bytes or less. The Db2
system task that monitors unique index FTB eligibility now also monitors candidate
non-unique indexes to automatically allocate and deallocate FTB structures for these
indexes. A new system parameter FTB_NON_UNIQUE_INDEX was introduced to control
whether non-unique index FTB usage is enabled for the Db2 subsystem. In Db2 12, the
default value for FTB_NON_UNIQUE_INDEX is NO, which means non-unique indexes do not
have FTBs that are created for them.
In Db2 13 at FL 500 or later, two more changes are introduced to further expand FTB
eligibility:
The key size limit is raised from 64 bytes to 128 bytes for unique indexes, and from
56 bytes to 120 bytes for non-unique indexes.
The default value of the FTB_NON_UNIQUE_INDEX subsystem parameter is changed from NO
in Db2 12 to YES in Db2 13, which means that FTB support for non-unique indexes is
enabled by default in Db2 13 FL 500 or later.
Usability and monitoring enhancements for the FTB function also were introduced over the
years. For more information, see 2.6.2, “Performance measurements for an FTB key size
increase” on page 32.
Measurement environment
The following environment was used for the FTB non-unique index support feature’s
performance evaluations under non-data sharing:
One IBM z14 LPAR with two general processors, one zIIP, and 64 GB of real storage
A non-data-sharing Db2 12 subsystem running at FL 504
z/OS 2.3
A DS8870 direct access storage device controller
The following test environments were for the two-way data-sharing measurements:
Two IBM z14 LPARs with two general processors, one zIIP for each LPAR, and 64 GB of
real storage for each LPAR
Two-way data-sharing Db2 12 subsystems running at FL 504
z/OS 2.3
All of random select, update, and insert workloads were run under a non-data-sharing
configuration. The random insert workload was run in a two-way data-sharing configuration.
The random select workloads perform 200 random selects per commit. There are three
scenarios with a different number of data rows for each index key value:
One row for an index key
Ten rows for an index key
One hundred rows for an index key
The random insert workloads have three scenarios with a different number of data rows for a
single index key value:
One row for an index key. The test inserts one row per commit.
Ten rows for an index key. The test inserts 100 rows per commit.
One hundred rows for an index key. The test inserts 100 rows per commit.
Results
CPU improvements were observed for all the performance evaluations when FTBs were
allocated for the non-unique index. This section describes the results in two parts: the results
for the non-data-sharing environment are presented, and then followed by the results for the
two-way data-sharing environment.
For the non-data-sharing tests comparing when FTB is used for the non-unique index to when
FTB is not used, the following observations were noted:
Getpage reductions of up to 72% for the test workloads.
With the getpage reductions, there are Db2 total CPU usage savings of up to 17%. The
total Db2 CPU time that is used includes the class 2 CPU time and Db2 address space
CPU time that is spent on general central processors (CPs). The CPU time that is spent
on zIIP is not included in the analysis. These FTB measurements do not affect zIIP usage
for the workloads much. All the zIIP time is from Db2 address spaces and had only small
changes.
Figure 2-9 Non-unique index 1 row per key CPU and getpage changes: non-data sharing
– With 10 rows of data for every index key value without FTB, in our test there are four
getpages on the 4-level index to locate the index key and six getpages on the table
space to retrieve all 10 rows of data for each select. With FTB enabled for the index, for
each select, there is one getpage on the index and six getpages on the table space.
Therefore, in theory, the total getpage reduction FTB brings for a single select is 30%.
Random selects getpage reduction of 29.4% in Figure 2-10 verifies the theory.
Figure 2-10 Non-unique index 10 rows per key CPU and getpage changes: non-data sharing
– When there are 100 rows of data for every index key value without FTB, in our test
there are 4.2 getpages on the 4-level index (because there are 100 RIDs for a single
index key, and one index key’s data can span two pages) and 100 getpages on the
table space for each select. With FTB enabled for the index, we see the index gepages
reduced to 1.5 for a single select. As Figure 2-11 on page 31 shows, the total getpage
reduction is 2.7%.
FTB data structures are allocated in a 64-bit variable storage pool. The biggest increase
that we see in 64-bit variable storage pool usage is 294 MB for an FTB structure with a
size of 119,921 KB, as the output of the -DISPLAY STATS(IMU) command shows.
For the two-way data-sharing insert tests (see Figure 2-12), comparing when FTB is used for
the non-unique index to when FTB is not used.
Figure 2-12 Non-unique index insert workload CPU and getpage changes: two-way data-sharing
In the two-way data-sharing configuration, the insert workload showed similar degrees of
getpage reduction as the non-data-sharing tests. The getpage reduction is as high as
72%.
Db2 total CPU saving is observed for all the measurements with 1, 10, or 100 rows per
index key value. The Db2 CPU reduction is in the range of 2.1% - 6.9%.
The biggest 64-bit variable storage usage increase for a single Db2 member was 300 MB
for the FTB with 95936 KB, as the output of the -DISPLAY STATS(IMU) command shows.
Notify messages between the two Db2 members increase up to 14 times compared to the
test runs without FTB enabled. This increase occurs because index splits happen
occasionally as data is inserted. With FTB enabled for the index, the Db2 member handling
the index split sends notify messages to the other members to let them know about the FTB
structure changes. This situation causes the random insert workload’s class 3 suspension
time for notify messages to increase. We also observed a slight increase of ssnmIRLM
address space SRB time. The increase is far less than the class 2 CPU reduction.
Figure 2-13 Non-unique index insert workload notify message differences: two-way data-sharing
Summary of results
Allowing non-unique indexes to be accessed through FTB structures can help reduce
getpage operations that are performed on the non-leaf index pages and save CPU time for
workloads that access indexes randomly.
Although the performance evaluations that are described here cover only limited non-unique
index usage scenarios, they are not the only tests that were run to make sure that FTB
support for non-unique indexes was properly vetted. The Db2 for z/OS performance team
also measured the feature’s performance with the more regular workloads, such as the
DPRB, TPC-E, relational transaction workloads (RTWs), query workloads, and others when it
was first introduced. We made sure that no regression was seen for any of these workloads.
In Chapter 4, “Workload-level measurements” on page 149, you can see various workloads
showing performance improvements after migrating to Db2 13 with the FTB_NON_UNIQUE_INDEX
subsystem parameter set to the default value of YES, which turns on FTB support for
non-unique indexes.
The Db2 for z/OS performance team conducted a series of tests to make sure that this FTB
enhancement meets its functional and performance target and does not cause performance
regression.
Measurement environment
The performance measurement was conducted by using the following components:
One IBM z14 LPAR
z/OS 2.4
Six general processors and three zIIPs
For the non-data-sharing tests, when FTB was used for the non-unique index was compared
to when FTB is not used:
The performance results for test cases with an FTB index key size of 24 bytes and
48 bytes showed that this FTB enhancement introduced no noticeable performance
degradation for the existing support and all the differences are in the noise range.
Non-data sharing for an index key size of greater than 64 bytes: For an index with a key
size greater than 64 bytes, such as 96 bytes, the index is not eligible for FTB with the
existing FTB support, but it is eligible for FTB with this FTB enhancement. In this
measurement, the total number of rows that are inserted is 20 million and the number of
index levels is 4. The performance results for random select with index-data access are
shown in Figure 2-14. This FTB enhancement improves CPU time by approximately 8%
for a case with the unique index of 96 bytes, and 4 - 6% CPU time improvement is seen for
a case with the non-unique index of 96 bytes. The CPU savings occur because with this
enhancement, an FTB structure is created for the index, and for one random index access,
only one getpage operation is needed instead of four without FTB.
Figure 2-14 Class 2 CPU time improvement for random select with index-data access
For two-way data sharing for an index key size of greater than 64 bytes, the performance
results for tests with a unique index with a key length of 96 bytes are shown in Figure 2-15 on
page 35. Up to 13% CPU time improvement is observed for random index-data access, and
there is no significant overhead for sequential insert, update, and delete operations.
The performance results for a non-unique index are shown in Figure 2-16. Up to an 8% CPU
time improvement is observed for random index-data access, and there is no significant
overhead for sequential insert, update, and delete operations.
Figure 2-16 Db2 CPU time improvement for a 96-byte non-unique index
For monitoring, new IFCID 2 fields and the new -DISPLAY STATS(INDEXTRAVERSECOUNT)
command were introduced in Db2 12 with new-function APARs to help system administrators
monitor and manage FTB more functions.
APAR PI72330 introduced six new FTB_related fields for IFCID 2, as shown in Example 2-1.
OMPE on z/OS 5.4.0 and later supports these new fields and prints them in the
MISCELLANEOUS section of the Db2 statistics trace, as shown in Example 2-2.
This command helps database administrators to better manage FTB functions for individual
indexes. For more information about how to use the command, see -DISPLAY STATS (Db2).
Example 2-3 on page 37 shows a sample command output with DBNAME specified. As the
example shows, the indexes are ordered by traverse count with biggest one on top and the
smallest at the bottom.
To give Db2 system administrators more control over the usage of the FTB capability for
specific indexes, Db2 12 APAR PH23238 introduced a new SELECTED option for the
INDEX_MEMORY_CONTROL subsystem parameter. With this option, instead of letting Db2
automatically decide which indexes are suitable for using FTB structures for index access,
only the indexes from SYSIBM.SYSINDEXCONTROL table rows with ACTION='A' (Automatic FTB
creation) are considered for FTB eligibility. For more information about how to use the
SELECTED option, see the INDEX MEMORY CONTROL field.
How Db2 determines which indexes are suitable for FTB is usually effective. But if you decide
to use the SELECTED option for the INDEX_MEMORY_CONTROL subsystem parameter, you
can use the -DISPLAY STATS(ITC) command to get a better understanding of which indexes
have the highest number of traverse counts. Choosing from those top indexes to insert into
the SYSIBM.SYSINDEXCONTROL table with ACTION='A' is a good starting point.
Also, in Db2 12, APAR PH41751 eliminates message DSNI070I to reduce the number of
console messages that are issued when fast index traversal (FTB) is used. The removed
DSNI070I message reported changes to the number of objects that use FTB and other
FTB-related status information for a Db2 subsystem. Customers reported seeing too many of
these messages.
To obtain detailed information about FTB usage in a Db2 subsystem, you can use IFCID 389
trace information. IFCID 389 also is part of statistics trace class 8. You also can issue the
-DISPLAY STATS command with the INDEXMEMORYUSAGE option.
SELECT * FROM
(
SELECT
SUBSTR(A.CREATOR,1,10) AS TABLE_CREATOR,
SUBSTR(A.NAME,1,10) AS TABLE_NAME,
SUBSTR(B.CREATOR,1,10) AS INDEX_CREATOR,
SUBSTR(B.NAME,1,10) AS INDEX_NAME,
SUM(D.LENGTH
+ CASE D.COLTYPE
WHEN 'VARCHAR' THEN 2
WHEN 'VARBIN' THEN 4
ELSE 0
END
+ CASE D.NULLS
WHEN 'Y' THEN 1
2.6.4 Conclusion
We hope that the information that is provided in this section conveys a strong message that
the fast index traversal function is a powerful performance improvement feature. From its
original introduction in 2016 at the GA of Db2 12, it has gradually matured and been improved
over the years. With the expanded FTB capability to support non-unique indexes and indexes
with longer key sizes, you can expect to see more indexes being accessed by using fast index
traversal and enjoy more CPU cost savings.
The storage relief from this enhancement greatly depends on the number of communications
buffers that your workload needs. The performance measurements did not show any
performance regression after this feature was added in Db2 13.
While not processing a client request, a DBAT does not need to be attached to the client
connection awaiting the next request from a client (CMTSTAT=INACTIVE). This capability allows
Db2 to service many client connections with a relatively small population of DBAT server
threads.
When a DBAT is not being used to service a connection request, the DBAT is “pooled” and
considered disconnected from the connection. If a pooled DBAT is not used within the amount
of time that is specified by the POOLINAC system parameter, it is terminated. A DBAT also is
terminated at the end of processing after it is used cleanly to process up to 500 transactions.
The -STOP DDF MODE(SUSPEND) command also terminates the disconnected pooled DBATs.
A high-performance DBAT also automatically terminates when it is used 500 times by its
client connection. A high-performance DBAT also terminates if it does not receive a new
transaction request within a POOLINAC amount of time. However, high-performance DBAT does
not acknowledge a POOLINAC time of 0. If POOLINAC is set to 0, POOLINAC is assumed to be
120 seconds. Therefore, setting POOLINAC to 0 does not prevent Db2 from terminating a DBAT
that was reused for over 500 requests.
When a (large) spike of incoming work over the network occurs, it requires Db2 to create
many DBATs to handle the extra work. If the workload spike is short, there is a good chance
that these additional DBATs are reused only a couple of times and mostly sit n the pool
waiting to be reused. However, after being in the pool for more than the POOLINAC value, Db2
terminates those DBATs. Because the spike was short, many DBATs that are in the pool for
longer than POOLINAC become eligible for termination concurrently.
To provide some immediate relief, Db2 12 delivered PH36114 to reduce the number of DBAT
terminations that occur concurrently.
Db2 thread storage contraction was improved. With these changes, which are intended to
improve DBAT termination behavior in Db2, you see a reduction in the overall frequency and
number of DBAT terminations, which also helps to reduce the number of concurrent DBAT
terminations that are often caused by a short workload spike.
Performance measurement
To evaluate the new DBAT termination behavior, a client workload with 2000 threads that are
connected to the Db2 subsystem is used. After the workload runs stable for 2 minutes, all
threads of this workload stop performing SQL work while they are still connected to the Db2
subsystem. To analyze the DBAT termination behavior, we use the ACTIVE_DBATS metric. It is
the total number of allocated DBATs (in use or in the pool).
Figure 2-17 shows the ACTIVE_DBATS metric from the Db2 statistics trace. The data shows the
number of active DBATs gradually decreasing at a rate of 50 DBAT termination per cycle, as
expected.
Measurement environment
To conduct the measurements, we used Db2 13 for z/OS and the same level of z/OS level,
either z/OS 2.4 or z/OS 2.5. The Db2 CPU time, including both general-purpose processors
and IBM zSystems Integrated Information Processors (zIIPs), was the comparison point.
Figure 3-1 on page 45 shows the percentage of CPU time improvement running on IBM z16
compared to IBM z15. The workloads that are used here are described in Appendix A, “IBM
Db2 workloads” on page 403.
The analysis indicates that the wide range of variability is due to the difference in the
processor cache-hit behavior. IBM z16 processors use 7-nm chip technology with an updated
cache hierarchy on the central processor complex (CPC) drawer. IBM z16 has a larger L2
cache than IBM z15, with up to 32 MB of virtual private cache. The L3 and L4 cache are using
a virtual shared cache on the IBM z16.
In general, workload performance is sensitive to how deep into the memory hierarchy the
processor must go to retrieve the instructions and data to run the workloads. This
characteristic is described as relative nest intensity (RNI).
The workloads that mostly use the L2 cache can be characterized as low-RNI workloads.
They are typically Db2 batches or queries. Conversely, workloads that require a deeper level
of cache hierarchy are called high-RNI workloads. Typically, transactional workloads with
many concurrent threads with transaction managers running on z/OS fall into this category.
Due to the cache hierarchy updates, a workload with high RNI might see more improvement
compared to a low-RNI workload that runs on IBM z16.
For more information about the cache level differences, see IBM z16 (3931) Technical Guide,
SG24-8951.
We also observed more improvement for the IBM brokerage transactional workload running in
a data-sharing configuration compared to a non-data-sharing configuration. This difference is
attributed to the performance optimization of the IBM z16 coupling link latency. For more
information about this improvement, see 3.2, “Data-sharing overhead reduction with short
reach coupling links” on page 46.
These improvements result in improved synchronous CF request service times, which reduce
Db2 data sharing overhead.
3.2.1 Requirements
To leverage these improvements, you must be using IBM z16 with CFLEVEL 25 or later.
IBM z15 running CFLEVEL 24 running the same workload is used as a basis for comparison.
No specific Db2 version or function level (FL) is required for a data-sharing group to leverage
the benefits of this z16 enhancement. If there are Db2 CF structures on CFs at CFLEVEL 25
and connected to the z/OS logical partitions (LPARs) (at z/OS 2.3 or later with PTFs for APAR
OA60650 applied) through CS5 type CF links, there should be data-sharing processing
savings.
Measurement environment
Two sysplexes are employed: one on IBM z15 hardware, and the other on IBM z16. Each
sysplex consists of two Internal Coupling Facilities (ICFs) and two LPARs. Separate CS5 links
are used to link LPARs and ICFs. Another separate CS5 link also connects the two ICFs. The
tests use asynchronous duplexing for the Db2 lock structure and synchronous duplexing of
the shared communications area (SCA) structure. All GBPs are duplexed with the structures
that are spread over the two ICFs.
The IBM brokerage transaction workload is the workhorse that is used to evaluate this
feature. The workload is driven by a Linux on IBM Z driver, which is outside of the sysplex but
within the same IBM z15 or IBM z16 box as the sysplex.
Results
Table 3-1 on page 47 highlights data from each collected measurement.
Non-data sharing
As indicated in Table 3-1, using IBM z16 along with CFLEVEL 25 shaves off
3.2 percentage points of the (two-way) data-sharing overhead compared to IBM z15 that uses
CFLEVEL 24.
Looking at the RMF CF activity reports, the average SYNC CF service time for IBM z15 is
5.75 microseconds, and it is 3.85 microseconds for IBM z16, which is an improvement of 33%
though both IBM z15 and IBM z16 processors are running at a 5.2 GHz clock speed.
CF access accounts for most, if not all, of the Db2 data-sharing cost. CF synchronous access
times are directly related to CPU time, so enhancing CF SYNC access time directly reduces
the data-sharing cost.
In this evaluation, the total CF SYNC time in the two-way data-sharing measurement is
484 IBM z15 CPU seconds and 336 IBM z16 CPU seconds, a reduction of 30%, which is in
line with the average SYNC CF service time difference.
Because the data-sharing cost is much cheaper on IBM z16 than IBM z15, the data-sharing
measurement (CPU time or ITR) improves more than non-data sharing on IBM z16 compared
to IBM z15.
However, the performance of Internal Coupling (ICP) links or Long Reach Coupling (CL5)
links are not enhanced in CFLEVEL 25.
We want to emphasize that this section describes our experiences with migrating the IBM
brokerage workload from CFLEVEL 24 (IBM z15) to 25 (IBM z16). We make this information
available so that you can avoid the same issues that we experienced. It is not intended to
provide any technical advice that the Db2 community must follow.
The following excerpt, which is taken from the section titled CF structure sizing increases for
CFLEVEL 25 in the Family 3931+01 IBM z16 announcement letter, provides the official
advice for CFLEVEL 25:
“CF structure memory size requirements often increase when migrating to a higher
CFLEVEL. IBM z16 CFLEVEL 25 is no exception to this general rule, and in fact the structure
size increases might be more noticeable going to CFLEVEL 25 than for some previous
CFLEVEL migrations, particularly for structures whose absolute size is small (for example,
100 MB or less). Clients are urged to carefully resize their CF structures as part of migrating
to CFLEVEL 25.”
Our experience is a result of having a small SCA structure for the workload.
3.3.2 Results
The Db2 SCA list structure was defined with a SIZE setting of 10M when using CFLEVEL 24.
When using CFLEVEL 25, Db2 at FL 501 failed to start with a 10M size SCA.
The SCA structure has a different number of directory entries and data elements that are
available in CFLEVEL 25 compared to CFLEVEL 24. Table 3-2 shows the allocated size,
number of directory entries, and number of data elements of the SCA structures that we used
under CFLEVEL 24 and 25.
DSNCEB0_SCA
As shown in Table 3-2, the SCA structure uses 20% more storage in CFLEVEL 25 than 24,
yet both the number of directory entries and data elements are fewer than CFLEVEL 24.
Because we used a small SCA structure for CFLEVEL 24, we were unlucky that without
making it larger on CFLEVEL 25, there was not enough space in the structure for Db2 to
restart successfully, and this situation drew our attention to the memory size requirement
increase for CFLEVEL 25.
Other data-sharing workloads with larger SCA structure sizes, such as the relational
transaction workload (RTW) workload, which uses a SIZE (80M) SCA structure, did not
encounter the same SCA storage shortage issue after migrating to CFLEVEL 25 with the
same structure size.
We also examined the GBP structures to understand whether the number of directory entries
and data elements change after migrating to CFLEVEL 25 without structure size changes.
Table 3-3 Db2 GBP structure for 4 K buffer pools: CFLEVEL 25 versus CFLEVEL 24
Cache structure CFLEVEL 24 CFLEVEL 25 CFLEVEL 25 versus 24
DSNCEB0_GBP0
DSNCEB0_GBP8
DSNCEB0_GBP0 is the smallest GBP (63M in size), and GBP8 is the largest (26G in size) that
corresponds to 4 K local buffer pool BP8. The shrinkage in directory entries and data
elements, from CFLEVEL 24 to 25, is 0.4% - 1.3% for the IBM brokerage workload. On
CFLEVEL 24, we do not see any directory reclaims for the GBP structures. However, with
CFLEVEL 25, because of reductions in total available directory entries, we see a small
amount of directory reclaims for some of the GBP structures, although the small amount of
directory reclaims does not seem to impact performance.
Although most local buffer pools are 4 K size buffer pools, there is one 32 K local buffer pool
that requires a corresponding GBP structure DSNCEB0_GBP32K. Table 3-4 illustrates the
GBP32K comparison between CFLEVEL 25 and 24.
Table 3-4 Db2 GBP structure for 32 K buffer pools; CFLEVEL 25 versus CFLEVEL 24
Cache structure CFLEVEL 24 CFLEVEL 25 CFLEVEL 25 versus 24
DSNCEB0_GBP32K
With the same allocated size for GBP32K, both the number of directory entries and data
elements are 27% fewer in CFLEVEL 25 than in CFLEVEL 24. This difference might impact
system performance if GBP32K is heavily accessed.
DSNCEB0_LOCK1
On the positive side, the geometry of th Db2 lock structure remains constant between
CFLEVEL 24 and 25 for the IBM brokerage workload.
Summary of results
Although this study is not an exhaustive or comprehensive evaluation of CFLEVEL 25, it did
reveal the fact that when the IBM brokerage data-sharing workload was moved to an IBM z16
machine, and after applying the same CFRM policy settings that were running fine on
CFLEVEL 24 CFs to the CFLEVEL 25 CFs, we observed a significant drop in the number of
“entries” that can be stored in the SCA structure and some of the GBP structures with
relatively small sizes.
Two new statistics are added to the GBP statistics storage areas: One is the average time that
a data element stays in a GBP before it is reclaimed, and the other is the average time that a
directory entry is in a GBP before it is reclaimed. The values are returned to Db2 from the CF
through z/OS. They are moving weighted averages that approximate the residency time in
microseconds averaged on a per-cache-storage-class basis.
These two metrics can be obtained through the Instrumentation Facility Component Identifier
(IFCID) 230, 254 record traces, or by issuing the -DISPLAY GROUPBUFFERPOOL command with
the GDETAIL option. For more information about this enhancement, see 12.4, “Group buffer
pool residency time enhancements ” on page 356.
For this feature to work, Db2 must be running on an IBM z16 LPAR with z/OS 2.4 or later that
has the necessary PTFs for APAR OA60650 to support the cache residency time metrics.
Also, the CF must be running at CFLEVEL 25 or later.
Measurement environment
The IBM brokerage workload that is used during this evaluation was run in the following
environment:
An IBM z16 sysplex consisting of two MVS LPARs
Two ICFs running CFLEVEL 25
z/OS 2.5
Db2 13 at FL 501
Results
A series of nine two-way data-sharing measurements were collected to understand the
performance effect of the GBP residency time for a data-sharing environment. In particular,
the size of four GBPs was gradually increased to observe the changes in their data and
directory residency times along with the data-sharing group’s ITR, which is a yardstick for
system performance.
The nine measurements were designated as state I - IX in Table 3-6. Each state used
different GBP sizes for GBP12, GBP21, GBP23, and GBP24.
I 0.25 1 1 0.25
II 0.50 2 2 0.50
III 1 4 4 1
IV 2 8 8 2
V 3 12 12 3
VI 5 17 17 5
VII 20 17 17 20
VIII 20 40 40 20
IX 20 80 80 20
Figure 3-2 on page 53 through Figure 3-5 on page 54 depict the GBP residency time for data
and directory entries at the end of each measurement for the respective GBPs.
Note: If there are no data or directory reclaims for a GBP structure after the structure is
allocated, the residency time that is returned is 0. The residency time is calculated after a
cache structure is allocated. The current allocation’s residency time is not affected by
previous allocations.
The Db2 data-sharing group ITR for states I - IX are shown in Figure 3-6 on page 55.
The distribution of the group’s ITR is 15,200 - 15,700 commits per second, which is a mere
3.3% difference and barely above the measurement noise level. However, it is surprising to
see that the group ITR of state IX is not the best one. With zero GBP residency time in both
data and directory entries, which means no reclaim is performed for any of the GBP
structures, we expected state IX to be among the best performing scenarios. Further
investigations were conducted to better understand this unexpected behavior.
Figure 3-8 CF utilization of the CF with primary GBPs across nine states
It is evident that the CF utilization of states VIII and IX is much higher than the other
measurements, and later CF utilization leads to longer CF SYNC service times.
Figure 3-9 on page 57 through Figure 3-12 verify that more CF requests lead to higher CF
utilization, which results in longer CF SYNC service times.
Armed with this information, we proceeded to calculate the total group’s CF SYNC service
time for each state during the measurement periods as a sum (GBPxx CF SYNC service time
per request x total SYNC requests during the measurement duration). Figure 3-13 on
page 59 shows the outcome.
It is clear from Figure 3-13 that state IX has the highest group CF SYNC time, 135 CPU
seconds, which is a 36% increase compared to state I, which has 99 CPU seconds CF SYNC
time.
The root cause of why the group ITR of state IX is not the best one is the higher than average
number of CF requests during the measurement interval, which leads to higher CF utilization
and results in longer CF SYNC service time. Figure 3-13 shows more CF SYNC CPU time in
state IX than the rest of the states, so the group ITR of state IX is not the best one, although it
is close.
Summary of results
Drawing on the experiences from this exercise, we can summarize the following items:
Small GBP residency times can mean frequent reclaims such that either directory or data
entries do not stay long in the GBP, which might cause more synchronous I/Os because
data is less likely to be found in the GBP when it is needed. The reclaim process itself also
incurs a CPU cost, which is what happens in state I in this evaluation, which results in the
lowest group ITR among the nine states.
Large GBP residency times can mean infrequent reclaims on either directory or data
entries such that they stay in the GBP for a long time, and the chance that the required
data is found in the GBP is higher, and the workload performs fewer synchronous I/Os.
The GBP is on the verge of zero GBP residency time, as shown in state VI or VIII in this
exercise.
Zero GBP residency time is expected to be the best performer because both directory and
data entries stay in GBP forever. There are no reclaims at all, and there are the fewest
synchronous I/Os.
Overall, the group ITR distribution in this exercise is 15,200 - 15,700 commits per second,
which is a mere 3.3% difference and barely beats the measurement noise level. This situation
occurred because reclaim processing was relatively small in the workload that we evaluated.
Continue to monitor GBP shortage messages (DSNB325A or DSNB319A) in addition to the GBP
residency time when tuning the GBP size.
Our evaluation of SRB focuses on the zIIP boost only. The duration of a zIIP boost is 1 hour.
SRB stage 3 is delivered with IBM z16 (driver 51). It enhances SRB in the following areas:
Middleware Region Startup Boost
SRB can potentially be activated for 5 minutes to accelerate the restart of any started task.
However, there is a limit of 30 minutes of Recovery Boost time per system per day. The
Workload Manager (WLM) classification rules were enhanced to let you select those
started tasks whose startup triggers the activation of SRB for that system. Because our
tests were conducted on z15, we did not test this feature, but the inner workings are the
same, but that the time and scope of what is boosted is different from the tests that are
described.
IBM HyperSwap® configuration load boost, which is not a focus in our evaluation.
SVC Dump boost, which is not a focus in our evaluation.
In addition, SRB usability is also improved, which allows enablement and disablement of the
SRB feature on a system, monitoring the status of SRB, and so on.
For more information about SRB, see Introducing IBM Z System Recovery Boost,
REDP-5563.
3.5.1 Requirements
To leverage SRB, you must be using the following minimum levels of hardware and software:
IBM z15 or later hardware
Db2 12 for z/OS at FL 100 or later
Results
The following sections describe the results for each of the scenarios.
Table 3-7 Benefit of SRB to the Db2 restart after unplanned LPAR outage
Event No boost With boost Delta
2 CPs and 2 zIIPs 2 CPs and 2 zIIPs with 4
(in seconds) reserved zIIPs
(in seconds)
The timings for the different events in Table 3-7 are as follows:
IPL: From “beginning” to “TSO Login”
Db2 restart time is the sum of the following times:
– Log initialization: From “-STA Db2” to message DSNJ099I
– Current status rebuild: From message DSNJ099I to message DSNR004I
– Forward log recovery: From message DSNR004I to message DSNR005I
– Backward log recovery: From message DSNR005I to message DSN9022I
The impact of SRB on the Db2 forward log recovery phase during Db2 restart is significant in
this scenario. SRB shortens the elapsed time of this phase. As a result, it speeds up the Db2
restart after an unplanned LPAR outage.
Figure 3-14 WLM service time for Db2 restart after unplanned LPAR outage
The forward log recovery phase of the Db2 restart process is designed to have fast log apply
tasks working in parallel. Because there are more available zIIPs when SRB is used, which
allow more tasks to be working concurrently, the results are an improved forward log recovery
phase during Db2 restart.
Table 3-8 Benefit of SRB for a Db2 restart after a planned LPAR outage
Event No boost With boost Delta
2 CPs and 2 zIIPs 2 CPs and 2 zIIPs with 4
(in seconds) reserved zIIPs
(in seconds)
The timings for the events that are listed in Table 3-8 are as follows:
IPL: From “beginning” to “TSO Login”
Db2 restart is the sum of the following times:
– Log initialization: From “-STA Db2” to message DSNJ099I
– Current status rebuild: From message DSNJ099I to message DSNR004I
– Forward log recovery: From message DSNR004I to message DSNR005I
– Backward log recovery: From message DSNR005I to message DSN9022I
Figure 3-15 illustrates the workload external throughput rate (ETR) over time for a scenario
without SRB, where approximately 10 minutes after the system undergoes IPL, Db2 is
restarted to work on eliminating the backlog that is accumulated while the system was down.
It is evident that the ETR gradually drops as time goes on, meaning less and less backlog is
cleared as time passes.
Figure 3-15 Throughput rate when eliminating the backlog without SRB
Concurrently, the transaction elapsed time is getting longer, as shown in Figure 3-16.
Figure 3-16 Application elapsed time when eliminating the backlog without SRB
Figure 3-17 General processor utilization when eliminating the backlog without SRB
Figure 3-18 zIIP processor utilization when eliminating the backlog without SRB
When the backlog is being cleared in a CPU-bound environment, the workload is starved for
processing power.
Now, look at a system that undergoes IPL with SRB capability. With SRB, two CPs and six
zIIPs are working for 1 hour after IPL (the boost period) to clear the backlog.
Almost immediately, a reduction in various Db2 internal latch suspension times is observed.
Db2 page latch suspension time is also reduced, as shown in Figure 3-19 on page 65.
Reducing both Db2 internal latch and page latch suspension time leads to more stability of
the transaction elapsed time, as shown in Figure 3-20.
Figure 3-20 Application elapsed time when eliminating the backlog with and without SRB
Figure 3-21 External throughput when eliminating the backlog with and without SRB
After 1 hour, SRB is no longer active, and the system goes back to a two CPs and two zIIPs
configuration. Thereafter, the Db2 application elapsed time and workload ETR exhibit the
same behavior.
Figure 3-22 shows that the LPAR (CP and zIIP) MVS is busy from the time the system
undergoes IPL until 65 minutes after IPL.
Figure 3-22 MVS busy when eliminating the backlog with and without SRB
The LPAR (CP and zIIP) is less stressed because it is relieved by SRB, which provides extra
zIIPs working together through the boost period to clear the backlog.
1 hour later, the following messages indicate that the SRB duration ended:
IEA678I All IPL boosts have ended
IWM063I WLM POLICY WAS REFRESHED
IWM064I BOOST ENDED
APAR PH34550 addresses a Db2 RLF issue with SRB. Make sure that the PTF for this APAR
is applied when Db2 is restarted across the SRB period.
3.5.5 Conclusion
The focus of this study is on exploring the effects of the zIIP boost option of the SRB feature.
It provides extra processing power by allowing zIIP processors to work like general
processors during a 1-hour SRB period after a system IPL. Therefore, the extra processing
capability helps clients to work more quickly through a backlog of work that accumulated
while the system was down.
When Db2 was restarted after a planned LPAR outage, SRB had minimal performance
impact because the Db2 restart is mostly I/O-bound in this case, and adding more CPU
power does not speed up the restart time.
3.6 REORG table space with variable length records on IBM z15
with IBM Z Sort
IBM z15 provides a new on-chip sort accelerator that is known as the IBM Integrated
Accelerator for IBM Z Sort (Z Sort). IBM z15 also provides the new SORT LISTS (SORTL)
instruction, which allows DFSORT to take the hardware-accelerated approach to sort, which
reduces the CPU cost and shortens the elapsed time. The Db2 REORG utility implements the
interface to leverage the IBM z15 features when DFSORT is invoked to reorganize a Db2
table space with variable length records (VLRs).
3.6.1 Requirements
To leverage this enhancement, you need the following hardware and software:
IBM z15 or later
z/OS 2.3 or later
Db2 12 for z/OS at FL 500 or later
Db2 12 APAR PH28183
DFSORT APAR PH03207
Measurement environment
The performance evaluation was done on both IBM z14 and IBM z15 systems. Both systems
were set up with z/OS 2.3, Db2 12, four general-purpose processors, one zIIP engine, and
512 GB of memory.
Then, the REORG TABLESPACE utility is run with the SORTDATA, SHRLEVEL CHANGE, and
KEEPDICTIONARY options. Two sorts are performed during the utility run time: sorting data VLR,
and sorting when building the indexes (fixed-length record (FLR)). Both sorts use DFSORT,
but only the VLR part is Z Sort eligible.
The IBM z15 LPAR is equipped with enough memory so that Z Sort can contain the sort
entirely in memory in this test.
For both scenarios, measurements are repeated: the first run uses a short clustering key
length (33 bytes), and the second run uses a longer clustering key length (165 bytes).
The performance numbers that are used for comparison are ELAPSE, SORTCPU, and
STEPCPU from the System Management Facility (SMF) 16 and SMF 30 records, and
above-the-bar (ATB) real memory usage from the job outputs.
Results
The measurement results can be summarized as follows:
Scenario 1 (IBM z15 with Z Sort versus IBM z14)
– Short clustering key length
For the VLR part, the improvement is up to 17% in ELAPSE and 40% in SORTCPU
time. For the overall REORG job, the improvement is up to 11% in ELAPSE and 17% in
STEPCPU time. z15 with Z Sort uses 91% more ATB real memory.
– Longer clustering key length
For the VLR part, the improvement is up to 23% in ELAPSE and 34% in SORTCPU
time. For the overall REORG job, the improvement is up to 15% in ELAPSE and 20% in
STEPCPU time. z15 with Z Sort uses 66% more ATB real memory.
Scenario 2 (IBM z15 with Z Sort versus IBM z15 no Z Sort)
– Short clustering key length
For the VLR part, the improvement is up to 7% in ELAPSE and 32% in SORTCPU
time. For the overall REORG job, the improvement is up to 4% in ELAPSE and 9% in
STEPCPU time. IBM z15 with Z Sort uses 91% more ATB real memory.
– Longer clustering key length
For the VLR part, the improvement is up to 3% in ELAPSE and 25% in SORTCPU
time. For the overall REORG job, the improvement is up to 1% in ELAPSE and 8% in
STEPCPU time. IBM z15 with Z Sort uses 66% more ATB real memory.
Db2 also has the UTILS_USE_ZSORT=YES/NO subsystem parameter (NO is the default) to control
the usage of Z Sort for the REORG TABLESPACE utility. When UTILS_USE_ZSORT=YES, if the Db2
REORG TABLESPACE utility detects at run time that Z Sort is enabled and the architecture
supports Z Sort from DFSORT, then Db2 uses Z Sort.
The following two messages in the job outputs are helpful for determining z Sort status:
ICE267I indicates whether Z Sort is used.
ICE399I indicates how much memory is used.
3.6.4 Conclusion
When you are using IBM z15 and later, Z Sort for DFSORT is a good feature to enable and
test. Currently, only the Db2 REORG TABLESPACE utility with VLR uses it, but Db2 might expand
the usage to REORG TABLESPACE with FLR or other utilities, such as LOAD, in the future.
Besides the general requirements for any SQL sort to use SORTL, not all sorts benefit from
using SORTL. An individual sort operation can leverage SORTL when the following conditions
are met:
It must be part of a Db2 12 plan or package.
The sort key size is <= 136 bytes, and the data size is <= 256 bytes. These sizes are not
“hard rule” numbers. They can be adjusted to be larger depending on the setting of the
SRTPOOL subsystem parameter and when the number of sort records is known.
The sort key size is equal to the data size (all the data columns are key columns with no
variable length fields).
3.7.1 Requirements
For any SQL sort to be considered for Z Sort usage, the system must meet the following
requirements:
IBM z15 and later
Db2 12 at FL 100 or later
Db2 APARs PH31684 and PH37239
Results
The performance measurements showed an average 8% (0 - 30%) class 2 CPU time
reduction and average 0.3% class 2 elapsed time reduction with qualified sort-intensive
queries. The measurements show that the percentage of the performance improvement is
affected by the following factors, which are ordered by the significance of their impact on the
query performance:
1. The sort pool size, which is determined by the SRTPOOL subsystem parameter
2. The number of rows sorted
3. The sort key size
Generally, for a qualified query with the same number of rows sorted and the same sort key
size and sort data size, the larger the sort pool size, the greater the reduction in CPU and
elapsed time. With the default SRTPOOL setting (10000KB) in Db2 12, the more rows sorted, the
more performance savings. The smaller the sort key size is, the more performance savings.
However, with an SRTPOOL size of 128000, the impact of the number of rows and sort key size
is not as obvious as with smaller SRTPOOL sizes.
Table 3-9 and Table 3-10 on page 73 provide more details about the performance
improvements when SORTL is used for different SRTPOOL sizes, which varies the number of
rows that are sorted, and the sort key length.
Table 3-9 Db2 class 2 CPU time improvement (%) for SRTPOOL size 10 M
Number of rows sorted Sort key size less than 40 bytes Sort key size more than 40
bytes
10 Up to 13% Up to 8%
100 Up to 8% Up to 7%
10 Up to 30% Up to 24%
100000 Up to 19% Up to 6%
Figure 3-25 OMPE Db2 accounting statistics report for SORTL usage
However, when this storage is freed, threads that use SORTL can incur extra ssnmMSTR SRB
or ssnmDIST SRB system CPU time from thread deallocation on systems that specify the
default setting of AUTO (or ON) for the REALSTORAGE_MANAGEMENT subsystem parameter. Because
the storage discard processing happens only at thread deallocation time, this situation can be
alleviated by optimal thread reuse. In addition, ATB storage management was enhanced in
Db2 13, which eliminates the need for the REALSTORAGE_MANAGEMENT subsystem parameter.
SORTL usage can help improve performance for qualified queries. One of the primary
requirements for using Z Sort is providing enough virtual, real, and auxiliary storage. To
provide the most benefit from SORTL usage, the default SRTPOOL subsystem parameter setting
for Db2 13 is changed from 10000KB to 20000KB.
Good candidates for LOB compression are LOB columns that store data in formats such as
DOCX, TXT, XML, and HTML. For binary files with a low compression ratio, Db2 recognizes
that no benefit is gained from compression, and it does not compress the data.
You can use the DSN1COMP stand-alone utility to estimate the compression ratio and the
number of data pages that are saved before the compression of the LOBs is implemented.
For insert operations, LOB compression works up to 70% better on IBM z15 for most common
file types. With the mixed data files scenario, the insert operations perform 70% better on
IBM z15 hardware than they do on IBM z14 with zEDC Express cards. For select operations,
IBM z15 hardware offers 27% better performance in decompression time for mixed file type
scenarios. With low compression ratio LOBs or TIF files, there is no improvement on
decompression.
TIF 1 1 -70.88 0
Large 22 25 -71.70 0
Compression time reduction helps to improve LOB compression throughput on IBM z15.
Figure 3-27 shows the results from the measurements of the workload with 10, 20, and 40
concurrent users that perform insert operations with the mixed data files scenario. It shows
the trend that you can achieve better throughput with IBM z15 hardware.
Figure 3-27 Throughput of LOB compression on IBM z14 and IBM z15
When 20 concurrent threads are used to perform insert operations, IBM z15 achieves a
throughput of 1418 MBps, but IBM z14 achieves only 1143 MBps. Overall, the Integrated
Accelerator on IBM z15 offers 21% - 39% more throughput in MBps than the zEDC on
IBM z14.
Conversely, the INPUT/OUTPUT PROCESSORS section report in the RMF I/O QUEUING ACTIVITY
report shows the activity for the Integrated Accelerator for zEDC. Figure 3-29 shows that on
IBM z15, compression busy % is 12%, but there is no queue. Also, the compression chip is
only 0.92% busy, so the Integrated Accelerator for zEDC can handle more compression work
than the zEDC card on IBM z14.
Figure 3-29 RMF report for the Integrated Accelerator for zEDC on IBM z15
3.8.2 Conclusion
The Integrated Accelerator for zEDC on IBM z15 improves the compression ratio across
well-known file types such as TXT, PDF, JPG, PNG, PPT, and TIF. It can achieve up to 70%
faster compression and decompression operations in some cases. Overall, compression time
is reduced by 56% - 77% for insert operations, and decompression time by 13.5% - 75% for
select operations on LOB data. The Integrated Accelerator for zEDC also reduces CPU cost
and can handle much more compression work compression than the PCIe card on IBM z14
hardware.
In general, when you can achieve a good compression ratio (better disk space saving), you
tend to observe both a CPU and elapsed time reduction by compressing the LOB. However,
with low compression ratio files, you might experience some regression for both elapsed and
CPU time when you enable LOB compression.
Before this feature, both WAR and WARM ran synchronously, which means that Db2 stays
with the command from the time that it is issued until it completes. The operation includes the
data being written to the GBP and XI signals traveling to the rest of data-sharing members to
invalidate local buffers. Only when both operations are complete can Db2, which issued this
command, proceed to its next task.
For a sysplex over distance, when the CFs and LPARs are separated by long distance (order
of kilometers), these commands can take a while to complete due to the XI signals traveling a
long distance.
With the asynchronous cross-invalidation feature, Db2 gets control back as soon as the WAR
or WARM command is issued. To avoid data inconsistency, Db2 issues a sync-up call to
check whether the commands completed in the meantime. Only the last token must be
checked. If the command is not yet completed when it is checked, Db2 is suspended until it
completes. This behavior allows multiple asynchronous WAR or WARM commands to run
simultaneously, which results in performance improvements.
3.9.1 Requirements
To leverage the asynchronous cross-invalidation feature, you must be using the following
environment:
IBM z14.
GBP allocated in a CF on IBM z14 and CF Level 23 or later
z/OS 2.3 or later (or z/OS 2.2 with APAR OA54688)
Db2 12 at FL 100 or later
Db2 12 APAR PH10577
Different combinations of these factors were tested by using batch jobs to determine the
threshold in which asynchronous processing of the XI request outperforms a synchronous
call. Update commit suspend time is the criteria that we used to determine the threshold.
Under that threshold, synchronous calls show shorter suspend times; however, above the
threshold, asynchronous calls show shorter suspend times, which provide a performance
benefit.
Table 3-13 on page 81 shows the update commit suspend times for two different distances
between the CF and LPARs (local and 10 km apart) from a test that uses a 4 K page size.
The data shows that for a 4 K page size, when the number of pages in a WARM command is
below the internal threshold of X pages, Db2 issues the WARM command in synchronous
fashion, which is the same behavior as without asynchronous cross invalidation. If the number
is X or greater, Db2 issues the WARM command asynchronously regardless of the distance
between CF and MVS LPARs, which results in better performance.
We repeated this exercise for 8 K, 16 K, and 32 K page sizes to determine the threshold value
for the respective page sizes. Unfortunately, no threshold value can be found for a 32 K page
size. So, Db2 does not allow the use of asynchronous XI for cross-invalidation of 32 K pages.
This limitation is documented in APAR PH10577.
Table 3-14 illustrates the performance improvement of using asynchronous XI when CF and
LPARs are a minimal distance apart. These results showcase batch updating eight 4 K pages
in a commit scope.
Table 3-15 illustrates the performance improvement of using asynchronous XI when CF and
LPARs are 10 km apart. These results showcase batch updating eight 4 K pages in a commit
scope. The same batch jobs that were used for the test in Table 3-14 are used in this test.
Fields QBGLWX, QBGLSU, and QBGLAS were added to the Db2 statistics trace record IFCID 2 to
make this information available for regular subsystem performance monitoring.
3.9.5 Conclusion
The asynchronous cross-invalidation feature addresses the elongated CF response time in a
sysplex where a long distance is involved. By invoking the asynchronous cross-invalidation
feature, when it is appropriate, update commit suspend time is reduced, which results in
improved elapsed time without sacrificing CPU time. The threshold to trigger the
asynchronous cross-invalidation feature depends on the page size. Through a series of
experiments, a threshold was found for 4 K, 8 K, and 16 K page sizes, but not the 32 K page
size. The internally designated threshold values are transparent to users, and 32 K pages are
not eligible for asynchronous cross-invalidation calls. When the number of changed pages
that is written to GBP is below the threshold value, Db2 continues to issue calls that use
synchronous XI, so no regression occurs.
The sync-up call that follows an asynchronous cross-invalidation call drives more requests to
the GBP. That cost is reflected in Db2 MSTR SRB time, which is not zIIP eligible. The cost of
the sync-up call might be noticeable with a remote CF.
Overall, the goal of asynchronous cross-invalidation, which is improving batch update elapsed
time by sending “write and cross-invalidation” asynchronously, is achieved.
When I/O cannot be avoided for response time-critical applications, Db2 administrators find it
hard to achieve the wanted transaction response times due to high I/O latency. In such cases,
transaction elapsed time can be drastically reduced by Db2 leveraging
IBM zHyperLink technology for I/O without changing the application. Db2 leverages
zHyperLink for database synchronous random read I/O and active log force write I/O, in which
Db2 is suspended until I/O completion.
During zHyperLink I/O, the CPU spins while waiting for the I/O to complete. The CPU spin is a
fixed clock time. In IBM zSystems hardware with higher clock speed processors, the fixed
reduced I/O latency time corresponds to a higher number of CPU cycles. As a result, the
drastically reduced I/O response time for zHyperLink comes with higher CPU cost for
IBM zSystems with faster processors. However, in a busy system where traditional I/O is
impacted by delays that are associated with asynchronous I/O, the CPU increase with
zHyperlink might be acceptable for achieving substantially reduced transaction elapsed time
for critical applications.
Db2 leverages zHyperLink technology for both database synchronous random read I/O and
active log force write I/O.
To identify all hardware and software requirements, see “zHyperLink Prerequisites” in Getting
Started with IBM zHyperLink for z/OS, REDP-5493. PTFs for zHyperLink can be found by
searching for the IBM.Function.zHyperLink fix category with APAR keyword HYPERL/K.
In order for Db2 systems to successfully run zHyperLink I/O, zHyperLink must be enabled at
the z/OS LPAR level, the storage class level, and the Db2 level.
Note: The default behavior is not to use ZHYPERLINK for the SMS storage class.
To enable zHyperLink for read I/O operations, set zHyperLink Eligible for Read to YES.
To enable zHyperLink for write I/O operations, set zHyperLink Eligible for Write to YES.
Db2 level
Use the Db2 ZHYPERLINK subsystem parameter to enable zHyperLink at the Db2 level. The
new ZHYPERLINK subsystem parameter can be updated online. The default setting is DISABLE.
Set ZHYPERLINK to one of the following three options, which control the scope of the
zHyperLink protocol for I/O requests:
ZHYPERLINK=DATABASE (enables zHyperLink for database read only)
ZHYPERLINK=ACTIVELOG (enables zHyperLink for Active Log Write I/O only)
ZHYPERLINK=ENABLE (enables zHyperLink for both database read I/O and Active Log Write
I/O)
During zHyperLink I/O, the CPU spins while waiting for the I/O to complete. zHyperLink can
reduce the time that is required to complete the I/O because the major processing
components for standard I/O operations, such as the dispatching, interrupt handling, and
CPU queue time, are no longer necessary, and Db2 code is not evicted from the processor
cache by other activities in the system. Concurrently, zHyperLink read I/O time is included as
a part of Db2 class 2 CPU time (SQL processing time), which is like synchronous CF
requests, such as GBP reads where the CPU time for CF access is charged to Db2 class 2
CPU time.
To estimate the benefit of zHyperLink, it is important to understand the disk cache hit rate.
Db2 and DFSMS added the additional instrumentation to indicate whether a disk cache hit
occurred or not.
The new fields are also available in the SMF42 type 5 record. APAR PI87082 must be
installed in Db2 12 or APAR PI99235 in Db2 11 to identify eligible candidates for synchronous
reads. The DFSMS APARs that are listed in Table 3-16 should also be installed for the
zHyperLink SMF42 counters to take effect.
Table 3-16 DFSMS APARs that are required for zHyperLink SMF42 counters
APAR DFSMS HDZ2210 HDZ2220 HDZ2230
The IBM zSystems Batch Network Analyzer (zBNA) tool can be used to identify eligible
synchronous read candidates for zHyperLink. This tool can be downloaded from IBM Z Batch
Network Analyzer (zBNA) Tool. The tool provides a top candidate list for zHyperLink and
estimates the benefit of zHyperLink I/O. Before SMF42 type 6 records for zBNA zHyperLink
eligibility analysis can be collected, the new Db2 ZHYPERLINK subsystem parameter must be
set to DISABLE or the zHyperLink feature must be disabled at the input/output supervisor (IOS)
level. The default setting of the ZHYPERLINK subsystem parameter in Db2 is DISABLE.
Figure 3-32 zBNA top data set candidates for zHyperLink read I/O
An example of the zBNA Data Filtering Capability report is shown in Figure 3-33.
Figure 3-33 zBNA total I/O rate and zHyperLink eligible I/O rate
3.10.3 Requirements
To leverage zHyperLink for Db2 database read I/O, you must be using the following
environment:
IBM z14 or later
IBM DS8880 or later storage subsystem
z/OS 2.2 or later (You can find the PTFs for zHyperLink by searching for Fix Category
IBM.Function.zHyperLink with APAR keyword HYPERL/K.)
Db2 11 (with APAR PH01123) or later
Measurement environment
Performance measurements for zHyperLink reads were performed by using the following
environment:
IBM z14 M05-3907-7E7 and z14 zR1 connected to IBM DS8886
z/OS 2.2 with the necessary maintenance
Results
The first set of measurements illustrates the benefit of reduced transaction elapsed time with
zHyperLink enabled for database read I/O, and shows that the more database I/Os that are
DASD cache hits, the larger the reduction in elapsed time.
Figure 3-35 compares the average transaction elapsed time for the same workload, one with
zHyperLink enabled for database read I/O and the other with standard I/Os that use
high-performance FICON. The total size of buffer pools is set as 10 GB, and there are an
average of 28 random database read I/Os per transaction.
Because the database read I/O wait time is reduced dramatically with zHyperLink, the total
elapsed time is reduced by half.
Figure 3-35 Average elapsed time with a 10 GB bpool with zHyperLink read
Again, due to the reduction of the database read I/O wait time, the transaction elapsed time is
reduced by 23% with zHyperLink, as shown in Figure 3-36. The benefit is less than the 10 GB
buffer pool test because there are fewer database I/Os due to the higher buffer pool hit ratio
with larger buffer pools.
Figure 3-36 Average elapsed time with a 70 GB bpool with zHyperLink read
zHyperLink I/O latency is a fixed wall clock time. Although the I/O latency benefit is same, the
zHyperLink CPU cost depends on the following factors:
The speed of processors: The faster the processors, the faster they spin. Therefore,
waiting for I/O completion becomes more expensive.
The path reduction and processor cache benefit: Path length savings are a result of
eliminating I/O interrupt processing and dispatching delay, and not missing the processor
cache during I/O operation. Although the path length savings are consistent, the benefit
from cache miss depends on the relative nesting intensity (RNI) of your workloads (see
LSPR workload categories.
Figure 3-37 CPU cost of zHyperLink reads relative to the processor speed
The next set of measurements demonstrates CPU usage when zHyperLink is used for
database read I/O.
With 70 GB buffer pools, we observed a 5% CPU increase at the LPAR level and a 13.8%
CPU increase with a 10 GB buffer pool size. Most of the cost is in Db2 class 2 time. Because
there are more cache hits in the Db2 buffer pool and less zHyperLink I/Os with 70 GB buffer
pools compared to 10 GB buffer pools, we observe less elapsed time benefit, and less CPU
impact.
The measurements in Figure 3-38 on page 93 were done on the LPAR where we ran only the
Db2 transaction workload.
Although the measurement results in Figure 3-38 are clean, lab-controlled results, the next
set of measurements are likely more realistic and better reflect real-world Db2 application
usage. For the measurements that are shown in Figure 3-39, we ran several background Db2
applications, such as Db2 batch queries, in addition to the online transaction workload.
The last set of measurements demonstrates the impact of processor speed on CPU usage.
These measurements were taken by using IBM z14 Model ZR1 processors with a lower CPU
cycle speed and by using similar workloads but at a smaller transaction rate because the
processor speed is reduced compared to a full-speed IBM z14 system. Interestingly, as we
lower the processor speed from ZR1 Z06 to ZR1 I06, the CPU impact of zHyperLink shifts
from negative to positive, as shown in Figure 3-40. This result clearly illustrates the points that
CPU savings and cost cross at some point, and that cost depends on many factors.
Summary of results
These sets of measurements that use the IBM brokerage transaction workload demonstrate
that the average transaction response time can be reduced up to 50% when zHyperLink is
used for database random read I/O compared to using zHPF. The elapsed time benefit
depends on the number of database random read I/Os that are eligible for zHyperlink. The
CPU impact that is shown as ITR loss might not be substantial depending on processor
speed, and with a slow processor speed, the ITR gain can be up to 6.4%. The CPU impact
also can be less when other background work is running on the LPAR.
Figure 3-41 DISPLAY BPOOL with zHyperLink I/O at the buffer pool level
Figure 3-42 DISPLAY BPOOL with zHyperLink I/O at the page set level
The Synchronous I/O Device Activity report, which is shown in Figure 3-44, shows
zHyperLink (Synch) I/O per second and average response times. More information about
zHyperLink (synch) I/O activity, such as average transfer rate in megabytes, % request
success, % link busy, % cache miss and % rejects, also is provided.
DASD link level (RMF ESS Synchronous I/O Link Statistics report)
The RMF ESS Synchronous I/O Link Statistics report, which is shown in Figure 3-46, shows
zHyperLink from a Storage System view. It shows the link type and the I/O operations per
second, bytes per operation, response time, and success %.
The Synchronous I/O Link Activity section of the report, which is shown in Figure 3-48,
shows the detailed link activity for each zHyperLink PFID (not all columns are shown).
Figure 3-49 The RMF PCIe: Synchronous I/O Response Time Distribution
The solution that provides the greatest performance improvement is to invest in larger
memory to optimize the performance. However, zHyperLink read support can help if you
need to improve or scale I/O-bound transactions without increasing your memory footprint.
Only the data that is found in the disk control unit cache is eligible for zHyperLink read
I/Os. Therefore, workloads with many synchronous random reads that are cache hits in
the DASD subsystem are generally good candidates for using zHyperLink read. Such
workloads show substantial transaction elapsed time improvements because of the
dramatic reduction in read I/O latency. Enabling zHyperLink if the disk cache hit rate is
less than 75 - 80% provides little to no performance benefit.
zHyperLink limits database random read I/O to a 4 K Control Interval (CI). Therefore, 8 K,
16 K, and 32 K CIs are not eligible for zHyperLink read I/O. The zHyperLink I/O-eligible
reads might include I/Os from a buffer pool other than 4 K. For example, when a
compressed index uses a buffer pool page size greater than 4 K, the page on disk for this
index data set is always 4 K; therefore, the CI size is 4 K. Db2 reads compressed index
pages from disk as 4 K pages and expands them to the buffer pool page size.
The large reduction in elapsed time when low-latency zHyperLink I/O is used comes with a
CPU cost. Although the zHyperLink I/O latency is a fixed clock time, the CPU cost varies
depending on the processor speed. The faster the speed of the processor, the more CPU
cycle spins during the fixed clock time. The negative CPU impact is less on a slower
processor.
In a busy system in which CPU utilization is high (over 85%), the CPU cost of processing
traditional asynchronous I/O might be high. In traditional I/Os, the CPU cost of the interrupt
handling process and dispatching delay, along with the associated Level1/Level2
processor cache misses, increases the code path length. zHyperLink I/O latency is low
and it eliminates interrupt handling process and dispatching delay. Therefore, in a busy
system, the CPU impact of zHyperLink might not be significant when it is compared to
traditional asynchronous I/O.
The shorter transaction response time results in an increase in CPU cost for high-end
IBM zSystems. However, the CPU usage increase for the target workload is a reasonable
tradeoff for the tremendous reduction in transaction elapsed time when 95% of the database
reads are cache hits and the LPAR CPU is busy with other background work.
For mid-range systems, such as IBM z14 ZR1, which has subcapacity processors, a
reduction in transaction elapsed time can be attained with minimal to even a positive effect on
CPU usage. zHyperLink I/O is a good option for database random read I/O for mid-range
systems such as ZR1 in which memory expansion is limited and increasing buffer pool size
might not be an option.
The low latency of zHyperLink I/O makes it possible to achieve improved transaction
response time with no changes to user applications.
In 2019, zHyperLink write was introduced only for simplex and synchronous copy (Metro
Mirror).
In 2020, for z/OS 2.3 and later, zHyperLink write support was extended to asynchronous copy
(Global Mirroring but not for Extended Remote Copy (XRC)). For more information about the
zHyperLink write capability for synchronous and asynchronous mirroring, see zHyperLink
Write Support.
Initially, when Db2 leveraged zHyperLink for active log writes, it did not run parallel writes in a
dual active log configuration. In 2021, z/OS DFSMS Media Manager added support to Db2 to
write to dual active logs in parallel with a single interface call.
The Db2 writes to active logs are processed under an enclave SRB that runs in the Db2
system services (ssnmMSTR) address space. The CPU cost of spinning while waiting for the
active log write I/O to complete is 100% zIIP eligible and does not add to operational cost.
The reduced log write latency with no extra operational cost and no changes to user
applications makes zHyperLink Write I/O a viable feature to improve Db2 transaction
performance.
Measurement environment
Performance measurements for using zHyperLink write I/O for active log writes are run in the
following environments:
Db2 12 at FL 508, running on z/OS 2.4 with necessary maintenance.
Db2 13 at FL 501, running on z/OS 2.5 with necessary maintenance.
IBM z14 M05-3907-7E7 connected to an IBM DS8886 system.
IBM z15 connected to an IBM DS8886 system.
INSERT Intensive Workload on Db2 12 FL 508 and IBM z14.
IBM brokerage transaction workload on Db2 FL501 and IBM z15.
All measurements do not use striped active logs unless otherwise described in the test
scenario.
Results
Workloads that are update-intensive with frequent commits benefit the most from
implementing zHyperLink for Db2 active log writes.
Using zHyperLink I/O to write to dual active logs results in a 54% reduction in class 3 active
log write I/O wait time in Db2. This reduction improved transaction latency by 24% compared
to active log writes that use traditional I/O through zHPF links.
When zHyperLink write was used for dual active logs, we observed a 63% reduction in log
write I/O wait time in Db2, and then a 41% reduction in Db2 latch time. The drastically
reduced active log write zHyperLink I/O time and fewer Db2 latch contentions improved
transaction latency time by 40% when compared to using zHPF for writing to dual active logs.
The IOS component of z/OS limits zHyperLink writes to two tracks. Db2 active log writes can
be up to 512 K (128 4-K records) or up to 12 tracks. Therefore, when the log write size gets
larger, more I/O requests occur when zHyperLink is used. Because the I/O latency is
drastically reduced with zHyperLink, the average transaction latency is still reduced
compared to using a zHPF link.
The next set of measurements show a reduction in transaction response time despite a large
log write size when zHyperLink is used. The same insert-intensive workload was run after
enabling zHyperLink write for active log data sets with a stripe size of 4 on uncompressed
data to increase the log write size to 128 4-K records. We increased the log write size by
increasing the number of inserts per commit.
Figure 3-54 The active log writes with different commit frequency
Using zHyperLink for active log writes also results in higher CPU usage because the CPU
spins until the I/O operation completes. However, unlike zHyperLink database random read
I/O, active log write is charged to the Db2 System Services (MSTR) address space CPU
under an SRB, and this work is zIIP eligible. zIIP usage does not incur extra charges.
Workloads with a high read I/O percentage might not always show improvement in
transaction rates if the active log write I/O time is not a major factor that is impacting
transaction elapsed time. The next set of measurements demonstrate that for these
workloads that zHyperLink writes did not result in a noticeable transaction elapsed time
improvement, as shown in Figure 3-56.
The results from these measurements demonstrate the benefit of different zHyperLink
enablement options (that is, different ZHYPERLINK subsystem parameter settings) on improving
transaction elapsed time.
When zHyperLink is enabled for active log write I/O only with the ACTIVELOG option, we
observed an approximately 1.9% reduction in elapsed time compared to using zHPF for log
write I/O.
When zHyperLink is enabled for database random read I/O with the DATABASE option, elapsed
time is substantially reduced by 25.9% compared to using zHPF for read I/O.
When zHyperLink is enabled for both active log write I/O and database random read I/O by
using the ENABLE option, we observed a further reduction of 27.4% in elapsed time compared
to using zHPF for both reads and writes.
IBM also conducted a performance study that involved the SAP Core Banking Account
Settlement workload with Db2 12 that used zHyperLink link for active log write I/O. This
workload simulates account balancing by calculating the interest and charges of all the
accounts in the test database. The Db2 12 database that was used in this study contains a
total of 100 million banking accounts, which is comparable to some of the largest banks in the
world. The result of this evaluation also shows 20% reduction in total batch elapsed time
when zHyperLink is used for active log write I/O compared to when zHPF link is used for
active log write I/O. For more information, see Evaluation of zHyperLink Write for Db2 Active
Logging with SAP Banking on IBM Z.
During our striped active log data sets measurements, we increased the log write size by
increasing the number of inserts per commit. We observed that the elapsed improvement
reduced as we increased the log write size by inserting more records per commit. We still
observed a minimum of 20% elapsed time improvement.
A CPU increase for the Db2 ssnmMSTR address space occurs when zHyperLink is used for
active log writes. This CPU time is fully zIIP eligible, which does not add to operational cost.
No increase in the transaction class 2 CPU time was observed.
The evaluation of the different options of the ZHYPERLINK subsystem parameter with the IBM
brokerage transaction workload illustrates that not all workloads benefit equally from using
zHyperLink. Therefore, before setting ZHYPERLINK, determine where the bottlenecks to
achieving lower transaction latency are in your environment.
When dual active logging is used, both log data sets must be on zHyperLink enabled DASD to
be eligible for zHyperLink write.
zHyperLink writes can write up to two tracks per I/O request due to a z/OS IOS limitation. This
limitation can result in more log I/Os for the same workloads when zHyperLink for active log is
enabled. zHyperLink write support for synchronous mirroring is available on the IBM DS8880
and IBM DS8900. For Metro Mirror environments, the devices on which the Db2 log data sets
are on must be in the duplex state and in a HyperSwap configuration, and zHyperWrite must
be enabled. All IBM DS8800 devices in a Metro Mirror environment must be within
150 meters, and all of them must be connected through zHyperLink. If Metro Mirror (PPRC) is
enabled for the active log data sets, zHyperLink uses zHyperWrite technology to write to the
secondary. When you enable ZHYPERLINK write, z/OS enables zHyperWrite for peer write
regardless of the value that you specify for the zHyperWrite subsystem parameter
(REMOTE_COPY_SW_ACCEL) if the distance between primary and secondary is within 150 meters.
zHyperLink write support for asynchronous mirroring is available only on the DS8900.
CPU spin time while waiting for zHyperLink active log write completion is accounted for in
System Services (MSTR) SRB time, which is zIIP eligible. Therefore, you must plan for extra
zIIP capacity before you enable zHyperLink for active log write I/O.
When implementing zHyperLink for Db2, evaluate your workloads to assess zHyperLink
eligibility and benefits. zBNA is a useful tool for evaluating the eligibility and benefits of
leveraging zHyperLink I/O.
Huffman compression, also referred to as entropy encoding, uses the same Ziv-tree- based
mechanism, but it accounts for the sequence frequency. It uses variable-length codes to
replace sequences. More frequent sequences are replaced with shorter symbols. This
replacement is done by adding a symbol translation stage, during which the fixed-length node
index is mapped into a variable-length symbol lengthening from 1 - 16 bits. By using such a
methodology, more data can be replaced with shorter symbols, which achieves a better
compression ratio.
A better compression ratio also means that the same number of records can be contained
with fewer data pages, which result in less getpage activity during query execution. This
reduction saves elapsed time, especially when large amounts of data, typically brought into
the buffer pool through sequential prefetch, are involved. If you want to reduce the run time of
your queries, you can choose Huffman compression for these objects to reduce the number
of getpages that must be processed.
Starting at Db2 12 FL 509 you also can specify the usage of Huffman compression at the
individual table space or partition level by using the COMPRESS YES HUFFMAN option.
If any of the prerequisites are not met, a classic dictionary is created, and data is compressed
by using the classic (fixed-length) algorithm. If a Huffman dictionary exists for the table space
or table space partition, the record is stored uncompressed.
Compression on insert
When the requirements are met, a Huffman dictionary can be built by inserting data into a
table space or partition. After the inserted data volume exceeds a threshold (which is
determined by Db2), a Huffman dictionary is built, and when subsequent data is inserted (or
updated), it is compressed by using this dictionary.
Starting at Db2 12 FL 509, Huffman compression can be specified at the table space or table
part level. If there is existing data in a table space that is compressed with the fixed-length
algorithm (that is, a fixed-length dictionary exists) and you alter the table space to COMPRESS
YES HUFFMAN, subsequent inserts continue to use the fixed-length dictionary to compress rows
until you do a REORG to convert the dictionary to a Huffman dictionary.
Compression by LOAD
Huffman compression can also be used with the LOAD utility. When you load data into an
empty table space or partition without specifying KEEPDICTIONARY, or when you load data with
REPLACE without KEEPDICTIONARY, Db2 builds a compression dictionary while the records are
loaded. After the dictionary is built, subsequent loaded data is compressed by using this
dictionary.
Compression by REORG
The REORG utility can be used to convert a table space or partition that was compressed by
using the classic fixed-length algorithm to Huffman compression, or to convert uncompressed
data to Huffman compressed data. To do so, ensure that the Huffman compression
prerequisites are met, and REORG the table space or partition without specifying
KEEPDICTIONARY.
Information about compression savings is also available in Db2 catalog tables if the RUNSTATS
utility or inline statistics were run against the table space. The PAGESAVE column in
SYSIBM.SYSTABLEPART contains the percentage of pages that are saved.
You can also use the DSN1COMP stand-alone utility to estimate the amount of compression that
you can expect without compressing the table space or partition. For more information, see
3.11.5, “Using DSN1COMP for Huffman compression estimation” on page 125.
Measurement environment
The following system configuration was used to conduct these measurements:
IBM z14 (software decompression on IBM z13) with six general CPs, one zIIP, and 64 GB
of memory
z/OS 2.2
Stand-alone Db2 12 at FL 504
Compression ratio
Because Huffman compression uses a more efficient algorithm, the compression ratio also is
expected to improve when compared to using fixed-length compression. The LOAD and REORG
utilities were used to evaluate the compression ratio of Huffman compression against the set
of workloads.
During our measurements, Huffman compression achieved a 60%- 95% compression ratio
with an average of 80%.
The results showed no significant difference between using fixed-length compression and
Huffman compression in terms of the overall elapsed time and total CPU time of the entire
workload. We also observed that Huffman compression saved more data pages than
fixed-length compression.
However, individual packages and jobs showed differences. The elapsed time difference
ranges from -73% to 380%, and the CPU time difference ranges from -15% - 101%. Our tests
revealed the following extra findings at the sub-workload level.
Utility workload
The tables were populated by using the LOAD utility. The number of data pages that were
saved for fixed-length compression is 54% and for Huffman compression is 59%. The data in
Table 3-18 shows a slight improvement in total elapsed time and no noticeable total CPU time
difference for the overall utility workload.
For individual utility jobs, we recorded the following observations, which are shown in
Table 3-19 on page 119:
For the COPY utility, we observed an approximately 10% elapsed time improvement and a
9% CPU time improvement. Improvement exists in both COPYR and COPYW phases,
and data getpage activity was reduced by approximately 10%.
For the RECOVER utility, we observed an approximately 12% elapsed time improvement and
a 10% CPU time improvement. Improvement exists in the RESTORER, RESTOREW, and
LOGAPPLY phases.
For other utility jobs such as LOAD, UNLOAD, REBULD INDEX, and REORG, no significant
difference was observed in both elapsed time and CPU time.
Elapsed time Elapsed time Delta (%) CPU time CPU time Delta (%)
OLTP workload
We observed a slight elapsed time degradation and no noticeable CPU time difference, as
shown in Table 3-20.
Query workload
As shown in Table 3-21, we observed a slight total elapsed time degradation and no
noticeable total CPU time difference. However, compared to fixed-length compression, the
impact of Huffman compression on the elapsed time and CPU time performance of individual
queries is significantly different and reveals a wide range of variation:
Elapsed time changed in the range of -59% - 380%.
CPU time changed in the range of -12% - 30%.
Table 3-22 and Table 3-23 list the query types that were used during this test.
Type A $SEL15 Select column 15, where column 15=xx, 0 row qualified.
Type B $SELB Select columns 1, 2, 3, 4, and 5, where column 34=xx, 1000 rows qualified.
$SELG Select columns 35, 36, 37, 38, and 39, where column 34=xx, 1000 rows
qualified.
$SELH Select columns 1, 2, 3, ... 30, 32, 33, and 34, where column 30=xx, 1000
rows qualified.
The results show that expanding Huffman-compressed data adds between approximately 4%
and 65% extra CPU compared to fixed-length compressed data, and it saves approximately
-3% - 38% elapsed time depending on the query type.
Figure 3-62 on page 121 and Figure 3-63 on page 121 compare the CPU cost and elapsed
time that is associated with expansion for different types of queries for fixed-length expansion,
Huffman expansion, and uncompressed data.
Figure 3-63 Db2 class 2 elapsed time of table space scan queries
To expand the objects that were compressed by the fixed-length algorithm, Db2 might use a
technique called partial expansion and decompression. The partial decompression is effective
against records with many columns because it expands only the necessary part of the record
instead of expanding the entire record.
Db2 does not support partial expansion when the Huffman algorithm is used. Therefore, the
CPU cost for different types of queries varies for fixed-length expansion (depending on the
number of columns that are selected and the predicates evaluated), but it is consistent for
Huffman expansion.
To evaluate expansion cost, we evaluated same types of queries that are listed in the previous
section. Our measurements show that Huffman expansion on IBM z13 requires 600% - 900%
more CPU time than using fixed-length compression. A query that uses count(*) without
predicates is an exception because the data does not need to be expanded for the evaluation.
In terms of elapsed time, Huffman expansion on IBM z13 requires 200% - 300% more
elapsed time when compared with fixed-length expansion, but it saves approximately 10%
elapsed time compared to uncompressed data.
Figure 3-64 on page 123 and Figure 3-65 on page 123 show the class 2 CPU and elapsed
time respectively for these measurements.
Figure 3-65 Db2 class 2 elapsed time of table space scan on IBM z13
As shown in Figure 3-65, four types of queries were tested. They all use a pure table space
scan (rscan), which is the worst case scenario.
Type A: A 924% increase in CPU and a 325% increase in elapsed time compared to
fixed-length expansion on IBM z13.
Type B: An 858% increase in CPU and a 316% increase in elapsed time compared to
fixed-length expansion on IBM z13.
Performance comparison
For the SAP Account Settlement workload (batch processing), the total batch elapsed time
was shortened by 6%, and the database server ITR improved by 3% for Huffman
compression versus the fixed-length baseline. For the SAP Day Posting workload (OLTP), the
average response times are in the subsecond range and comparable for both encoding
techniques, but the database server ITR was down by 4% for the Huffman compression
versus the fixed-length baseline.
For more information about this measurement, see Performance Evaluation of Huffman
Compression with SAP Banking on IBM Z.
Compression ratio
We observed better compression ratio when using the REORG utility as opposed to the INSERT
or LOAD utilities. Consider running the REORG utility to gain the full benefit of Huffman
compression.
Performance of expansion
Our measurements indicate that the performance of expansion highly depends on the data
access pattern and the compression ratio. In a typical OLTP environment with heavy index
access, the cost of the data page decompression tends to be small, and you do not see a
significant difference between Huffman and fixed-length compression.
For most of the queries that we tested, the elapsed time was reduced because fewer getpage
and prefetch requests were needed when processing large amounts of data, such as when
performing a large table space scan. To estimate the impact of transforming fixed-length
compression into Huffman compression, you can inspect the query types and their access
paths in your applications to see which query type or types to which they belong.
Figure 3-66 DSN1COMP output of compression statistic by using the ALL option
To use DSN1COMP to estimate the efficiency of Huffman compression in Db2 12, you must have
the PTF for APAR PH19242 applied.
By default, DSN1COMP provides estimates like the compression savings that the LOAD utility
would achieve. If you specify the REORG keyword, DSN1COMP provides estimates like the
compression savings that the REORG utility would achieve.
To avoid spending too much time and processor resources on compression estimation,
DSN1COMP provides a ROWLIMIT option to limit the number of rows that are scanned. A sample
of the first 5 - 10 million rows of the table space or partition suffices. You can calculate the
proper ROWLIMIT based on the row length of the table for which you are doing the estimation.
Table 3-24 DSN1COMP estimated and actual savings with the LOAD option
Table Original Actual data size Actual DSN1COMP Actual data DSN1COMP
total data after savings estimated size scanned estimated
size (KB) compression savings with limited savings
(KB) (scanned all rows (KB) (scanned
rows) limited rows
(10 million
rows
scanned))
These results show that DSN1COMP provides an accurate estimate of the data compression
savings even when the ROWLIMIT option is used.
TB13 is an exception, which might because of the data distribution and data characteristics.
TB12 is not compressed because the data was encrypted by the application.
3.12 Encrypting Db2 data with z/OS data set encryption and CF
structure encryption
Data security is a core part of the modern IT industry. Db2 for z/OS takes every opportunity to
improve its data protection capabilities. When z/OS DFSMS introduced the pervasive
encryption of data-at-rest feature that is called z/OS data set encryption, Db2 for z/OS was
among the first products to leverage this new feature and offered users the option to encrypt
transparently Db2 data sets. When a data set is encrypted, unauthorized users cannot view
the contents of the data set in clear when it is on the disk.
3.12.1 Requirements
Before you can start using DFSMS data set encryption with Db2 related data, your system
must meet the hardware and software requirements.
For more information about requirements and considerations for encrypting Db2 data sets,
see Encrypting your data with z/OS DFSMS data set encryption.
CF structure encryption
To encrypt Db2 GBP and SCA data in the CF, your system must meet the following
requirements:
Hardware requirements:
– IBM zEnterprise EC12 or IBM zEnterprise BC12 or later servers
– Crypto Express3 Coprocessor or later
– Feature 3863 CPACF
z/OS 2.3 and later.
ICSF installed and configured with a CKDS and AES master key loaded.
Db2 version: All supported versions.
To illustrate how to identify the data decryption cost, if you have a table space with a data set
that is encrypted and your workload does synchronous database I/Os against this table
space, then each database I/O requires data decryption when the data is read from disk into
the Db2 local buffer pool. Because data decryption for VSAM data sets is part of the I/O
interrupt handling, the CPU cost is accounted for under the ssnmDBM1 address space. By
looking at the statistics report formatted by OMPE, you are likely to see an increase in “CPU
FOR I/O” time (QWSAIIPT) for the database services (ssnmDBM1) address space in the “CPU
TIMES 2” trace block.
To explain to where the data encryption cost goes, take active logs, for example. If your active
logs are created as encrypted data sets, then each log write I/O causes the corresponding log
CIs to be encrypted on writing to the disk. Because the system services (ssnmMSTR)
address space is responsible for log write activities, this data encryption cost is accounted for
as a part of ssnmMSTR PREEMPT SRB time and is zIIP eligible.
The different elapsed time and CPU time components that are affected by the usage of data
set encryption and the instrumentation that is available to assess the impact of reading/writing
encrypted Db2 data sets and log data sets, are described in more detail next.
Measurement environment
A series of tests was designed to evaluate the overhead of running workloads with various
types of encrypted data sets, including table space data sets, log data sets, utility data sets,
and so on. The data set encryption cost is related to I/O operations, so the actual cost that
you see for your workloads depends on the I/O characteristics of the data sets that you
choose to encrypt, and they also depend on your hardware configuration.
Measurement results
This section presents the measurement results that can help you understand the changes
that you might see in different performance areas after implementing data set encryption.
Figure 3-67 shows the performance measurement numbers that reflect these observations.
By using these measurements and comparing the scenarios in which the table space data
sets were encrypted versus not encrypted, we made the following observations:
OTHER READ time increased by as much as 28%. This increase caused class 2 elapsed
time to increase by approximately 11%.
Most of the decryption CPU cost for sequential prefetch goes to the ssnmDBM1 I/O
interrupt time, and the increase is as large as 13 times. Concurrently, the ssnmDBM1
preemptive SRB time shows a small reduction.
The total Db2 CPU time, including class 2 CPU and all Db2 address spaces CPU time,
increased by approximately by 24%.
Figure 3-69 on page 133 shows the performance measurement numbers that reflect these
observations.
From the measurement results, as shown in Figure 3-70 and Figure 3-71 on page 134, we
observed the following facts that are related to the encryption cost with data set encryption for
Db2 user data.
The data encryption CPU cost is registered under ssnmDBM1 SRB time and is zIIP
eligible for both deferred write and cast-out operations.
As we were looking at deferred write performance, our test triggered heavy deferred write
activity. This heavy deferred write activity resulted in long page latch suspension for the
workload. When the data is encrypted, deferred writes must do extra work to encrypt the
data, so longer class 3 suspension time for existing “pressure points” (such as page latch
suspension time in our scenario) should be expected when the underlying data sets are
encrypted.
Encrypting 32 K pages costs more CPU per page than encrypting 4 K pages because
32 K CIs are being encrypted for each write:
– As shown in Table 3-26 on page 130, the 4 K deferred write test writes 6.1 million
pages, and the ssnmDBM1 IIP SRB time increase is 6.905 seconds for data
encryption, which calculates to approximately 1.13 seconds per 1 million pages written.
– The 32 K deferred write test writes 0.8 million pages, and the ssnmDBM1 IIP SRB time
increase is 2.802 seconds for data encryption, which calculates to approximately
3.5 seconds per 1 million pages written.
Figure 3-73 Active log decryption, log offload: MSTR address space CPU
For archive log data set decryption, we designed a large rollback workload that reads 3.5 GB
of archive logs. We first ran an MRI batch workload without committing to fill a 4 GB active log
to 88% full, and then we suspended the program. Next, we issued the -ARCHIVE LOG
command to offload the log data to archive logs and made sure that the data was no longer
available on any active log data set. After that, we canceled the suspended MRI batch job,
which required the rollback to read log data from the archive logs.
The measurement details are shown in Figure 3-74, Figure 3-75, and Figure 3-76 on
page 137.
Figure 3-74 Archive log encryption, log offload: CPU and elapsed
Note: The rollback elapsed time is calculated as the elapsed time between the cancel job
command and job cancel success time.
Summary of results
Let us summarize the knowledge that we gained so far about the encryption and decryption
cost of running Db2 workloads with different types of Db2 data sets encrypted.
Encrypting user and catalog table spaces and indexes:
– Data decryption for table spaces and index spaces happens during the read I/Os, such
as synchronous database I/Os and prefetch I/Os, and the CPU cost is accounted for as
part of the ssnmDBM1 address space’s I/O interrupt time.
– Data encryption for table spaces and index spaces occurs during the write I/Os, such
as deferred writes and GBP cast-out operations, and the data encryption cost is
accounted for as part of ssnmDBM1 PRERMPT SRB time and is zIIP eligible.
Encrypting Db2 logs
– For active logs, writing to encrypted log data sets causes the ssnmMSTR address
space to spend more zIIP eligible PREEMPT SRB time to encrypt the log CIs. When
Db2 must read active logs for offloading, rollbacks, recovery, and so on, the
ssnmMSTR address space decrypts the log CIs, and the CPU cost is accounted for as
part of the I/O interrupt service time of the ssnmMSTR address space.
– For archive logs, encryption happens during the offload process, and decryption
happens when operations such as rollback and recovery must read log records from
(encrypted) archive logs. The archive log encryption cost is charged to MSTR
NONPREEMPT SRB time and is not zIIP eligible. The cost of archive log decryption
also is charged as part of MSTR NONPREEMPT SRB time and is not zIIP eligible.
Measurement environment
The following configuration was used for these measurements:
IBM z14 processors.
Two Crypto6S crypto cards that are configured the LPAR.
512 GB of LPAR storage.
DS8870 disk controller.
z/OS 2.3.
Db2 12 FL 501 in a one-way data-sharing environment.
Z14REONN No No No No
1. Pre-conversion, as a baseline measurement: The REORG utility was run without any data
sets being encrypted.
2. Conversion: Run the REORG utility after configuring the data sets with the proper encryption
configuration (a RACF data set profile was used to assign the encryption key label). The
Db2 log data sets also are encrypted.
3. Post-conversion: With the table space and its indexes using encrypted data sets, measure
the REORG utility performance with all data sets encrypted.
The REORG utility tests were run when no other applications were accessing the table space.
The REORG utility was run with following commands (we used the LOG YES option to stress the
log activities):
TEMPLATE UNLDDS UNIT(SYSDA)
DSN(TPCEB0.DSNDBC.TPCEB100.&TS..P&PART.)
SPACE (3000,3000) CYL DISP(NEW,DELETE,CATLG)
LISTDEF REORGLST INCLUDE TABLESPACES
TABLESPACE TPCEB100.TEHHTS PARTLEVEL(1:1000)
REORG TABLESPACE LIST REORGLST UNLDDN UNLDDS
SORTDEVT 3390 SORTNUM 4 LOG YES PARALLEL 40
The REORG utility job was invoked with the STATSLVL(SUBPROCESS) parameter, and we used the
utility statistics and OMPE reports to analyze the performance results.
The utility job uses 2% more class 1 CP CPU. Most of the CPU increase is from the
RELOAD and BUILD phases, during which Db2 must read from the encrypted UNLDDS data
sets. For more information, see Figure 3-77 and Figure 3-78.
ssnmDBM1 shows an increase of 5% in its IIP SRB time for writing data to encrypted table
space and index data sets. For more information, see Figure 3-79 on page 141.
With both active logs and archive logs encrypted, the ssnmMSTR address space shows a
72% increase in total CPU usage. However, 76% of this CPU increase is used on a zIIP
engine. For more information, see Figure 3-79.
We observed the following results when we compared the post-conversion REORG TABLESPACE
utility test (Z14REOEE) to the pre-conversion test (Z14REONN):
The utility job runs approximately 4% longer. From the utility statistics, UNLOAD, RELOAD, and
SORTBLD all had longer elapsed times. For more information, see Figure 3-77 on page 140
and Figure 3-78 on page 140.
The utility job uses 4% more class 1 CP CPU. Most of the CPU increase is from the
RELOAD and BUILD phases, during which Db2 must read from encrypted UNLDDS data sets
and write to encrypted table space and index data sets. For more information, see
Figure 3-77 on page 140 and Figure 3-78 on page 140.
ssnmDBM1 shows an increase of 6% in IIP SRB time for writing data into encrypted table
space and index data sets. For more information, see Figure 3-79.
With both active logs and archive logs encrypted, the ssnmMSTR address space shows a
77% increase in total CPU usage. However, 72% of this CPU increase is used on a zIIP
engine. For more information, see Figure 3-79.
For more information about setting up encrypted CF structures, see Encrypting Coupling
Facility structure data.
Like the z/OS data set encryption feature, the CF encryption feature also is an important part
of the IBM Z pervasive encryption solution. To understand how Db2 workload performance
can change by encrypting GBP structures, we designed and ran many workloads with heavy
GBP CF structure read or write operations. This section describes the results of these tests.
Because SCA CF structure activity rates are usually low, we did not measure the SCA CF
structure’s encryption cost.
Db2 writes to the GBP CF structures when the GBPCACHE option requires that the pages must
be cached in GBP structure for GBP-dependent objects, which is either triggered by deferred
writes or at transaction commit time.
To evaluate GBP prefetch read performance with CF encryption enabled, we used a full table
space scan workload that reads data pages from the GBP through prefetch.
To evaluate how GBP read with CF encryption enabled performs without prefetch (which
means that the GBP read requests are performed under the application thread), we used a
semi-random select workload that reads one data page per select from the GBP.
The table spaces that we used for these tests used a 4 K page size. Table 3-29 describes the
workload characteristics.
GBP READ for 4 K pages through Full table space scan with data read 248.7
prefetch from GBP
GBP READ for 4 K pages under Semi-random select with data that is 1046.5
application thread read from GBP
The results of the table space scan tests, which involve reading and decrypting the GBP data
from the CF structures through prefetch, are shown in Figure 3-80.
The results of the semi-random select tests, which involve reading and decrypting the GBP
data from the CF structures under the application thread, are also shown in Figure 3-80. In
summary, we made the following observations:
Most of the CF structure decryption cost is charged to the class 2 CPU time of the
application. With every data page read from the GBP CF structure, we observed a 6%
increase in class 2 CPU time.
The Db2 address spaces did not show a notable CPU usage difference.
Table 3-30 on page 145 summarizes where the CPU usage is charged when encrypted GBP
CF structures are used for the most common scenarios.
GBP READs under the application thread Application Class 2 CPU Maybe a
thread
We used the IBM brokerage transaction workload to evaluate the performance impact of
pervasive encryption on a real-life complex OLTP workload. This workload provides a better
reference for users with OLTP workloads, which sets more appropriate expectations about the
cost of running Db2 workloads with encrypted data sets and CF structures.
The IBM brokerage transaction workload runs various SQL/PL stored procedures, such as
look up, update, insert, and reporting queries to simulate transactions in a brokerage firm. It
accesses more than 17,000 objects (that is, tables and indexes), totaling 1.3 TB of data.
Measurement environment
The performance measurements were collected by using a two-way data-sharing
configuration. Two types of CPUs, IBM z13 and IBM z14, were used to evaluate the
encryption cost that is associated with each type of CPU. We used two environments to
obtain our measurements:
Environment 1:
– Two IBM z13 LPARs with six dedicated general CPs each with a Crypto5S for secure
key protection. Each LPAR has 512 GB of storage.
– Two IBM z13 ICFs with three dedicated CPs and 256 GB storage each.
– Four ICPs and four coupling over InfiniBand (CIB) links are shared between two LPARs
and two ICFs.
– z/OS 2.2 and CFCC Level 21 for the tests with only data sets encrypted.
– When the data sets are encrypted, this situation is referred to as Transparent Data
Encryption (TDE), or z/OS data set encryption.
– z/OS 2.3 and CFCC Level 21 for the tests with both data sets and CF structures
encrypted.
Measurement results
During the measurement period, transactions run at a rate of approximately 5,000 per second
for the data-sharing group with two members. In terms of the amount of work that is done by
encryption and decryption, it translates into the following metrics:
25,000 4 K pages were decrypted per second. This metric is the result of five sync I/Os
per transaction against encrypted objects with a 4 K page size.
10,000 4 K pages were decrypted per second. This metric is the result of two cast-out
pages per transaction against encrypted objects with a 4 K page size.
7,000 encrypted CIs per second were written to active log data sets by two members of
the data-sharing group.
10,000 CIs per second were decrypted and encrypted through the archive log process by
reading from encrypted active log data sets and writing to encrypted archive log data sets
by two members.
There were no I/Os to the Db2 directory or catalog during the measurement interval.
Therefore, no encryption or decryption work is needed.
Encrypting and decrypting 70,000 4 K pages per second was done by CF encryption and
decryption for all GBPs and SCA structures.
Overall, the rate of encryption and decryption by data set encryption alone for this workload
was approximately 52,000 4 K pages per second (a rate of 122,000 4 K pages per second
when CF encryption was enabled).
In Figure 3-82 on page 147 and Figure 3-83 on page 147, the “Base” test is the measurement
that is run without any data set or CF structures that are encrypted. The “TDE” test is the
measurement with only data sets, including all active and archive logs, and the selected table
space and index data sets are encrypted. The “TDE with CF” test is the measurement that is
done with the same data sets encrypted as the “TDE” test, and with all the GBP and SCA
structures encrypted.
On z14 processors, the encryption cost for TDE, in terms of ITR, is observed at 0.4%, but the
ITR loss is at 2.4% when both TDE and CF encryption are in effect. Again, minimal
differences in transaction elapsed time were observed. Figure 3-83 illustrates the encryption
cost, in terms of ITR loss, for this workload running on an IBM z14 processor.
CF structure encryption is transparent to Db2, and Db2 does not provide extra traces to track
GBP or SCA structure encryption statuses and performance.
The -DISPLAY GROUP, -DISPLAY LOG, and -DISPLAY ARCHIVE commands can be used to display
the encryption key labels that the Db2 data-sharing group, the current active log data sets,
and archive log data sets are using when you are running with Db2 12 at FL 502 and later.
Protecting data comes with a price. Before implementing data set encryption and CF
structure encryption, you can use the no-charge zBNA tool that is provided by IBM to
estimate the cost of encryption based on your current workload characteristics. An IBM
Community article, zBNA for Pervasive Encryption Estimation, describes how the tool is used
to estimate encryption cost for data sets and CF structures. To download the tool, see IBM Z
Batch Network Analyzer (zBNA) Tool.
If your Db2 subsystems have data that is sensitive enough that it needs more protection, with
all the information that is provided in this section, you can start planning to implement data set
encryption and CF structure encryption with a data-set-by-data-set and structure-by-structure
approach.
3.12.8 Conclusion
When running Db2 workloads with encrypted data sets and CF structures, there are some
CPU and elapsed time costs. The CPU cost mostly goes to Db2 address spaces, that is,
ssnmMSTR and ssnmDBM1. Class 2 CPU time usually is not impacted. Class 2 elapsed time
can see some degree of increase because each I/O operation and CF data read/write
operation needs more time to encrypt or decrypt data.
Our in-lab tests show that CPU and elapsed time costs are lower when running with a newer
IBM zSystems processor (IBM z14 or later) that includes a pervasive encryption solution,
rather than on older IBM zSystems processor models (IBM z13 or earlier).
The following workloads were used to evaluate the overall performance of Db2 13 for z/OS
compared to Db2 12 for z/OS:
Performance regression bucket
IBM brokerage transaction workload (version 1)
IBM brokerage transaction workload (version 2): Effect of REBIND (one-way data sharing)
on performance
Distributed IBM Relational Warehouse Workload workload
High-insert batch workloads
Relational transaction workload (CICS and Db2)
IBM query regression workload
SAP banking workloads
For more information about the workloads that were used, see Appendix A, “IBM Db2
workloads” on page 403.
As illustrated in Figure 4-1 on page 151, the following results were measured at the
sub-workload level:
For DDL, BIND, column, query, XML, and LOB workloads, Db2 13 is almost equivalent to
Db2 12.
For batch workloads, on average Db2 13 has about 2.5% CPU time improvement
compared to Db2 12. Inside the batch sub-workload, the CPU time improvement for
individual batch jobs can be up to 20% (see Figure 4-2). The improvement is mostly due to
the index buffer pool getpage reduction, which results from the index look-aside
optimization for sequential insert, update, and delete processing or the fast index traversal
enhancements, including fast traversal blocks (FTB) enablement for non-unique indexes
by default in Db2 13, and the index key size increase for FTB eligibility. For more
information, see 2.6, “Fast index traversal enhancements” on page 27.
Figure 4-2 CPU time improvement for batch jobs for Db2 12 versus Db2 13
Figure 4-3 Elapsed and CPU time improvement for the utility subworkload for Db2 12 versus Db2 13
A large improvement was seen in the set of REORG utility test cases, as shown in Figure 4-3.
This improvement is due to improvements in REORG INDEX performance, as shown in
Table 4-1. This performance improvement is the result of using the NOSYSUT1 keyword for
REORG INDEX as the default system parameter. The NOSYSUT1 enhancement (described in
more detail in 9.7, “REORG INDEX NOSYSUT1” on page 303) avoids the usage of an
SYSUT1 data set (NOSYSUT1), and it enables utility subtasks to unload and rebuild the index
keys in parallel.
Table 4-1 Elapsed and CPU time improvement for REORG for Db2 12 versus Db2 13
Parameter Elapsed time improvement (%) CPU time improvement (%)
Real storage usage for the DBM1 address space increased by about 400 MB (about 2%) in
Db2 13 compared to Db2 12. The storage increase comes from more FTB usage and the
REORG INDEX NOSYSUT1 performance improvement.
Performance measurements were captured for both Db2 12 FL 510, which serves as the
baseline, and Db2 13 FL 501 to demonstrate performance enhancements.
Table 4-2 Db2 CPU time per transaction enhancement and getpage reduction for two-way data sharing
Db2 version Db2 12 Db2 12 Db2 13 Db2 13 Delta
(FL 510) (FL 510) (FL 501) (FL 501)
Table 4-3 shows the results for the one-way data-sharing measurements.
Table 4-3 Db2 CPU time per transaction enhancement and getpage reduction for one-way data sharing
Db2 version Db2 12 (FL 510) Db2 13 (FL 501) Delta
Table 4-4 Db2 CPU time per transaction enhancement and getpage reduction for non-data sharing
Db2 version Db2 12 FL 510 Db2 13 FL 501 Delta
Table 4-5 shows the effect of the increased number of FTB eligible indexes on the notify
traffic, which results in higher IRLM CPU usage.
Table 4-5 Increased traffic in Notify Message with more FTB eligible objects
Db2 version Db2 12 Db2 12 Db2 13 Db2 13 Delta
(FL 510) (FL 510) (FL 501) (FL 501)
Many internal optimizations and enhancements, which apply to Db2 13 at FL 501 and are not
applicable to Db2 12, contribute to performance improvement. However, fast index traversal,
also known as FTB, is the one of the most important contributors to performance
enhancement for this workload in Db2 13 at FL 501. For more information, see 2.6, “Fast
index traversal enhancements” on page 27.
Table 4-2 on page 154, Table 4-3 on page 154, and Table 4-4 show a reduction in getpage
count per commit in Db2 13 at FL 501, compared to Db2 12 FL 510: a reduction of 48.1% for
two-way, 49.8% for one-way and non-data sharing respectively. This reduction translates into
a reduction in transaction class 2 CPU time, which results in an enhanced workload
performance of 0.8% for two-way data sharing, 5.0% for one-way data sharing, and 2.9% in a
non-data-sharing environment.
The RMF XCF activity report is monitored to ensure the wellness of the sysplex signaling
situation even with the increased notify traffic.
The Db2 13 FL 501 DBM1 storage footprint is expected to increase because more index
objects are eligible to use FTB than in Db2 12. Monitoring of this workload determined that
FTB storage usage is about 1.5 GB more per Db2 member in Db2 13 FL 501 than in Db2 12.
Specifically, with Db2 12 FL 510, FTB total memory allocation is 427 MB out of 5070 MB of
DBM1 real storage usage, excluding a virtual buffer pool. With Db2 13 at FL 501, the FTB
total memory allocation is 1900 MB out of 7120 MB of DBM1 real storage usage other than a
virtual buffer pool. The PTF for APAR PH46301, which addresses an excessive DBM1
storage usage issue, is applied in the measurement.
4.2.3 Conclusion
In summary, migration of the IBM brokerage transaction workload from Db2 12 at FL 510 to
Db2 13 FL 501 in two-way data-sharing, one-way data-sharing, and non-data-sharing
environments, shows performance improvement, as indicated by a decrease in Db2 CPU
time per transaction. DBM1 real storage usage in Db2 13 at FL 501 increased compared to
Db2 12, due to the expanded FTB usage in Db2 13.
4.3.1 Requirements
The following levels of Db2 are required for running this workload:
A Db2 13 FL 100 subsystem
A Db2 12 FL 510 subsystem to compare against
Measurement environment
All performance measurements are performed by using the following environment:
IBM z15 LPAR with eight general CPs
z/OS 2.5
Figure 4-4 CPU improvement for IBM brokerage workload (one-way data sharing)
After migration to Db2 13 FL 100 without running a REBIND of the stored procedure packages,
the CPU time is reduced by 2.6% compared to the Db2 12 baseline test.
Then, a REBIND is run for all packages specifying APREUSE(ERROR) to ensure that the same
access paths are used for all the packages. In this scenario, CPU time is reduced by 2.7%
compared to the Db2 12 baseline.
The next measurement is performed after activating Db2 13 FL 500 and a REBIND by using
APREUSE(ERROR) for all the packages. A further improvement for a Db2 class 2 CPU time of
5.7% was observed compared to the Db2 12 baseline.
The final measurement is performed after activating Db2 13 FL 501 and a REBIND by using
APRESUSE(ERROR) for all packages. This measurement resulted in a further slight improvement
of 5.9% Db2 class 2 CPU time compared to the Db2 12 baseline.
Most of the reduction in Db2 class 2 CPU time for IBM brokerage transaction workload in Db2
13 can be attributed to the reduction in buffer pool getpage from more indexes being eligible
for in-memory fast index traversal.
In Db2 13, the default parameter setting of FTB_NON_UNIQUE_INDEX is changed from NO to YES,
thus making more indexes eligible for in-memory fast index traversal.
Furthermore, in Db2 13 at FL 500, the key size of eligible unique indexes was increased from
64 bytes to 128 bytes, and the key size of eligible non-unique indexes was increased from
56 bytes to 120 bytes. With more indexes eligible for in-memory fast index traversal, fewer
buffer pool getpages are needed for random index access, as shown in Figure 4-5 on
page 159. The figure shows the buffer pool getpages for the IBM brokerage workload
(one-way data sharing) at each Db2 13 migration step.
After migration from Db2 12 to Db2 13 at FL 100, the buffer pool getpages per commit
dropped from 375 to 305. Then, after migrating to Db2 13 and activating FL 500, buffer pool
getpages per commit further dropped further to 174.
Figure 4-6 shows the total real storage usage by the IBM brokerage workload (one-way data
sharing) at each Db2 13 migration step. With more indexes eligible for in-memory fast index
traversal in Db2 13, we also observed an increase in real storage usage during different
stages of migration, as shown in Figure 4-6.
Figure 4-6 Total real storage usage by the IBM brokerage workload (one-way data sharing)
Then, after Db2 13 FL 500 was activated, total real storage usage increased to about
4,200 MB compared to Db2 12. The further increase in total real storage occurs because
more indexes become eligible for in-memory fast index traversal due to the increased key size
for eligible unique and non-unique indexes.
4.3.4 Conclusion
After migrating to Db2 13 FL 100 without rebinding your packages, you can expect a
reduction in CPU usage if non-unique indexes are eligible to use fast index traversal. Starting
at Db2 13 FL 500, you might see further reduction in CPU as more indexes become eligible
for fast index traversal. As more indexes qualify for fast index traversal, a reduction in buffer
pool getpage occurs, which amounts to less CPU usage. You can expect real storage usage
to grow as more indexes use in-memory fast index traversal.
All workloads are multiple thread applications and consist of seven transactions. Each
transaction runs one or multiple SQL statements to perform business functions in a
predefined mix, as shown in Table 4-6.
Total transaction CPU time per commit and internal throughput rate (ITR) are the two main
metrics that are used for comparing transactions running on Db2 13 against Db2 12.
The values in Table 4-7 represent the percentage difference between the measurements on
Db2 12 FL 510 and Db2 13 FL 501.
Table 4-7 Distributed workloads: Delta CPU% and ITR% with different PKGREL options in effect
Workload PKGREL(COMMIT) PKGREL(BNDOPT)
Note: Unless indicated otherwise, the elapsed and CPU times in the figures in this section
are in seconds.
As shown in Figure 4-7, the Db2 class 2 CPU time results demonstrate that Db2 13 can
improve class 2 CPU time by up to 6.56%:
For PBR table spaces with APN, the CPU time improved by 5.11% when inserting data
into the populated tables.
For PBR table spaces with relative page numbering (RPN), the class 2 CPU time was
equivalent when inserting data into the populated tables.
For PBG tables spaces, the CPU time improved by 6.56% when inserting data into the
populated tables.
Figure 4-7 Archive sequential insert: Class 2 CPU time for PBR+APN, PBR+RPN, and PBG
Figure 4-8 Archive sequential insert: System-level service time for PBR+APN, PBR+RPN, and PBG
The general improvement is from the index look-aside enhancement, which results in more
than a 90% getpage reduction on index non-leaf pages for all the table space types that were
tested. More improvement is seen in PBR RPN due to the reduction of false contention
reductions.
The average class 2 elapsed time results, which are shown in Figure 4-9 on page 165, show
less than a 1% improvement in Db2 13 compared with Db2 12.
The throughput of rows that are inserted is divided by the CPU utilization to obtain the ITR.
Figure 4-10 shows the ITR for the archive sequential insert scenario test run on Db2 13 and
Db2 12 with the same user load and software and hardware configuration. It shows that
Db2 13 performs 4.16%, 5.58%, and 5.46% better on PBR with APN, PBR with RPN, and
PBG respectively.
Figure 4-10 Archive sequential insert: Internal throughput rate for PBR+APN, PBR+RPN, and PBG
The Db2 class 2 CPU time results, which are shown in Figure 4-11, show no significant CPU
time changes for the journal table insert scenario in Db2 13 compared to Db2 12.
Figure 4-11 Journal table insert scenarios: Class 2 CPU time for Db2 12 versus Db2 13
All the CPU service time that was consumed on the sysplex from RMF workload activity
reports, as shown in Figure 4-12 on page 167, demonstrate that Db2 13 can improve CPU
time by 8.99% and 10.49% in workloads 1 and 2.
A major factor that contributed to the result was an 84.9% reduction of false contention
occurrences for workloads 1 and 2. Another factor was the index look-aside enhancement,
which can result in up to an 88% getpage reduction for index non-leaf pages for workloads 1
and 2. In workloads 3 and 4, Db2 13 performed almost the same as Db2 12. Because fewer
indexes were created, there was less opportunity for index look-aside and false contention
savings.
The measurement results in Figure 4-13 show that an average class 2 elapsed time reduction
of up to 5.1% was observed between Db2 12 and Db2 13.
Figure 4-13 Journal table insert scenarios: Class 2 elapsed time for Db2 12 versus Db2 13
Figure 4-14 Journal table insert scenarios: Internal throughput rate for Db2 12 versus Db2 13
Db2 13 uses IAG1 with the default setting for the DEFAULT_INSERT_ALGORITHM subsystem
parameter. The results in Figure 4-15 and Figure 4-16 on page 169 show the average class 2
elapsed time and CPU time for workloads 4 and 5.
These workloads insert data into three PBR tables with one index for each table. The average
class 2 CPU time was 2.17% more when using IAG2 than when using IAG1 in Db2 13. The
average class 2 elapsed time is 5.52% less when using IAG2 than when using IAG1 in Db2 13.
Although using IAG2 helped to reduce data page latch suspension time, we saw Db2 latch
suspension from index page split activities as the next bottleneck.
Figure 4-17 Message queue insert and delete scenarios: Class 2 CPU time for Db2 12 versus Db2 13
All the CPU service time that was consumed on the sysplex from RMF workload activity
reports, as shown in Figure 4-18, show that the insert performance was almost the same in
Db2 13 compared to Db2 12.
Figure 4-18 Message queue insert and delete scenarios: System-level service time
The results in Figure 4-19 on page 171 show that less than a 0.5% difference was observed
in class 2 elapsed time between Db2 12 and Db2 13.
The throughput of rows inserted was divided by the CPU utilization to obtain the ITR.
Figure 4-20 shows the ITR for the message queue insert and delete scenario tests that ran on
Db2 13 and Db2 12 with the same user load and software and hardware configuration. Again,
it shows Db2 13 performs almost the same as Db2 12.
Figure 4-20 Message queue insert and delete scenarios: Internal throughput rate
During the test runs, the table is first seeded with 40 million rows. Then, 4 million rows are
randomly inserted. Concurrently, 2.7 million rows are randomly deleted. The inserts are done
concurrently by 180 threads, and the deletes are done concurrently by 120 threads that are
spread across the two Db2 data-sharing members.
The Db2 class 2 CPU time results, as shown in Figure 4-21, show the following results:
For PBR table spaces, a 4.9% class 2 CPU time saving was observed between the Db2
13 and Db2 12 test runs.
For PBG table spaces, an 82% class 2 CPU time saving was observed between the
Db2 13 and Db2 12 test runs. A 28% getpage reduction was observed due to the index
look-aside enhancement on index non-leaf pages. The enhanced insert candidate page
searching algorithm is another contributor. For more information about this enhancement,
see 7.1.2, “Retrying a search of previously failed partitions” on page 242.
Figure 4-21 Random insert and delete scenarios: Class 2 CPU time for PBR and PBG
Figure 4-22 Random insert and delete scenarios: System-level service time for PBR and PBG
The Db2 class 2 elapsed time results, which are shown in Figure 4-23, show the following
results:
For PBR table spaces, no notable change was observed in Db2 13 compared to Db2 12.
These transactions are not CPU bound so no significant increase in throughput rate was
expected.
For PBG table spaces, a 75% class 2 elapsed time saving was observed between the Db2
13 and Db2 12 test runs.
Figure 4-23 Random insert and delete scenarios: Class 2 elapsed time for PBR and PBG
Figure 4-24 Random insert and delete scenarios: Internal throughput rate for PBR and PBG
The CPU time results, as shown in Figure 4-25 on page 175, show a comparable class 2
CPU time for the two-way data-sharing benchmarks in Db2 13 compared to Db2 12.
All the CPU service time that was consumed on the sysplex from RMF workload activity
reports, as shown in Figure 4-26, show that the overall CPU usage is 3.54% in Db2 13
compared to Db2 12.
Figure 4-26 Massive simple row insert: System-level service time for Db2 13 versus Db2 12
Figure 4-27 Massive simple row insert: Throughput rate for Db2 13 versus Db2 12
Because this application evenly spreads threads that insert data into different partitions and
because no index is created on the table, there is less opportunity for savings that are related
to false contentions the index look-aside enhancement, so the CPU usage is close between
Db2 12 and Db2 13. Meanwhile, the external throughput rate from the statistic report in Db2
13 is 1.95% greater than for Db2 12.
The throughput of rows that were inserted was divided by the CPU utilization to obtain the
ITR. Figure 4-28 on page 177 shows the ITR for the massive simple row insert scenario test
runs on Db2 13 and Db2 12 with the same user load and software and hardware
configuration. It shows that Db2 13 performs almost the same when compared with Db2 12.
The environment that was used by the two-way data-sharing RTW workload was as follows:
Two IBM z15 LPARs with eight general CPs and two zIIPs each
Eight CICS regions running CICS TS 5.6 with four regions running on each of the LPARs
z/OS 2.5
DS8870 DASD controller
TPNS running on another IBM z15 LPAR with four general CPs to drive the workload
Two IBM z15 Internal Coupling Facility (ICF) LPARs running CFLEVEL 25 CFCC with
each connected to the z/OS LPARs through two CS5 channels
Results
This section describes the measurement results of the RTW workload for both non-data
sharing and two-way data sharing.
Figure 4-29 shows more details about the RTW non-data-sharing measurements.
Figure 4-29 RTW: Non-data-sharing measurement results for Db2 13 versus Db2 12
Figure 4-30 shows more detail about the two-way data-sharing measurements of the RTW.
Figure 4-30 RTW: Two-way data-sharing measurement results comparing Db2 13 to Db2 12
Summary of results
After migrating from Db2 12 to Db2 13, some performance improvement still occurs for OLTP
type of workloads with more indexes that are supported by fast index traversal (FTB). Also,
with the storage management optimization of ATB storage, the ssnmMSTR address space
SRB CPU time might have some CPU time reduction.
A total of nine Db2 internal query regression workloads were measured to compare Db2 13 to
Db2 12. Those workloads were combined with several sets of queries for complex
transactional processing and real-time analytics in Db2.
The IBM query regression workload was measured by using a non-data-sharing configuration
with Db2 13 at FL 501. Db2 12 at FL 505 measurements were used as the baseline.
Figure 4-31 IBM complex query workload: CPU time improvements Db2 13 versus Db2 12
In Figure 4-31, the Db2 class 2 CPU time for the IBM query workloads in Db2 13 almost is the
same as with Db2 12.
In this performance measurement, there are no access path changes in Db2 13 compared
with Db2 12.
Many Db2 users tend to use APREUSE for their static SQL statements to keep their known
stable access paths during migration to a new Db2 release. Even though Db2 13 introduces
no access path enhancements, this approach is still a best practice to reduce the risk from
the migration.
In Db2 13, more indexes are eligible for in-memory fast index traversal due to the default
parameter setting of FTB_NON_UNIQUE_INDEX changing from NO to YES and changing the index
key size limitations. Although the IBM complex query workloads that are described here did
not leverage fast index traversal, it is possible to see a reduction in buffer pool getpages in
other (customer) query workloads because more indexes now eligible for in-memory fast
traversal. These set of query workloads did not leverage FTB because the objects are
restarted at every query to evaluate an accurate impact from any access path updates.
SAP Day Posting workload is an interactive OLTP workload. The workload simulates users
interactively running 15 dialog steps. The key metrics are the ITR and response time. The
workload has intensive insert, update, delete, and simple SQL statements. The data access
pattern is random.
SAP Account Settlements workload is a batch workload that balances all the accounts in the
database. The key metrics are elapsed time and ITR. The workload has intensive insert,
update, delete, and SQL statements that are simple to complex. The data access pattern is
mostly sequential.
The SAP banking database that is used for these tests consists of 100 million accounts,
which is in scale with the largest banks in the world.
The CPU utilization for each of the core configurations is shown in Figure 4-33. It shows that
for a core configuration, the CPU utilization difference between Db2 13 and Db2 12 is 0 - 2%.
The SAP Day Posting and SAP Account Settlement workloads run with Db2 13 FL 500 and
Db2 12 FL 510 by using this configuration.
The z/OS LPAR CPU utilization and Db2 class 2 CPU in Table 4-8 shows that the difference
between Db2 13 and Db2 12 is within noise level.
Table 4-8 SAP Day Posting CPU utilization and response time
Item Db2 12 (FL 510) Db2 13 (FL 500)
Figure 4-35 Account Settlement elapsed time and internal throughput rate
The z/OS LPAR CPU utilization and Db2 class 2 CPU time are also similar for the two
versions during these runs, as shown in Table 4-9.
4.8.5 Conclusion
Db2 must perform well with business-critical SAP applications. Running SAP workloads with
Db2 12 and Db2 13 shows that Db2 13 continues to perform well with respect to CPU time,
elapsed time, throughput, and scalability.
To improve the performance and scalability of PBR RPN table spaces for applications that
use RLL, the algorithm for generating the resource hash values for page P-locks was
redesigned in Db2 13. With the new algorithm, the resource hash values are better distributed
with higher levels of uniqueness, and false contentions are minimized.
Important: Applications that do not generate many page P-locks, such as applications that
use PBR RPN table spaces with page-level locking, do not seem to have high false
contention. For these applications, using PBR RPN table spaces does not cause a
performance issue.
5.1.1 Requirements
This enhancement is available for Db2 13 data-sharing groups that are operating at function
level (FL) 500 or later.
For PBR RPN table spaces that are created in Db2 12 and Db2 13 FL 100, the REORG utility
with the SHRLEVEL REFERENCE or CHANGE option, or the LOAD utility with the REPLACE option must
be run to use the new hash values. For more information, see 5.1.3, “Usage considerations”
on page 193.
The workload also ran with a PBR table space that was defined with absolute page
numbering (APN). The results show that completing the same number of updates in Db2 13
by using a PBR RPN table space has similar performance as when running with a PBR APN
table space.
For the Db2 12 measurement, the table space was created in FL 510 and populated at the
same FL. This measurement is the base one, and the PBR RPN table space uses the old
hash algorithm for page P-locks.
For the Db2 13 measurement, the table space was created and populated in FL 500. The
PBR RPN table space uses the new hash algorithm for page P-locks.
To understand how the workload performs with a PBR RPN table space compared to a PBR
table space that used absolute page numbering (APN) in Db2 13, we also ran another
Db2 13 measurement with a PBR APN table space that is created in FL 500 and populated
with the same data.
Results
The measurement results are described in two parts:
The first part covers observations of the workload performance when using a PBR RPN
table space and comparing Db2 13 and Db2 12 measurements.
The second part covers observations comparing the workload performance of Db2 13
measurements when using a PBR RPN table space versus a PBR APN table space with
the same data in the table space.
Figure 5-1 PBR RPN Db2 12 versus 13 random update workload CPU usage of LPAR, Db2, and XCF
We also observed the following reductions in elapsed time and global contention suspension
time, as shown in Figure 5-2 on page 191:
Total class 2 elapsed time of the workload is reduced by 77%.
Total class 3 global contention suspension time, and the number of suspension events, is
reduced by 99%.
All these improvements result from the vast reduction in the number of false contention
suspensions, from 19,610,916 to a mere 412.
An increase of 20% was observed in class 2 CPU usage, as shown in Figure 5-2 on
page 191, and an increase of ssnmDBM1 IIP System Recovery Boost (SRB) time of 98% was
observed when the workload was run under Db2 13 (Db2 12 with 134.276 seconds and
Db2 13 with 259.319 seconds). However, the class 2 CPU and ssnmDBM1 IIP SRB time
increases are justifiable tradeoffs for greater reduction in the total Db2 CPU usage and to
achieve a 4.7 times increase in the update rate compared to the Db2 12 measurement.
Figure 5-2 PBR RPN Db2 12 versus Db2 13 random update workload key elapsed time metrics
Comparing th performance of the random update workload between using PBR with RPN
versus APN in Db2 13, there is no CPU or elapsed time regression. All the key performance
metrics such as class 2 CPU time, elapsed time, Db2 address space CPU usage, and update
rates had differences within a range of -1% - 1%.
Note: The workload also ran by using a Db2 12 data-sharing group with a PBR APN table
space, and the performance is like the Db2 13 PBR APN measurement.
Figure 5-3 Db2 13 PBR APN and RPN random update workload CPU usage of LPAR, Db2, and XCF
Figure 5-4 Db2 13 PBR APN versus RPN random update workload key elapsed time metrics
All new PBR table spaces that are created in Db2 13 with FL 500 and later with PAGENUM
RELATIVE will be formatted to support the new hash algorithm.
Existing PBR table spaces that were created before FL V13R1M500 with PAGENUM RELATIVE
do not support the new hash algorithm automatically. A conversion is needed to format the
table space header pages with the required settings.
After you are at FL V13R1M500 and later, there are two ways to convert an existing PBR
table space or individual partitions to use the new hash algorithm for RPN:
Run the REORG utility with SHRLEVEL REFERENCE or CHANGE.
Run the LOAD utility with the REPLACE option.
To see whether you can benefit from this enhancement, complete the following steps:
1. If you have PBR RPN table spaces that were created with Db2 12 or Db2 13 before FL500
in your workloads, check whether they were created with LOCKSIZE ROW.
2. Check the DATA SHARING section of the IBM OMEGAMON for Db2 Performance Expert
(OMPE) DB2PM accounting reports of the applications that access these table spaces.
3. If you see a high number of P-lock requests, and the number of FALSE CONTENTIONS is
greater than the number of SUSPENDS – IRLM (global internal resource lock manager
(IRLM) contentions), consider converting these PBR RPN table spaces to use the new
page P-lock hash values after you migrate to Db2 13 FL 500. The conversions should
allow your applications to run with fewer false contentions.
5.1.4 Conclusion
If you are considering leveraging the PBR table spaces with RPN for their traits, such as a
single partition growing up to 1 TB, or if you are planning to convert your existing
partition-by-growth (PBG) table spaces to PBR, there is no better time to do it than with
Db2 13. After activating FL V13R1M500, you can convert PBG table spaces into PBR RPN
table spaces with minimal application impact, as described in Converting tables from
growth-based to range-based partitions. With the performance improvement for PBR RPN
table spaces that are described in this section, you can expect your workloads to run with
optimum performance by using PBR RPN table spaces.
In Db2 13, the frequency at which Db2 evaluates GBP cast-out threshold (GBPOOLT) has
been reduced to 0.2 of the original design. The purpose of this change is to trigger GBP
cast-out actions more frequently. This change can improve transaction performance by
ensuring that enough storage is available in the GBP with more efficient cast outs.
Another change in Db2 13 that is related to GBP cast out is reducing the time that a
transaction waits after encountering a GBP write failure condition before resuming and
retrying the GBP write. The wait time allows GBP cast-out operations to free enough storage
in the GBP. As GBP cast-out efficiency improves, storage can be freed in the GBP structure
more quickly than before. Reducing wait time (from 1 second in Db2 12 to 100 milliseconds in
Db2 13) after the transaction experiences a GBP write failure helps shorten the transaction’s
update commit time because the transaction can retry the failed GBP write more quickly.
5.2.1 Requirements
This enhancement is available for Db2 13 systems that are operating at FL 100 or later.
Measurement environment
The performance measurements that were conducted used the following setup:
IBM z15 hardware with eight general CPs and two zIIPs on each LPAR
512 GB memory on each LPAR
2 CFs at CF LEVEL 24, each with three dedicated CPUs
z/OS 2.4
Db2 12 FL 510 with 2-way data-sharing
Db2 13 FL 500 with 2-way data-sharing
We tuned the workload so that GBP cast out is triggered frequently in Db2 12.
Results
With the cast-out changes in Db2 13, we see solid improvements of the workload
performance. Comparing the performance numbers of the Db2 13 measurement to the Db2
12 measurement, we see the following key results:
The GBP cast-out threshold is triggered more frequently. In Db2 12, it is triggered
36 times, and in Db2 13, it is triggered 113 times.
The number of pages being cast out for the workload increases about 2%, which is a slight
increase that was expected. Some pages might have been updated more times in Db2 12
than in Db2 13 before being cast out because cast outs are happening less frequently.
Because cast outs are being triggered more frequently and with a similar number of pages
being cast out, the ssnmDBM1 address space shows a 25% IIP SRB time savings
because of the improved cast-out efficiency.
34 GBP write failures were recorded in the Db2 13 test, and 87 were recorded in the
Db2 12 test.
Update commit time for the application is reduced by 28% because of the shorter
transaction suspension time when GBP write failures happen and also because there are
fewer GBP write failures.
Class 2 elapsed time is shortened by 14%, mainly because of the update commit time
improvement.
Class 2 CPU time has a slight increase of 0.7%, which is within the measurement noise
level.
We see 15% more synchronous database I/Os, which is the result of more
SYN.READ(XI)-NO DATA RETURN in the Db2 13 measurement. This increase occurred
because the update rate is higher in Db2 13 and the data in GBP is being replaced at a
quicker pace, so there is a greater chance of data not being returned from the GBP.
Figure 5-5 Random update and select workload major contributors to total Class 2 elapsed time
Figure 5-6 Random update and select workload major contributors to total Db2 CPU time
This section is not related to any of the new features of Db2 13. However, because this
performance information is of interest to customers who are using or are considering using
data sharing over a distance, we decided to add this information.
5.3.1 Requirements
There are no specific requirements for data sharing in a sysplex with distance.
Measurement environment
A sysplex, which is composed of two MVS LPARs and three CFs, is used to run a two-way
Db2 data-sharing workload.
Initially, all these components are in close vicinity. One CF, which is the focus of this study, is
connected to both MVS LPARs through coupling over InfiniBand (CIB) links. The Db2 lock
structure, shared communications area (SCA) structure, and primary GBPs are in this CF.
This configuration serves as the baseline for this exercise.
In our testing, we increased the distance between one MVS LPAR and the CF by completing
the following steps:
1. We replaced CIB links with Long Reach Coupling (CL5) links.
2. We configured CL5 links plus dense wavelength division multiplexing (DWDM) equipment
with zero distance.
3. We configured CL5 links plus DWDM equipment with 25 km distance.
4. We configured CL5 links plus DWDM equipment with a 49 km distance.
All processors, including MVS LPARs and CFs, were IBM z14 processors. The CF link types
that were used during the tests were CIB, CL5, and DWDM with distances of 0, 25, and
49 km.
The IBM brokerage transaction workload was used for this study. It ran in a Db2 12
data-sharing environment. For each of these configurations, a measurement was conducted
to study the performance of this workload to better understand the performance
characteristics of data-sharing workload in a sysplex over a distance.
The CF links were the focus of this study, and the goal was to understand how the CF service
time, which is a critical factor in data-sharing workload performance, varied in these different
CF link configurations.
This configuration was the same as configuration D, but with extra links that were defined.
This configuration was the same as configuration E, but with extra links that were defined.
Measurement results
The CF has three structure types: lock, cache, and list structures. Db2 data sharing employs
all three types: _LOCK1 (lock), _GBPs (cache), and _SCA (list).
From a performance perspective, _LOCK1 and _GBPs are more important than the _SCA
because the _SCA structure is less frequently accessed during normal operation.
The average CF service times to the Db2 lock structure for configurations A - E are shown in
Figure 5-14 on page 203.
The CF service time for _LOCK1 is relatively stable between STLAB1A and ICF0AZE1
because four CIBs are used to connect them and no distance is involved here. However,
between STLAB1D and ICF0AZE1, the CF service time varies from 5 ms in configuration A to
500 ms in configuration E, where MVS LPAR STLAB1D is 49 km away from CF ICF0AZE1.
The CF service time of the Db2 list structure is depicted in Figure 5-16.
The CF service time for _SCA is relatively stable between STLAB1A and ICF0AZE1 from
configurations A - E because 4 CIBs are connecting them in every configuration. The service
time between STLAB1D and ICF0AZE1 varies from 40 microseconds in configuration A to the
mid-500s of microseconds in configuration E, where MVS LPAR STLAB1D is 49 km away
from CF ICF0AZE1. Again, during a normal operation, access to the _SCA structure is
infrequent compared to either _LOCK1 or _GBPs. Therefore, a long CF service time does not
significantly impact workload performance.
The overall CF service time for STLAB1A and STLAB1D respectively is shown in Figure 5-17.
As for the CF service time between STLAB1D and ICF0AZE1, it varies from single-digit
microseconds in configuration A to low-500s microseconds in configuration E, where MVS
LPAR STLAB1D is 49 km away from CF ICF0AZE1.
Due to the high CF service times in configurations D and E, the number of CF links were
increased in an attempt to drive down CF service time for configurations D or E. Therefore,
we created configurations F and G, which are depicted in Figure 5-11 on page 201 and
Figure 5-12 on page 201.
Figure 5-18 and Figure 5-19 show the overall CF service time of configurations D and F and
configurations E and G.
The additional CL5 link between STLAB1D and ICF0AZE1 stays neutral, with no improved or
degraded performance. Therefore, it is not beneficial from a performance perspective to have
two extra CL5 links.
Overall, CF service times from configurations A - E are reasonable. Every kilometer adds
10 ms of service time to each request. Therefore, an increase in CF service time is expected
as distance goes up (speed of light remains unchanged).
Next, we look at how varying CF service times impacts Db2 transaction performance.
The IBM brokerage transaction workload that is running on Db2 12 is used for this study.
Rather than looking at transaction performance at the data-sharing group level, we looked at
the transactions that were run by each Db2 data-sharing member (or by the MVS LPAR
because only one member is running on each LPAR).
Transaction class 2 CPU time for members CEB1 (on STLAB1A) and CEB2 (on STLAB1D) is
shown in Figure 5-20.
Figure 5-20 Class 2 CPU time of members CEB1 and CEB2 for configurations A - E
Transaction class 2 CPU time for configuration A is approximately the same between CEB1
and CEB2, that is, about a 2% difference. This result is expected because both STLAB1A and
STLAB1D are connected to ICF0AZE1 with 4 CIBs.
For configuration B, transactions that run on STLAB1D use about 15% more class 2 CPU
time than the ones on STLAB1A because the CF service time of STLAB1D is nearly twice of
STLAB1A (see Figure 5-17 on page 204) even though the location (distance) is unchanged.
The fact that requests that use CL5 links take longer than when they use CIB links explains
this difference.
For configuration C, having DWDM equipment with no distance in between does not
significantly alter CF service time compared to configuration B, which explains the similarity of
class 2 CPU time between configurations B and C.
For configuration E, 49 km distance between STLAB1D and ICF0AZE1, only 0.6% of the CF
requests go synchronously, which tie up the processor. This situation is similar to
configuration D, except it is more pronounced. Our general finding is that comparable class 2
CPU times are observed between CEB1 and CEB2.
Transaction class 2 elapsed time for members CEB1 (on STLAB1A) and CEB2 (on
STLAB1D) is shown in Figure 5-21.
Figure 5-21 Class 2 elapsed time and global contention suspend time of members CEB1 and CEB2
Transaction class 2 elapsed time is approximately the same for configurations A, B, and C, but
for configurations D and E, transaction elapsed time rises sharply for member CEB2, which is
located 25 and 49 km away from CF ICF0AZE1. Analysis of the Class 3 suspension time
indicates that this increase is largely driven by an increase in global contention suspend time
due to the longer CF service time that is associated with distance. This suspension time is
coming from CF sync requests being converted to asynchronous, not from actual resource
contention. Therefore, long CF service time does not impact the CPU usage, being converted
to asynchronous by z/OS. However, the long CF service time does affect transaction elapsed
time.
Furthermore, global contention is closely related to IRLM processing, so more CPU time is
consumed.
Figure 5-22 IRLM CPU time of members CEB1 and CEB2 for configurations A, B, C, D, and E
It is evident that IRLM CPU consumption is closely related to the CF service time or global
contention suspend time that were observed in this study. IRLM CPU consumption is not zIIP
eligible.
Internal throughput rate (ITR) is relevant to Online Transaction Processing (OLTP) workload
performance. Again, rather than looking ITR at the data-sharing group level, this study looked
at the ITR for each Db2 data-sharing member (or MVS LPAR).
Figure 5-23 shows workload ITR on STLAB1A and STLAB1D for configurations A - E.
5.3.4 Conclusion
Measuring the performance of an OLTP workload running on a sysplex in which LPARs and
CFs are initially within close vicinity but are gradually pulled away until 49 kilometers separate
the LPAR and the CF where the Db2 _LOCK1, primary _GBPs, and _SCA structures are
presented some unique challenges. However, the measurements that we obtained are
valuable for planning configurations in which LPARs that host Db2 members are
geographically distant from the CFs that host the CF structures.
In essence, CF link performance is studied here. CIB and CL5 types, when coupled with the
distances of 0, 25, and 49 kilometers, allow us to have seven configurations, though two of
them did not provide any additional insights from a performance perspective.
The type of CF link and distance between LPAR and CF determine the CF service time. Short
CF service times allow CF requests to be done synchronously, which slightly increase CPU
time. Longer CF service times drive CF requests asynchronously, keeping transaction class 2
CPU time the same. However, the longer CF service times also inevitably drive up global
contention (suspension) time, which elongates Db2 class 2 elapsed time. The async
conversion of the IRLM lock requests results in more suspend and resume activity by IRLM,
and the consequence is increased IRLM CPU time. MVS (XES) decides to process CF
requests synchronously or asynchronously, so Db2 cannot control it.
In summary, workload ITR is lower on the “remote” site as expected. Transaction class 2 CPU
time is comparable on the remote site with a longer elapsed time, and the remote LPAR
endures higher IRLM CPU time.
After installing the SQL DI GUI, you identify the tables or views that you want to run AI queries
against as AI objects, and then you enable AI query functions on the AI objects. Enabling AI
query on an object trains an unsupervised neural network model, which is called data
embedding, on the object and stores the generated ML model in the same Db2 system where
the AI object is. After the model is created for the object and loaded into Db2, you can run
semantic queries by using SQL DI built-in functions against the object. Currently, the following
three SQL DI built-in functions are supported for semantic queries:
The AI_SIMILARITY function finds similar or dissimilar entities in a table or view.
The AI_SEMANTIC_CLUSTER function checks what other entities exist in a table or view that
could belong to a cluster of up to three entities.
The AI_ANALOGY function determines whether the relationship of a pair of entities applies to
a second pair of entities.
Without any AI skills, you can use these Db2 SQL DI built-in functions in queries to infer
hidden relationships between different entities in a table or view.
Using SQL Data Insights consists of the following two major steps:
Enabling AI query
Running an AI query
During the pre-processing phase, Db2 columns that are selected for the Enable AI Query
process are converted to a textual format. Numeric data type values are clustered, and all
numeric values are replaced with cluster identifiers. In the textual format of data, every
numeric data type value is tagged with its column name and associated cluster identifier, and
every categorical data type value is tagged with its column name.
During model training, an unsupervised model is trained on this textual representation of the
Db2 data, and numeric vectors are generated for every unique value in the textual format data
set. This set of unique values is referred to as vocabulary in the model. This neural network
model, which is a collection of numeric vectors, is loaded into a Db2 table by using ZLOAD. The
numeric vectors represent inter-column and intra-column semantic relationships between the
different unique values in the AI object.
SQL DI has an embedded Spark cluster that enables AI Query. It uses a Spark cluster during
the preprocessing phase while transforming Db2 data to textual format. The interface to train
the model is provided by z/OS components, IBM Z Deep Neural Network Library (zDNN),
IBM Z AI Optimization (zAIO), and IBM Z AI Data Embedding Library.
Besides all the Spark processing that is eligible to run on IBM zSystems Integrated
Information Processors (zIIPs), numeric clustering during pre-processing and model training
that does not run on Spark also are eligible to run on IBM zIIPs.
6.1.1 Requirements
To enable AI query processing, the requirements that are described in the following
subsections must be in place.
Hardware requirements
An IBM z16, IBM z15, IBM z14, IBM z13, or IBM zEnterprise EC12 system.
Software requirements
Db2 13 with function level (FL) 500 or later
SQL Data Insights FMID HDBDD18
z/OS 2.5 or 2.4:
– zDNN:
• For z/OS 2.5 with APARs OA62901
• For z/OS 2.4 with APARs OA62849
– IBM Z AI Optimization Library (zAIO):
• For z/OS 2.5 OA62902
• For z/OS 2.4 OA62886
– IBM Z AI Data Embedding Library:
• For z/OS 2.5 OA62903
• For z/OS 2.4 OA62887
In all the performance measurements, the following terms that are referenced in this
document have a specific meaning:
AI object A Db2 table or view that is added as AI object on which Enable AI
Query processing is performed.
AI object text data An AI object that is transformed to a textual data format with selected
columns of Enable AI Query processing. This data is the input to
model training.
Number of words in the training text file (AI object text data)
All the values in every categorical and numeric column that are
selected for model training in the AI object.
Vocabulary The total of number of unique values in the selected categorical
columns and the number of clusters that are associated with the
selected numeric columns in the AI object for the Enable AI Query
process.
Model (Vector) table The output of model training, which is stored as numeric vectors.
Each row in the table represents a numeric vector. There is a numeric
vector for every word in the vocabulary.
Measurement environment
The SQL DI database DSNAIDB2 is set up in storage group DSNAIDSG with large enough
capacity volumes to store multiple, trained ML models.
Because it is important that the SQL Data Insight processes are assigned the correct
priorities, a separate SQL DI workload was defined in z/OS Workload Manager (WLM) on the
logical partition (LPAR) where the performance measurements ran. The service classes that
were defined in the SQL DI workload include the SQL DI application address space and
Spark address spaces that are used by SQL DI. The service class for the Spark address
spaces was defined with a lower priority than the SQL DI application address space. The SQL
DI application address space was defined with a lower priority than Db2 address spaces.
In all the performance measurement scenarios that are outlined here, all the columns in the
AI object were selected for Enable AI Query processing.
Results
This section describes the results for the four measurement scenarios.
Scenario 1
The first set of measurements ran to enable AI query by increasing the size of the AI object by
expanding the number of rows from 500 K to 101 million and keeping the number of columns
constant (29 columns). The number of unique values (vocabulary) is expected to increase as
number of rows increase in this set of measurements. Table 6-1 on page 215 shows the
resource consumption when the AI query is enabled on AI objects with the same number of
columns but with a different number of rows. An overview of the measurement results is
shown in Table 6-1 on page 215.
Number of 14 / 15 14 / 15 14 / 15 14 / 15 14 / 15 14 / 15
numeric and
categorical
columns that are
selected in the AI
object to Enable AI
Query
As shown in Figure 6-1, when the size of the AI object is increased by adding more rows, the
size of the textual format of AI object increases at the same rate. All columns in the AI object
were selected for Enable AI Query processing.
Figure 6-1 The size of AI object textual format data increasing with the size of the AI object
The time to enable AI query and the CPU usage by SQL DI and Db2 also increased at
approximately the same rate as the number of rows, as shown in Figure 6-2 on page 217.
The CPU usage by SQL DI with eight training threads is almost all zIIP eligible.
The maximum memory that was used and extra DASD storage usage by the zFS file system
also increased with the number of rows in the AI object, as shown in Figure 6-3.
Figure 6-3 zFS space usage and maximum memory usage during Enable AI Query processing
Figure 6-4 The size of the model versus the size of the vocabulary
Scenario 2
To analyze the influence of the number of columns in determining the resource requirements
for training, the second set of measurements ran for Enable AI Query processing on Db2
views with the same number of rows (500 K), varying the number of columns 10 - 60. This
measurement demonstrates that as you add more columns for training, the increase in the
number of unique values (vocabulary) of the model might not increase at the same rate. The
increase in vocabulary size is data-dependent. If the additional columns have only a couple of
unique values (for example, YES or NO), the increase in vocabulary size is minuscule. The
columns that were added in this set of measurements to increase the size of the Db2 view do
not have many unique values.
Table 6-2 Resource consumption of Enable AI Query with a different number of columns
Enable AI Query VIEW_1 VIEW_2 VIEW_3 VIEW_4
Number of words in the training text file (AI object text 5,508,074 10,515,414 15,522,754 30,544,774
data)b
Size of the model table (extra DASD storage that is used 892.04 892.33 896.07 896.27
by Db2) (MB)
Elapsed time to train the model and load the model to Db2 0.85 2.53 5.23 9.23
(minutes)
All Db2 address spaces CPU service time (seconds) 2.943 3.501 4.654 7.389
SQL DI WLM CPU service time (minutes) 6.98 20.31 41.54 73.46
Maximum memory that is used by Enable AI Query (MB) 6,435 6,748 7,480 12,059
Extra DASD storage that is used by zFS (MB) 1,008 1,107 1,264 1,650
a. Blue text represents the characteristics of the input to Enable AI Query.
b. Green text represents the characteristics of the preprocessed data.
c. Red text represents the output of the model.
d. Mauve text represents the time, memory, and storage that is used (resource consumption) by Enable AI Query.
As we increased the size of the AI object by increasing the number of columns, the size of the
textual format of AI object data also increased at the same rate, as shown in Figure 6-5.
During our tests, we used different views with a different number of columns, and we selected
all columns in the AI object for Enable AI Query.
Figure 6-5 Results showing the size of AI object textual format data increasing with the size of the AI
object
Figure 6-6 Elapsed time and the total CPU time breakdown of Enable AI Query processing
Maximum memory usage and zFS storage that is used by SQL DI increased only gradually
as more columns were added to the AI object, as shown in Figure 6-7. This low rate of
increase is because the size of the vocabulary did not increase at the same rate as the
number of columns, as shown in Figure 6-8 on page 221.
Figure 6-7 zFS space usage and maximum memory usage during Enable AI Query processing
Because the size of the vocabulary (unique values) grew only by a few words as we increased
the number of columns, the change to the size of the model table is minimal. The size of the
vocabulary for this set of AI objects increased at a slower pace as more columns were added
because the added columns have few unique values.
Scenario 3
A third set of measurements ran to enable AI query on different sizes of Db2 AI objects with a
varying number of columns and rows. The first AI object is a VIEW with 29 columns and
500 K rows. The second AI object is a different table with eight columns and 1.3 million rows.
This measurement demonstrates that the number of columns or the rows of the input data (AI
object) for training cannot solely be used to determine the system resources that are needed
for Enable AI Query processing. All columns in VIEW_1 and TABLE_1 were selected for
model training.
Based on the results in Table 6-3, you cannot plan for all resource requirements based only
on the number of rows or number of columns in an AI object. VIEW_1 has less than half the
number of rows compared to TABLE_1 but consumed more memory and storage resources.
VIEW_1 has 29 columns and TABLE_1 has eight columns. Having more columns in VIEW_1
generated a larger input AI object text file for model training because in an AI object text file,
every column value is tagged with its column name. A larger vocabulary (unique values)
resulted in a much larger model for VIEW_1. The number of unique values determines the
size of the vocabulary in model training and the size of the model or vector table that is stored
in Db2. However, because the number of rows in the AI object TABLE_1 is greater than the
number of rows in VIEW_1, the total elapsed time and CPU time that was needed to enable
AI query on TABLE_1 is more than the total elapsed time and CPU time that was needed to
enable AI query on VIEW_1.
Table 6-3 Characteristics of different AI objects and the resource consumption with Enablie AI Query
Enable AI Query View_1 Table_1
Number of numeric or categorical columns that is selected in the AI object for 14 / 15 1/7
Enable AI Query
Number of words in the training text file (AI object text data) 15,022,020 12,347,712
Size of the model table (extra DASD storage that is used by Db2) (MB) 891.95 262.71
Elapsed time to train the model and load the model to Db2 (minutes) 3.97 4.65
All Db2 address spaces CPU service time (seconds) 4.997 1.710
Number of numeric and categorical columns that are selected in the AI object for 1/7 1/7
Enable AI Query
Number of words in the training text file (AI object text data)b 12,347,712 12,347,712
Size of the model table (extra DASD storage that is used by Db2) (MB) 262.71 262.71
a. Blue text represents the characteristics of the input to Enable AI Query.
b. Green text represents the characteristics of the preprocessed data.
c. Red text represents the output of the model.
The elapsed time and resource consumption for Enable AI Query processing for a single and
two identical tables concurrently are shown in Table 6-5.
Table 6-5 Elapsed time and resource consumption running two Enable AI Query processes concurrently
Enable AI query Single enable Two concurrent enable AI
AI query query processes
All Db2 address spaces CPU service time (seconds) 2.177 3.902
Figure 6-9 The elapsed time of concurrent versus sequential Enable AI Query
Summary of results
The results of our testing demonstrated that as we keep the number of columns constant and
increase the number of rows for an AI object, the size of the textual data grows at the same
rate when all columns are selected to enable AI query. The time to train the model and the
CPU usage by SQL DI and Db2 also increased at approximately the same rate. The CPU
usage by SQL DI is almost all zIIP eligible. Memory usage and extra DASD usage by the
UNIX System Services file system also increased with the number of rows in the AI object.
The size of the model increased as the size of the vocabulary expanded. In one instance, the
size of the vocabulary increased at a slower pace when the number of rows increased from
10 million rows (VIEW_4) to 45 million rows (VIEW_5). We observed that this increase is
because the unique values in the additional rows in VIEW_5 are mostly duplicates of the
unique values in VIEW_4.
As we increased the size of the AI object by adding more columns and keeping the number of
rows constant, the size of the textual data expanded at the same rate when all columns were
selected to enable AI query. The time to train the model and the CPU usage by SQL DI and
Db2 also increased at approximately the same rate. The CPU usage by SQL DI is almost all
zIIP eligible.
The performance evaluation of different Enable AI Query scenarios demonstrates that as the
size of the table or view increases, either by number of columns or rows, the number of words
in the textual file increases at the same rate when all columns are selected for Enable AI
Query. The time to run the Enable AI Query for AI function and the CPU usage increases at
the same rate as the AI object text data size increases. Extra memory usage and DASD
storage usage by zFS are a function of both the AI object text data size and vocabulary size.
The size of the model and extra storage that is needed by Db2 is directly correlated to the size
of vocabulary.
Now, the RMF Workload Activity Report can be used to obtain CPU time and zIIP usage or
eligibility of SQL Data Insights work. For more information about setting up WLM, see
Configuring z/OS workload management for Apache Spark.
The starting time and ending time of Enable AI Query processing is captured in the
SYSAIDB.SYSAITRAININGJOBS table. You can use the following query to extract the start and
end time of Enable AI Query processing and calculate the total elapsed time:
SELECT * FROM SYSAIDB.SYSAITRAININGJOBS
ORDER BY MODEL_ID;
You can identify the MODEL_ID of the AI object by querying the SYSAIDB.SYSAIOBJECTS table.
You can monitor the logs that capture the breakdown of the elapsed time, the size of the
vocabulary, and memory usage during the model training phase. These logs are in the UNIX
System Services Spark driver directory that is associated with the Enable AI Query run
($SQLDI_HOME/spark/worker). The following three logs contain useful information:
base10cluster-local-timestamp.log contains information about the clusters that are
generated for each numeric column that is selected.
ibm_data2vec-local-timestamp.log contains information about the model training.
stdout contains time completion information of each phase of Enable AI Query. To view
this file, run the following command:
vi -W filecodeset=utf-8 stdout
SQL DI is not required to be configured on the same LPAR where Db2 is. The CPU, memory,
and DASD storage resource consumption can be high depending on the size of the AI object.
Therefore, plan for the SQL DI configuration to be on an LPAR with workloads that are not
mission-critical.
Plan for sufficient zIIP capacity because the high CPU usage by SQL DI is zIIP eligible.
SQL DI can consume all the CPU and memory resources that are available to use. Therefore,
limit the use of these resources for the SQL DI workload in z/OS WLM.
Before you decide to start enable AI query processing, complete the following steps:
1. Identify the columns in the AI object that will never be used or columns whose values do
not influence business decisions. By omitting these fields and by using fewer columns,
less CPU and storage resources are used, and model training finishes faster.
2. Identify any columns that should be selected as a Key type by using the Analyze Data –
Data Statistics window of the AI object. If the number of unique values of a column is the
same as the number of rows in the AI object, as a best practice, select this column as a
Key type when enabling an AI query.
3. Modify the settings in the Settings SQL DI window, as shown in Figure 6-10.
6.1.5 Conclusion
Enabling AI query for using AI functions is the first step to use AI functions in SQL queries.
Although this step is not run frequently, the system resource consumption by this task can be
intensive as the size of the AI object increases.
However, this demand on system resources is less than the alternative option of hiring a team
of data scientists to design ML models and constantly move the data. Also, the CPU that is
consumed by SQL DI is mostly zIIP eligible and cost-effective.
Db2 13 provides the following three built-in scalar functions that can be run against tables or
views that are enabled for AI query to gain hidden insight into the data:
AI_SIMILARITY The AI_SIMLARITY function determines how similar two entities
are within a context (table or view) and returns a similarity
score. The cosine distance between the two numeric vectors
(entities) is calculated to determine similarity (a score closer to
1.0) or dissimilarity (a score closer to -1.0).
AI_SEMANTIC_CLUSTER The AI_SEMANTIC_CLUSTER function can use a cluster of up to
three entities as input and returns a score for other entities in
the context (table or view) that have affinity to this cluster. A
score closer to 1.0 has a stronger affinity to the cluster.
AI_ANALOGY The AI_ANALOGY function can be used to compare relationships
across two pairs of entities. This function infers the relationship
between the first pair, and then assesses the relationship of the
second pair to other entities within the context (table or view)
and returns a score. A higher score indicates a better analogy.
All three SQL DI built-in scalar functions are zIIP eligible by using a similar mechanism that
involves using a child task for query CPU parallelism. This parallelism technique is triggered
for SQL DI built-in functions regardless of the parallel degree setting.
6.2.1 Requirements
The requirements that are shown in the following subsections must be in place to run
semantic SQL queries against AI-enabled objects.
Hardware requirements
An IBM z16, IBM z15, IBM z14, IBM z13, or zEnterprise EC12 system.
Software requirements
Db2 13 (5698-DB2 or 5698-DBV) with FL 500 or later
SQL Data Insights FMID HDBDD18
z/OS 2.5 or 2.4:
– zDNN:
• For z/OS 2.5 with APARs OA62901
• For z/OS 2.4 with APARs OA62849
– zAIO:
• For z/OS 2.5 OA62902
• For z/OS 2.4 OA62886
– IBM Z AI Data Embedding Library:
• For z/OS 2.5 OA62903
• For z/OS 2.4 OA62887
IBM OpenBLAS with the following APARs:
– PH44479 (for both z/OS 2.5 and 2.4)
– PH45672 (for z/OS 2.5)
– PH45663 (for z/OS 2.4)
Objects must be enabled for AI query before you can run semantic SQL queries with AI
built-in functions.
Measurement environment
The following environment was used to conduct this semantic SQL AI performance study:
Hardware: IBM z16.
CPUs: Three CPs and three zIIPs.
LPAR memory: 4 TB.
Results
This section describes the measurement results for the four semantic SQL query scenarios.
Scenario 1
Before creating the BiFs in Db2 13, we experimented with UDF SQL DI functions to
understand the performance benefit of using BiFs instead of UDFs. The table that was used
in the semantic query had eight columns with a cardinality of 1.3 million rows, and its trained
model table had 80 K vectors (vocabulary size). All AI semantic SQL queries that were used
in this set of measurements are provided in Appendix B, “Artificial intelligence semantic
queries” on page 409.
Comparing BiFs to UDFs for SQL DI functions, as shown in Figure 6-12 on page 231, we see
that the elapsed time reduction with BiFs is 70% - 89% for the queries that were tested in this
experiment.
Comparing BiFs to UDFs for SQL DI functions shows that the CPU usage reduction with BiFs
is 58% - 86% for the queries that were tested in this experiment, as shown in Figure 6-13.
Figure 6-13 Average CPU time when using BiFs and UDFs for SQL DI functions
The performance evaluation of the queries that were tested demonstrate that the elapsed
time per query by using BiFs can be reduced by up to 89% compared to using UDFs, and the
total class 1 CPU usage can be reduced by up to 86%. Based on these results, we decided to
deliver the functions as BiFs in the Db2 engine.
Scenario 2
AI queries with different complexities ran by using all three supported Db2 SQL DI built-in
functions on the same table to demonstrate zIIP usage when invoking SQL DI functions. The
table that was used by the semantic queries has eight columns with a cardinality of 1.3 million
rows, and its trained model table has 80 K vectors (size of the vocabulary). All AI semantic
SQL queries that were used in this set of measurements are shown in Appendix B, “Artificial
intelligence semantic queries” on page 409.
The performance results of the different queries that ran by using the AI_SIMILARITY function
show that zIIP utilization can be up to 98% based on the number of SQL DI function
invocations in the query and the complexity of the queries that ran. Query T00SIMX has only
one invocation of the SQL DI function in the query, which means that there is less zIIP
eligibility for this query.
The details of the AI_SIMILARITY test results are shown in Figure 6-14.
The performance results of the different queries that ran by using the A_SEMANTIC_CLUSTER
function (Figure 6-15 on page 233) show that zIIP utilization can be up to 98% based on the
number of entities that is specified for the cluster in the input of SQL DI function invocation.
The SQL DI CPU usage increases with the number of entities that is specified in the input
cluster.
Query T00SEM1 used one entity in the input cluster, T00SEM2 used two entities in the input
cluster, and T00SEM3 used three entities in the input cluster of the SQLI DI function.
The performance results of the two different queries that ran by using the AI_ANALOGY function
show that zIIP utilization can be up to 97% based on the number of SQL DI function
invocations by the query, as shown in Figure 6-16.
Query T01ANAX has only one invocation of the SQL DI function, which means that there is
less zIIP eligibility for the query.
Scenario 3
This set of measurements ran to evaluate the impact on query-elapsed time and CPU usage
as AI semantic queries ran concurrently.
AI semantic queries ran by using all three supported SQL DI built-in functions on the same
table with 1.3 million rows, eight columns, and a model size of 80 K vectors or rows. In these
measurements, each DDF thread ran the same query against the same table concurrently for
the 2-threads and 4-threads tests.
Figure 6-17 Application elapsed time and special engine (zIIP) CPU time per query
The performance results in Figure 6-18 demonstrate that when concurrent AI queries that
invoke the AI_SEMANTIC_CLUSTER function run, the average elapsed time per query also
increases from waiting for parent and child task synchronization. This increase might be due
to contention for CPU resources. The zIIP usage per query increases while running
concurrent queries. Most of the zIIP CPU usage increase is from running the SQL DI function.
CP CPU time is not shown here because it is not a significant part of the total CPU for this
query.
Figure 6-18 Application elapsed time and special engine (zIIP) CPU time per query
Figure 6-19 Application elapsed time and special engine (zIIP) CPU time per query
Scenario 4
The last set of performance measurements analyzes the application-elapsed time and CPU
usage as we increase the size of the AI object by adding more rows. AI semantic queries ran
by using all three supported SQL DI built-in functions on six different views with the number of
rows varying from 500 K - 101 million. The evaluation of the model training on these views is
shown in “Scenario 1” on page 214. The AI semantic queries that are used here are provided
in Appendix B, “Artificial intelligence semantic queries” on page 409.
Figure 6-20 Application elapsed time and special engine (zIIP) CPU time per query
The performance results in Figure 6-21 on page 237 show that the elapsed time and zIIP
utilization of AI queries that use the AI_SEMANTIC_CLUSTER function scale upwards when the
number of rows increases until 10 million rows. For 45 million rows, the elapsed time and
CPU usage did not increase because the optimizer chose a different access path by using a
different index for one of the base tables for the view with 45 million rows, which yielded
reduced elapsed time and CPU usage.
The performance results in Figure 6-22 show that the elapsed time and zIIP utilization of an
AI query with the AI_ANALOGY function scales upwards with an increase in the number of rows.
Figure 6-22 Application elapsed time and special engine (zIIP) CPU time per query
Summary of results
The performance evaluation of semantic AI queries that use both UDFs and BiFs for the SQL
DI functions indicated that SQL DI functions should be implemented as BiFs. By using a
built-in function, Db2 can invoke IBM zSystems hardware optimization for processors.
Concurrently running AI queries takes longer to finish than running a single query. The
increase in elapsed time is due to waiting for parent and child task synchronization. This
increase might be due to contention for CPU resources. The zIIP usage per query also
increases slightly while running queries concurrently. Most of the increase in zIIP CPU usage
is from running SQL DI functions.
The elapsed time and CPU usage scale upwards as we increase the number of rows of the AI
object for the queries and test cases that are used here. One exception is the AI query when
using the AI_SEMANTIC_CLUSTER function on the view with 45 million rows. In this case, the
Db2 optimizer used a different access path by using a different index on one of the base
tables, which yielded better elapsed time and zIIP CPU usage. This result might be because
the view that was created with 45 million rows is on all the rows in the base tables.
You can use SQL DI functions in SQL statements in the SELECT clause or as part of the WHERE
clause.
The syntax and restrictions for writing queries by using the new SQL DI scalar functions can
be found in the following topics:
AI_SIMILARITY
AI_SEMANTIC_CLUSTER
AI_ANALOGY
SQL DI functions may run on zIIPs, so they are cost-effective. However, plan for zIIP capacity
to accommodate the CPU resources that are needed for complex queries with multiple
invocations of SQL DI functions or for running multiple AI queries concurrently.
With SQL DI built-in functions, you can run AI semantic SQL queries directly on tables or
views that use the learned information in the vector model to gain insights into the data. You
can make faster business decisions without offloading the data from Db2 and running queries
in a data warehouse.
6.3 Summary
The SQL Data Insights feature, which is available in Db2 13, brings AI capability directly to the
Db2 engine without the need to hire experienced data scientists for designing ML models of
Db2 data.
Enabling AI query on any Db2 table or view triggers the building of a neural network model of
the data in Db2 object and storing the resulting vector model in the same Db2 subsystem. We
demonstrated that the elapsed time to enable AI query and the resource consumption in
terms of CPU, memory, and disk storage usage can increase as the size of the input data for
model training expands. We also showed that the performance to enable AI query might
depend on both the number of columns that is selected for enabling AI query and the
cardinality of the input data. The size of the vector model that is stored in Db2 is
data-dependent. The number of unique values in the input data directly correlates to the size
of the model.
Although the system resource consumption by the model training task can be intensive as the
size of the AI object increases, this step is not frequently run, and the CPU that is used during
this phase is mostly zIIP eligible and cost effective.
Semantic AI queries that operate on any Db2 data on which you enabled AI query provide a
deeper understanding of the data to make effective business decisions faster, which can be
achieved in a cost-effective manner because the SQL DI functions are eligible to run on zIIPs.
Additionally, because the data can remain on the Db2 subsystem where it is, the overhead
that is required to move that data is eliminated. All SQL DI built-in functions use zAIO to
choose the most optimal hardware during semantic AI query execution.
7.1.1 Requirements
To leverage these improved algorithms, you must be using Db2 13 at function level (FL)
V13R1M100 and later.
Figure 7-1 illustrates this problem. The target partition, partition 3, is full. Db2 continues to
check partition 4 but cannot get the conditional lock, which is commonly caused by a utility
concurrently running on this PBG table space or another application exclusively locking the
partition. When partition 5 is tried, the same thing occurs. Db2 goes back to search
partition 1, and then partition 2, which are both full. Now that Db2 checked all partitions (and
did not manage to insert the row), the INSERT statement fails with SQLCODE -904 with reason
code 00C9009C, which does not give enough information to understand the reason for the
failure. A new partition is not created because the insert operation cannot access partitions 4
and 5 because it does not have adequate space information to determine whether these
partitions are full. This behavior avoids adding extra partitions when existing partitions are not
fully used.
Performance measurements
To evaluate this enhancement, a performance study was conducted by using a PBG table
space that is defined with and without the MEMBER CLUSTER clause.
Scenario description
To test the new algorithm, we prepared a PBG table space that is defined with MAXPARTITIONS
120, and with 23 partitions filled with data (DSSIZE 1G). Each partition has a small amount of
space for new inserts. The test run inserts 200,000 rows by using 10 concurrent threads on
each member of a two-way data-sharing group (one row per commit). Concurrent jobs are
used to mimic other applications that are holding locks on partitions randomly. The PBG table
space is defined with page-level locking, APPEND NO, and it uses insert algorithm 1.
Measurement environment
The following system configuration was used to make these measurements:
IBM z14 hardware with two Coupling Facilities (CFs) at CF LEVEL 23, each with three
dedicated CPUs
Logical partition (LPAR) CPU: Eight general CPs
LPAR memory: 48 GB (for each of the two LPARs that are used in this testing)
z/OS 2.4
Db2: Two-way data-sharing group on two LPARs, one running Db2 12 at FL 509 and one
running Db2 13 at FL 500
With the retry logic, Db2 13 can insert 26.5% more rows into the table, and the number of
errors is reduced by 99.9%. The 93 errors are all SQLCODE –913 instead of -904, and each
of them provides detailed lock holder and victim information that you can use to identify the
application that is causing the locking problems.
No obvious extra space is required when the new algorithm is used. Db2 13 tends to insert
more rows into the last partition or newly grown partition.
Figure 7-5 shows the average Db2 class 2 CPU per commit and average class 2 elapsed time
per commit for the test scenario against a PBG table space with MEMBER CLUSTER. Compared
to Db2 12, Db2 13 provides an improvement in CPU time. The CPU reduction comes from a
reduced amount of getpage activity because for a MEMBER CLUSTER table space, the new
algorithm keeps a retry list for a prepared INSERT statement. When the next row is inserted,
rather than searching every part, Db2 tries the partitions in the retry list first. In Db2 12, the
high getpage activity is due to the concurrent lock part jobs, which cause Db2 to keep
searching for a part that is not locked and has space for an insert.
Figure 7-5 Class 2 CPU, elapsed time, and getpage against PBG with MEMBER CLUSTER
Figure 7-6 Total rows that are inserted and error count against the PBG table space without MEMBER
CLUSTER
Testing revealed an 18% savings in class 2 CPU time for the PBG without MEMBER CLUSTER
scenario, although some increase in elapsed time because of longer global contention time
was observed, as shown in Figure 7-7. With the Db2 13 retry logic, Db2 can wait longer to get
a partition lock for some rows. A slight elapsed time increase is acceptable considering the
drastic reduction in the number of insert failures because those inserts typically drive
rollbacks and retries in applications, which are typically much more disruptive.
Figure 7-7 Class 2 CPU, elapsed time, and getpage against PBG without MEMBER CLUSTER
Figure 7-8 shows how such a situation can result from using descending partition search.
Because the search is performed in ascending or descending order randomly, empty
partitions can be present in the middle of the table space when the descending search is
used. For example, suppose that the candidate page is in partition p7, but there is no
available space. Db2 (randomly) performs a descending search, and because partitions p6 -
p1 are full, Db2 wraps around to partition p18. This partition is empty and has free space, so
Db2 inserts the row in p18. Now, the last partition p18 is in use, but previous partitions p13 -
p17 are empty.
Figure 7-8 Ascending and descending search during PBG cross-partition space search
To avoid this situation, Db2 13 enhances the descending search algorithm so that when a
descending search reaches partition 1 and there is no space, instead of going directly to the
last physical partition, Db2 determines the next best target partition to search by referring to
real-time statistics (RTS) and by caching the usage information. This process is shown in
Figure 7-9. When the descending partition search reaches p1, it switches to p12 (if partition
12 has enough space).
The internal tracking information is kept in memory and updated when an insert operation
traverses through existing partitions since the last physical open of the page set. Therefore,
each data-sharing member can have different tracking information based on the workload
activity on each member. When a page set is newly opened, insert performance gradually
improves until Db2 caches the last non-empty partition. With this new algorithm, even if there
are trailing empty partitions, Db2 does not go to the last physical partition, so there are no
empty partitions in the middle of the table space.
Performance measurements
To evaluate this enhancement, a performance study was conducted by using a PBG table
space that was defined with and without the MEMBER CLUSTER keyword.
Then, the test inserted 2 million rows by using 50 concurrent threads on each member of a
2-way data-sharing group. Concurrently, there were 50 concurrent delete threads, which
deleted 2 million existing rows on each member. The table space was defined with page-level
locking, APPEND NO, and insert algorithm 1. Testing was done by using both a MEMBER CLUSTER
PBG table space and a non-MEMBER CLUSTER PBG table space. The initial target page (based
on the clustering index) for the insert workload was mainly in partition 1.
Measurement environment
The following system configuration is used to make these measurements:
IBM z14 hardware with two CFs at CF LEVEL 23, each with two dedicated CPUs
LPAR CPU: Eight general CPs
LPAR memory: 48 GB (for each of the two LPARs)
z/OS 2.4
Db2: Two-way data-sharing group on two LPARs, one running Db2 12 at V12R1M509 and
one running Db2 13 at V13R1M500
Figure 7-11 shows the class 2 CPU and elapsed time per commit of the insert threads and
the table space getpage activity of the test run by using the MEMBER CLUSTER PBG table space.
Note the 14.6% improvement in CPU time, which is mainly because of the reduction in table
space getpage activity. Because concurrent delete activity is occurring, the lock retry strategy
(see 7.1.2, “Retrying a search of previously failed partitions” on page 242) is triggered, and
Db2 retries the parts in the retry list rather than searching each partition every time. The
result is a reduction in the number of getpages. The performance of the insert activity can
vary depending on the nature of the workload, for example, whether there is concurrent
delete activity, how many parts are filled with data, and how full these partitions are. As
always, your results might vary.
Figure 7-11 Class 2 CPU, elapsed time, and getpage against PBG with MEMBER CLUSTER
Figure 7-12 Data distribution across partitions for a PBG without MEMBER CLUSTER
From a performance perspective, the non-MEMBER CLUSTER PBG table space behavior is
different than the MEMBER CLUSTER case. With the non-MEMBER CLUSTER attribute, Db2 must
start from the target partition based on the candidate page that is determined by the
clustering index (partition 1 in this test) for every row that is inserted, which means that Db2
must go through the entire cross-partition search process. No obvious performance
differences were observed when comparing these results against the Db2 12 test results.
Figure 7-13 Class 2 CPU, elapsed time, and getpage against PBG without MEMBER CLUSTER
For this scenario, in which the candidate page (based on the clustering index) is in partition 1,
Db2 12 directly goes to partition 100 during a descending search, and it inserts rows in that
partition, establishing index clustering there. Db2 13 tries to find the last non-empty partition
(partition 17 in this scenario) during a descending search. When the page set is first opened,
this last non-empty partition is unknown to Db2. It takes some time for Db2 to learn that the
last partition with data is partition 17 (before partition 18 is filled with data) and establish a
clustering order there, making partition 17 the target partition (based on the candidate page).
Further testing showed that when the last non-empty partition (partition 17 in this case) is
reached and a cluster order is created there, Db2 13 outperforms Db2 12, with up to a 34%
CPU time improvement and a 23% getpage reduction.
7.1.4 Conclusion
In a highly concurrent workload environment, Db2 13 can achieve a higher insert success
rate against tables in PBG table spaces. With the enhanced retry insert logic, Db2 13 reduces
the possibility of failing to get partition locks and reduces the effort to diagnose the errors, with
no extra CPU cost, and some elapsed time increase.
The improved descending search algorithm for PBG table spaces eliminates the behavior of
inserting into partitions at the end when pre-allocated partitions exist, which results in empty
partitions in the middle. Performance is not affected overall, except that when a non-MEMBER
CLUSTER table page set is newly opened, it takes some time for Db2 to identify the last
non-empty partition to perform the descending search partition efficiently.
Insert performance depends on many factors. The lab tests that are described in this section
try to explain how the improved algorithm works. You might observe different results for your
workloads, depending on the PBG configuration, space usage, data distribution, insert
pattern, concurrent workloads that run together with the insert activity, and so on. The insert
behavior and performance at run time can vary.
Index look-aside is always enabled for read operations (select and fetch). For update activities
(insert, update, and delete operations) before Db2 13, index accesses for SQL insert and
SQL delete operations can use index look-aside for clustering indexes and non-clustering
indexes with high cluster ratios based on catalog statistics. Before Db2 13, index look-aside is
not supported for update operations.
With Db2 13, index look-aside support is expanded in scope. Index look-aside can be enabled
for all indexes during SQL insert, update, and delete operations regardless of the cluster ratio
in catalog statistics. Db2 13 monitors each index’s look-aside efficiency for insert, update, and
delete operations by using look-aside hit counters within a commit scope. With these
counters, Db2 turns on look-aside for an index when look-aside can be beneficial and
disables look-aside for an index when it fails to meet certain criteria. This dynamic
management of look-aside for each index ensures that look-aside is used efficiently without
depending on catalog statistics, which can be inaccurate.
Index look-aside support for operations other than insert, update, and delete is unchanged in
Db2 13 and behaves the same as in Db2 12.
Measurement environment
The following setup was used for these measurements:
IBM z15 hardware
Two CFs at CF LEVEL 24, each with three dedicated CPUs
LPAR memory: 512 GB (for each of the two LPARs for the data-sharing tests)
LPAR CPU: Eight general CPs and two IBM zSystems Integrated Information Processors
(zIIPs) (for each of the two LPARS for the data-sharing tests)
z/OS 2.4
Db2:
– Base measurements: Db2 12 at FL 510
– Db2 13 measurements: Db2 13 at FL 100
The indexes that were used by the workloads and their characteristics are listed in Table 7-1.
The base table space that the workloads use is a PBR table space with 32 partitions.
Cluster ratio % 98 88 68 48 18 0
Unique? Yes No No No No No
Measurement results
The index look-aside optimization that was introduced in Db2 13 provides solid improvements
of the workload performance. The performance results are described in three parts:
Insert workloads
Update workloads
Delete workloads
Note: The performance results that were observed by using the workloads are highly
dependent on the workload behavior, the cluster ratios of the indexes, data characteristics,
testing environments, and so on. The performance results from this Db2 13 enhancement
are likely to be different in your environment when running your workloads.
By analyzing the getpage counts of the indexes with different cluster ratios for the insert
workloads, the following results were observed:
Getpage counts for IXPBR01 are similar between the Db2 12 and Db2 13 tests because
index look-aside is enabled for both Db2 12 and Db2 13 for clustering index IXPBR01.
Without RUNSTATS, although IXPBR02 has a high cluster ratio of 88%, in Db2 12, index
look-aside is not enabled for this index. However, in Db2 13, the dynamic monitoring
algorithm ensures that index look-aside is enabled for IXPBR02 when suitable, and the
number of getpages for IXPBR02 is reduced by close to 80%.
When RUNSTATS ran and the IXPBR02 high cluster ratio updated in the catalog statistics for
the index, index look-aside is enabled for IXPBR02 in Db2 12, and the getpage counts are
similar between the Db2 12 and Db2 13 tests.
For indexes IXPBR03, IXPBR04, and IXPBR05 with lower cluster ratios, with or without
RUNSTATS, Db2 12 does not enable index look-aside for them. In Db2 13, with the dynamic
enablement of index look-aside, getpages were reduced for all three of these indexes. The
higher the cluster ratio, the greater the getpage reduction for that index.
For index IXPBR06, which has a cluster ratio of 0, the getpage count of the Db2 12 and
Db2 13 tests are almost the same. Because index look-aside for IXPBR06 does not
provide any benefits, Db2 13 dynamically turns off look-aside at run time, but Db2 12
never enabled index look-aside for IXPBR06.
Running the same test with or without FTBs allocated results in similar percentages of
getpage reduction for the same indexes.
Running the same test by using data-sharing and non-data-sharing configurations also
results in a similar reduction in the number of getpages for the same indexes.
Figure 7-15 Insert workloads: Total index getpages in a two-way data-sharing environment
For the non-data-sharing tests, the following results were observed when comparing the Db2
13 performance results to the Db2 12 results:
For the tests without FTBs allocated, with a total getpage reduction of 30 - 41% for all the
indexes (depending on whether RUNSTATS ran for the Db2 12 tests), we observed a 5 - 9%
total class 2 CPU time reduction. We also see some amount of class 2 elapsed time
reduction. For more information, see Figure 7-16.
Figure 7-16 Insert workloads: Class 2 CPU and class 2 elapsed time changes (Db2 13 versus Db2 12)
For the tests with FTBs allocated for all the indexes, we observed a total getpage
reduction of 37% for all the indexes and a total class 2 CPU time saving of approximately
2%. For more information, see Figure 7-16.
Note: When FTBs are allocated for the indexes, we see fewer CPU savings because
fast index traversal already helps to reduce the index tree traversal cost. The index
look-aside function is further reducing CPU cost, in addition to fast index traversal, by
performing fewer index leaf page getpage requests.
Note: The data-sharing tests showed fewer CPU savings with the Db2 13 index
look-aside enhancements than the non-data-sharing tests, even though they have
similar reductions in getpage activity. This result is expected because in a data-sharing
environment, the getpage cost makes up less of the cost of the overall transaction
compared to a non-data-sharing environment.
We also checked the Db2 system address space CPU usage and storage usage, and no
notable differences were observed between the Db2 12 and Db2 13 tests.
Figure 7-16 on page 256 depicts the class 2 CPU time and class 2 elapsed time changes for
the insert workloads. The results from testing with Db2 13 are compared to the results of
testing with Db2 12. For workload characteristics details for scenarios 1, 2, 3, and 4, see
“Measurement scenario description ” on page 252.
To check for possible regression, we designed a random update workload, which is referred to
as update workload scenario 1, which is described in “Measurement scenario description ” on
page 252.
To evaluate whether index look-aside support for UPDATE operations is beneficial, we also
created a sequential update workload, which is referred to as update workload scenario 2.
Analyzing the getpage activity of the indexes with a different cluster ratio for these update
workloads shows a reduction in the getpage count for all indexes with index look-aside
enabled:
For the random update workload, the getpage activity of the clustering index IXPBR01
was reduced by approximately 19% when fast index traversal (FTB) is not used.
For the random update workload, except for the clustering index IXPBR01, all indexes
show a getpage reduction of 29% when fast index traversal is not used.
For the sequential update workload, when fast index traversal is not used, getpage activity
was reduced 47% - 85% for each index, depending on their clustering ratio. The higher the
clustering ratio, the greater the getpage reduction.
For both update tests, when FTBs are allocated, the getpage reduction percentages are
slightly less for all the indexes than the reductions for the same indexes when fast index
traversal is not enabled for them.
The index getpage reductions are similar for the test runs that use a non-data-sharing and
a data-sharing configuration.
Note: In Figure 7-17, w/o FTB means that FTB structures are not used for the indexes.
W/FTB means that FTB structures are used for the indexes.
With all indexes experiencing different levels of getpage reductions, the workload CPU usage
also shows different degrees of savings. The sequential update tests that do not have fast
index traversal that is enabled for the indexes show the largest class 2 CPU savings, with over
16% among all the tests.
The following list summarizes a few key observations about changes in CPU usage for the
update workloads when comparing the Db2 13 runs to the Db2 12 runs, as shown in the top
half of Figure 7-18 on page 259:
For update workload 1 (random updates), more than 2% class 2 CPU saving was
observed when fast index traversal is not used. The minor increase in class 2 CPU time for
one of the tests in non-data sharing with FTB enabled for the indexes is small and within
measurement noise level.
The update workload 2 (sequential update) shows a class 2 CPU time savings of 16%
when fast index traversal is not used in a non-data-sharing environment. In a 2-way
data-sharing environment, the same test shows a class 2 CPU saving close to 12%.
With fast index traversal enabled for the indexes, update workload 2 shows a class 2 CPU
saving of 4% in a non-data-sharing environment, and 3% when using data sharing, which
is good.
The class 2 CPU time makes up less than 30% of the class 2 elapsed time for all the
measurements, so the class 2 CPU time changes do not affect the class 2 elapsed time
numbers much. As shown in the bottom half of Figure 7-18 on page 259, the class 2
elapsed time differences between the Db2 12 and Db2 13 tests are all within
measurement noise level.
We also verified the Db2 address space CPU and storage usage for these measurements,
and we observed no notable differences between the Db2 12 and Db2 13 tests.
Analyzing the performance results of the different test runs shows overall positive results for
the Db2 13 tests.
The following list summarizes the differences in getpage activity for the indexes:
All non-clustering indexes with a cluster ratio greater than 0 show a getpage count
reduction from 15% to 84%. The greater the cluster ratio, the bigger the percentage of the
reduction.
The getpage counts for clustering index IXPBR01 are similar between Db2 12 and Db2 13
tests because index look-aside is used for this index in both Db2 12 and Db2 13.
For index IXPBR06, which has a cluster ratio of 0, getpage counts are similar between the
Db2 12 and Db2 13 tests because index look-aside does not help avoid getpages for this
index.
For all six indexes, without FTBs allocated, Db2 13 performs 39% fewer getpages for
these indexes in total. With FTBs allocated, Db2 13 performs 34% fewer index getpages in
total.
As a result of the reduction in getpage activity, we see CPU time savings for three out of the
four measurements, as shown in Figure 7-20:
As with the insert and update test runs, we see more CPU savings from the Db2 13
enhanced index look-aside capability for the delete tests when fast index traversal is not
enabled. We get more than an 8% class 2 CPU savings for the non-data-sharing test.
When the sequential delete workload is run in a data-sharing environment, and all the
indexes have FTBs that are allocated, there is a 0.4% class 2 CPU usage increase, which
is small enough to be considered measurement noise.
The Db2 class 2 elapsed time of the delete application was reduced for all the sequential
delete scenarios with Db2 13, as shown at the right of Figure 7-20.
Figure 7-20 Delete Workload total class 2 CPU and total class 2 elapsed time changes
We also checked the Db2 address space CPU and storage usage for the delete workload
measurements, and we observed no notable differences between the Db2 12 and Db2 13
tests.
Summary of results
The results of this performance study show that the optimized index look-aside capability in
Db2 13 can improve index insert, update, and delete efficiency for indexes with low cluster
ratios in all types of workloads that were tested. With index look-aside being dynamically
monitored and turned on and off for individual indexes, Db2 13 makes sure that look-aside is
enabled only when it benefits an index and that it is turned off when there is no benefit.
Fast index traversal (FTB) is a powerful Db2 performance feature to reduce index getpages.
When fast index traversal is in effect for an index, the index access is greatly optimized.
Understandably, the performance benefits of using index look-aside for an FTB-enabled index
are not as prominent as when index look-aside is used for an index that does not have FTBs
allocated.
We also observed performance benefits from the Db2 13 enhanced index look-aside
capability in other workloads that were not designed for evaluating index look-aside
performance. Some of the sequential high-insert batch workloads (HIWs) also experienced
notable CPU time savings with reduced index getpage activity. Some batch subworkloads in
the performance regression bucket (PRB) workload experienced performance improvements
from this enhancement too. For more information about HIW and PRB, see Chapter 4,
“Workload-level measurements” on page 149.
To mitigate the overhead that is involved with diagnosing index split problems, Db2 13
introduces a new Instrumentation Facility Component Identifier (IFCID) 396 trace record, and
added three new fields in the Db2 catalog to track index splits and abnormal index splits.
REORGTOTALSPLITS Total number of index splits since the last reorganization or rebuild
REORGSPLITTIME Total or aggregated elapsed time of index splits since last reorganization or
rebuild
REORGEXCSPLITS Total number of abnormal index splits (such as elapsed time > 1 s) since
the last reorganization or rebuild
Testing showed that the cost of this trace is minimal, even if you have a large volume of
exceptional index splits. In this case, the only thing you might notice is an increase in the trace
volume for the trace destination that is used by IFCID 396. Alternatively, you can turn on
IFCID 359, which records every event of index splits in Db2 12 and prior releases. However,
IFCID 359 can become a large volume of trace data in heavy insert workloads and be
cumbersome to use to identify the record with high wait time. The new IFCID 396 improves
productivity by filtering on only problematic records.
IFCID 396 is available starting at Db2 13 FL 100, and contains the fields that are shown in
Table 7-3.
QW0396GBPD CHAR(1) Whether the current index page set is marked as GBP
dependent. ‘Y’ = GBP dependent.
QW0396ElapseTime FIXED(32) Total elapsed time of current abnormal index split (in
milliseconds).
When you think that insert activity is getting slow, and you suspect that it might be related to
long or many index splits, you can use the following process to diagnose the situation:
1. Review the Db2 accounting data, and check whether Db2 latch suspension time is one of
the major contributors and takes more time than normal, as shown in Figure 7-21.
You can also use the TOP command in the IBM Tivoli OMEGAMON for Db2 Performance
Expert (OMPE) reporting tool to report transactions for the main contributors.
Figure 7-21 Sample accounting trace record with high Db2 LATCH suspension time
Figure 7-22 Sample statistic trace record showing high LC06 and LC254 contention
Starting in Db2 13, you can also print the IFCID 396 records during a slowdown period, which
provide information about long (> 1 second) index split events. Using this information, you can
tell which application or applications and which index or indexes contributed to the index split
exceptions. IFCID 396 also provides the URID and page number, which can help you or the
IBM service team to zoom in on the problem. A sample (formatted) IFCID 396 record is
shown in Figure 7-23.
3. Using the index name from the IFCID 396 trace, you can query the
SYSIBM.SYSINDEXSPACESTATS catalog table for this problem index, confirming the number of
splits, the time they took, and the number of problematic splits. In our example, which is
shown in Figure 7-24, we are evaluating index FALCDB01.PK_UIX_ORDER.
The CURRENT LOCK TIMEOUT special register can be set at the application level to control the
number of seconds that the application waits for a lock before it times out. Db2 13 must be at
FL 500 and later, and the application must be set up to use application compatibility
(APPLCOMPAT) V13R1M500 or later.
To leverage this new function, install the PTF for IRLM APAR PH43770.
Results
No performance regression was observed for either workload. For more information about
these workloads, see Appendix A, “IBM Db2 workloads” on page 403.
In addition, a new trace record, IFCID 437, was added to record every use of the SET CURRENT
LOCK TIMEOUT statement.
For more information about these instrumentation changes, see 12.6, “IFCID changes for
application timeout and deadlock control” on page 357.
For local threads, during the period that the profile is active, each time a local package is
loaded for running, Db2 performs a look-up in the profile tables to find the correct profile
based on the filtering criteria that is specified in DSN_PROFILE_TABLE. A package is loaded
when the first SQL statement in that package runs. If there is a match of the criteria, Db2
performs the specified actions in DSN_PROFILE_ATTRIBUTES for the matching profile, for
example, SET CURRENT LOCK TIMEOUT = 200. If the package is released at COMMIT due to the
RELEASE(COMMIT) bind option, the next SQL statement from that package that runs after the
COMMIT loads the package again.
For remote threads, profiles are evaluated and SET statements are processed only when the
first package loads, and the first non-SET statement within that package runs. Subsequent
package loads do not cause the profile to be re-evaluated.
At the time of writing, only the use of SET CURRENT LOCK TIMEOUT,
DEADLOCK_RESOLUTION_PRIORITY, and RELEASE_PACKAGE options are supported for local
applications. No other special registers or global variables are supported for local profiles.
The system profile table support for local or remote applications requires Db2 to be started
with the DDF subsystem parameter set to AUTO or COMMAND.
Test case 1
This test case measures the CPU time overhead by using a simple, single-thread query:
SELECT 1 FROM SYSDUMMY1
The application package is bound with RELEASE(COMMIT). It repeats the same query
100 times and commits after each query. Both SET CURRENT LOCK TIMEOUT and
SYSIBMADM.DEADLOCK_RESOLUTION_PRIORITY are specified in the profile tables.
Before the test starts, a distributed Java program populates the Db2 internal hash table that
stores the profiles with 50,000 different connections, each with a distinct Db2 application
name.
Test case 2
This test case measures the CPU time overhead with a long-running query with a single table
access that returns 15,000 rows, and then issues a commit that is followed by issuing 100
searched UPDATE statements with a commit after each UPDATE statement. The target updated
table has 4.5 million rows. The application package is bound with RELEASE(COMMIT). The SET
CURRENT LOCK TIMEOUT statement is specified in the profile table. The following test scenarios
are used:
Profiling is active with matching profile attributes.
Profiling is active with non-matching profile attributes.
Profiling is deactivated.
Test case 4
This test case evaluates the CPU time overhead and successful DDL break-in rate by using
the distributed SQLJ IRWW 400 Warehouse (32 GB) OLTP workload for remote profile
support. The detailed workload description can be found in Appendix A, “IBM Db2 workloads”
on page 403.
The following three test scenarios are measured with and without the new remote profile
enhancement implementation (feature on versus feature off):
Profiling is active with matching profile attributes with DDF PKGREL BNDOPT.
Profiling is deactivated with MODIFY DDF PKGREL set to COMMIT.
Profiling is deactivated with MODIFY DDF PKGREL set to BNDOPT.
Measurement environment
The performance measurements for test case 1 and test case 2 were conducted by using an
IBM z14 with six general-purpose central processors (GPCs), two ZIIPs with 80 GB of
memory, and DS8800 disk storage.
Measurement results
The measurements from test case 1 revealed the following key results:
Profiling activated with matching attribute versus profiling deactivated
Up to 10% class 2 CPU time overhead was observed with a difference in absolute
numbers of about 100 ms.
Profiling activated with non-matching attribute versus profiling deactivated
Up to 5% class 2 CPU time overhead was observed with a difference in absolute numbers
of about 60 ms.
The measurements from test case 2 indicated that due to the complexity of the SQL
statements that are injected by the application, the class 2 CPU overhead is within the noise
range for both comparisons (that we also used for test case 1).
The measurements from test case 3 revealed the following key results:
An average class 2 CPU time overhead around 1.37% at the (complete) workload level
when comparing profiling activated with profiling deactivated
The successful DDL break-in rate increased from 45% (54 out of 120) with profiling
deactivated to 100% with profiling activated.
For test case 4, the results of the different measurement scenarios are shown in Table 7-4.
“Feature off” means that the Db2 code to support this enhancement was not installed, but
“Feature on” means that the code for this enhancement was installed.
Table 7-4 Performance results of using RELEASE_PACKAGE profile support with remote applications
Profile and PKGREL DDL successful break-in rate Total CPU time in
(%) microseconds/commit
The measurements in Table 7-4 show that enabling high-performance database access
threads (DBATs) (PKGREL BNDOPT to trigger RELEASE(DEALLOCATE)) behavior reduces the
average CPU time per transaction in this particular workload (comparison of first two rows).
With the new remote profile enhancement, the thread that is performing the DDL operations
can break in and successfully run the DDL statements when the system profile is on (with a
matching profile that specifies the RELEASE_PACKAGE keyword). Because the packages are
changed to RELEASE(COMMIT) behavior by using this feature, the benefits of high-performance
DBATs are unavailable, as indicated in the CPU time in the last row in the table. For this
reason, it is a best practice that you turn on a profile only when you must, and after achieving
the goal (DDL completed successfully), you turn off the profile again.
Summary of results
More detailed analysis of the CPU time showed that the extra CPU time overhead comes
from Db2 loading the package, matching the profile, and performing the specified action in the
profile tables. The percentage of the CPU time overhead can be workload-dependent. Db2
system profiles are efficient and easy to use and can help you to temporarily change existing
application behavior to achieve a higher level of application concurrency without modifying
existing applications.
7.5.3 Conclusion
A slight performance overhead is incurred when using system profiles because Db2 must
search the internal hash table to find the matching profile and apply the specified action or
actions in the DSN_PROFILE_ATTRIBUTE table at package load time. The percentage of the
CPU time overhead is workload-dependent. Generally, the more complex the statements in a
package, the smaller the observed overhead percentage.
Using system profiles provides flexibility and an easy way for you to change the application
behavior temporally without paying the cost of modifying the existing applications to achieve a
higher level of concurrency in some circumstances, such as running concurrent DDLs.
8.1.1 Requirements
To use this feature, you must be using Db2 13 at function level (FL) 100 or later. No rebind is
needed.
Measurement environment
The performance evaluation was conducted on IBM z14 hardware with four general
processors and three IBM zSystems Integrated Information Processors (zIIPs), with 32 GB of
storage, and running z/OS 2.4 and Db2 13. IBM DS8800 (disk) storage devices were used.
Example 8-1 Queries that can benefit from Db2 13 sort enhancements
SELECT C8,C9,C16,C32,
SUM(INT1),SUM(DECM)
FROM SORTTAB3
WHERE ROWNUM <= 10000
GROUP BY GROUPING SETS ((C8,C9),
(C16,C32)) ;
SELECT C16,
SUM(DISTINCT INT1),SUM(DISTINCT INT2),COUNT(*)
FROM SORTTAB3
WHERE ROWNUM <= 10000
GROUP BY C16 ;
Summary of results
Performance measurements showed up to a 40% reduction in class 2 CPU time and up to a
67% reduction in the number of getpage requests for the sort work file buffer pool for qualified
queries. Generally, the more the data that is in sorting order when it arrives in the sort
component, the more CPU time savings that can be expected.
8.1.3 Conclusion
Workloads with heavy usage of qualified queries can expect a reduction in both CPU time and
number of getpage requests for sort buffer pool activity.
8.2.1 Requirements
To use this feature, you must be using Db2 13 at FL 100 or later. No rebind is required.
Measurement environment
The performance measurements were conducted on IBM z14 hardware with four general
processors, three zIIPs, 32 GB of storage, and running z/OS 2.4 and Db2 13. DS8800 (disk)
storage devices were used.
Results
The following improvements were observed:
The number of sort buffer pool getpage requests were reduced by 12% - 50%.
Db2 class 2 CPU time was reduced by 11% - 60%.
Up to 40% class 2 elapsed time reduction was observed.
Summary of results
Performance measurements showed up to a 60% class 2 CPU time reduction and up to a
50% reduction in the number of getpage requests for the sort buffer pool for qualified queries.
The more SUBSTR internal buffer storage savings, the more CPU time and elapsed time
savings. The use of the SUBSTR function also reduces the size of the record that is written to
the logical work file that is stored in the local buffer pool. This reduction allows for more
entries to be stored on a single page and contributes to the reduction in getpage activity.
8.2.3 Conclusion
Workloads with heavy usage of qualified queries are expected to see reductions in both CPU
time and elapsed time, and a reduction in the number of getpage requests for the sort work
file buffer pools.
8.3.1 Requirements
To use this feature, you must be using Db2 13 at FL 100 or later. No rebind is required.
Measurement environment
The performance measurements were conducted on IBM z14 hardware with four general
processors, three zIIPs, 32 GB of storage, and running z/OS 2.4 and Db2 13. DS8800 (disk)
storage devices were used.
Measurement scenario
SELECT queries that include an ORDER BY clause and a defined length of the last column in the
sort key that is greater than 100 bytes were used to evaluate this enhancement. The queries
use different actual data sizes that are smaller than the defined column length. In
Example 8-3, the VARCH1 column is the candidate column that should be able to leverage
this performance enhancement if its defined length is greater than 100 bytes.
Results
The following improvements were observed for the second and subsequent run of the
qualified queries:
Class 2 CPU time was reduced by up to 10%.
The number of sort buffer pool getpage requests was reduced by up to 24%.
Summary of results
In general, the greater the difference between the actual data length and the defined length,
the more memory storage and work file usage reduction for qualified queries can be
expected, which results in more CPU time and elapsed time savings.
Usage considerations
If the data in the table is updated or inserted with longer column values for the long
VARCHAR column than Db2 recorded during the previous run, for the next run of the same
query, the Db2 sort component detects this update and uses the new larger VARCHAR
values, which can cause performance to degrade compared to the previous time that the
query ran. If this situation happens, the performance enhancement is disabled for this query.
However, the performance of the following query run should be the same as the first time that
the query ran.
Figure 8-1 Db2 statistics report formatted by IBM OMEGAMON for Db2 Performance Expert
During REBIND, the Db2 optimizer evaluates all the possible access plans to find the most
optimal plan. This evaluation process requires virtual and real memory to store Db2 internal
data structures for costing different join sequences. The larger the number of tables that must
be joined, the more memory that is needed.
When the APREUSE(ERROR) or APREUSE(WARN) option is specified, Db2 first tries to retrieve the
saved access plan by using a plan hint, and attempts to reuse the saved plan. If the access
path is reused successfully, Db2 Query optimizer stops further processing. If Db2 fails to
reuse the saved plan, the query optimizer falls back to using the normal cost evaluation
process.
Because the size of the specific Db2 data structures that are allocated is based on the
complexity of the query (such as the number of tables and number of the predicates), we
expect that more tables are joined in one query block, and the more complex the query, the
greater the reduction in memory storage.
8.4.1 Requirements
To use this feature, you must be using Db2 13 at FL 100 or later.
Measurement environment
The performance measurements were conducted on IBM z14 hardware with four general
processors, three zIIPs, 32 GB of storage, and running z/OS 2.4 and Db2 13. DS8800 (disk)
storage devices were used.
Measurement scenarios
The measurements compare the REBIND performance of static packages that have queries
with a different number of tables that are joined. The results are compared with and without
the feature enabled.
Scenario 1 measures a successful APREUSE with different numbers of joined tables with
APREUSE(WARN). Each measurement was run with 10 concurrent threads.
Scenario 2 includes three different APREUSE scenarios for rebinding a package with a query
that joins 200 tables in one query block. Each measurement is a single thread. The three
APREUSE scenarios that were compared are as follows:
– REUSE SUCCESS: REBIND APREUSE(WARN) and the access path was reused successfully.
– REUSE FAIL: REBIND APREUSE(WARN) and the access path was not reused. A warning
was issued.
– REUSE NO: REBIND APREUSE(NO).
Measurement results
The measurement results for both scenarios are provided in the following subsections.
Table 8-1 2 GB above-the-bar shared real storage savings for different number of tables that are joined
Number of tables Old design New design Savings (MB) Saving %
joined (MB) (MB)
Table 8-2 2 GB above-the-bar shared real storage savings for different APREUSE results
Reuse result Old design New design Savings (MB) Saving %
(MB) (MB)
Summary of results
Successful APREUSE usage for queries in which many tables are joined in one query block
greatly benefit from this enhancement. The storage reduction can be verified by looking at the
2 GB ATB shared real storage counters in the Db2 statistics report.
8.4.3 Conclusion
The memory storage reduction for a successful APREUSE of static packages, especially ones
with complex queries that involve many tables that are joined in one query block, helps overall
system performance when running many REBIND operations concurrently as other Db2
applications in a memory-constrained system.
The following two extra enhancements are included in this feature to further improve LOAD
PRESORT performance:
The usage of the block-level interface both to and from IBM Db2 Sort for z/OS (Db2 Sort)
instead of row by row to save CPU time when Db2 Sort is used. This enhancement applies
to Db2 Sort only.
Skipping the index key sort if the keys of an index match the clustering keys because the
data is in order.
9.1.1 Requirements
To use this feature, your Db2 environment must meet the following minimum requirements:
Db2 12 at function level (FL) 500 or later or Db2 13
For Db2 12, APARs PH23105 and PH34323
Measurement environment
The performance evaluation was conducted on IBM z14 hardware with four general
processors, one IBM zSystems Integrated Information Processor (zIIP), and running z/OS 2.3
and Db2 12.
The following three scenarios were used to evaluate the performance improvements for the
LOAD PRESORT utility enhancements:
Scenario 1 does a LOAD followed by REORG. LOAD is using INDEXDEFER ALL, and REORG is
using SORTDATA.
Scenario 2 sorts the input source data in a separate job by using DFSORT, and then uses
LOAD with PRESORTED YES specified.
Scenario 3 uses LOAD PRESORT.
Results
The performance comparison is between Db2 12 with the feature and Db212 without the
feature.
The metrics for comparison are total elapsed time, CPU time, and zIIP time, which are shown
in Figure 9-1.
Summary of results
When the results of scenario 3 are compared with scenario 1, the total elapsed time
improvement is up to 27%, and an improvement of up to 23% in total CPU + zIIP time is
observed for the combination of 30 input source data sets and DFSORT. An improvement of up
to 8% in total CPU + zIIP time improvement is observed for single input source data set and
DFSORT combined. The Db2 Sort improvement is either equivalent to DFSORT or better.
When the results of scenario 3 are compared with scenario 2, the total elapsed time
improvement is up to 9%, and an improvement of up to 7% in total CPU + zIIP time is
observed for 30 input source data sets and DFSORT combined. An improvement of up to 28%
in total elapsed time improvement is observed when a single input source data set is used
combined with DFSORT. The Db2 Sort improvement is either equivalent to DFSORT or better.
9.1.4 Conclusion
LOAD PRESORT helps to simplify the load process if you want to load tables in clustering key
order when the source data is not in clustering key order. Also, you might benefit from
reduced end-to-end elapsed times and a reduction of GP CPU time.
Before the availability of this feature, you had to unload the data, drop the PBG source table
space, create a new PBR table space and a new target table, and load data into the new
target PBR table space.
With this new Db2 13 feature, you can convert the table space from a PBG to a PBR table
space (relative page numbering (RPN) only) by using a simple ALTER TABLE ALTER
PARTITONING TO PARTITION BY RANGE statement. If the data set of the PBG table space is not
defined, the conversion is immediate. Otherwise, the ALTER TABLE statement puts the PBG
table space in an advisory reorg (AREOR) pending state, and a subsequent REORG
TABLESPACE with either SHRLEVEL REFERENCE or CHANGE is required to materialize the change.
9.2.1 Requirements
To use this feature, your Db2 environment must meet the following minimum requirements:
Db2 13 at FL 500 or later
APPLCOMPAT level V13R1M500 or later
Measurement environment
The performance evaluation was conducted on IBM z14 hardware with four general
processors, one zIIP, and running z/OS 2.4 and Db2 13.
Example 9-1 shows an ALTER TABLE statement that converts the partitioning scheme of table
PBG_TABLE from PBG to PBR. The target PBR table has three partitions with limit keys
value_1, value_2, and MAXVALUE. The target PBR table’s partition column is PARTITION_COL.
After the ALTER TABLE statement runs, the corresponding information is recorded in the
SYSIBM.SYSPENDINGDDL catalog table, and the PBG table space is placed in the AREOR
pending state. No other change occurs until the REORG TABLESPACE is done to materialize the
change.
The following three scenarios were used to evaluate the performance improvements that are
associated with enhancements to the online conversion from PBG to PBR process:
Scenario 1 is a regular REORG TABLESPACE of the PBG table space with SORTDATA YES and
STATISTICS TABLE ALL INDEX ALL. This scenario is the baseline for the comparison.
Scenario 2 is a REORG TABLESPACE to convert the PBG table space to a 3-partition PBR
table space. SORTDATA YES is enforced, and the default inline statistics with TABLE ALL
INDEX ALL are collected.
Scenario 3 is a REORG TABLESPACE to convert the PBG table space to a 30-partition PBR
table space. SORTDATA YES is enforced, and the default inline statistics with TABLE ALL
INDEX ALL are collected.
All three measurements were conducted with Db2 13. The elapsed time and the sum of CPU
and zIIP time are used as metrics for the comparisons.
Summary of results
The following list summarizes the key points of this performance assessment:
The materializing REORG scenarios (scenarios 2 and 3) have a higher cost in the UNLOAD
phase because REORG must extract the limit key and evaluate the new partition number in
the target PBR table space for each record that is unloaded. Another contributing factor is
the building the compression dictionaries: Instead of one dictionary, a separate
compression dictionary is built for each PBR partition.
The materializing REORG has a higher cost for collecting table statistics and performing the
INLINE COPY when the number of PBR partitions is higher.
Although the materializing REORG has a higher value that is recorded for the LOG and SWITCH
phases, the number remains less than 1 second in the test.
However, page sampling was not supported when using inline statistics. This Db2 13 feature
fills this gap by adding page sampling support for REORG TABLESPACE and LOAD REPLACE by
gathering inline statistics for cardinalities, frequencies, distribution, and histogram statistics.
To invoke the page sampling method, the keyword TABLESAMPLE SYSTEM is added to the REORG
and LOAD syntax with the following options:
AUTO Db2 determines the sampling rate based on the size of the table: the
larger the table, the smaller the sampling rate. There is an internal
threshold of 500,000. If the table has fewer rows than the threshold,
page sampling is not used, and all the table’s pages go through the
statistics collection process.
NONE No page sampling occurs.
A numeric-literal A user-provided positive number 1 - 100 that is used as the sampling
rate.
The SAMPLE keyword existed for sampling inline statistics before this enhancement. This
keyword is intended for row-level sampling in which all the records are passed to the statistics
subtask, and the statistics subtask selects a portion of the records for statistics collection.
Page sampling should result in a greater cost reduction than row-level sampling because
fewer rows are passed to the statistics subtask and fewer rows must be sorted, if sorting is
required for distribution or column statistics.
Similar to the RUNSTATS utility, the existing STATGSAMP subsystem parameter determines
whether to use page sampling by default for inline statistics. STATGSAMP provides the following
options:
SYSTEM (default) or YES RUNSTATS or inline statistics always run as though
TABLESAMPLE SYSTEM AUTO is specified, unless TABLESAMPLE
SYSTEM NONE or a numeric-literal value is specified in the
statement. Even when the SAMPLE keyword is specified in
the inline statistics statement, row-level sampling is
ignored, and page sampling is used instead.
NO RUNSTATS or inline statistics do not use page sampling by
default. Any sampling is determined by what is specified in
the statement.
Measurement environment
The performance evaluation was conducted on IBM z14 hardware with four general
processors, one zIIP, and running z/OS 2.4 and Db2 13.
In each scenario, the test is performed on a PBR table space with 30 logical partitions
(LPARs) and a PBG table space with three physical partitions. Both the PBR and the PBG
table space contain approximately 180 million records with a record length 142 bytes. There
is no index on either of the tables because page sampling applies only to the table space, not
the index.
For each scenario, the tests collect two types of statistics: the first collects simple table
statistics, and the other collects more complex statistics (table statistics plus 10 frequencies
and 10 histogram quantiles for each of the seven column groups).
Results
The results of this evaluation are presented in two parts. The measurements for REORG
TABLESPACE STATISTICS with page-level sampling are shown first, followed by the same
measurements for LOAD REPLACE STATISTICS with page-level sampling.
Figure 9-4 shows the REORG measurements for a PBG table space with inline statistics that
uses page-level sampling. These results are similar to the performance improvements when
using a PBR table space.
Figure 9-4 Performance results for page-level sampling of inline statistics with REORG on a PBG table
Figure 9-5 Page-level versus row-level sampling with REORG on a PBG table space
Figure 9-6 Performance results for page-level sampling of inline statistics with LOAD REPLACE on a PBR table space
Figure 9-7 Performance results for page-level sampling of inline statistics with LOAD REPLACE on a PBR table space
As shown in Figure 9-6 on page 288 and Figure 9-7, using inline statistics with page-level
sampling with the LOAD utility produces performance improvements that are similar to the ones
for the REORG utility.
Summary of results
Using page sampling with a sampling rate of 10 at the inline statistics phase in LOAD and
REORG saves approximately 90% of CPU + zIIP time compared to no sampling, regardless of
whether “simple” statistics or “complex” statistics are collected.
When the inline statistics collection phase uses a page-level sampling rate of 10, CPU + zIIP
time for “simple” statistics is reduced by approximately 10%, and CPU + zIIP time for
“complex” statistics is reduced by approximately 90% when compared to using row-level
sampling.
Because the cost of collecting complex statistics, such as distribution or column group
statistics, weighs more in the overall REORG or LOAD job, the overall CPU and elapsed time
savings from using page-level sampling for the entire job is more significant.
Important: If the existence or accuracy of the statistics is crucial to your access path
selection, consider the following options:
Use TABLESAMPLE SYSTEM numeric-literal to specify a higher sample rate.
Specify TABLESAMPLE SYSTEM NONE in the inline statistics statement.
Set the STATGSAMP subsystem parameter to NO to disable page sampling.
9.3.4 Conclusion
Because inline statistics with page-level sampling can result in reductions in CPU and
elapsed time compared to no sampling or row-level sampling, it is a best practice to use
page-level sampling whenever possible with existing inline statistics jobs. When the
STATGSAMP subsystem parameter is set to SYSTEM (default), page-level sampling is enabled
automatically. If you do not want to use page-level sampling, consider setting STATPGSAMP to
NO or specifying TABLESAMPLE SYSTEM NONE in the inline statistics statements to disable it.
You can enable this feature by setting the UTILITY_HISTORY subsystem parameter to UTILITY
(NONE is the default value) directly in the DSNTIJUZ installation job, or you can specify it on the
DSNTIP63 installation panel (Db2 utilities parameters panel 4). This feature is available in FL
V13R1M501 and later.
The utilities history records are stored in the new SYSIBM.SYSUTILITIES catalog table. For
each utility run, a row is inserted and updated in this table to record the job name, utility
name, starting and ending timestamp, elapsed time, CPU time, zIIP time, and return code.
You can query this table to retrieve information about a certain utility job or to do some
analysis about past runs of certain utilities.
You can collect history records for the following Db2 utilities:
BACKUP SYSTEM
CATMAINT
CHECK DATA
CHECK INDEX
CHECK LOB
COPY
COPYTOCOPY
LOAD
MERGECOPY
MODIFY RECOVERY
MODIFY STATISTICS
QUIESCE
REBUILD IDNEX
RECOVER
REORG
REPAIR
RUNSTATS
UNLOAD
9.4.1 Requirements
To use this feature, you must be using Db2 13 at FL 501 or later.
Measurement environment
The performance evaluation was conducted on IBM z14 hardware with four general
processors, one zIIP, and running z/OS 2.4 and Db2 13.
The performance test of this feature includes the following seven utilities:
LOAD
REBUILD INDEX
COPY
REORG
RUNSTATS
RECOVER
UNLOAD
Because only one record is inserted and updated in the SYSIBM.SYSUTILITIES table for each
utility job, you can expect a slight amount of overhead when utility history information is
collected. The main purpose of this test is to ensure that the performance-related information
(such as elapsed time, CPU time, and zIIP time) in SYSIBM.SYSUTILITIES matches the
numbers in accounting reports.
The numbers from SYSIBM.SYSUTILITIES (the first group of columns from the right) match the
numbers from the Db2 accounting reports (second group of columns from the right).
9.4.4 Conclusion
DBAs or system administrators who run Db2 utilities find that the utility history feature is a
convenient tool that provides relief from the burden of manually managing the bookkeeping of
past utility runs.
To help alleviate the difficulties that are associated with recovering database objects, Db2
added a redirected recovery option for table spaces and index spaces to the Db2 RECOVER
utility. To leverage this feature, you must have a target table space and index space with the
same definition as the definition of the source, and both must be in the same Db2 subsystem.
Redirected recovery can recover the target object to the current state or to a point in time. If
an error occurs during the recovery (such as a missing image copy), it is revealed during the
redirected recovery process. If the elapsed time of the redirected recovery does not meet
your RTO, certain actions can be taken to improve recovery performance, such as taking
more frequent image copies to make sure that the RTO can be met if actual recovery is
needed.
9.5.1 Requirements
To use this feature, your Db2 environment must meet the following minimum requirements:
Db2 12 at FL 500 or later or Db2 13
For Db2 12, APAR PH27043 for table spaces and APAR PH32566 for index spaces
Measurement environment
The performance evaluation was conducted on IBM z14 hardware with four processors, no
zIIPs, and running z/OS 2.3 and a Db2 12 non-data-sharing system.
Buffer pool 0 (BP0) is used for the Db2 catalog objects. Both the source and target table
spaces and index spaces are assigned to the 4 K buffer pools with VPSIZE 1,000,000. The
Db2 log buffer (OUTBUF) is defined as 30,000 buffers.
After the image copy is taken for the source object, 10 million records are updated in the
source table.
The scenarios that were used to evaluate redirected recovery and their results are presented
in two parts. Information about redirected recovery of the target table space is shown first,
followed by information about redirected recovery of indexes.
Scenario 2: Recover the table space to a point-in-time by using the TORBA recovery option.
– Run RECOVER TABLESPACE TORBA on the source table space by using the Db2 12 base
code level.
– Run RECOVER TABLESPACE TORBA on the source table space by using a code level that
supports redirected recovery.
– Run RECOVER TABLESPACE TORBA from the source table space to the redirected target
table space by using a code level that supports redirected recovery.
Figure 9-10 on page 295 shows that no performance degradation is observed when the
results of recovering the source object by using RECOVER TORBA with the Db2 base code
level are compared against the results of recovering the source object by using a code
level that supports redirected recovery.
Figure 9-10 Performance of RECOVER TABLESPACE TORBA on a source versus redirected target
Scenario 3: Recover a table space list of 100 partitions to the current state.
– Run a RECOVER TABLESPACE list of 100 partitions on the source table space by using the
Db2 12 base code level.
– Run a RECOVER TABLESPACE list of 100 partitions on the source table space by using a
code level that supports redirected recovery.
– Run a RECOVER TABLESPACE list of 100 partitions from the source table space to the
redirected target table space by using a code level that supports redirected recovery.
Figure 9-11 RECOVER TABLESPACE list of 100 partitions on a source versus redirected target
Scenario 4: Recover a table space list of 100 partitions by using the TOLOGPOINT recovery
option.
– Run a RECOVER TABLESPACE list of 100 partitions with TOLOGPOINT on the source table
space by using the Db2 12 base code level.
– Run a RECOVER TABLESPACE list of 100 partitions with TOLOGPOINT on the source table
space by using a code level that supports redirected recovery.
– Run a RECOVER TABLESPACE list of 100 partitions with TOLOGPOINT from the source table
space to the redirected target table space by using a code level that supports
redirected recovery.
As shown in Figure 9-12 on page 297, no performance degradation is observed when the
results of running RECOVER TABLESPACE with TOLOGPOINT on a list of 100 partitions by using
the base code library are compared against the results of using a code level that supports
redirected recovery.
Comparing the results of running RECOVER TABLESPACE with TOLOGPOINT on the source table
space list of 100 partitions (at base code level) against the results of running a redirected
recovery with TOLOGPOINT on the target table space list of 100 partitions (by using a code
level that supports redirected recovery) shows an overall similar CPU time and elapsed
time.
Figure 9-13 Performance of RECOVER NPI index on a source versus a redirected target
Figure 9-14 Performance of RECOVER NPI index TORBA on a source versus a redirected target
Figure 9-16 Performance of RECOVER PI index TORBA on a source versus a redirected target
Summary of results
The elapsed and CPU time when using RECOVER from the source to the target table space or
index space are similar to the elapsed time and CPU time of recovery on the source object
when the object is recovered to the current state. However, you might experience some
increase in CPU time and elapsed time when recovering the source object to an earlier
point-in-time when there are many log records to undo. These increases are because
redirected point-in-time recovery on the target object does not write compensation log
records during the LOGUNDO phase when backing out the uncommitted work, but recovery
to the source object does.
A new utility phase, TRANSLAT, is added to perform conversion of the OBIDs in the pages of
the target objects from the source to the target. The cost of the TRANSLAT phase is small.
9.5.3 Conclusion
Redirected recovery provides Db2 system programmers and database administrators a
useful method to test recoveries on the production system without affecting production
objects and applications.
This section describes a method for moving the tables from a multi-table table space to
separate PBG table spaces, with minimal impact and without having to regrant authorization
privileges and re-create dependent objects.
To simplify the process of migrating multi-table or segmented table spaces, the ALTER
TABLESPACE statement is enhanced with a MOVE TABLE option. The MOVE TABLE option enables
you to specify the name of the table that is to be moved and the name of the target PBG table
space.
To leverage this process, the target PBG table space must be created with DEFINE NO and
MAXPARTITIONS 1. When a table is moved this way, all existing authorization privileges, object
dependencies, referential relationships, and associated objects are retained and unaffected.
If the data set of the source table space exists (is defined), the alter is run as a pending
definition change that is materialized by a subsequent online REORG TABLESPACE with either
SHRLEVEL REFERENCE or CHANGE. Otherwise, the alter is an immediate change. Dependent
packages are invalidated only for the moved tables. If not all tables are moved, packages that
depend on the tables that remain in the (original) source table space are not invalidated.
9.6.1 Requirements
To use this feature, your Db2 environment must meet the following minimum requirements:
Db2 12 at FL 508 or later or Db2 13
For Db2 12, APPLCOMPAT level 508 or later
For Db2 12, APAR PH29392
Measurement environment
The performance evaluation was conducted on IBM z14 hardware with four general
processors, no zIIPs, and running z/OS 2.3 and Db2 12.
A REORG TABLESPACE SHRLEVEL CHANGE runs (and measured) by using the following five
different sets of tables in the source segmented table space:
Ten tables in which each table has 10 million rows and no index
Twenty tables in which each table has 5 million rows and no index
Fifty tables in which each table has 2 million rows and no index
One hundred tables in which each table has 1 million rows and no index
Five hundred empty tables with no index
Results
The measurement results are shown in Figure 9-17, Figure 9-18, and Figure 9-19 on
page 302.
Figure 9-17 REORG TABLESPACE of the source table space between base and APAR
Figure 9-18 REORG TABLESPACE with pending move tables versus no pending move table
Summary of results
The following list summarizes the key points of this performance assessment:
No performance regression is observed when the results of running REORG TABLESPACE by
using a code level that supports the MOVE TABLE option are compared to the Db2 12 base
code level.
CPU and elapsed time increase when the number of tables in pending move is 50 or more.
The increase in elapsed time is mainly class 3 time for open and close and delete,
rename, and define of shadow data sets.
The REORG switch time, which directly affects the duration of the outage during REORG
materialization, increases when the number of tables in the “move pending” status
increases.
When many tables are moved in a single REORG, each table results in more shadow data sets
being created and opened during the REORG. If many tables are being moved in a single REORG,
you might encounter the DSMAX limitation (the DSMAX subsystem parameter controls the
maximum number of data sets that can be open at one time).
9.6.4 Conclusion
UTS is the strategic table space type, and this feature provides a convenient method for
moving existing tables from multiple table non-UTS table spaces to individual PBG table
spaces with minimal impact.
REORG INDEX with SHRLEVEL NONE is not affected by this new behavior.
9.7.1 Requirements
To use this feature, your Db2 environment must meet the following minimum requirements:
Db2 12 at FL 100 or later.
APAR PH25217.
Either the NOSYSUT1 keyword must be specified on the REORG INDEX statement, or the
REORG_INDEX_NOSYSUT1 subsystem parameter must be set to YES (NO is the default setting).
If you want to use Db3 13, you must use the following function-level-related considerations:
If you are using Db2 13 at FL 100, either the NOSYSUT1 keyword must be specified on the
REORG INDEX statement, or the REORG_INDEX_NOSYSUT1 subsystem parameter must be set
to YES (NO is the default setting).
If you are using Db2 13 at FL 500 or later, REORG INDEX SHRLEVEL REFERENCE or CHANGE
enforces the NOSYSUT1 option by default, which becomes the only available behavior
starting at FL 500.
Measurement environment
The performance measurements were conducted on an IBM z14 hardware with six general
CPs, three zIIPs, and 56 GB of real storage, and running z/OS 2.3 and Db2 12.
The indexes that are evaluated include PI, DPSI, NPI, NPI defined with multiple columns, and
NPI defined with a mix of char and varchar columns. REORG INDEX statements are either using
SHRLEVEL REFERENCE or SHRLEVEL CHANGE.
Results
The performance results in Figure 9-20 show that with this enhancement the elapsed time
improves up to 86% and the CPU + zIIP time improves up to 61%. With this enhancement,
REORG INDEX is up to 99% zIIP eligible.
Figure 9-20 Performance comparison for REORG INDEX with and without NOSYSUT1
9.7.5 Conclusion
This new feature improves the elapsed time (up to 86%) and CPU time (up to 61%) of REORG
INDEX SHRLEVEL REFERENCE or CHANGE. It also improves the zIIP eligibility (up to 99%) for REORG
INDEX.
Additionally, IBM Db2 for z/OS Data Gate was recently introduced, which provides Db2 for
z/OS users with a modern approach to accelerating queries in hybrid cloud environment. As
with IDAA for z/OS, with IBM Db2 for z/OS Data Gate, you can redirect queries away from
IBM zSystems, but instead of running within an IDAA for z/OS system, IBM Db2 for z/OS Data
Gate allows those queries to run in a hybrid cloud environment. Performance results and
other information about IBM Db2 for z/OS Data Gate are provided in 10.8, “Performance
overview for IBM Db2 for z/OS Data Gate” on page 323.
For more information about the query workloads that were used in this chapter, see
Appendix C, “IBM Db2 Analytics Accelerator for z/OS workloads” on page 421.
10.1.1 Requirements
To use the IBM Integrated Synchronization feature, your Db2 environment must meet the
following minimum requirements:
Db2 12 at function level (FL) V12R1M500 or later or Db2 13
APAR PH06628 (PTF UI63356) applied
IDAA for z/OS 7.5 or later
Measurement environment
The environment in which the performance evaluation was conducted consists of the
following components:
IBM z14 hardware with six general-purpose central processors (GPCs) and two zIIPs
96 GB of memory
10 GB network to the accelerator
z/OS 2.2
Db2 12 for z/OS
IDAA 7.5
Measurement scenario
This section describes the usage scenarios that were used to evaluate the performance of
IBM Integrated Synchronization:
The test workload uses 500 partitioned tables with concurrent insert and update activity
from 250 threads. The number of rows that are inserted and updated per commit is
20 rows. The maximum Db2 transaction rates, which are driven by a distributed Java
program, are as follows:
– 76.4 K rows per second inserted.
– 76.4 K rows per second updated.
The total amount of log read activity (based on the Db2 statistics trace) for inserts and
updates:
– READS SATISFIED-OUTPUT BUFF: 311.0 million.
– READS SATISFIED-ACTIVE LOG: 2626.3 K.
– LOG RECORDS READ: 285 086 600.00.
The following test scenarios were run, measured, and analyzed:
– Capture and apply with concurrent Db2 transactions.
– Capture and apply the pre-populated log.
Results
This section describes the results of both scenarios.
Performance summary for capture and apply with concurrent Db2 transactions
Comparing IBM Integrated Synchronization to traditional CDC reveals the following general
results:
IBM Integrated Synchronization provides a reduction in maximum replication latency (see
Figure 10-1).
Chapter 10. Performance enhancements for IDAA for z/OS and IBM Db2 for z/OS Data Gate 309
IBM Integrated Synchronization log reading is counted in the Db2 DIST address space
(see Figure 10-2).
Figure 10-2 CPU time in seconds for capture and apply with concurrent Db2 transactions
The maximum zIIP eligibility in the DIST address space is 99%. (see Figure 10-2).
A concurrent influx of new changes (from concurrently running transactions) makes it
difficult to isolate the replication CPU usage for load-reading tasks.
Figure 10-3 CPU time in seconds for capture and apply with pre-populated Db2 logs
Figure 10-4 IBM Integrated Synchronization log reader CPU time in the Db2 statistics report
A multiple-node accelerator has a node cluster that consists of a head node and several
worker nodes (also known as data nodes). Each node is running multiple database partitions,
which are multiple logical nodes (MLNs). Data that is received from the Db2 for z/OS
subsystem is loaded into the accelerator back-end Db2 Warehouse database by using INSERT
statements. Before this enhancement was available, the accelerator server that is connected
to the head node’s MLNs was used for the load INSERT statements. The CPU utilization on the
head node is much higher than the worker nodes (the worker nodes use little CPU). To avoid
excessive CPU usage from load activities on the head node, the accelerator server has an
internal logic to halt load activities when the head node’s CPU utilization reaches 80%, which
limits the load throughput. This experience was observed in many internal and customers’
large table loads in the past.
With the cluster load feature, the accelerator server connects to every MLN on all the nodes
for the load INSERT statements, which means that the workload on the head node is reduced
and more balanced among the nodes in the cluster. With less work being directed to the head
node, the CPU utilization on the head node is reduced, which results in fewer occurrences of
load activities being halted and an improvement in load throughput.
Because the cluster load feature reduces the CPU usage on the head node, the accelerator
server can possibly take more parallel connections between the Db2 for z/OS subsystem, and
the accelerator to leverage the released head node CPU capacities to further improve load
throughput. In the following tests, the stored procedure Workload Manager (WLM) variable
AQT_MAXUNLOAD_IN_PARALLEL is increased from 10 to 20, and the load throughput is improved
as a result of using IDAA for z/OS 7.1.9 with cluster load compared to IDAA for z/OS 7.1.8.
Chapter 10. Performance enhancements for IDAA for z/OS and IBM Db2 for z/OS Data Gate 311
10.2.1 Requirements
This enhancement is available in IDAA for z/OS 7.1.9 or later. No specific Db2 version or FL is
required.
Measurement environment
The performance evaluation was conducted on IBM z14 hardware with six general
processors, one zIIP, 80 GB of memory, and running z/OS 2.3 and Db2 12.
The IDAA for z/OS system is a 7-node full rack Sailfish system.
Results
The results of both scenarios are documented in the following sections.
Sequential load with two different data volumes with a cluster load
The data in Figure 10-5 compares an IDAA for z/OS 7.1.8 non-cluster load that uses
AQT_MAX_UNLOAD_IN_PARALLEL=10 to an IDAA for z/OS 7.1.9 cluster load that uses
AQT_MAX_UNLOAD_IN_PARALLEL=20. Using a cluster load shows a 48% and 37% throughput
improvement when loading the respective tables.
Figure 10-5 Comparing the sequential load between a cluster load and a non-cluster load
Figure 10-6 Comparing the concurrent reload performance between a cluster load and a non-cluster
load
Figure 10-7 shows that without a cluster load, the host CPU utilization rate reaches the 80%
cap while the CPU utilization rate of worker nodes is low. With a cluster load, host and work
nodes CPU utilization is better balanced, and host node CPU utilization no longer reaches
80%, which means that load activity no longer halts and a higher load throughput is achieved.
Figure 10-7 Accelerator host and worker CPU utilization comparison with versus without a cluster load
10.2.4 Conclusion
This feature enables you to achieve a better balance among accelerator catalog nodes and
other work nodes or data nodes for load or reload workloads. As a result, catalog node CPU
utilization is not as much of a bottleneck as it was before. When a cluster load is used, the
catalog node has more capacity to take extra parallel connections from the Db2 for z/OS side
to speed up load or reload processes and improve the throughput.
Chapter 10. Performance enhancements for IDAA for z/OS and IBM Db2 for z/OS Data Gate 313
10.3 COLJOIN 1 enablement
With the continuing pursuit of improving query performance, certain Db2 Warehouse settings
are enabled by default for IDAA for z/OS usage too. One such setting is COLJOIN 1, which
improved query performance in IBM internal tests.
COLJOIN 1 decompresses and expands the result set during a hash join. By doing so, the
memory usage increases because the uncompressed data must be processed, but there is
faster processing of the join overall. Testing revealed that the increase of memory usage did
not impact the performance results.
10.3.1 Requirements
This enhancement is available in IDAA for z/OS 7.5.7 or later. No specific Db2 version or FL is
required.
Results
The results of the TPC-H workload measurements are shown in Figure 10-8 on page 315,
and the results of the Customer FD workload are shown in Figure 10-9 on page 315.
Figure 10-9 Customer FD workload results comparing COLJOIN 1 to the baseline measurements
Results summary
The TPC-H and Customer FD results show that there are some general improvements or
equivalent performance compared to the previous default settings. For the TPC-H Workload,
most queries did not see regression nor improvement. However, some long-running queries
had better access paths because COLJOIN 1 was enabled, which resulted in an improvement
in overall elapsed time.
One of the longest running queries in the TPC-H workloads saw an improvement, which led
to an improvement in elapsed time for the 5 TB data size workload. This same query was
excluded from multi-threaded workloads before this setting in previous versions because it
has such a long run time; therefore, the multi-threaded results do not show the same amount
of improvement compared to the single-threaded results.
Chapter 10. Performance enhancements for IDAA for z/OS and IBM Db2 for z/OS Data Gate 315
10.4.1 Requirements
IDAA for z/OS 7.5.6 and later is required to use this setting.
Additionally, no adverse effects were observed after changing the I/O scheme. All queries that
ran during testing performed on-par or better with direct I/O enabled.
Because of the necessity of statistics for better query runs, the collection of statistics
becomes a priority. Previously, statistics were collected by using a single-thread daemon in
the Db2 Warehouse engine. However, because this daemon is single-threaded, this process
can take nearly 2 hours for statistics to start being collected, which does not include the
additional amount of time it takes to collect the statistics. To reduce the amount of time for
statistics collection, a new method was developed in which table statistics are automatically
collected immediately after an initial load completes.
Starting in IDAA for z/OS V7.5.2, the collection process is changed to provide better plan
stability over time. By using this new method, the RUNSTATS statement still runs with a
sampling rate that is predetermined by the system. When the RUNSTATS statement with the
computed sampling rate completes, the system checks whether the (computed) sampling rate
that was used is below a minimum threshold. By default, this threshold is 30%, but it can be
changed by using the RUNSTATS_MINIMUM_SAMPLING_RATE_PERCENTAGE subsystem parameter
by an IBM Client Success agent. If the calculated sampling rate is below this threshold, a
second RUNSTATS statement runs with the minimum sampling rate set.
The status of this statistics collection can be monitored by using the IDAA for z/OS Data
Studio plug-in, in which a “warning sign” indicates that statistics are still being collected on
that table and that disappears when the process finishes.
Requirements
To leverage the original statistics collection process (without the sampling rate calculation),
you must be using IDAA for z/OS 7.5.1 or later.
To use the updated early stats collection process, you must be using IDAA for z/OS 7.5.2 or
later.
Usage considerations
Although the statistics collection begins as soon as possible, it can take a few hours before
the statistics collection is finished. Again, this process occurs after an initial table load, and
full reloads of tables can benefit from the Copy Table Statistics feature that is described in
10.5.2, “Copy Table Statistics” on page 317.
The Copy Table Statistics feature helps to alleviate this problem by saving the table’s statistics
from before the reload and copying them over after the reload completes. Therefore, queries
that are submitted shortly after the reload completes perform as well as before the full table
reload occurred.
Requirements
To use this feature, you must be using IDAA for z/OS 7.5.0 or later.
Chapter 10. Performance enhancements for IDAA for z/OS and IBM Db2 for z/OS Data Gate 317
Performance measurement
To validate the performance of the Copy Table Statistics feature, several baseline
measurements were taken first. The term “Regular Statistics” refers to the normal statistics
that are collected after a table is initially loaded, and the term “Copy Statistics” refers to the
statistics that are provided from the Copy Table Statistics feature after a reload. The test
scenario consists of the following steps:
1. With the Copy Table Statistics setting disabled, perform an initial load of the TPC-H
LINEITEM table. After the table is loaded, wait for statistics to be collected.
2. Take a baseline measurement of the query workload’s elapsed time in which all queries
are using the LINEITEM table (Perf 1 – Regular Statistics).
3. The LINEITEM table is then reloaded, and the query workload runs without waiting for
statistics to be collected and without Copy Table Statistics (Perf 2 – Reloaded
performance, No Copy or Regular Statistics).
4. Remove the LINEITEM table, then do another initial load of the LINEITEM table with Copy
Table Statistics enabled and run the query workload again (Perf 3 – Second Regular
Statistics).
5. The LINEITEM table is reloaded with Copy Table Statistics enabled, and the query
workload is run and measured again (Perf 4 – Copy Table Statistics measurement).
6. Another measurement is conducted by using the query workload with normal statistics
and Copy Table Statistics enabled (Perf 5 – Regular Statistics after Copy Table Statistics).
During each run, the statement cache is cleared to ensure that all runs are as clean as
possible.
Results
The results of these measurements are summarized in Figure 10-10.
Figure 10-10 Detailed query performance breakdown of all steps of the performance measurement process
Usage considerations
The Copy Table Statistics feature works only when a full table reload occurs. Partial table
reloads do not receive the benefit of the Copy Table Statistics feature.
To alleviate this situation, IDAA for z/OS 7.5.4 introduced a stored procedure that is called
ACCEL_COLLECT_TABLE_STATISTICS. By using it, you can explicitly collect statistics. You can
specify multiple tables in a table set and RUNSTATS runs concurrently for each table, which can
potentially cause increased CPU and memory usage on the accelerator.
A performance study was conducted to assess the impact of this feature on the accelerator
resources and to evaluate the impact on query performance when multiple explicit RUNSTATS
run concurrently.
Requirements
To use this feature, you must be running IDAA for z/OS 7.5.4 or later. No specific version of
Db2 for z/OS is required.
Measurement environment
The performance evaluation was conducted on IBM z14 hardware with six general
processors, one zIIP, 80 GB of memory, and running z/OS 2.3 and Db2 12.
The 50 tables that are used in these tests are all AOTs with five different table definitions, data
volumes, and number of records:
Tables 1 - 10 have a data volume of 23.43 GB and contain 15,000,000 records.
Tables 11 - 20 have data volume of 26.07 GB and contain 200,000,000 records.
Tables 21 - 30 have a data volume of 115.66 GB and contain 800,000,000 records.
Chapter 10. Performance enhancements for IDAA for z/OS and IBM Db2 for z/OS Data Gate 319
Tables 31 - 40 have a data volume of 160.98 GB and contain 1,500,000,000 records.
Tables 41 - 50 have a data volume of 722.15 GB and contain 5,999,989,709 records.
Measurement results
For the first scenario, all the explicit RUNSTATS were successful.
As shown in Figure 10-11, when the number of tables in the table set increases, the elapsed
time of explicit RUNSTATS becomes longer but is still within a reasonable range. For example,
the elapsed time of the explicit RUNSTATS for table 1 is 20.4 seconds with table set=5 and
36.6 seconds with table set=50.
Figure 10-11 Elapsed time in seconds of each table’s explicit RUNSTATS with different numbers of tables in the table set
As shown in Figure 10-12 on page 321, the host CPU usage on the accelerator has spikes
during the first test scenario. When table set=50 is used, the host CPU usage reaches 70%. A
best practice is not to have too many explicit RUNSTATS running concurrently.
For the second scenario, 10 tables have explicit RUNSTATS running concurrently. The query
workload degrades less than 4% in total elapsed time compared to the same workload
without the explicit RUNSTATS running.
10.5.4 Summary
This section introduces how and when statistics are automatically triggered and collected
after a table is initially loaded or fully reloaded in an accelerator. You can also explicitly run
RUNSTATS on the accelerator if extra or specific statistics are needed.
Measurement environment
The performance evaluation was conducted on IBM z14 hardware with six general
processors; one zIIP; 80 GB of memory; and running z/OS 2.3, Db2 12 FL V12R1M506, and
Db2 13 FL V13R1M501.
Chapter 10. Performance enhancements for IDAA for z/OS and IBM Db2 for z/OS Data Gate 321
Measurement workloads
The following workloads were used during testing:
TPC-H 1 TB and 5 TB data size, single-thread workloads
TPC-H 1 TB and 5 TB data size, multi-threaded workloads
Initial LOAD Performance workloads
Customer FD workloads
Results summary
The single-thread TPC-H results indicate that Db2 13 provides an improvement of
approximately 12% for the 5 TB workloads while staying on-par for the 1 TB workloads. The
Customer FD workloads running with Db2 13 do not regress for the same workloads
compared to Db2 12.
Results for the multi-threaded TPC-H workloads on Db2 13 show that the performance was
also on-par compared to the Db2 12 results.
Initial table load elapsed time was also similar to the Db2 12 results.
Overall, Db2 13 offers equivalent performance for query workloads and initial loads, and
provides an improvement for workloads with larger tables.
Therefore, if you are considering enabling TLS, consider the potential impact on concurrent
tasks when running a TLS-enabled load.
The key synchronization technology that IBM Db2 for z/OS Data Gate uses is IBM Integrated
Synchronization, which is described in 10.1, “IBM Integrated Synchronization” on page 308.
IBM Db2 for z/OS Data Gate 2.1 includes many enhancements and features that can improve
performance.
IBM Db2 for z/OS Data Gate 2.1 and IDAA 7.5.8 share and use the same set of stored
procedures so that you can more easily maintain environments that include both products.
This synergy also ensures consolidated optimized performance on the stored procedures
side.
Chapter 10. Performance enhancements for IDAA for z/OS and IBM Db2 for z/OS Data Gate 323
Although IBM Db2 for z/OS Data Gate and IDAA share some technologies, IBM Db2 for z/OS
Data Gate has its own set of typical use cases, which are illustrated in Figure 10-13. When
you use IBM Db2 for z/OS Data Gate, data is synchronized from Db2 for z/OS data sources to
target databases on IBM Cloud Pak for Data. For more information, see Overview of IBM
Cloud Pak for Data.
Figure 10-13 IBM Db2 for z/OS Data Gate use cases
The following sections provide more details about each type of use case.
Measurement environment
The Db2 for z/OS side uses the following setup:
IBM z13 hardware with eight general processors and eight zIIPs
750 GB of memory
A 20 TB direct access storage device (DASD)
10 Gbps network
z/OS V2R2
Db2 12 for z/OS
The following IBM Db2 for z/OS Data Gate, Db2, and Db2 Warehouse instances resource
allocation on IBM Cloud Pak for Data were used for each of the use cases:
Transactional (use case 1): IBM Db2 for z/OS Data Gate instance (10 vCPU and 32 GB of
memory); Db2 instance (20 vCPU and 192 GB)
Analytical (use case 2): IBM Db2 for z/OS Data Gate instance (10 vCPU and 16 GB of
memory); Db2 Warehouse instance (20 vCPU and 128 GB of memory)
Chapter 10. Performance enhancements for IDAA for z/OS and IBM Db2 for z/OS Data Gate 325
Workload definitions
The following table and workload setup was used:
Load Modify the TPCH.LINEITEMS table definition. Change the length of
column L_COMMENT to VARCHAR (1044). Define 100 range
partitions on Db2 for z/OS. Populate the source table with a total of
1 TB of data.
Synchronization Define seven tables that contain many CHAR and VARCHAR
columns. Duplicate these seven tables to 70 tables with different
schema names. The largest table contains a total of 44 columns and
28 character columns. There are no “nullable” key columns.
Application One JDBC Type 4 driver application runs in the UNIX System Services
layer on the same z/OS LPAR on which Db2 for z/OS is running. The
JDBC application performs insert, update, and delete operations on
Db2 for z/OS tables with a specified workload pressure. Workload
pressure is defined in terms of the number of row changes per second,
and the workload pressure is adjustable. There are 50 concurrent
threads that submit insert, update, and delete operations with a batch
size of 20 rows. The Db2 for z/OS system that is used in the
measurement environment that is described above can handle up to
200 KB of row changes per second.
Results
The results for the load throughput are shown in Figure 10-14.
The synchronization latency results for different workload pressures are shown in
Figure 10-15 on page 327.
The Mix workloads in Figure 10-15 include insert, update, and delete operations.
Results summary
Testing revealed the following key results:
Load achieves a 2 TB per hour throughput rate. IBM Db2 for z/OS Data Gate uses
different technologies to optimize load throughput. For Db2, NOT LOGGED INITIALLY
(NLI) and ADAPTIVE compression are used. With NLI, any changes that are made on a
table (including insert, delete, update, or create index operations) in the same unit of work
that creates the table are not logged. ADAPTIVE compression, which incorporates classic
row compression and also works on a page-by-page basis to further compress data, offers
the most opportunities for storage savings. Both technologies reduce transaction
log-writing bottlenecks. For Db2 Warehouse, reduced logging and the usage of
column-stored tables also help to avoid I/O bottlenecks.
Network bandwidth is a key factor for load performance. The test is performed by using a
10 Gbps network. You can expect better results with higher network bandwidth.
The synchronization test results show that IBM Db2 for z/OS Data Gate can maintain a
seconds-level latency, even with a 200 KB row changes per second pressure. The record
length and compression ratio are two key factors for synchronization throughput,
especially for the Db2 target. The test ran with rows with a maximum record length of
approximately 330 bytes. If your table record length is longer, you might see lower
synchronization throughput.
The underlying storage affects performance. Unlike with IDAA, you can choose the
storage type for the Db2 and Db2 Warehouse instance on IBM Cloud Pak for Data. Using
HostPath is a best practice for best performance. You can choose other storage types,
such as Red Hat OpenShift Data Foundation (ODF) or IBM Spectrum® Scale, if your
workload pressure is not high.
The test results were generated from an IBM Cloud Pak for Data x86-64 platform. Tests
were also run with Linux on IBM Z. The Linux on IBM Z results are on-par with the x86-64
platform but use less CPU resources.
Chapter 10. Performance enhancements for IDAA for z/OS and IBM Db2 for z/OS Data Gate 327
10.8.4 Performance measurement: Query
To conduct the query performance measurements, the following setup is used for use case 3
(Query acceleration).
Measurement environment
A comparable test environment with IDAA 7.5.8 was used for the query performance testing.
The IDAA performance test was run by using one 7-node full rack IAS, with 168 physical
cores, 3.5 TB of memory, and SAN storage with a 40 Gb FC network.
Workload definitions
Query tests use the same workloads as the IDAA for z/OS 7.5.8 performance tests.
Results
When comparable resources are allocated for the Db2 Warehouse MPP instance, IBM Db2
for z/OS Data Gate query performance is on-par with IDAA for z/OS 7.5.8, as shown in
Figure 10-16.
Figure 10-16 TPCH 1 TB query workload ET: IDAA for z/OS versus IBM Db2 for z/OS Data Gate
Chapter 10. Performance enhancements for IDAA for z/OS and IBM Db2 for z/OS Data Gate 329
330 IBM Db2 13 for z/OS Performance Topics
11
Furthermore, some performance-related subsystem parameters were modified along with the
various performance-related enhancements that were introduced in Db2 13.
When you migrate to Db2 12 from Db2 11, the single migration job DSNTIJTC (CATMAINT)
performs all the changes that are necessary to tailor the Db2 11 catalog and directory
objects. After you run DSNTIJTC, the Db2 catalog reaches the catalog level of V12R1M500.
Catalog and directory objects at the V12R1M500 level can be used by Db2 12 (regardless of
the FL) and by Db2 11 with the fall-back SPE applied. As a result, if the fall-back SPE is in
place, falling back to Db2 11 or coexisting with Db2 11 members in a data-sharing
configuration is possible after the catalog changes to the Db2 12 format.
In Db2 12, you do not run a job to enable a new function. Instead, you use the new Db2
ACTIVATE command to activate a new function. Because only one step is needed to update
the catalog during migration, catalog availability is improved. As a result, applications that
access the catalog can expect less interruption during the migration.
Although there are no changes to the installation process in Db2 13 when installing a new
Db2 subsystem, the migration process from Db2 12 to Db2 13 changed to provide
performance improvements and a better user experience by eliminating catalog and directory
updates during the migration. Eliminating catalog and directory updates was achieved by
requiring Db2 12 to be updated and activated at FL V12R1M510 (FL 510) before migrating to
Db2 13. When you activate FL 510 in Db2 12, Db2 validates the necessary catalog level and
structure updates. Db2 13 still uses the initial release migration job DSNTIJTC (CATMAINT), but
in Db2 13 CATMAINT no longer changes the structure of the Db2 catalog. It does not create any
tables or indexes, and it does not add columns to existing catalog tables. This initial CATMAINT
process sets internal information to indicate that the catalog level is now V13R1M100 and
that the FL is V13R1M100 (FL 100). Unlike in Db2 12, CATMAINT functions as a switch to
indicate migration completion without modifying any catalog or directory objects, which speed
up the migration process and improves subsystem availability.
After migrating a Db2 subsystem to Db2 13 FL 100 and running it with an expected level of
stability for a period, you can activate FL V13R1M500 (FL 500). Activating FL 500 also does
not result in any catalog changes. However, after you activate FL 500, falling back to Db2 12
or release coexistence in a data-sharing environment is no longer possible. Release
coexistence between Db2 12 and Db2 13 is available only while FL 100 is the highest
activated FL in Db2 13.
11.1.1 Requirements
Db2 13 requires the following hardware and software:
IBM zEC12 or later
z/OS 2.4 or later
Db2 12 with FL 510 and with fall-back SPE (PH37108) applied
Measurement environment
The migration performance measurements were conducted by using the following
environment:
z15
z/OS 2.5
Four general-purpose central processors (GPCs)
Db2 one-way data sharing
A Db2 catalog size of 343 MB in Db2 10
Results
Elapsed time and CPU time of the migration jobs are provided for each version migration of a
Db2 subsystem migrating from Db2 10 to Db2 13:
Db2 10 NFM migration to Db2 11 NFM
Db2 11 NFM migration to Db2 12 FL 500
Db2 12 FL 500 mid-release migration to Db2 12 FL 510
Db2 12 FL 510 migration to Db2 13 FL 500
Db2 13 FL 500 migration to Db2 13 FL 501
Table 11-1 Elapsed and CPU time of the migration steps from Db2 10 NFM to Db2 11 NFM
Migration steps Elapsed time (seconds) CPU time (seconds)
The results show that ENFM uses the most CPU and takes longer to complete, which is a
result of the online REORG of the table spaces.
Table 11-2 Elapsed and CPU time of the migration from Db2 11 NFM to Db2 12 FL 500
Migration steps Elapsed time (seconds) CPU time (seconds)
Table 11-3 Elapsed and CPU time of the migration from Db2 12 FL 500 to Db2 12 FL 510
Migration steps Elapsed time (seconds) CPU time (seconds)
Mid-release migration from Db2 V12R1M500 to Db2 12 FL 510 takes 3.244 seconds to
complete and consumes 0.59 CPU seconds. During CATMAINT UPDATE from FL 500 to FL 510,
catalog changes are applied for FL 502, FL 503, FL 505, FL 507, and FL 509.
Table 11-4 Elapsed and CPU time of the migration from Db2 12 FL 510 to Db2 13 FL 500
Migration steps Elapsed time (seconds) CPU time (seconds)
Migration from Db2 12 FL 510 to Db2 13 FL 100 completes quickly with hardly any CPU
consumption compared to the migration from Db2 11 NFM to Db2 12 FL 500 because no
catalog changes are made during this migration phase. Unlike Db2 12, the DSNTIJTC job
performs a CATMAINT UPDATE to FL 100 in Db2 13 and not to FL 500.
Table 11-5 Elapsed and CPU time of the migration from Db2 13 FL 500 to Db2 13 FL 501
Migration steps Elapsed time (seconds) CPU time (seconds)
A mid-release migration from Db2 V13R1M500 to Db2 V13R13M501 takes 3.175 seconds to
complete and consumes 129 milliseconds of CPU usage. This phase involves changes to the
Db2 catalog.
Summary of results
The evaluation and comparison were done by using a small Db2 subsystem to illustrate the
relative performance difference between the migration of different Db2 versions. The
migration of a large Db2 subsystem with several databases and packages takes more
elapsed time and CPU time (for more information, see IBM Db2 12 for z/OS Performance
Topics, SG24-8404.
Comparing the different migrations from Db2 10 NFM to Db2 13 FL 500 shows that the overall
elapsed time and CPU usage improved for migrations to Db2 12 and Db2 13 when compared
to the previous version.
Migration from Db2 11 NFM to Db2 V12R1M500 completes 79% faster and uses 83% less
CPU when compared to migration from Db2 10 NFM to Db2 11 NFM.
Migration from Db2 V12R1M510 to Db2 V13R1M500 completes 61% faster and uses 92%
less CPU when compared to migration from Db2 11 NFM to Db2 V12R1M500.
You can query the SYSLEVELUPDATES catalog table to see the history of catalog changes and
FL activations.
Before you migrate from Db2 12 to Db2 13, you must complete the following steps:
1. Db2 12 must be activated at FL 510.
2. Before Db2 12 can be successfully activated at FL 510, packages that were bound earlier
than Db2 11 and were active in the last 18 months must be rebound.
Note: The rebinding of SQL PL stored procedure rebinds only the SQL statements and not
the control statements. Therefore, regenerate the SQL PL stored procedure by using the
ALTER PROCEDURE... REGENERATE command. Run this command for SQL PL procedures
that are generated before Db2 11 to avoid auto bind in Db2 13. For more information, see
Rebind old plans and packages in Db2 12 to avoid disruptive autobinds in Db2 13.
11.1.5 Conclusion
As demonstrated by the measurements, the Db2 migration process has made great strides in
improving both the user experience and performance since Db2 11. The Db2 13 migration
took another leap in improving the performance of the migration process by eliminating
changes to the catalog structure during the CATMAINT migration step.
UTILITY_HISTORY
The UTILITY_HISTORY subsystem parameter specifies whether utility history information is
collected. This subsystem parameter is ignored in FL V13R1M500 and earlier.
For more information about performance measurement data with the new UTILITY_HISTORY
subsystem parameter, see Chapter 9, “IBM Db2 for z/OS utilities” on page 279.
SPREG_LOCK_TIMEOUT_MAX
The SPREG_LOCK_TIMEOUT_MAX subsystem parameter controls the maximum value that can be
specified for the SET CURRENT LOCK TIMEOUT statement, and whether -1 can be specified for
the special register to indicate that there is no limit for how long an application can wait for a
lock. If -1 is specified, any valid value for the SET CURRENT LOCK TIMEOUT statement can be
specified.
DSMAX
The DSMAX subsystem parameter specifies the maximum number data sets that can be open
at one time by a Db2 subsystem. The highest possible value of this parameter is increased
from 200000 to 400000.
For more information about the performance tests that were conducted to measure the effect
of changes to DSMAX, see Chapter 2, “Scalability and availability” on page 13.
FTB_NON_UNIQUE_INDEX
The FTB_NON_UNIQUE_INDEX subsystem parameter specifies whether fast index traversal (fast
traversal blocks (FTB)) is enabled for non-unique indexes.
The default value of FTB_NON_UNIQUE_INDEX is changed from NO to YES in Db2 13, which
means that non-unique indexes are automatically eligible for FTB processing if they meet
other eligibility criteria, such as the maximum key length.
You might experience an increase in internal resource lock manager (IRLM) CPU time for
data-sharing systems because more indexes are likely to be using FTB after this subsystem
parameter change enables non-unique indexes for FTB processing. However, this increase
should be offset by a decrease in transaction class 2 CPU time from using FTB processing for
non-unique indexes. For more information about the actual measurement numbers, see
Chapter 2, “Scalability and availability” on page 13.
PAGESET_PAGENUM
The PAGESET_PAGENUM subsystem parameter specifies whether partition-by-range (PBR) table
spaces and associated partitioned indexes (PIs) are created to use absolute page numbering
(APN) across partitions or relative page numbering (RPN).
The default setting of PAGESET_PAGENUM is changed from ABSOLUTE to RELATIVE in Db2 13. As a
best practice, all members of a data-sharing group should use the same value.
For more information about the performance measurements of converting from PBG to PBR,
see Chapter 9, “IBM Db2 for z/OS utilities” on page 279.
STATIME_MAIN
The STATIME_MAIN subsystem parameter specifies the time interval, in seconds, for the
collection of those interval-driven statistics that are not collected at the interval that is
specified by the STATIME subsystem parameter.
In Db2 13, the default setting of STATIME_MAIN is changed from 60 seconds to 10 seconds. As
a best practice, all members of a data-sharing group should use the same STATIME_MAIN
value.
The Db2 12 default STATIME_MAIN statistics interval setting of 60 seconds makes it difficult for
database administrators and Db2 system programmers to identify true workload peaks by
using Db2 statistics for subsystem level performance tuning and planning. Similarly, when
slowdowns in the range of 5 - 15 seconds occur, diagnosing the problem becomes difficult by
using 60-second-interval data. Using a more granular statistics collection default of
10 seconds helps to speed up performance diagnosis.
OUTBUFF
The OUTBUFF subsystem parameter specifies the size of the output log buffer, in kilobytes, for
writing active log data sets.
The current default of 4000 does not reflect current best practices. The default was changed to
102400 (equivalent to 100 MB). Having enough OUTBUFF is critical for update (I/U/D)
performance.
If you are using an OUTBUFF size that is greater than 100 MB, you do not need to consider
updating your definition.
SRTPOOL
The SRTPOOL subsystem parameter specifies the maximum amount of storage, in kilobytes,
that is allowed for the Relational Data System (RDS) sort pool (for an individual sort).
The default value was changed to from 10000 to 20000 to reflect best practices to obtain CPU
reduction by reducing the number of sort runs. If you are using an SRTPOOL size that is more
than 20000, you do not need to consider updating your definition.
For more information about the performance test results that are related to the SRTPOOL
subsystem parameter change, see Chapter 3, “Synergy with the IBM Z platform” on page 43.
REORG_INDEX_NOSYSUT1
The REORG_INDEX_NOSYSUT1 subsystem parameter specifies whether REORG INDEX SHRLEVEL
REFERENCE or CHANGE uses the NOSYSUT1 behavior.
MAXCONQN
If the number of active in-use database access threads (DBATs) reaches the MAXDBAT
threshold, the MAXCONQN subsystem parameter specifies the maximum number of inactive or
new connection requests that can be queued waiting for a DBAT to process the request.
The default value was changed from OFF to ON to reflect the best practices for improving the
application availability.
MAXCONQW
The MAXCONQW subsystem parameter specifies the maximum length of time that a client
connection waits for a DBAT to process the next unit-of-work or a new connection request.
The default value was changed from OFF to ON to reflect the best practices for improving the
application availability.
EDM_SKELETON_POOL
The EDM_SKELETON_POOL subsystem parameter determines the maximum size, in kilobytes,
that is used for the environmental descriptor manager (EDM) skeleton pool. This storage is
above the 2 GB bar.
The default value was changed from 51200 to 81920 to improve the performance of accessing
Db2 plans and packages. If you are using an EDM_SKELETON_POOL size that is more than 81920,
you do not need to consider updating your definition.
EDMDBDC
The EDMDBDC subsystem parameter determines the maximum amount of EDM storage, in
kilobytes, that is used for database descriptors (DBDs). This storage pool is above the 2 GB
bar.
The default value was changed from 23400 to 40960 to improve the performance of accessing
the DBDs. If you are using an EDMDBDC size that is more than 40960, you do not need to
consider updating your definition.
MAXSORT_IN_MEMORY
The MAXSORT_IN_MEMORY subsystem parameter specifies the maximum storage allocation, in
kilobytes, for a query that contains an ORDER BY clause, a GROUP BY clause, or both. The
storage is allocated only during the processing of the query. Increasing the value in this field
can improve performance of such queries but might require a larger amount of real storage
when several such queries run simultaneously.
The default value was changed from 1000 to 2000 to expand the opportunity of using
in-memory sort to reduce the CPU time and work-file demand from the sort-intensive SQL
statements. If you are using a MAXSORT_IN_MEMORY size that is more than 2000, you do not
need to consider updating your definition.
NUMLKTS
The NUMLKTS subsystem parameter specifies the default maximum number of page, row, large
object (LOB), or XML locks that an application can hold simultaneously in a table or table
space. If a single application exceeds the maximum number of locks in a single table or table
space, lock escalation occurs.
NUMLKUS
The NUMLKUS subsystem parameter specifies the maximum number of page, row, LOB, or
XML locks that a single application can hold concurrently for all table spaces.
The default value was changed from 10000 to 20000 to reduce the application impact. If you
are using a NUMLKUS value that is more than 20000, you do not need to consider updating your
definition.
IRLMRWT
The IRLMRWT subsystem parameter controls the number of seconds that are to elapse before
a resource timeout is detected.
In previous Db2 versions, the following IRLM command could be used to dynamically change
the IRLM timeout value:
F irlmproc,SET,TIMEOUT=xxx,subsystem-name
However, this approach was not ideal, and it is not recommended because using the
command does not update the value of the IRLMRWT subsystem parameter.
STATPGSAMP
The STATPGSAMP subsystem parameter specifies whether the RUNSTATS utility, or other utilities
with inline statistics, use page-level sampling by default for universal table spaces (UTSs).
In Db2 FL V13R1M500 and later, STATPGSAMP also applies to the collection of inline statistics
when the LOAD or REORG TABLESPACE utilities run with the STATISTICS keyword.
REALSTORAGE_MANAGEMENT
Before Db2 13, the REALSTORAGE_MANAGEMENT subsystem parameter was used to specify how
Db2 should manage its real storage usage. In Db2 12, when REALSTORAGE_MANAGEMENT is set
to AUTO, Db2 pro-actively frees unused 64-bit real storage frames by issuing z/OS IARV64
DISCARDDATA requests when a thread deallocates, or every 120 commits when the thread is
reused.
Db2 13 uses enhanced automatic behavior, which avoids causing unnecessary z/OS Real
Storage Manager (RSM) serialization from DISCARDDATA requests, which are not available in
Db2 12. Therefore, the REALSTORAGE_MANAGEMENT subsystem parameter is obsolete.
AUTHCACH
Before Db2 13, the AUTHCACH subsystem parameter was used to specify the size, in bytes per
plan, of the authorization cache that is used if no CACHESIZE is specified on the BIND PLAN
subcommand.
In Db2 13, the plan authorization cache size is 4096 by default. AUTHCACH was made obsolete
for better authorization cache management. If you still want to change the size or eliminate
the cache, you can do so at the plan level by specifying the BIND/REBIND CACHESIZE
parameter.
For more information about the performance measurements with plan authorization cache,
see Chapter 2, “Scalability and availability” on page 13.
PARA_EFF
Before Db2 13, the PARA_EFF subsystem parameter was used to control the efficiency that
Db2 assumes for parallelism during access path selection. For best results, the default value
of 50 was recommended.
In Db2 12, the value that you specified for this parameter is acknowledged, but its use was
discouraged. Support for this subsystem parameter was removed in Db2 13.
For more information about this new IFCID 396 trace record and how to use it, along with
information about three new catalog fields, see Chapter 7, “Application concurrency” on
page 241.
Note: It is a best practice to collect and externalize only one of the DDF statistics trace
records (IFCID 365, 411, or 412). You can start all three, but doing so increases the
number of trace records that are written to the trace destination.
QLAPAPPN CHAR(16) The name of the application that is running at the remote site.
QLAPPRID CHAR(8) The product ID of the remote location from which the remote application
connects.
QLAPCOMR BIGINT The number of commit requests that are received from the requester
(single-phase commit protocol), and the number of committed requests that are
received from the coordinator (two-phase commit protocol).
QLAPABRR BIGINT The number of abort requests that are received from the requester (single-phase
commit protocol), and the number of back-out requests that are received from the
coordinator (two-phase commit protocol).
QLAPNREST BIGINT The number of times that the application reported a connection or application
condition from a REST service request.
QLAPNSSR BIGINT The number of times that the application reported a connection or application
condition from setting a special register through a profile.
QLAPNSGV BIGINT The number of times that the application reported a connection or application
condition from setting a global variable through a profile.
QLAPHCRSR BIGINT The number of times that the application used a cursor that was defined as WITH
HOLD and was not closed. That condition prevented Db2 from pooling database
access threads (DBATs).
QLAPDGTT BIGINT The number of times that the application did not drop a declared temporary table.
That condition prevented Db2 from pooling DBATs.
QLAPKPDYN BIGINT The number of times that the application used a KEEPDYNAMIC package. That
condition prevented Db2 from pooling DBATs.
QLAPHIPRF BIGINT The number of times that the application used a high-performance DBAT. That
condition prevented Db2 from pooling DBATs.
QLAPHLOBLOC BIGINT The number of times that the application had a held large object (LOB) locator.
That condition prevented Db2 from pooling DBATs.
QLAPSPCMT BIGINT The number of times that a COMMIT was issued in a stored procedure. That
condition prevented Db2 from pooling DBATs.
QLAPNTHDPQ BIGINT The number of times that a thread that was used by a connection from the
application was queued because a profile exception threshold was exceeded.
QLAPNTHDPT BIGINT The number of times that a thread that was used by a connection from the
application was terminated because a profile exception threshold was exceeded.
QLAPNTHDA BIGINT The number of times that a thread that was used by a connection from the
application abended.
QLAPNTHDC BIGINT The number of times that a thread that was used by a connection from the
application was canceled.
QLAPNTHD INTEGER The current number of active threads for the application.
QLAPHTHD INTEGER For a statistics trace, the highest number of active threads during the current
statistics interval; for a READS request, the highest number of active threads since
DDF was started.
QLAPTHDTM BIGINT The number of threads that were queued because the MAXDBAT subsystem
parameter value was exceeded.
QLAPTHDTI BIGINT The number of threads that were terminated because the IDTHTOIN subsystem
parameter value was exceeded.
QLAPTHDTC BIGINT The number of threads that were terminated because the CANCEL THREAD
command was issued.
QLAPTHDTR BIGINT The number of threads that were terminated because a profile exception
condition for idle threads was exceeded.
QLAPTHDTK BIGINT The number of threads that were terminated because the threads were running
under KEEPDYNAMIC refresh rules, and the idle time exceeded the
KEEPDYNAMICREFRESH idle time limit (20 minutes).
QLAPTHDTF BIGINT The number of threads that were terminated because the threads were running
under KEEPDYNAMIC refresh rules, and the time that the threads were in use
exceeded the KEEPDYNAMICREFRESH in-use time limit.
QLAPTHDTN BIGINT The number of threads that were terminated due to network termination.
If the trace identifies more than 50 applications to be reported, Db2 generates multiple IFCID
411 records. For Instrumentation Facility Interface (IFI) READS requests, only those IFCID 411
records that fit into the buffer are returned. The interval at which the IFCID 411 statistics trace
records are produced is controlled by the STATIME_MAIN subsystem parameter.
These counters are maintained for a server site only at the client application level. When
IFCID 411 is written as part of a statistics trace, the high-water mark (HWM) is the maximum
value that is observed since the last statistics trace interval. When IFCID 411 is created for a
READS request, the HWM is the maximum value that is observed since DDF was started.
IFCID 411 information is also available in Db2 12 for z/OS after applying the PTF for
PH40244.
12.2.2 New IFCID 412 for recording DDF client user ID statistics
IFCID 412 was introduced to record detailed statistics about the client user IDs that are
involved in communicating with Db2 subsystems by using the DRDA protocol. IFCID 412 can
be started by using statistics trace class 11.
QLAUUSRI CHAR(16) The name of the client user ID under which the connection from the remote
application to the local site is established.
QLAUPRID CHAR(8) The product ID of the remote application from which the application connects.
QLAUCOMR BIGINT The number of commit requests that were received from the requester
(single-phase commit protocol), and the number of committed requests that were
received from the coordinator (two-phase commit protocol).
QLAUABRR BIGINT The number of abort requests that were received from the requester
(single-phase commit protocol), and the number of back-out requests that were
received from the coordinator (two-phase commit protocol).
QLAUNREST BIGINT The number of times that an application that is run by the specified client user ID
reported a connection or application condition from a REST service request.
QLAUNSSR BIGINT The number of times that an application that is run by the specified client user ID
reported a connection or application condition from setting a special register
through a profile.
QLAUNSGV BIGINT The number of times that an application that is run by the specified client user ID
reported a connection or application condition from setting a global variable
through a profile.
QLAUHCRSR BIGINT The number of times that an application that is run by the specified client user ID
used a cursor that was defined as WITH HOLD and was not closed. That condition
prevented Db2 from pooling DBATs.
QLAUDGTT BIGINT The number of times that an application that is run by the specified client user ID
did not drop a declared temporary table. That condition prevented Db2 from
pooling DBATs.
QLAUKPDYN BIGINT The number of times that an application that is run by the specified client user ID
used KEEPDYNAMIC packages. That condition prevented Db2 from pooling DBATs.
QLAUHIPRF BIGINT The number of times that an application that is run by the specified client user ID
used a high-performance DBAT. That condition prevented Db2 from pooling
DBATs.
QLAUHLOBLOC BIGINT The number of times that an application that is run by the specified client user ID
used a held LOB locator. That condition prevented Db2 from pooling DBATs.
QLAUSPCMT BIGINT The number of times that a COMMIT was issued in a stored procedure that was
called by the specified client user ID. That condition prevented Db2 from pooling
DBATs.
QLAUNTHDPQ BIGINT The number of times that a thread that was used by a connection from the
application that is run by the specified client user ID was queued because a profile
exception threshold was exceeded.
QLAUNTHDPT BIGINT The number of times that a thread that was used by a connection from the
application that is run by the specified client user ID terminated because a profile
exception threshold was exceeded.
QLAUNTHDA BIGINT The number of times that a thread that was used by a connection from an
application that is run by the specified client user ID abended.
QLAUNTHDC BIGINT The number of times that a thread that was used by a connection from an
application that is run by the specified client user ID was canceled.
QLAUNTHD INTEGER The current number of active threads for the application that is run by the specified
client user ID.
QLAUHTHD INTEGER For a statistics trace, the highest number of active threads during the current
statistics interval. For a READS request, the highest number of active threads since
DDF was started.
QLAUTHDTM BIGINT The number of threads that are associated with the specified client user ID that
was queued because the MAXDBAT subsystem parameter value was exceeded.
QLAUTHDTI BIGINT The number of threads that are associated with the specified client user ID that
was terminated because the IDTHTOIN subsystem parameter value was
exceeded.
QLAUTHDTC BIGINT The number of threads that are associated with the specified client user ID that
was terminated because the CANCEL THREAD command was issued.
QLAUTHDTR BIGINT The number of threads that are associated with the specified client user ID that
was terminated because a profile exception condition for idle threads was
exceeded.
QLAUTHDTK BIGINT The number of threads that are associated with the specified client user ID that
were terminated because the threads were running under KEEPDYNAMIC refresh
rules, and the idle time exceeded the KEEPDYNAMICREFRESH idle time limit
(20 minutes).
QLAUTHDTF BIGINT The number of threads that are associated with the specified client user ID that
were terminated because the threads were running under KEEPDYNAMIC refresh
rules, and the time that they were in use exceeded the KEEPDYNAMICREFRESH in-use
time limit.
QLAUTHDTN BIGINT The number of threads that are associated with the specified client user ID that
were terminated due to network termination.
The information in IFCID 412 can help identify users who are using resources inefficiently in
the following types of situations:
A user is not releasing a connection in a timely manner.
A user is causing unexpected thread termination.
A user is monitored by a profile, but thresholds must be adjusted (they are too high or too
low).
A user is monopolizing resources (such as opening multiple threads) and should be
monitored.
If the trace identifies more than 50 client user IDs to be reported, Db2 generates multiple
IFCID 412 records. For IFI READS requests, only those IFCID 412 records that fit into the
buffer are returned. The interval at which the IFCID 412 statistics trace records are produced
is controlled by the STATIME_MAIN subsystem parameter.
These counters are maintained only for server sites and only at the client user ID level. When
IFCID 412 is written as part of a statistics trace, the HWM is the maximum value that is
observed since the last statistics trace interval. When IFCID 412 is created for a READS
request, the HWM is the maximum value that was observed since DDF was started.
IFCID 412 information also is available in Db2 12 for z/OS after applying the PTF for
PH40244.
12.2.3 New fields in IFCID 365 for recording DDF location statistics
IFCID 365 records provide detailed location statistics about the remote locations that a Db2
subsystem communicates with by using the DRDA protocol. IFCID 365 can be started by
using statistics trace class 7.
Example 12-1 shows the new fields that were added to IFCID 365 since Db2 12 became
generally available. They provide information about DBATs from remote locations.
Note: QLSTNTPLH and QLSTNTILS are available in Db2 13 only, but the other fields also
are available in Db2 12.
As with IFCID 411 and 412, if the trace identifies more than 50 locations to be reported, Db2
generates multiple IFCID 365 records. For IFI READS requests, only those IFCID 365
records that fit into the buffer are returned. The interval at which the IFCID 365 statistics trace
records are produced is controlled by the STATIME subsystem parameter.
Because these new fields are part of the QLST data section, and the DSNDQLST section
also is included in IFCID 1, these extra fields also are available in the standard Db2 IFCID 1
statistics record.
QDSTNAKD INTEGER The current number of DBATs that are active due to the usage of packages that
are bound with KEEPDYNAMIC(YES).
QDSTMAKD INTEGER The maximum number of DBATs that are active due to the usage of packages that
were bound with KEEPDYNAMIC(YES) since DDF was started.
QDSTNDBT INTEGER The number of DBATs that terminated since DDF was started.
QDSTNTPL INTEGER The number of DBATs that terminated after remaining in pool longer than
POOLINAC since DDF was started.
QDSTNTRU INTEGER The number of DBATs that terminated after being reused more times than the limit
since DDF was started.
QDSTDBPQ INTEGER The current number of DBATs that were suspended due to a profile exception.
QDSTMDPQ INTEGER The maximum number of DBATs that were suspended due to a profile exception
since DDF was started.
Note: The changes that include the two new fields QDSTDBPQ and QDSTMDPQ, which
were introduced by APAR PH47626, become available when FL V13R1M500 or later is
activated in Db2 13. Also, the existing QDSTNDBA field was updated in Db2 13.
QDSTNDBA represents the number of times that a DBAT was created. This value did not
include DBATs that were created to replace disconnected (pooled) DBATs that terminated
because they reached their reuse limit. The DBAT creation statistics counter now always
counts the number of DBATs that are created.
In Db2 AI for z/OS 1.4 (Db2ZAI 1.4), the Distributed Connection Control (DCC) component
recommends profiles (MONITOR THREADS/CONNECTIONS) based on IP address (requester) by
using the information that is collected in IFCID 365. With the more granular client information
that is provided by IFCID 411 or 412, Db2ZAI 1.5 DCC can recommend profiles (MONITOR
THREADS) based on extra filtering categories (CLIENT_APPLNAME and CLIENT_USERID), as shown
in Figure 12-1, Figure 12-2, and Figure 12-3 on page 352.
Figure 12-1 Db2ZAI new DCC settings for client application name
12.3 IFCID 003: Recording the longest wait time for certain
suspension types
To maintain data consistency and concurrency control, Db2 uses locks and latches to
serialize access to resources. However, there are cases when the resources are held for too
long, which can cause performance issues that must be addressed. Before Db2 13, to find
these long latch and lock suspensions, extra and often high-volume traces are required.
These extra traces sometimes result in thousands of records, which make it difficult to
analyze the information.
To narrow down the searches, it is important to know when the longest lock or latch
suspensions occurred and for how long. Therefore, Db2 13 added an IFCID 3 section that
includes the longest suspension time for a select number of suspensions that occurred during
the life of the thread. Besides information about the longest lock or latch wait, Db2 also
records the longest service task wait information, and the longest page latch wait information.
The longest wait time for each of these suspensions recorded, and information about the
resource that experienced the contention also is reported.
A new section, QWA01REO, was added to IFCID 3 and is mapped by DSNDQLLL to provide
this extra information. This control block contains information about the longest internal
resource lock manager (IRLM) lock, Db2 latch, log, or database I/O wait; the longest service
task wait; and the longest page latch wait.
APAR PH46371adds an extra field that is called QLLLTYP, which makes it easier to identify
the type of suspension that is reported on in the first part of the QLLL section. APAR
PH46372 adds support for roll-up records to QLLL. When ACCUMAC >= 2, the roll-up records
record the largest wait times for parts 1, 2, and 3 independently. This information is recorded
each time that a roll-up accumulation occurs.
Two new metrics were added to relevant GBP statistics storage areas and are accessible
through extra fields in IFCID 230 and IFCID 254 trace records, as shown in Figure 12-4 on
page 355 and Table 12-5.
Table 12-4 New group buffer pool residency statistics fields in IFCID 230
Column name Data type Description
QBGBART CHAR(8) Data Area Residency Time: The weighted average, in microseconds, of the
elapsed time that a data area is in a GBP before the data area is reclaimed.
QBGBERT CHAR(8) Directory Entry Residency Time: The weighted average, in microseconds, of the
elapsed time that a directory entry is in a GBP before the directory entry is
reclaimed.
QW0254AR CHAR(8) Data Area Residency Time: The weighted average, in microseconds, of the
elapsed time that a data area is in a GBP before the directory entry is reclaimed.
QW0254ER CHAR(8) Directory Entry Residency Time: The weighted average, in microseconds, of the
elapsed time that a directory entry is in a GBP before the data area is reclaimed.
You also can display the same information by using the -DISPLAY GROUPBUFFERPOOL
command.
When the -DISPLAY GROUPBUFFERPOOL command includes the GDETAIL option, the output
includes statistics on the residency times for items in the GBP with a new DSNB820I message.
Example 12-3 shows a sample DSNB820I message.
For more information about the measurements that were conducted to understand the
performance effects of GBP residency time in a data-sharing environment, see Chapter 3,
“Synergy with the IBM Z platform” on page 43.
This enhancement is also available in Db2 12 for z/OS after applying the PTF for PH43916.
Two new counters were added to IFCID 2 (statistics trace record) and IFCID 3 (accounting
trace record), as shown in Example 12-4.
IFCID 106 (subsystem parameters trace record) was modified to include a new field,
QWP4LTMX, to capture the setting of the new SPREG_LOCK_TIMEOUT_MAX subsystem
parameter, as shown in Example 12-5.
A new flag, QW0172WAS, was added to IFCID 172 (deadlock trace record) to record whether the
“worth value” was set by using the global variable or another method. The new field is shown
in Example 12-6.
IFCID 196 (timeout trace record) was updated to record the timeout interval and the source of
the timeout interval setting, as shown in Example 12-7.
IFCID 437 is a new trace record that records the usage of the SET CURRENT LOCK TIMEOUT
statement, and whether it is set directly by the application or by using a system profile.
Example 12-8 shows the layout of the new trace record.
A new WQLSPTRG flag is defined in the qualification area fields of WQLS. When you turn on this
flag, the mapping of WQLSDBPS in WQLS is replaced with the new mapping of WQLSDBPP.
The WQLSDBPP mapping contains two new fields that are called WQLSPPART_LOW and
WQLSPPART_HIGH, which can be specified only once for each table space.
Table 12-6 lists and describes the qualification area fields of WQLSDBPP. This area is
mapped by the assembly language mapping macro DSNDWQAL. This DSECT maps the
area that is used to filter IFCID 306 records when WQLSTYPE is set to 'DBPP'.
For a partitioned table space, you can set WQLSPPART_LOW and WQLSPPART_HIGH to
any value of low and high available partition numbers. If WQLSPPART_LOW is ‘0000’x and
WQLSPPART_HIGH is ‘0000’x or ‘FFFF’x, the entire partitioned table space qualifies for the
IFCID 306 process.
For a non-partitioned table space, you must set both WQLSPPART_LOW and
WQLSPPART_HIGH to zero.
The elapsed and CPU times are included in regular class 2 elapsed and CPU times and
represent only those portions that are spent running the built-in scalar SQL Data Insights
functions.
For more information about SQL Data Insights, see Chapter 6, “SQL Data Insights” on
page 211.
You also can display the FTB Factor value by using the -DISPLAY
STATS(INDEXTRAVERSECOUNT) command, as shown in the DSNT830I message in the
Example 12-12.
The fields that are shown Example 12-13 are no longer used and were removed from the
QSST section of IFCID 1.
Example 12-14 New storage manager large object contraction-related IFCID 1 fields
0001 QSST_P64PCTRCT (S) NUMBER OF 64-BIT POOLS THAT WERE CONTRACTED.
0001 QSST_P64DISNO (S) NUMBER OF TIMES THAT IARV64
REQUEST=DISCARDDATA,KEEPREAL(NO) WAS ISSUED.
0001 QSST_P64DISYES (S) NUMBER OF TIMES THAT IARV64
REQUEST=DISCARDDATA,KEEPREAL(YES) WAS ISSUED.
These fields track the number of contractions of ATB storage pools and the number of actual
DISCARDDATA requests that are performed, either with KEEPREAL(NO) or KEEPREAL(YES).
For more information about the performance tests that were conducted to measure the effect
of the ATB storage contraction optimization in Db2 13, see Chapter 2, “Scalability and
availability” on page 13.
Db2 added many new latch types over the years. Several latch classes reached or are close
to reaching the maximum number of latch types, which can cause latch hierarchy violations
and result in errors and abends.
Db2 13 increased the number of latch classes from 32 to 64 to help avoid latch hierarchy
violations and provide better diagnostics.
The following IFCIDs were changed to support this latch classes expansion:
In IFCID 1, the DSNDQVLS sections were expanded to include the new latch classes from
33 to 64.
In IFCID 3, in the DSNDQLLL section, the latch class field QLLLLC was extended from
FIXED(08) to FIXED(16).
IFCID 51 extended the latch class in the Q0051LC field from FIXED(8) to FIXED(16).
IFCID 52 extended the latch class in the Q0052LC field from FIXED(8) to FIXED(16).
IFCID 56 extended the latch class in the Q0056LC field from FIXED(8) to FIXED(16).
IFCID 57 extended the latch class in the Q0057LC field from FIXED(8) to FIXED(16).
IFCID 148 extended the latch class in the Q0148LC field from FIXED(8) to FIXED(16).
The latest IFCID trace field mappings file (DSNWMSGS) now also is available online with more
frequent updates. As in earlier Db2 releases, the DSNWMSGS file also remains available in a
data set that is included with Db2 product. However, any updates to the DSNWMSGS file in this
location require APARs, and for various practical reasons, these APARs cannot be released
with the same frequency as the related enhancements in continuous delivery. So, the
DSNWMSGS file is now available in two different locations, but the online version is always more
current:
Customers who have Db2 for z/OS licenses that are associated with their IBMid can
download the latest DSNWMSGS file in th PDF format from Db2 13 for z/OS IFCID flat file
(DSNWMSGS).
Customers can use the TSO or ISPF browse function to look at the field descriptions in the
prefix.SDSNIVPD(DSNWMSGS) data set that is provided with Db2. However, this version is
updated only infrequently by APARs that consolidate changes from continuous delivery.
12.14 Summary
Db2 13 introduces some new instrumentation and serviceability capabilities to support
changes to your overall Db2 for z/OS environment, including support for SQL Data Insights,
Db2ZAI, and the ability to monitor GBPs by leveraging changes to IBM z16. Db2 13 also
expands on existing instrumentation and serviceability capabilities to keep pace with new and
enhanced Db2 features in the areas of INSERT performance, thread monitoring, locks and
latches, application timeouts and deadlocks, replication operations, plan authorization cache,
fast index traversal, and large objects (LOBs).
All these changes enable you to gain insights into the behavior of your Db2 subsystems at a
level that was not possible before.
To learn more about Db2ZAI 1.5.0, see IBM Db2 13 for z/OS and More, SG24-8527, which
provides an overall introduction to the product, or see Db2 AI for z/OS 1.5.0.
After baseline data collection completes, Db2ZAI SQL optimization starts training with the
data and feeds the learned results to the Db2 optimizer to choose the most efficient access
paths for each SQL statement.
There are currently three types of models that Db2ZAI can use for SQL access paths
optimization:
Host variable model With the learned host variables and parameter marker values
from baseline SQL runs, Db2ZAI helps the Db2 optimizer to get
more accurate filter factor estimations for choosing better access
paths.
Screen scrolling model With the learned screen-scrolling behavior from baseline SQL
runs, Db2ZAI helps the Db2 optimizer to optimize the access
paths for retrieving some rows of data.
Parallelism model Db2ZAI provides the Db2 optimizer with query parallelism inputs
to activate proper levels of parallelism for best runtime
performance.
Db2ZAI automatically can stabilize simple dynamic statements, which help applications
reduce prepare overhead.
Also, Db2ZAI has a powerful feature that is built on its baseline learning about SQL
statements so that Db2ZAI can detect and resolve performance regressions that are caused
by access path changes. This feature is automated fully with Db2 12 function level (FL) 505
and later, and it is available for both static and dynamic SQL statements. (Before FL
V12R1M505, only regression detection is automated, without automatic resolution of the
regressions.)
To evaluate the benefits of SQL optimization, we randomly selected 154 dynamic queries
from our query performance regression workloads and compared their performance when we
ran them with access paths that were not optimized by Db2ZAI against their performance
when we ran them with access paths that the Db2 optimizer chose with Db2ZAI models.
Measurement environment
The target Db2 subsystem used the following environment:
An IBM z14 z/OS logical partition (LPAR) with 16 general CPs, four IBM zSystems
Integrated Information Processors (zIIPs), and 512 GB of memory
z/OS 2.4
One Internal Coupling Facility (ICF), one external Coupling Facility (CF), and CFCC level
23, each with three dedicated CPUs
An IBM DS8870 direct access storage device (DASD) controller
One-way data sharing with Db2 12 at FL 510
As the base measurement, we ran the query jobs after the Db2 subsystem was restarted with
ML stopped, which ensured that the SQL statements were all running with access paths
without the influence of Db2ZAI.
Then, to measure how the SQL statements perform with the access paths that Db2ZAI
helped to choose, we restarted Db2 and started ML, and then ran the same set of query jobs.
With ML started, the query SQL statements used the access paths that Db2ZAI helped to
optimize.
Results
From the Db2ZAI UI for SQL optimization dashboard page for all dynamic statements, as
shown in Figure 13-1, 14 out of the 154 (9%) dynamic SQL statements improved, and these
statements are estimated to save 66% CPU on average.
Figure 13-1 Db2ZAI 1.5.0 SQL optimization benefits dashboard: Dynamic statements
By using the accounting traces of the selected dynamic SQL statements, we validated the
statements’ performance changes and saw that they are in line with what results that are
displayed in the Db2ZAI UI.
Chapter 13. IBM Db2 AI for z/OS benefits and capacity requirement 367
Figure 13-2 shows the class 2 elapsed time and CPU time of the SQL statements that were
run with the access paths that Db2 chose without Db2ZAI models influencing the decisions,
and the access paths that the Db2 optimizer chose by using the Db2ZAI models. Most
statements show similar performance, although with Db2ZAI, a few statements have notable
shorter class 2 elapsed time and CPU time (where the orange line is below the blue line).
Figure 13-2 Dynamic SQLs class2 elapsed time and CPU without versus with Db2ZAI
Figure 13-3 shows the average class 2 elapsed time and class 2 CPU time of the 154
statements, comparing their performance when they were run without Db2ZAI and with
Db2ZAI. The average class 2 elapsed time was reduced by 63.6% for the sample SQL
statements. The average class 2 CPU time was reduced by 16.8%.
Figure 13-3 Dynamic SQLs average class 2 elapsed time and CPU without versus with Db2ZAI
Figure 13-4 Improved SQLs average class 2 elapsed time and CPU without versus with Db2ZAI
Chapter 13. IBM Db2 AI for z/OS benefits and capacity requirement 369
Figure 13-5 shows a Db2ZAI UI screen capture of the statement details for a dynamic SQL
statement (the batch job name is K01Q0030) with improved performance after implementing
the access paths with the Db2ZAI host variable model. Db2ZAI calculated the improvement
based on the SQL statement’s runtime history, and estimated it to have a 65% CPU
improvement and a 94.9% elapsed time improvement.
Note: The Statement CPU runtime plot time graph shows only the most recent runtime
history records up to 500. In Figure 13-5, the baseline runtime history is not displayed
because it was not among the most recent 500 run times.
Figure 13-5 Db2ZAI 1.5 statement details for an improved SQL statement
When we examined the accounting trace for the same query, the results of which are shown
in Figure 13-6 on page 371, we saw that the class 2 elapsed time and CPU time of the query
had improvements of 94.6% and 61.7%, which are similar to what Db2ZAI estimated from th
history data of 94.9% and 65%.
For more information on how to use Db2ZAI to optimize SQL statements, see Optimizing SQL
access paths.
In Db2ZAI 1.5.0, a new feature that is called performance insights was introduced to provide a
quick overview of Db2 health with performance-at-a-glance and deeper drill-down capability
for a quick analysis for different types of Db2 connections, CICS, BATCH, DIST, UTILITY, and
so on.
Chapter 13. IBM Db2 AI for z/OS benefits and capacity requirement 371
This section provides a few examples of how SA and performance insights can help you tune
and monitor Db2 subsystem performance. The target Db2 subsystems are a two-way
data-sharing group running Db2 12 at FL 510.
However, with the help of performance insights and SA, we were able to identify the
application that was causing the problem quickly.
The following steps describe how to pinpoint the “bad” SQL statements:
1. Log in to the Db2ZAI UI and go to the Performance insights page to examine the Db2
Elapsed Time plot under Response Time - Subsystem with the start and end date and
time set to between 09/22/2022 00:30 and 09/22/2022 01:00. Because the main symptom
of the performance incident is a response time increase for distributed applications, we
focus on connection type DIST, as shown in Figure 13-7.
We see that the average response time of DIST transactions is unusually high between
00:40 and 00:50.
Figure 13-7 Performance insights: Db2 elapsed time for DIST connections - Per Quantity
2. To check what might be causing the elapsed time increase, we look at the Db2
Suspension time, and we see that global contention and lock and latch suspensions are
the main contributors to the suspension time, as shown in Figure 13-8.
Figure 13-8 Performance insights: Db2 suspension time for DIST connections - Per Quantity
The distributed applications running on our target Db2 subsystems usually run quickly.
Why are there such long global contention suspensions?
3. We check other connection types and find that BATCH also shows unusually long global
contention and lock and latch suspension times at a similar time, as shown in Figure 13-9.
Figure 13-9 Performance insights: Db2 suspension time for BATCH connections - Per Quantity
Chapter 13. IBM Db2 AI for z/OS benefits and capacity requirement 373
4. We click Assessment on demand to trigger a SA with start and end times of 09/22/2022
00:30 and 09/22/2022 01:00. After the on-demand assessment completes, we go to the
SA page to see whether there are any exceptions that match our performance issue. We
see that the assessment flagged the CL3_GLOBAL_CONTENTION_ALLC metric, which
had exceptions between 00:42 and 00:49, as shown in Figure 13-10. The exceptions
match what we learned from the performance insights drill-down.
5. We want to find which high CPU-consuming SQL statements were running during the
abnormal period. Using our cursor, we click and highlight the time range 00:42 - 00:49.
Then, we right-click the CL3_GLOBAL_CONTENTION_ALLC graph and click View top
CPU consuming SQLs, which opens the window that is shown in Figure 13-12 on
page 375.
We see that three SQL statements are flagged as the top CPU consumers. Clicking the SQL
text shows that they are doing updates on table TBPBR02 with large data ranges (in fact,
each of the UPDATE statements updated 1.5 million rows). With such a large update unit of
recovery that locks out so many rows of data for a long period, the unusual global contention
of the distributed workload is explained (some of the distributed transactions also update
table TBPBR02).
Chapter 13. IBM Db2 AI for z/OS benefits and capacity requirement 375
Figure 13-13 lists a few of the recommendations that might help you tune your Db2
subsystems without extensive performance studies. These windows are all screen captures
from actual SAs.
If you schedule SAs daily, Db2ZAI will help evaluate your Db2 systems’ wellness and detect
exceptions within 24 hours of them happening, and with minimum human intervention and
extra cost. With the provided recommendations, you can decide how to address the
exceptions so that your Db2 subsystems run better.
Db2 administrators often lack the knowledge of remote application behaviors and might not
be able to implement enough protection for the Db2 subsystems to prevent a flood of
distributed connections, which can have a system-level impact in serious cases. A balance is
required to control Db2 distributed connections.
The connection controls can be set either at the subsystem level with Db2 subsystem
parameters such as MAXDBAT or MAXCONQN, or they can be managed through more granular
means such as by using the Db2 profile tables. Setting the connection controls too high can
cause Db2 subsystems to be flooded by some rogue applications sending in too many
requests in a short period. Setting them too low can prevent the distributed applications from
doing their work efficiently enough, so that they are always queuing for available database
access threads (DBATs) or being rejected altogether.
This section provides a few use case examples by using our Db2ZAI 1.5 installation. The
target environment is a Db2 12 two-way data-sharing group running at FL 510.
13.3.1 Db2ZAI DCC use case: Setting up profiles to monitor and control
distributed connections
If you want to implement more granular control over your distributed workloads but find it
difficult to gather all the information that you need to define proper profile table entries, the
Db2ZAI DCC function will help you. The Db2ZAI product documentation describes a typical
flow of how to use the Db2ZAI DCC function to create profile recommendations by training
with history statistics data of distributed connections and threads. For more information and
Db2ZAI 1.5 instructions, see Monitoring and controlling distributed connections.
This section describes the process. We use some screen captures that show how we set up
distributed connection profile monitoring. We use Db2ZAI 1.5 to illustrate this process. The
target Db2 subsystems are a two-way data-sharing group running Db2 12 at FL 510.
Then, we make sure that ML is started on our target Db2 subsystems and that IFCIDs 365,
402, and 412 are active, which ensure that Db2ZAI collects the information that these IFCIDs
produce and writes the data into the Db2ZAI tables for DCC training and data presentation.
Chapter 13. IBM Db2 AI for z/OS benefits and capacity requirement 377
We set the Db2 STATIME_MAIN subsystem parameter to 10 (seconds), and we set the MAXDBAT
and CONDBAT subsystem parameters to 2000 and 10000.
To indicate to DCC where to focus, we change the priority levels of the locations and client
user IDs from the initial UA (Unaccounted for) to SP (Standard priority), or HP (High priority)
for the important ones, before running the training. Assigning priority levels other than UA
allows Db2ZAI to generate unique profile recommendations for the locations and client user
IDs. Otherwise, they are included in the general profile of location 0.0.0.0.
Figure 13-15 shows how to change the priority levels of the different locations that Db2ZAI
identified. Similarly, use the Client application name and Client user ID tabs to assign
priorities to the client application names and user IDs that Db2ZAI identified.
Click Confirm to run the DCC training process. The training might take a while to complete,
depending on how much statistics data must be processed.
Note: You can get a training status “Warning” if the sum of high-water mark (HWM) values
of all the connection profiles that were generated for a Db2 member exceeds its CONDBAT
value, or if the sum of HWM values of threads of user ID (or application name) profiles for a
Db2 member exceeds its MAXDBAT value. In such cases, you must decide whether the
warning is acceptable or update the MAXDBAT or CONDBAT setting and retrain.
As shown in Figure 13-18, the location with IP address 192.168.129.101 had a connection
HWM value of 80 on member DC1N. The profile recommendations on the connection
warning and exception thresholds for this location on member DC1N are 81 and 123.
With client user ID traces active, the training produces profile recommendations for warning
and exception thresholds at the thread level for each client user ID. (If the client application
name is chosen from the DCC settings, the recommendations are for each client application
name). In the following example with a thread HWM of 106 on DC1N, Db2ZAI recommends
that the USRT001 profile use warning and exception thresholds of 107 and 287.
Chapter 13. IBM Db2 AI for z/OS benefits and capacity requirement 379
Activating recommended profiles
After evaluating the profile items and changing them as needed, we activate the profiles.
Figure 13-19 shows the green check boxes that indicate that both profile warnings and
exceptions are active.
We are ready to let the profiles do their work and monitor distributed work coming in from the
monitored locations and running under the monitored client user IDs.
Note: If your workload has major changes after you complete the profile evaluation and
activation process (such as new applications being introduced, work coming in from new
locations, or peak workload increases), you must retrain and repeat the profile evaluation
and activation process.
13.3.2 Db2ZAI 1.5 DCC use case: Visualizing distributed workload activities
with the dashboard
Db2ZAI 1.5 introduced a dashboard for DCC that serves as an entry point for Db2
administrators to get quickly an idea about how the distributed workloads are running on the
target Db2 systems.
The dashboard is the first page that you are presented with after opening the Distributed
connections web page from the Db2ZAI web UI.
This section provides some examples of what the dashboard looks like and how you can go to
different sections of the dashboard to discover more information about your distributed
workloads.
From here, you can right-click to focus on a single Db2 member, as shown in Figure 13-21, or
right-click again to investigate thread-level statistics by either location, user ID, or application
name.
Figure 13-21 DCC dashboard: DBAT High Water Mark for a single Db2 member
Chapter 13. IBM Db2 AI for z/OS benefits and capacity requirement 381
Click User ID Threads to open a graph of the top 20 user IDs with the largest number of
threads. As shown in Figure 13-22, clicking a user ID in the graph shows the thread statistics
of the specific user ID, where you can select different metrics to view.
Figure 13-22 DCC dashboard: DBAT high water mark and drilling down to thread statistics
Figure 13-23 DCC dashboard: Distributed connections and thread usage metrics
Clicking the WLB circle displays the top 20 locations that are not using sysplex WLB, as
shown in Figure 13-24. We can click each of the locations to check their connection statistics.
Chapter 13. IBM Db2 AI for z/OS benefits and capacity requirement 383
Profile exceptions and warning alerts
The third section of the dashboard shows a summary of the DCC profile exception and
warning alerts for the last 7 days.
As shown in Figure 13-25, we can do a drill-down analysis from the dashboard to get to the
list of alerts, and then display the alert details web page where we can check various
connection and thread statistics scorecards for the monitored resources.
Figure 13-25 DCC dashboard: Profile exception and warning alerts drill down
Drilling down into the profile exceptions and warning alerts can help DBAs determine whether
the alert is due to something erratic happening or if the workload has grown over time and the
profile must be adjusted or retrained to accommodate workload growth.
Figure 13-25 on page 384 is a high-level depiction of the Db2ZAI architecture, which reflects
both the Db2ZAI 1.4 and 1.5 designs.
This architectural diagram shows the six areas where Db2ZAI consumes system resources to
fulfill its functions. The following list explains the function and the CPU cost that is associated
with each of these six areas and provides some general recommendations for managing and
monitoring them through z/OS Workload Manager (WLM) report and service classes.
Chapter 13. IBM Db2 AI for z/OS benefits and capacity requirement 385
When Db2ZAI users log on to the UI and invoke UI operations, the data access of the
Db2ZAI tables is performed at the primary target Db2 member to which the Db2ZAI is
connected. They are DRDA requests whose CPU cost is 60% zIIP eligible, and they run
under the user’s UI login user ID.
The recommended setup is to use a separate WLM report class and possibly a separate
service class for DDF enclave SRB processing for the user IDs that perform Db2ZAI UI
operations.
The next three areas are on the Db2ZAI UI and REST services side:
Db2ZAI uses Node.js to support the UI web interfaces. The default names of the Node.js
address spaces are ZAIND*. They are UNIX System Services (OMVS) processes that
consume general central processor (CP) CPU time.
The recommended setup is to use a dedicated WLM report class for these address
spaces for better monitoring of their resource consumption.
The main Db2ZAI infrastructure functions run within an IBM WebSphere® Liberty server.
The Liberty server runs under UNIX System Services, and its default address space name
is ZAILBTY. The Liberty services are zIIP eligible. During SAs, the Liberty server can
consume up to 2.7 zIIP processors in our measurements with STATIME_MAIN=10 specified
for the target Db2 systems.
The recommended setup is to use a dedicated WLM service class for the Liberty address
space and assign it an importance lower than other more critical tasks that demand zIIP
capacity on the LPAR.
SA training runs under z/OS Spark services (the default address space names are
ALNSPK* for WMLz 2.4) through WMLz (the default address space names are ALN*). The
z/OS Spark services are zIIP eligible. Because SA training should be infrequent, the CPU
demand for the z/OS Spark services is not high.
The recommended setup is to use WLM report classes for the WLMz and Spark address
spaces to monitor the training cost. If frequent SA training is expected, consider setting up
a separate service class to control zIIP usage.
Note: The Db2 subsystem that is used to store Db2ZAI metadata is not reflected in the
architecture depiction. Db2ZAI accesses the metadata through DRDA requests only
occasionally. The system resource requirement for hosting the Db2ZAI metadata is small
and can be ignored.
Figure 13-27 on page 387 shows some examples of WLM service and report classes for the
various processes and address spaces that are involved in an Db2ZAI installation.
Note: For the WMLz, z/OS Spark, and the Db2ZAI Liberty and Node.js address spaces, if
they are started as started tasks, use subsystem type STC to assign their WLM service
and report classes.
Disk requirements on the target Db2 systems side are mainly for the Db2ZAI tables. Plan for
200 GB of available disk space for them. We consumed 120 GB disk space for the
Db2ZAI 1.5 installation without triggering any Db2ZAI table cleanups with a two-way
data-sharing target Db2 system. A best practice is to monitor your disk usage for the Db2ZAI
objects and to set up proper retention periods for different types of Db2ZAI data, as described
in Scheduling Db2ZAI table cleanups. Doing a periodical online REORG of the Db2ZAI objects
is also a best practice.
Disk requirements on the Db2ZAI UI side are for the product installation and the working
directories for the Db2ZAI services. Plan for 20 GB of disk space for Db2ZAI usage, which is
also enough to cover the disk space that WMLz 2.4 requires to support Db2ZAI.
This section provides the capacity evaluation test results from our in-house measurements for
Db2ZAI 1.5 running with Db2 12 target subsystems that host the mixed workload.
Measurement environment
The target Db2 subsystem consisted of the following setup:
One IBM z14 LPAR with 16 general CPs, four zIIPs, and 512 GB of memory
z/OS 2.4
One ICF, one external CF, and CFCC level 23, each with three dedicated CPUs
An IBM DS8870 disk controller
Chapter 13. IBM Db2 AI for z/OS benefits and capacity requirement 387
One Linux on IBM Z LPAR with six general CPs and 24 GB of memory to drive DRDA
workloads
Two-way data sharing, at Db2 12 FL 510, with both members running on the same LPAR
We defined separate WLM report classes for the address spaces and DDF tasks that are
involved, and we used RMF workload activity reports to obtain the CPU usage and real
storage usage performance numbers. On the target Db2 side, we also used OMPE reports for
some of the Db2 accounting and statistics numbers.
The background workload that runs on the target Db2 subsystems is made up of the following
subworkloads:
Dynamic and static SQL statements that are invoked through batch jobs.
A subset of DRDA IBM Relational Warehouse Workload (IRWW) transactions:
– One set running with 40 different IP addresses
– Another set running with 40 different user IDs and 40 different application names
A random update batch workload.
A two-way data-sharing relational transaction workload (RTW) CICS-Db2 workload that
adds more complexity to the system performance insights drill-downs.
Because we expect more clients to start running Db2 subsystems with STATIME_MAIN=10,
which is the default for Db2 13 for z/OS, we ran our tests with both target Db2 members with
STATIME_MAIN=10.
Before conducting the measurements, we accumulated Db2ZAI data for over 6 weeks without
cleanup. The Db2ZAI tables take up about 120 GB of disk space. There were about 2000
SQL statements in scope (meaning Db2ZAI monitors their performance and analyzes their
behavior for AI access path optimization). At the time of the measurements, most of the
in-scope SQL statements were in “Learning complete” status, which means that Db2ZAI
evaluated their access paths and created proper ML models for them if beneficial.
We evaluated the system resource cost for four major types of Db2ZAI usage scenarios:
Basic Db2ZAI data collection and aggregations
Performing UI operations
SA training
SAs
Here are the characteristics of the configuration that we used during the capacity requirement
studies for Db2ZAI 1.5.
Number of Db2 members: 2
Number of SQL statements in scope: ~2000
Number of packages in scope: ~1000
SQL complexities: All complexity levels
Number of full prepares: 0
Number of local BPs per Db2 member: 34
STATIME_MAIN setting: 10
Number of group BPs per Db2 member: 16
Number of IP addresses: 42
Number of work periods per Db2 member: 13
Number or remote applications: 47
Number of remote client IDs: 41
Number of Db2 connection types: 3
Number of Db2ZAI table rows added per hour: ~550,000
We conducted the following measurements to understand the costs that are associated with
Db2ZAI processing:
We ran the background workload for 60 minutes after we issued the -STOP ML command
and used this run as the base measurement for evaluating Db2ZAI data collection and
aggregation cost. We call this measurement NOML.
We ran the background workload again for 60 minutes after we issued the -START ML
command. We made sure that no Db2ZAI UI operations were performed during the
measurement period. We call this measurement NOUI. The performance differences
between the NOUI and NOML measurements reflect the Db2ZAI data collection and
aggregation cost.
Chapter 13. IBM Db2 AI for z/OS benefits and capacity requirement 389
We ran the third measurement with the background workload running for 60 minutes after
we issued the -START ML command, and concurrently we performed a typical set of
various Db2ZAI UI operations. We call this measurement UI. The performance differences
between the UI and NOUI measurements should reflect the Db2ZAI UI operation cost.
For evaluating SA training performance, we ran two SA training processes while the
background workload ran with ML started. During this time, we ensured that no other
Db2ZAI UI operations were performed. The first training (measurement TR1) was with
1 week of the statistics data that Db2ZAI. The second training (measurement TR2) was
with 2 weeks of statistics data. By comparing TR1 and TR2 data with NOUI performance
data, we can understand how systems assessment training performs and the training
performance scales as the data volume increases.
To evaluate SA performance, we ran three SAs while the background workload ran with
ML started and with no other Db2ZAI UI operations. The three assessments assessed
1 day, 2 days, and 3 days of data, and they are assigned measurement IDs of SA1, SA2,
and SA3. Comparing the SA1-3 performance data with NOUI, we calculated how SAs
perform and how the assessment performance scaled as the data volume increased.
The CPU cost of each scenario is expressed as the percentage of a single CPU’s capacity
that is used during the measurement periods. To illustrate the results, look at the numbers for
basic Db2ZAI data collection and aggregation.
The CPU cost that is spent on general CP is approximately 36 seconds, and approximately
140 seconds on zIIP processors for the measurement duration of 60 minutes. The CPU cost
on general CP is calculated as 36 / (60 × 60) = 1% of a single CPU’s capacity. Similarly, the
CPU cost on zIIP processors is calculated as 140 / (60 × 60) = 2.9%. Therefore, the total CPU
cost is 1% + 2.9% = 3.9% of the capacity of a single processor.
Table 13-2 Db2ZAI 1.5 capacity evaluation: High-level overview of CPU cost
Db2ZAI functions Processing on CPU cost (% of a How much of Main processing
single CPU’s the CPU cost under
capacity) is zIIP
eligible?
Basic Db2ZAI data Target Db2 side 3.9% > 70% ssnmDBM1
collection and
aggregations Db2ZAI UI side Ignorable N/A N/A
Chapter 13. IBM Db2 AI for z/OS benefits and capacity requirement 391
Note: The idling cost of the Db2ZAI services on the LPAR where the WMLz, Node.js, and
the Liberty servers are running is approximately 2% of a single CPU (less than 72 seconds
of CPU time per hour). Idling means that the services are started but are not processing
real work.
For the memory usage, we looked at the RMF paging activity reports and used the CENTRAL
STORAGE FRAMES counts in “Frame and Slot Counts” section to calculate the maximum
real storage that was used during the measurement periods. We used the following equation:
maximum_memory_used_in_GB = (CENTRAL STORAGE FRAMES TOTAL - CENTRAL STORAGE FRAMES
AVALIABLE (MIN)) × 4 /1024 / 1024
As shown in Table 13-3, for our measurements, only SA training used a significant amount of
real storage to run, which is 8 GB. Other Db2ZAI functions did not use a significant amount of
real storage.
Table 13-3 Db2ZAI 1.5 capacity evaluation: High-level overview of memory usage
Db2ZAI functions Processing on Memory usage
Db2ZAI UI side 8 GB
Note: The idling real storage cost of the Db2ZAI services on the LPAR where the WMLz,
Node.js, and the Liberty servers are running is approximately 4 GB. Idling means that the
services are started but are not processing real work.
With the information that is in Table 13-2 on page 391 and Table 13-3, you can see that
supporting Db2ZAI is not expensive. With our configuration on the LPAR where the target
Db2 subsystems are, the maximum of CPU capacity that we added with all the Db2ZAI
functions should be less than 15% of a single processor. On the LPAR where Db2ZAI and
WMLz services run, less than 5% of the CPU capacity of a single processor is enough to
handle functions other than SA training and assessments. To support SA training, up to
two IIP processors might be required during the training. For SAs, up to three IIP processors
might be required while the assessments run.
Then, we ran the workload again after issuing the -START ML command on the target Db2
subsystem. During this 1-hour measurement period, we did not perform any UI operations on
the Db2ZAI UI. By comparing this measurement with the base measurement, we can see that
the difference in CPU and memory usage reflects the Db2ZAI data collection and data
aggregation cost.
Based on our analysis, which is shown in Figure 13-28, we made the following general
observations about the Db2ZAI 1.5 data collection and the aggregation overhead that was
incurred with our configurations.
Figure 13-28 Db2ZAI 1.5 data collection and aggregation CPU usage on target Db2 subsystems
Chapter 13. IBM Db2 AI for z/OS benefits and capacity requirement 393
The CPU cost occurs mainly on the target Db2 side.
– Figure 13-28 on page 393 shows that a total of approximately 140 seconds of CPU
time was used to perform Db2ZAI data collection and data aggregations during our
1-hour measurement period, and 104 seconds of the CPU were spent on zIIP
processors, which is 74% of the total CPU cost.
– Most of the direct data collection and aggregation CPU is zIIP eligible and charged as
ssnmDBM1 SRB time.
– DCC data aggregation runs as DRDA requests to the primary Db2 member (the Db2
member to which Db2ZAI services connect). The DRDA requests run under the
Db2ZAI scheduler ID. The total DCC data aggregation CPU cost was only a few
seconds during the 60-minute measurement run.
– We observed an increase in the ssnmDBM1 I/O interrupt time on the Db2 member to
which Db2ZAI services connect. The increase is attributed to the data handling on the
Db2ZAI tables, such as data prefetch activities and synchronous I/Os.
– The CPU cost for Db2ZAI 1.5 data collection and aggregations in our measurement
uses less than 3% of a single CPU’s processing time, and over 70% of the CPU cost
can be offloaded to zIIPs.
– To evaluate real storage usage, we examined both the Db2 statistics reports and RMF
reports, which showed no noticeable real storage usage difference between the two
measurements. RMF paging activity reports show real storage usage of approximately
117 GB on the LPAR where the target Db2 subsystems run.
The Db2ZAI UI side spends only a few seconds of CPU to maintain connectivity with the
Db2 target systems and trigger DCC data aggregation activities.
Performing UI operations
To measure the cost of Db2ZAI UI operations, we ran our background workload for 1 hour
with ML started on the Db2 target subsystems without any UI operations that were performed
from the Db2ZAI UI. This measurement is used as the base measurement.
Then, we ran the background workload again with ML started while we performed all the
major Db2ZAI 1.5 UI operations for 1 hour, including typical UI operations for SA,
performance insights, SQL optimization, and DCC scenarios. In total, the test required
approximately 400 mouse clicks. The performance differences between this measurement
and the base measurement should reflect the cost of performing Db2ZAI 1.5 UI operations.
Based on our analysis, we made the following general observations about Db2ZAI 1.5 UI
operation system resource consumption:
On the Db2 target system side:
– DRDA requests are sent to the Db2 member that Db2ZAI directly connects to for the UI
operations that must retrieve data from the Db2ZAI tables. Db2ZAI code is designed for
efficiency and avoids repeated data retrieval, and it uses efficient queries that return
data quickly with optimum access paths.
– Figure 13-29 on page 395 shows the class 2 elapsed time distribution of the DRDA
requests for the UI user ID. For 1 hour of Db2ZAI UI operations, only 52.2 seconds
were spent for general CPU time and 74.8 seconds were spent on the zIIP processors.
These measurements have a total of 3.5% of a single CPU’s capacity on average for
the 60-minute measurement period.
– Approximately 400 mouse clicks generated approximately 1400 DRDA transactions
with a total class 2 elapsed time of 224.4 seconds, which averages to 0.16 second of
class 2 elapsed time per DRDA transaction.
Figure 13-29 Db2ZAI 1.5 UI operations elapsed time and CPU usage for UI user ID on the target Db2
Note: A best practice is to use dedicated buffer pools for the Db2ZAI objects. This
approach makes monitoring Db2ZAI object access efficiency easier and enables you to
tune the buffer pool attributes more easily.
Chapter 13. IBM Db2 AI for z/OS benefits and capacity requirement 395
On the Db2ZAI UI side:
– The Node.js address spaces (ZAIND*) and the Liberty applications handle the web
interactions and send DRDA requests to the target Db2 subsystem to retrieve data.
– The CPU time that is spent in the Node.js address spaces is not zIIP eligible, but the
CPU that is spent in the Liberty server is almost 100% zIIP eligible. Figure 13-30
depicts the CPU cost that is incurred on the LPAR where Db2ZAI is installed and
running during our UI operations measurement of 1 hour.
– In our configuration, the total CPU cost is small (no more than 2% of the capacity of a
processor).
– There was no real storage usage increase on the LPAR that hosts the Db2ZAI services
when performing UI operations. Both the base measurement and the measurement for
the UI operations have a similar LPAR real storage usage of approximately 27 GB.
Figure 13-30 Db2ZAI 1.5 UI operations: CPU usage on the Db2ZAI UI side
Table 13-4 lists some of the important factors that affect training performance.
Table 13-4 Db2ZAI 1.5 system assessment training study: Numbers that affect training performance
For each Db2 member Data volume
10 36 12 24 7 120839
10 36 12 24 14 236857
By analyzing the performance data from the two SA training processes with different data
volume, we made the following observations:
On the Db2 target system side:
– On the target Db2 member to which Db2ZAI directly connects, DRDA requests are
sent from the Db2ZAI services to retrieve the data for training and for writing the
training results back to the SA tables.
– As shown in Figure 13-31, the DRDA requests that are related to the SA training
processes use a small amount of CPU and zIIP processing time. We see an increase
in ssnmDBM1 IIP SRB time (the pre-emptible SRB time spent on zIIP) as more data is
pre-fetched for the training test with 2 weeks of data. As a result, class 3 other read
suspension time for the 2-week data training test also increases compared to the
1-week data training. The 4 K and 16 K buffer pools that we use for the Db2ZAI objects
have a VPSIZE of 80000 and 20000. These buffer pools can be tuned to be larger for
better buffer pool hit ratios to improve the efficiency of training data prefetch.
– All the extra CPU that is used on the target Db2 subsystems for the SA training process
represents less than 4% of a zIIP processor and less than 2% of a general processor
engine.
– There is no real storage usage increase on the LPAR where the target Db2
subsystems run during the SA training tests.
Figure 13-31 Db2ZAI 1.5 SA training elapsed time and CPU usage for scheduler ID on the target Db2
Chapter 13. IBM Db2 AI for z/OS benefits and capacity requirement 397
On the Db2ZAI UI side:
– The CPU cost for running the SA training processes is incurred under the z/OS Spark
address spaces. The default address space names for Spark are ALNSP* with
WMLz 2.4 installations. Approximately 98% of the CPU time that is spent by Spark
address spaces is zIIP eligible. The SA training function scales well as the data volume
increases in terms of elapsed time and CPU usage.
– Figure 13-32 shows the detailed performance numbers for our tests. The “Total LPAR
CPU time” is the total CPU that was used by all address spaces running on the LPAR
for the durations of the SA training processes. “Spark CPU time” is the CPU time that
all the spark address spaces consumed during the SA training processes. Similarly,
“Total LPAR IIP time” and “Spark IIP time” show the IIP time that was consumed by all
the address spaces on the LPAR and by Spark address spaces during the SA training
processes. Because there are no zIIPs that are configured on the LPAR where the
Db2ZAI instance runs, we calculated the IIP time based on the IIPCP numbers from
the RMF workload activity reports.
– The processor resource demand is 131.5 - 163.7% of a single CPU for our
measurements. If the Db2ZAI instance runs on an LPAR on which other critical
applications run, you should run the SA training during the lowest period of activity to
avoid affecting other applications. SA training processes do not need to be run
frequently. Typically, you need to rerun SA training processes only after your target Db2
subsystems go through changes that affect the baseline metrics and you change the
work period definitions of the target Db2 subsystems.
Consider putting the Spark address spaces (ALNSP*) into a separate WLM service
class with lower importance than other critical applications.
– The real storage usage requirement for the training processes is mainly above-the-bar
(ATB) and for the Spark workers. From the RMF paging activity report, the maximum
LPAR real storage usage increases by approximately 8 GB during the SA training
process compared to when either Db2ZAI or MLZ services were not brought up on the
LPAR.
Figure 13-32 Db2ZAI 1.5 SA training elapsed time and CPU usage on the Db2ZAI UI side
For the Db2ZAI 1.5 SA capacity evaluation, we measured the performance of three different
assessments processing 24, 48, and 72 hours of data.
The base measurement is running our mixed workload on the target Db2 systems with ML
started, but with no UI operations performed on the Db2ZAI UI.
The data volume that was processed for the three assessments, as recorded in
assessmentMessages.log and the elapsed time of the three assessments, is shown in
Table 13-5.
Table 13-5 Db2ZAI 1.5 system assessment study: Assessment data volume and elapsed time
Data volume 1-day count 2-day count 3-day count
dfWorkPeriods 24 24 24
Chapter 13. IBM Db2 AI for z/OS benefits and capacity requirement 399
On the target Db2 side:
– The statistics data in the Db2ZAI tables is retrieved through DRDA requests under the
connection’s scheduler ID from the Db2 member to which Db2ZAI connects. The
assessment results are written back to the Db2ZAI SA exception tables with DRDA
requests.
– As shown in Figure 13-33, the CPU cost of SAs on the Db2 target side scales well as
data volume increases.
– The general CPU overhead from the DRDA requests is below 1.5% of a processor, and
the zIIP processor overhead is below 2.2% of a processor.
– There is no real storage usage increase on the LPAR where the target Db2
subsystems run during the SA tests.
Figure 13-33 Db2ZAI 1.5 system assessment elapsed time and CPU usage on target Db2
Summary of results
The following key findings summarize the results for the series of capacity evaluation tests
that we did for Db2ZAI 1.5 and Db2 12 for z/OS:
On the target Db2 subsystem side, running Db2 workloads with Db2ZAI (ML started) is
inexpensive. After considering the CPU savings that the SQL optimization function can
potentially bring to your workloads, the overhead will be further reduced, if not canceled
entirely. Also, with proper WLM service class and report class definitions in place, running
with ML started does not affect your existing applications.
On the Db2ZAI UI side where the Db2ZAI services and WMLz run, the major CPU
overhead is from the SA training and SAs. However, because the CPU overhead is close
to 100% zIIP eligible, the cost is containable. If you have other zIIP dependent workloads
on the same LPAR, with proper scheduling and proper WLM service classes that are
defined, their performance is not affected.
Chapter 13. IBM Db2 AI for z/OS benefits and capacity requirement 401
- HV/OFNR Simple Model Training : MLSIM > RUNNING
- Package Automation : MLPACK > RUNNING
- Performance Data Processing : MLPERF > RUNNING
- Dynamic Sql Monitoring : MLDSMN > RUNNING
SYSTEM ASSESSMENT &
DISTRIBUTED CONNECTION CONTROL = STARTED
- ML Generating Metrics : MLMETX > RUNNING
DISPLAY ML REPORT COMPLETE.
DSN9022I -DB2A DSNXODPM 'DISPLAY ML' NORMAL COMPLETION
You also can obtain similar information from the Db2ZAI UI if you are using Db2ZAI 1.5 by
following the instructions in Checking the status of Db2ZAI processes on your Db2
subsystems.
You can monitor the statuses of the Db2ZAI services from the UI by using the instructions in
Checking the status of the Db2ZAI services.
For more information and complete instructions for installing, administering, and using
Db2ZAI 1.5, see Db2 AI for z/OS 1.5.0.
Traditionally, performance workloads started as heavy-duty batch jobs, which stressed the
CPU usage and I/O of the system. For example, the original non-distributed OLTP
IBM Relational Warehouse Workload (IRWW) (referred to as the classic IRWW in this
publication), which was created at approximately the same time as Db2 4, is based on a retail
environment that uses IMS and later CICS as the transaction monitor. In later years, more
workloads that were related to distributed environments, complex OLTP, query processing,
and data warehousing were added.
The following list summarizes the different types of workloads that were used to evaluate the
performance of the features in this book. Not all workloads were used with all features.
A set of the classic IRWW workloads represents the simple, legacy type, OLTP workloads
that are typically run through IMS or CICS as the transaction server.
IRWW distributed workloads are the same Db2 workloads as the classic IRWW workloads
but run through remote clients, such as JDBC or remote stored procedure calls.
The brokerage OLTP workload represents a complex OLTP workload with a mixture of
lookup, update, insert, and reporting queries.
The SAP day-posting workload is an OLTP workload with relatively simple SQL
statements that are run along with intensive insert, update, and delete statements an SAP
banking database.
The SAP account settlements workload is a batch workload that runs simple to complex
SQL statements along with intensive insert, update, and delete statements against an
SAP banking database.
High-insert workloads are concurrent inserts that are run through more than 100 JCC T4
connections in data sharing.
Special high-insert batch workloads (HIWs) are designed to compare physical design
options and different structures to minimize contention.
TSO batches represent a batch workload that is running various batch operations in a
loop, including SELECT, FETCH, UPDATE, INSERT, and DELETE.
The transactions that are used in the classic IRWW workload are listed here:
Delivery Performs various SELECT, UPDATE, and DELETE transactions that support
the delivery of a group of orders and runs as 2% of the total transaction
mix.
New Order Performs various SELECT, FETCH, UPDATE, and INSERT transactions that
support the receipt of new customer orders and runs as 22% of the total
transaction mix.
Order Status Performs various SELECT and FETCH transactions that support providing
the order status and runs as 25% of the total transaction mix.
Payment Performs SELECT, FETCH, UPDATE, and INSERT transactions that support
received customer payments and runs at 21% of the total transaction mix.
Price Change Performs an UPDATE that supports changing the price of an item and runs
as 1% of the total transaction mix.
Price Quote Performs various SELECT transactions that support providing the price of a
set of items and runs as 25% of the total transaction mix.
Stock Level Performs a JOIN and various SELECT transactions that support providing
the current stock level of an item and runs at 4% of the mix.
The IRWW database contains several inventory stock warehouses and sales districts. The
front-end transaction manager is typically IMS, but CICS is used sometimes.
Historically, the classic IRWW workload used IMS Fast Path (IFP) regions for transaction
runs. IFPs bypass IMS queuing, which allows more efficient processing, including thread
reuse.
Note: Because of this change, previously published IRWW results (for example, from Db2
10 for z/OS benchmarks) must not be compared to IRWW runs that use Db2 13 for z/OS.
The processing limit count (PLCT) parameter is used to control the maximum number of
messages that are sent to the application program by the IMS control program for processing
before the application program is reloaded in the message processing program (MPP) region.
This value is set to 65535 for two-thirds of the transactions and to 0 for the remaining
transactions.
PLCT(0) means the maximum number of messages that are sent to the application is one,
and the application program is reloaded into the MPP region before receiving a subsequent
message. PLCT(65535) means that no limit is placed upon the number of messages that are
processed at single program load.
The following environment configuration is used for the non-data-sharing RTW workload:
One z15 logical partition (LPAR) with eight general-purpose processors (CPs) and two
IBM zSystems Integrated Information Processors (zIIPs)
Four CICS regions running CICS Transaction Server (CICS TS) 5.6
A non-data-sharing Db2 subsystem
z/OS 2.5
An IBM DS8870 direct access storage device (DASD) controller
Teleprocessing Network Simulator (TPNS) on another z15 LPAR with four general CPs to
drive the workload
For more information about the seven transaction types, a brief description of each, and the
percentage of the transaction mix, see “IBM Relational Warehouse Workload ” on page 404.
This OLTP workload accesses 33 tables that contain approximately 15,000 Db2 database
objects. The brokerage workload represents transactions of customers in the financial market
who are updating account information. The transactions are implemented in stored procedure
by using native SQL stored procedures. The workload uses the following resources:
1.3 TB of data
Thirteen buffer pools
Thirty-three table spaces
Sixty-eight indexes
Two versions of IBM brokerage transaction workload are used. Version 1 excludes the
brokerage and data maintenance transactions from the workload. Version 2 contains the full
set of transactions.
T00SIMX
SELECT DISTINCT VENDOR_NAME, SIMILY
FROM
(
SELECT
VENDOR_NAME,
AI_SIMILARITY (
VENDOR_NAME, 'VERIZON' USING MODEL COLUMN VENDOR_NAME)
AS SIMILY
FROM USRT031.VIRG1TB
)
WHERE SIMILY IS NOT NULL
AND TRIM(VENDOR_NAME) <> 'VERIZON'
ORDER BY SIMILY DESC
FETCH FIRST 10 ROWS ONLY
WITH UR;
T01SIMX
SELECT
VENDOR_NAME,
SIMILY,
SUM(AMOUNT) AS TOTAL_AMOUNT,
AVG(AMOUNT) AS AVG_AMOUNT,
MIN(AMOUNT) AS MIN_AMOUNT,
MAX(AMOUNT) AS MAX_AMOUNT,
COUNT(*) AS COUNT_TXNS
FROM
(
SELECT
VENDOR_NAME,
AMOUNT,
AI_SIMILARITY(
VENDOR_NAME, 'VERIZON' USING MODEL COLUMN VENDOR_NAME
) AS SIMILY
FROM USRT031.VIRG1TB
WHERE AI_SIMILARITY(
VENDOR_NAME, 'VERIZON' USING MODEL COLUMN VENDOR_NAME
) IS NOT NULL
) TMP
T02SIMX
SELECT
DISTINCT
EXP.AGY_AGENCY_KEY,
AGY.AGY_AGENCY_NAME,
AI_SIMILARITY(
EXP.AGY_AGENCY_KEY, 125 USING MODEL COLUMN EXP.AGY_AGENCY_KEY
) AS SIMILY
FROM USRT031.VIRG1TB EXP
INNER JOIN USRT031.VIRGAGY AGY
ON EXP.AGY_AGENCY_KEY = AGY.AGY_AGENCY_KEY
WHERE
AI_SIMILARITY(
EXP.AGY_AGENCY_KEY, 125 USING MODEL COLUMN EXP.AGY_AGENCY_KEY
) IS NOT NULL
AND EXP.AGY_AGENCY_KEY <> 125
ORDER BY 3 DESC
FETCH FIRST 10 ROWS ONLY
WITH UR;
T03SIMX
SELECT
YEAR(VOUCHER_DATE) AS YR,
MONTH(VOUCHER_DATE) AS MTH,
SIMILAR.AGY_AGENCY_KEY,
SIMILAR.AGY_AGENCY_NAME,
SUM(AMOUNT) AS TOTAL_AMOUNT,
RANK() OVER (PARTITION BY
YEAR(VOUCHER_DATE),
MONTH(VOUCHER_DATE)
ORDER BY SUM(AMOUNT) DESC
) AS RANKING
FROM
USRT031.VIRG1TB EX,
(
SELECT
DISTINCT
EXP.AGY_AGENCY_KEY,
AGY.AGY_AGENCY_NAME,
AI_SIMILARITY(
EXP.AGY_AGENCY_KEY, 125 USING MODEL COLUMN EXP.AGY_AGENCY_KEY
) AS SIMILY
FROM USRT031.VIRG1TB EXP
INNER JOIN USRT031.VIRGAGY AGY
ON EXP.AGY_AGENCY_KEY = AGY.AGY_AGENCY_KEY
WHERE
AI_SIMILARITY(
T04SIMX
SELECT * FROM
(
SELECT DISTINCT
'DISSIMILAR' AS SIMILARITY_TYPE,
EXP.AGY_AGENCY_KEY,
AGY.AGY_AGENCY_NAME,
AI_SIMILARITY(
EXP.AGY_AGENCY_KEY USING MODEL COLUMN EXP.AGY_AGENCY_KEY,
'COUNTY WASTE LLC' USING MODEL COLUMN EXP.VENDOR_NAME) AS SIMILY
FROM USRT031.VIRG1TB EXP,
USRT031.VIRGAGY AGY
WHERE
EXP.AGY_AGENCY_KEY = AGY.AGY_AGENCY_KEY AND
AI_SIMILARITY(
EXP.AGY_AGENCY_KEY USING MODEL COLUMN EXP.AGY_AGENCY_KEY,
'COUNTY WASTE LLC' USING MODEL COLUMN EXP.VENDOR_NAME)
IS NOT NULL
ORDER BY 4
FETCH FIRST 10 ROWS ONLY
) DISSIMILR
UNION ALL
SELECT * FROM
(
SELECT DISTINCT
'SIMILAR' AS SIMILARITY_TYPE,
EXP.AGY_AGENCY_KEY,
AGY.AGY_AGENCY_NAME,
AI_SIMILARITY(
EXP.AGY_AGENCY_KEY USING MODEL COLUMN EXP.AGY_AGENCY_KEY,
'COUNTY WASTE LLC' USING MODEL COLUMN EXP.VENDOR_NAME) AS SIMILY
FROM USRT031.VIRG1TB EXP,
USRT031.VIRGAGY AGY
WHERE
EXP.AGY_AGENCY_KEY = AGY.AGY_AGENCY_KEY AND
AI_SIMILARITY(
EXP.AGY_AGENCY_KEY USING MODEL COLUMN EXP.AGY_AGENCY_KEY,
'COUNTY WASTE LLC' USING MODEL COLUMN EXP.VENDOR_NAME)
IS NOT NULL
ORDER BY 4 DESC
FETCH FIRST 10 ROWS ONLY
T00SEM1
SELECT
EXP.VENDOR_NAME,
SUM(EXP.AMOUNT) AS NET_EXPENDITURE,
VLIST.CLUSTER_PROXIMITY
FROM
USRT031.VIRG1TB EXP,
(
SELECT DISTINCT VENDOR_NAME,
AI_SEMANTIC_CLUSTER(VENDOR_NAME,
'COUNTY OF FAIRFAX'
) AS CLUSTER_PROXIMITY
FROM
USRT031.VIRG1TB EXP
WHERE
AI_SEMANTIC_CLUSTER(VENDOR_NAME,
'COUNTY OF FAIRFAX'
) IS NOT NULL
AND VENDOR_NAME NOT IN (
'COUNTY OF FAIRFAX')
ORDER BY 2 DESC
FETCH FIRST 10 ROWS ONLY
) VLIST
WHERE EXP.VENDOR_NAME = VLIST.VENDOR_NAME
GROUP BY EXP.VENDOR_NAME, VLIST.CLUSTER_PROXIMITY
ORDER BY VLIST.CLUSTER_PROXIMITY DESC
WITH UR;
T00SEM2
SELECT
EXP.VENDOR_NAME,
SUM(EXP.AMOUNT) AS NET_EXPENDITURE,
VLIST.CLUSTER_PROXIMITY
FROM
USRT031.VIRG1TB EXP,
(
SELECT DISTINCT VENDOR_NAME,
AI_SEMANTIC_CLUSTER(VENDOR_NAME,
'COUNTY OF HENRICO',
'COUNTY OF FAIRFAX'
) AS CLUSTER_PROXIMITY
FROM
USRT031.VIRG1TB EXP
WHERE
AI_SEMANTIC_CLUSTER(VENDOR_NAME,
'COUNTY OF HENRICO',
T00SEM3
SELECT
EXP.VENDOR_NAME,
SUM(EXP.AMOUNT) AS NET_EXPENDITURE,
VLIST.CLUSTER_PROXIMITY
FROM
USRT031.VIRG1TB EXP,
(
SELECT DISTINCT VENDOR_NAME,
AI_SEMANTIC_CLUSTER(VENDOR_NAME,
'COUNTY OF LOUDOUN',
'COUNTY OF HENRICO',
'COUNTY OF FAIRFAX'
) AS CLUSTER_PROXIMITY
FROM
USRT031.VIRG1TB EXP
WHERE
AI_SEMANTIC_CLUSTER(VENDOR_NAME,
'COUNTY OF LOUDOUN',
'COUNTY OF HENRICO',
'COUNTY OF FAIRFAX'
) IS NOT NULL
AND VENDOR_NAME NOT IN (
'COUNTY OF LOUDOUN',
'COUNTY OF HENRICO',
'COUNTY OF FAIRFAX')
ORDER BY 2 DESC
FETCH FIRST 10 ROWS ONLY
) VLIST
WHERE EXP.VENDOR_NAME = VLIST.VENDOR_NAME
GROUP BY EXP.VENDOR_NAME, VLIST.CLUSTER_PROXIMITY
ORDER BY VLIST.CLUSTER_PROXIMITY DESC
WITH UR;
T00ANAX
SELECT
DISTINCT OBJ_OBJECT_NAME, VENDOR_NAME,
AI_ANALOGY(
'132' USING MODEL COLUMN EXP.OBJ_OBJECT_KEY,
'TREASURER OF VA MEDICAID ACCTS' USING MODEL COLUMN VENDOR_NAME,
'343' USING MODEL COLUMN EXP.OBJ_OBJECT_KEY,
VENDOR_NAME) AS ANALOGY_SCORE
FROM USRT031.VIRG1TB EXP, USRT031.VIRGOBJ OBJ
WHERE
EXP.OBJ_OBJECT_KEY = OBJ.OBJ_OBJECT_KEY
AND AI_ANALOGY(
'132' USING MODEL COLUMN EXP.OBJ_OBJECT_KEY,
'TREASURER OF VA MEDICAID ACCTS' USING MODEL COLUMN VENDOR_NAME,
'343' USING MODEL COLUMN EXP.OBJ_OBJECT_KEY,
VENDOR_NAME) IS NOT NULL
ORDER BY 3 DESC
FETCH FIRST 10 ROWS ONLY
WITH UR;
T01ANAX
SELECT DISTINCT VENDOR_NAME, SIMILARITY
FROM
(SELECT VENDOR_NAME, AI_ANALOGY(
'7006' USING MODEL COLUMN FNDDTL_FUND_DETAIL_KEY,
'COUNTY OF ARLINGTON' USING MODEL COLUMN VENDOR_NAME,
'514' USING MODEL COLUMN FNDDTL_FUND_DETAIL_KEY,
VENDOR_NAME) AS SIMILARITY
FROM USRT031.VIRG1TB
)
WHERE SIMILARITY > 0.0
ORDER BY SIMILARITY DESC
FETCH FIRST 10 ROWS ONLY
WITH UR;
T00SIMX
SELECT DISTINCT VENDOR_NAME, SIMILY
FROM
(
SELECT
VENDOR_NAME,
AI_SIMILARITY (
VENDOR_NAME, 'VERIZON' USING MODEL COLUMN VENDOR_NAME)
AS SIMILY
FROM USRT031.VIRG1TB
)
WHERE SIMILY IS NOT NULL
AND TRIM(VENDOR_NAME) <> 'VERIZON'
ORDER BY SIMILY DESC
FETCH FIRST 10 ROWS ONLY
WITH UR;
T00SEM3
SELECT
EXP.VENDOR_NAME,
SUM(EXP.AMOUNT) AS NET_EXPENDITURE,
VLIST.CLUSTER_PROXIMITY
FROM
USRT031.VIRG1TB EXP,
(
SELECT DISTINCT VENDOR_NAME,
AI_SEMANTIC_CLUSTER(VENDOR_NAME,
'COUNTY OF LOUDOUN',
'COUNTY OF HENRICO',
'COUNTY OF FAIRFAX'
) AS CLUSTER_PROXIMITY
FROM
USRT031.VIRG1TB EXP
WHERE
AI_SEMANTIC_CLUSTER(VENDOR_NAME,
'COUNTY OF LOUDOUN',
T01ANAX
SELECT DISTINCT VENDOR_NAME, SIMILARITY
FROM
(SELECT VENDOR_NAME, AI_ANALOGY(
'7006' USING MODEL COLUMN FNDDTL_FUND_DETAIL_KEY,
'COUNTY OF ARLINGTON' USING MODEL COLUMN VENDOR_NAME,
'514' USING MODEL COLUMN FNDDTL_FUND_DETAIL_KEY,
VENDOR_NAME) AS SIMILARITY
FROM USRT031.VIRG1TB
)
WHERE SIMILARITY > 0.0
ORDER BY SIMILARITY DESC
FETCH FIRST 10 ROWS ONLY
WITH UR;
Table B-1 shows the views that are used by each of the SQL DI function types.
Table B-1 Views that are used by each SQL DI function type
View name Cardinality of the Cardinality of the view’s Cardinality of the associated
view (rows) base table (rows) model table (rows)
F00SIMX
SELECT DISTINCT LOAN_SEQ_NUM, SIMILY,
CREDIT_SCORE, CURRENT_LOAN_DLQCY_STATUS,LOAN_PURPOSE
FROM
(
SELECT
LOAN_SEQ_NUM,
AI_SIMILARITY(
LOAN_SEQ_NUM, 'F19Q10000002' USING MODEL COLUMN LOAN_SEQ_NUM)
AS SIMILY,
CREDIT_SCORE,
CURRENT_LOAN_DLQCY_STATUS,
LOAN_PURPOSE
FROM FMACE1.VIEW1
)
WHERE SIMILY IS NOT NULL
AND LOAN_SEQ_NUM <> 'F19Q10000002'
ORDER BY SIMILY DESC
FETCH FIRST 10 ROWS ONLY
WITH UR;
F00SEMX
SELECT LOAN_SEQ_NUM, SIMILARITY
FROM
(SELECT LOAN_SEQ_NUM,
AI_SEMANTIC_CLUSTER
(LOAN_SEQ_NUM,'F19Q10000097','F19Q10000098','F19Q10000099')
AS SIMILARITY
FROM FMACE1.VIEW1
WHERE
LOAN_SEQ_NUM NOT IN ('F19Q10000097','F19Q10000098','F19Q10000099')
GROUP BY LOAN_SEQ_NUM
)
WHERE SIMILARITY > 0
ORDER BY SIMILARITY DESC, LOAN_SEQ_NUM ASC
FETCH FIRST 5 ROWS ONLY
WITH UR;
F00ANAX
SELECT DISTINCT LOAN_SEQ_NUM,PROPERTY_TYPE,SIMILARITY
FROM
(SELECT LOAN_SEQ_NUM,PROPERTY_TYPE, AI_ANALOGY(
'P' USING MODEL COLUMN OCCC_STATUS,
'SF' USING MODEL COLUMN PROPERTY_TYPE,
'I' USING MODEL COLUMN OCCC_STATUS,
PROPERTY_TYPE) AS SIMILARITY
FROM FMACE1.VIEW1
)
WHERE SIMILARITY > 0.0
ORDER BY SIMILARITY DESC
FETCH FIRST 10 ROWS ONLY
WITH UR;
One of those benchmarks is the TPC-H decision support benchmark for database systems. It
tests the ability for the system to handle large volumes of data by using complex queries.
Therefore, our own test workload uses the general framework of the TPC-H framework, with
some small modifications.
For more information about the TPC-H benchmark, including details about queries, tables,
and so on, see the TPC home page.
Customer FD workload
The Customer FD performance workload is a comprehensive test suite with over 1000
queries. These queries are based on customer queries from numerous Db2 proof-of-concept
(PoC) projects with extra sets of scripts for extract-transform-load (ETL)-like workloads. It
consists of a star schema with snowflake dimension tables, one large and one small fact
table, and a bitemporal table. The queries of the Customer FD performance workload are
organized in groups, as shown in Table C-1.
G1 Plain table scans with different number of selected columns and no join.
G3 Scans with one or more joins, and tests the performance of joins.
G7 IN list.
G8 Compares filtering methods, including IN list, Snowflake, CTE, and Temp table.
G10 Foreign key, join filtering, and zonemap-aware joins. Specialized on MAKO.
G21 Correlation.
The publications that are listed in this section are considered suitable for a more detailed
description of the topics that are covered in this book.
IBM Redbooks
The following IBM Redbooks publications provide additional information about the topics in
this document. Some publications that are referenced in this list might be available in softcopy
only.
Getting Started with IBM zHyperLink for z/OS, REDP-5493
IBM Db2 13 for z/OS and More, SG24-8527
IBM Db2 12 for z/OS Performance Topics, SG24-8404
IBM z16 (3931) Technical Guide, SG24-8951
Introducing IBM Z System Recovery Boost, REDP-5563
You can search for, view, download, or order these documents and other Redbooks,
Redpapers, web docs, drafts, and additional materials, at the following website:
ibm.com/redbooks
Online resources
These websites are also relevant as further information sources:
Db2 13 for z/OS documentation
https://www.ibm.com/docs/en/db2-for-zos/13
Db2 13 for z/OS - What's New?
https://www.ibm.com/docs/en/SSEPEK_13.0.0/pdf/db2z_13_wnewbook.pdf
Db2 Real Storage Management and Recommendations
https://community.ibm.com/community/user/hybriddatamanagement/blogs/paul-mcwill
iams1/2022/04/12/db2-real-storage-management?CommunityKey=621c2a2a-01f9-4b57-99
2f-36ed7432e3bb
Evaluation of zHyperLink Write for Db2 Active Logging with SAP Banking on IBM Z
https://www.ibm.com/support/pages/evaluation-zhyperlink-write-db2-active-loggin
g-sap-banking-z-0
IBM Data Management Community
https://community.ibm.com/community/user/hybriddatamanagement/home
IBM Db2 AI for z/OS 1.5.0 documentation
https://www.ibm.com/docs/en/db2-ai-for-zos/1.5.0
SG24-8536-00
ISBN 0738461008
Printed in U.S.A.
®
ibm.com/redbooks