AIX Tuning For Oracle DB
AIX Tuning For Oracle DB
5.3
Agenda
IBM Power Systems Technical University Dublin 2012
CPU
Power 7
Memory
AIX VMM tuning
Active Memory Expansion
IO
Storage consideration
AIX LVM Striping
Disk/Fiber Channel driver optimization
Virtual Disk/Fiber channel driver optimization
AIX mount option
Asynchronous IO
NUMA Optimization
Other Tips
Copyright IBM Corporation 2011
Agenda
IBM Power Systems Technical University Dublin 2012
CPU
Power 7
Memory
AIX VMM tuning
Active Memory Expansion
IO
Storage consideration
AIX LVM Striping
Disk/Fiber Channel driver optimization
Virtual Disk/Fiber channel driver optimization
AIX mount option
Asynchronous IO
NUMA Optimization
Other Tips
Copyright IBM Corporation 2011
Power 7 (Socket/Chip/Core/Threads)
IBM Power Systems Technical University Dublin 2012
Power Hardware
SMT
SMT
SMT
SMT
Software
1 Power7
Chip
chip
32 sockets = 32 chips = 256 cores = 1024 SMT threads = 1024 AIX logical CPU
Copyright IBM Corporation 2011
Use SMT4
Give a cpu boost performance to handle more concurrent threads in parallel
Disabling HW prefetching.
Usually improve performance on Database Workload on big SMP Power system (> P750)
# dscrctl n b s 1 (this will dynamically disable HW memory prefetch and keep this
configuration across reboot)
Agenda
IBM Power Systems Technical University Dublin 2012
CPU
Power 7
Memory
AIX VMM tuning
Active Memory Expansion
IO
Storage consideration
AIX LVM Striping
Disk/Fiber Channel driver optimization
Virtual Disk/Fiber channel driver optimization
AIX mount option
Asynchronous IO
NUMA Optimization
Other Tips
Copyright IBM Corporation 2011
ACTION
1 - System Startup
2 - DB Activity
PAGING SPACE
Performance
degradation
!
FS CACHE
Objective :
Tune the VMM to protect computational
pages (Programs, SGA, PGA) from
being paged out and force the LRUD to
steal pages from FS-CACHE only.
SGA + PGA
Programs
Kernel
SGA + PGA
9
Initial configuration
Final configuration
Memory_max_size
= 18 GB
Real memory
= 12 GB
Scenario :
memory_target can be increased
dynamically to 11GB but real memory is
only 12GB, so it needs to be increased as
well.
AIX
+ free
Memory_max_size
= 18 GB
AIX
+ free
AIX
+ free
Real memory
= 15 GB
Real memory
= 12 GB
Memory_target
= 11 GB
Memory_target
= 8 GB
SGA
+
PGA
memory_target=11GB;
Memory_target
= 8 GB
SGA
+
PGA
Memory allocated to the system has been increased dynamically, using AIX DLPAR
Memory allocated to Oracle (SGA and PGA) has been increased on the fly
Copyright IBM Corporation 2011
11
Agenda
IBM Power Systems Technical University Dublin 2012
CPU
Power 7
Memory
AIX VMM tuning
Active Memory Expansion
IO
Storage consideration
AIX LVM Striping
Disk/Fiber Channel driver optimization
Virtual Disk/Fiber channel driver optimization
AIX mount option
Asynchronous IO
NUMA Optimization
Other Tips
Copyright IBM Corporation 2011
12
Sample Config
Memory Config
50%
50%
66%
33%
13
TEST Nb CPU
Physical Memory
AME Factor
BATCH Duration
CPU Consumption
24
120 GB
none
124 min
24
60 GB
2.0
127 min
24
40 GB
3.0
134 min
The impact of AME on batch duration is really low (<10%) with few cpu
overhead (7%), even with 3 times less memory.
Note: This is an illustrative scenario based on using a sample workload. This data
represents measured results in a controlled lab environment. Your results may
vary.
Copyright IBM Corporation 2011
14
Agenda
IBM Power Systems Technical University Dublin 2012
CPU
Power 7
Memory
AIX VMM tuning
Active Memory Expansion
IO
Storage consideration
AIX LVM Striping
Disk/Fiber Channel driver optimization
Virtual Disk/Fiber channel driver optimization
AIX mount option
Asynchronous IO
NUMA Optimization
Other Tips
Copyright IBM Corporation 2011
15
IO : Database Layout
IBM Power Systems Technical University Dublin 2012
rec O
om rac
me le
nd
a ti
o
Manual file-by-file data placement is time consuming, resource intensive and iterative
Additional advices to implement SAME :
apply the SAME strategy to data, indexes
if possible separate redologs (+archivelogs)
16
Storage
RAID-5 vs. RAID-10 Performance Comparison
I/O Profile
RAID-5
RAID-10
Sequential Read
Excellent
Excellent
Sequential Write
Excellent
Good
Random Read
Excellent
Excellent
LUN 3
Random Write
Fair
Excellent
LUN 4
HW Striping
LUN 1
LUN 2
17
Agenda
IBM Power Systems Technical University Dublin 2012
CPU
Power 7
Memory
AIX VMM tuning
Active Memory Expansion
IO
Storage consideration
AIX LVM Striping
Disk/Fiber Channel driver optimization
Virtual Disk/Fiber channel driver optimization
AIX mount option
Asynchronous IO
NUMA Optimization
Other Tips
Copyright IBM Corporation 2011
18
AIX
Storage
LVM Striping
Volume Group
hdisk1
LUN 1
hdisk2
LUN 2
hdisk3
LUN 3
hdisk4
LUN 4
HW Striping
1.
Luns are striped across physical disks (stripe-size of the physical RAID : ~ 64k, 128k, 256k)
2.
3.
Create AIX Volume Group(s) (VG) with LUNs from multiple arrays
4.
Logical Volume striped across hdisks (stripe-size : 8M, 16M, 32M, 64M)
=> each read/write access to the LV are well balanced accross LUNs and use the maximum number
of physical disks for best performance.
19
Agenda
IBM Power Systems Technical University Dublin 2012
CPU
Power 7
Memory
AIX VMM tuning
Active Memory Expansion
IO
Storage consideration
AIX LVM Striping
Disk/Fiber Channel driver optimization
Virtual Disk/Fiber channel driver optimization
AIX mount option
Asynchronous IO
NUMA Optimization
Other Tips
Copyright IBM Corporation 2011
21
In general, a generic device definition provides far from optimal performance since it
doesnt properly customize the hdisk device :
exemple : hdisk are created with a queue_depth=1
1.
Contact your vendor or go to their web site to download the correct ODM definition for your
storage subsystem. It will setup properly the hdisk accordingly to your hardware for optimal
performance.
2.
If AIX is connected to the storage subsystem with several Fiber Channel Cards for performance,
dont forget to install a multipath device driver or path control module.
- sdd or sddpcm for IBM DS6000/DS8000
- powerpath for EMC disk subsystem
- hdlm for Hitachi etc....
Copyright IBM Corporation 2011
22
AIX
LVM Striping
hdisk1
Pbuf
hdisk2
Pbuf
hdisk3
Pbuf
hdisk4
Pbuf
1.
2.
3.
23
AIX
LVM Striping
Queue_depth
hdisk1
Pbuf
queue
hdisk2
Pbuf
queue
hdisk3
Pbuf
queue
hdisk4
Pbuf
queue
App. IO
iostat -D output
queue
queue : avgtime
read/write : avgserv
0.2 ms
2.2 ms
1.
Each AIX hdisk has a Queue called queue depth. This parameter set the number of // queries
that can be send to Physical disk.
2.
1.
If you have :
avgserv < 2-3ms => this mean that Storage behave well (can handle more load)
And avgtime > 1ms => this mean that disk queue are full, IO wait to be queued
=> INCREASE hdisk queue depth (# chdev l hdiskXX a queue_depth=YYY)
Copyright IBM Corporation 2011
24
AIX
hdisk1
Pbuf
queue
hdisk2
Pbuf
queue
hdisk3
Pbuf
queue
hdisk4
Pbuf
queue
MPIO sddpcm
LVM Striping
Queue_depth
Nb_cmd_elems
fcs0
queue
fcs1
queue
External Storage
1.
Each HBA FC adapter has a queue nb_cmd_elems. This queue has the same role for the HBS as the qdepth for
the disk.
2.
3.
These changes use more memory and must be made with caution, check first with : # fcstat fcsX
25
Agenda
IBM Power Systems Technical University Dublin 2012
CPU
Power 7
Memory
AIX VMM tuning
Active Memory Expansion
IO
Storage consideration
AIX LVM Striping
Disk/Fiber Channel driver optimization
Virtual Disk/Fiber channel driver optimization
AIX mount option
Asynchronous IO
NUMA Optimization
Other Tips
Copyright IBM Corporation 2011
26
Virtual SCSI
I/O helps reduce hardware costs by sharing disk drives
IBM Power Systems Technical University DublinVirtual
2012
POWER5 or Later
AIX
generic
scsi disk
generic
scsi disk
FC Adapter
Virtual SCSI
VIOS
EMC
DS8000
FC Adapters
SAN
DS8000
EMC
Copyright IBM Corporation 2011
27
hdisk
qdepth
AIX MPIO
AIX
vscsi0
# lsdev Cc disk
hdisk0 Available Virtual SCSI Disk Drive
pHyp
vhost0
hdisk qdepth
MPIO sddpcm
VIOS
queue fcs0
Install Storage
Subsystem
Driver on the
VIO
N-Port ID Virtualization
POWER6 or Later
Disks
AIX
EMC
DS8000
Virtual FC
Shared
FC Adapter
VIOS
FC Adapters
SAN
DS8000
Copyright IBM Corporation 2011
EMC
29
N-Port ID Virtualization
POWER6 or Later
generic
scsi disk
generic
scsi disk
EMC
DS8000
Virtual FC
Virtual SCSI
VIOS
FC Adapters
AIX
Disks
AIX
Shared
FC Adapter
FC Adapter
Virtualized
disks
POWER5 or POWER6
VIOS
FC Adapters
SAN
SAN
DS8000
DS8000
EMC
Copyright IBM Corporation 2011
EMC
30
hdisk
qdepth
MPIO sddpcm
AIX
pHyp
VIOS
wwpn
queue fcs0
# lsdev Cc disk
hdisk0 Available MPIO FC 2145
Storage driver must be installed
on the lpar
Default qdepth is set by the
drivers
Monitor svctime / wait time
with nmon or iostat to tune the
queue depth
vfchost0
queue fcs0
Performance:
Monitor fc activity with nmon
(interactive: option ^ only)
(reccording : option -^)
Adapt num_cmd_elems
Check fcstat fcsX
Should be = Sum of vfcs
num_cmd_elems connected to the
backend device
Adapt num_cmd_elems
Check fcstat fcsX
31
Agenda
IBM Power Systems Technical University Dublin 2012
CPU
Power 7
Memory
AIX VMM tuning
Active Memory Expansion
IO
Storage consideration
AIX LVM Striping
Disk/Fiber Channel driver optimization
Virtual Disk/Fiber channel driver optimization
AIX mount option
Asynchronous IO
NUMA Optimization
Other Tips
Copyright IBM Corporation 2011
32
Writes with a
normal file
system
FS Cache
Writes with
Direct IO (DIO)
Writes with
Concurrent IO
(CIO)
Database Buffer
OS File System
33
34
Benefits :
1.
Avoid double caching : Some data are already cache in the Application layer (SGA)
2.
Give a faster access to the backend disk and reduce the CPU utilization
3.
Disable the inode-lock to allow several threads to read and write the same file (CIO only)
Restrictions :
1.
Because data transfer is bypassing AIX buffer cache, jfs2 prefetching and write-behind cant be
used. These functionnalities can be handled by Oracle.
(Oracle parameter) db_file_multiblock_read_count = 8, 16, 32, ... , 128 according to
workload
2.
When using DIO/CIO, IO requests made by Oracle must by aligned with the jfs2 blocksize to
avoid a demoted IO (Return to normal IO after a Direct IO Failure)
=> When you create a JFS2, use the mkfs o agblksize=XXX Option to adapt the FS
blocksize with the application needs.
Rule : IO request = n x agblksize
Exemples: if DB blocksize > 4k ; then jfs2 agblksize=4096
Redolog are always written in 512B block; So jfs2 agblksize must be 512
35
Application
512
CIO
CIO write failed
because IO is not
aligned with FS blksize
4096
36
Application
512
4096
37
Application
512
38
Application
39
IO : Direct IO demoted
IBM Power Systems Technical University Dublin 2012
Extract from Oracle AWR (test made in Montpellier with Oracle 10g)
Waits on redolog (with demoted IO, FS blk=4k)
Waits%
Time -outs Total Wait Time (s)
2,229,324 0.00
62,628
Waits /txn
1.53
Waits /txn
1.00
Waits%
494,905
40
Oracle binaries
Oracle Datafile
Oracle Redolog
mount o rw
mount o noatime
mount o rw
mount o noatime,cio
mount o rw
mount o noatime,cio
Oracle
Archivelog
Oracle Control
files
mount o rw
mount o noatime,rbrw
mount o rw
mount o noatime
41
Agenda
IBM Power Systems Technical University Dublin 2012
CPU
Power 7
Memory
AIX VMM tuning
Active Memory Expansion
IO
Storage consideration
AIX LVM Striping
Disk/Fiber Channel driver optimization
Virtual Disk/Fiber channel driver optimization
AIX mount option
Asynchronous IO
NUMA Optimization
Other Tips
Copyright IBM Corporation 2011
42
IO : Asynchronous IO (AIO)
IBM Power Systems Technical University Dublin 2012
Allows multiple requests to be sent without to have to wait until the disk subsystem has
completed the physical IO.
Utilization of asynchronous IO is strongly advised whatever the type of file-system and mount
option implemented (JFS, JFS2, CIO, DIO).
Application
2
aio
Q
aioservers
Disk
Posix vs Legacy
Since AIX5L V5.3, two types of AIO are now available : Legacy and Posix. For the moment, the
Oracle code is using the Legacy AIO servers.
Copyright IBM Corporation 2011
43
Rule of thumb :
maxservers should be = (10 * <# of disk accessed concurrently>) / # cpu
maxreqs (= a multiple of 4096) should be > 4 * #disks * queue_depth
Monitoring :
In Oracles alert.log file, if maxservers set to low : Warning: lio_listio returned EAGAIN
Performance degradation may be seen
#aio servers used can be monitored via ps k | grep aio | wc l , iostat A or nmon (option A)
44
With fsfastpath, IO are queued directly from the application into the LVM layer without any
aioservers kproc operation.
Better performance compare to non-fastpath
No need to tune the min and max aioservers
No aioservers proc. => ps k | grep aio | wc l is not relevent, use iostat A instead
AIX Kernel
Application
Disk
ASM :
enable asynchronous IO fastpath. :
AIX 5L : chdev -a fastpath=enable -l aio0 (default since AIX 5.3)
AIX 6.1 : ioo p o aio_fastpath=1 (default setting)
AIX 7.1 : ioo p o aio_fastpath=1 (default setting + restricted tunable)
45
46
Agenda
IBM Power Systems Technical University Dublin 2012
CPU
Power 7
Memory
AIX VMM tuning
Active Memory Expansion
IO
Storage consideration
AIX LVM Striping
Disk/Fiber Channel driver optimization
Virtual Disk/Fiber channel driver optimization
AIX mount option
Asynchronous IO
NUMA Optimization
Other Tips
Copyright IBM Corporation 2011
47
NUMA architecture
IBM Power Systems Technical University Dublin 2012
local
ne
ar
far
48
49
The test is done on a POWER7 machine with the following CPU and memory
distribution (dedicated LPAR). It has 4 domains with 8 CPU and >27GB each. If the
lssrad output shows unevenly distributed domains, fix the problem before proceeding.
SA/2
SA/1
SA/3
CPU
27932.94
31285.00
0-31
32-63
29701.00
29701.00
64-95
96-127
We will set up 4 rsets, namely SA/0, SA/1, SA/2, and SA/3, one for each domain.
#
#
#
#
MEM
SA/0
mkrset
mkrset
mkrset
mkrset
-c
-c
-c
-c
0-31 -m 0 SA/0
32-63 -m 0 SA/1
64-95 -m 0 SA/2
96-127 -m 0 SA/3
Before starting the DB, lets set vmo options to cause process private memory to be local.
# vmo -o memplace_data=1 -o memplace_stack=1
Copyright IBM Corporation 2011
50
iw
The following messages are found in the alert log. It finds the 4 rsets and treats them as NUMA domains.
LICENSE_MAX_USERS = 0
SYS auditing is disabled
NUMA system found and support enabled (4 domains - 32,32,32,32)
Starting up Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
The shared memory segments. There are total of 7, one of which owned by ASM. The SA instance has
6 shared memory segments instead of 1.
dba oracle
dba oracle
51
Server
process
1
PGA
Server
process
2
PGA
Background
process
PGA
SGA shm
Shared pool
Streams pool
Large pool
Java pool
Database
buffer cache
Redo log
buffer
53
Server
process
1
PGA
Background
process
Database
buffer cache
+
Numa pool
Server
process
2
PGA
Background
process
Background
process
PGA
Database
buffer cache
+
Numa pool
PGA
Database
buffer cache
+
Numa pool
PGA
Background
process
PGA
Database
buffer cache
+
Numa pool
Redo log
buffer
SRAD0
SRAD1
SRAD2
SRAD3
54
55
Users
Group 1
Users
Group 2
Users
Group 3
Users
Group 4
Listener
1
Listener
2
Listener
3
Listener
4
Database
buffer cache
+
Numa pool
Database
buffer cache
+
Numa pool
Database
buffer cache
+
Numa pool
Database
buffer cache
+
numa pool
Redo log
buffer
SRAD0
SRAD1
SRAD2
SRAD3
56
Four Oracle users each having its own schema and tables are defined.
The 4 schemas are identical except the name.
Each user connection performs some query using random numbers as
keys and repeats the operation until the end of the test.
The DB cache is big enough to hold the entirety of all the 4 schemas.
therefore, it is an in-memory test.
All test cases are the same, except domain-attachment control. Each
test runs a total of 256 connections, 64 of each oracle user.
58
=
b
*R
4
P
m
d
rlu
h
fa
tis
e
n
o
c
6
1
Relative Performance
Case 0
Case 1
Case 2
NUMA config
No
Yes
Yes
Connection affinity
No
RoundRobin*
Partitioned**
Relative performance
100%
112%
144%
the relative performance shown applies only to this individual test, and can vary widely
with different workloads.
Copyright IBM Corporation 2011
59
Agenda
IBM Power Systems Technical University Dublin 2012
CPU
Power 7
Memory
AIX VMM tuning
Active Memory Expansion
IO
Storage consideration
AIX LVM Striping
Disk/Fiber Channel driver optimization
Virtual Disk/Fiber channel driver optimization
AIX mount option
Asynchronous IO
NUMA Optimization
Other Tips
Copyright IBM Corporation 2011
60
61
Oracle Configuration
Version 11gR2
parallel_degree_policy = auto in spfile
optimizer_feature_enable at the exact version number of Oracle Engine
Calibrate IO through DBMS_RESOURCE_MANAGER.CALIBRATE_IO when
there is no activity on the database.
Update Statistics
62
rset1
CPU
RAM
CPU
RAM
IMPX
rset1
63
Various Findings
IBM Power Systems Technical University Dublin 2012
Slow access to time operations (such as sysdate) in Oracle when using Olson TZ on AIX 6.1
Workaround is to set TZ using POSIX values
Example
Olson: TZ=Europe/Berlin
POSIX: TZ=MET-1MST-2,M3.5.0/02:00,M10.5.0/03:00
Database performance progressively degrades over time until the instance is restarted.
Issue is exposed by a change in Rdbms 11.2.0.3
Triggered by large number of connections + Terabyte segments
Fixed in AIX 7.1 TL1 SP5
Workaround for ealier versions : disable Terabyte segment
64
Session Evaluations
IBM Power Systems Technical University Dublin 2012
PE129
Win prizes by
submitting
evaluations online.
The more evalutions
submitted, the
greater chance of
winning
65
Our customer benchmark center is the place to validate the proposed IBM
solution in a simulated production environment or to focus on specific IBM
Power / AIX Technologies
Standard benchmarks
Dedicated Infrastructure
Dedicated technical support
Light Benchmarks
Mutualized infrastructure
Second level support
IBM Montpellier
Products and Solutions Support Center
Request a benchmark :
http://d27db001.rchland.ibm.com/b_dir/bcdcweb.nsf/request?OpenForm#new
Questions ?
[email protected]
[email protected]
67
Thank You
IBM Power Systems Technical University Dublin 2012
Cm n
Vietnam
Dankie
Thai
Afrikaans
Gracias
Russian
Spanish
Siyabonga
Zulu
Danke
Traditional Chinese
German
Obrigado
Arabic
Brazilian Portuguese
Merci
Grazie
Italian
French
Simplified Chinese
Tamil
Japanese
Dziekuje
Polish
Korean
Tak
Tack
Danish / Norwegian
Swedish
68