Back To The Roots Oracle Database IO Management
Back To The Roots Oracle Database IO Management
About me
Jérôme Witt
Senior Consultant
+41 79 961 27 73
jerome.witt[at]dbi-services.com
20.11.2018
Who we are
Based In Switzerland
100% self-financed Swiss company
Over CHF 10.5 mio. Turnover
20.11.2018
Agenda
1.Quiz
2.I/O request types
3.Oracle I/O management
4.Troubleshooting
5.Core Message
20.11.2018
Safe harbor statement
We won’t have time to dig into all Oracle I/0 management topics
User and System I/O wait classes details
Tracing events (tkprof), Cost Based Optimizer (system statistics)
Hidden parameters, DB_FILE_MULTIBLOCK_READ_COUNT
Lost Write Protection
PGA tuning
The presentation focus on Linux and all demos based on a simple reverse engineering from
system calls using strace and perl
Output of strace program sent to a FIFO pipe
FIFO pipe read from a perl program
2
Oracle architecture
Background processes
Performance tuning advanced skills
3
20.11.2018
Quiz
Questions
Feel free!
Get out of this room!
20.11.2018
Quiz
Questions
Which background process wakes up regularly and writes cold, dirty buffers to disk?
Database Writer (DBWR)
Under which circumstances?
When a server process cannot find a clean reusable buffer within the database buffer cache
LRU touch count algorithm, DBWR WriteQueue
Advance database checkpoint position (piggy backing)
20.11.2018
Quiz
Compulsive tuning
20.11.2018
I/O request types
1
I/O software layers
Buffered I/O
Direct I/O
2
Blocking vs Non-blocking I/O
Asynch. I/O 3
20.11.2018
I/O request types
I/O software layers (Linux/Unix)
20.11.2018
I/O request types
Buffered I/O concept (Linux)
Concept
PostgreSQL Oracle MySQL Linux maintains (O.S) blocks in a
User space “cache”
Standard (GNU) C libraries (glibc) Cache shared across all physical
block devices
System calls All kind of blocks new & unused
(open, close, read, write, … )
Unused blocks cleaned regularly
Virtual File System from cache
xfs acfs, … ext4
20.11.2018
I/O request types
Non-buffered I/O (directIO)
Direct I/O is a feature of the file system whereby file reads and writes go directly from the
applications to the storage device
An application invokes direct I/O by opening a file with the O_DIRECT flag.
O_DIRECT (Since Linux 2.4.10)
Try to minimize cache effects of the I/O to and from this file.
In general this will degrade performance, but it is use-
ful in special situations, such as when applications do their own
caching. File I/O is done directly to/from user space
buffers. The I/O is synchronous, that is, at the completion of a
read(2) or write(2), data is guaranteed to have been
transferred. See NOTES below for further discussion.
Source: https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/5/html/global_file_system/s1-manage-direct-io
Linux manpage “man open”
20.11.2018
I/O request types
Blocking vs Non-blocking I/O
Application kernel
Application blocked
system call
response
data movement
20.11.2018
I/O request types
Blocking vs Non-blocking I/O
Application kernel
system call
system call
EAGAIN / WOULDBLOCK
response
system call
data movement
20.11.2018
I/O request types
Asynchronous I/O
system call
select()
response
Select – data available
system call
read()
data movement
20.11.2018
I/O request types
Asynchronous I/O
system call
response
data movement
with signal or callback
20.11.2018
Oracle I/O management
1
Initialization Parameters
Filesystem vs Oracle ASM vs Oracle dNFS vs ODM
Quiz: ACID paradigm & database processes
2
System calls
3
20.11.2018
Oracle I/O management
Initialization Parameters
References
Init.ora Parameter "FILESYSTEMIO_OPTIONS" Reference Note (Doc ID 120697.1)
Things To Consider For setting filesystemio_options And disk_asynch_io (Doc ID 1987437.1)
File System's Buffer Cache versus Direct I/O (Doc ID 462072.1)
20.11.2018
Oracle I/O management
Filesystem vs Oracle ASM vs Oracle dNFS vs ODM
20.11.2018
Oracle I/O management
Filesystem vs Oracle ASM vs Oracle dNFS vs ODM
20.11.2018
Oracle I/O management
FILESYSTEMIO_OPTIONS
Controls whether asynchronous and/or direct I/O is attempted for Oracle files available
through a filesystem
Parameter bypassed for RAW devices, Oracle ASM, VxFS with ODM, Oracle dNFS
DB_WRITER_PROCESSES automatically increased implicitly based on CPU_COUNT
20.11.2018
Oracle I/O management
ACID paradigm & database processes: Quiz
20.11.2018
Oracle I/O management
System calls – big picture
“DB
ASYNC
“Normal
“Directwrite”
I/O
write”
write”
O.S Process Dataaio_write()
files and Redo Unix Kernel
logs ASYNC I/O
flag ACK threads
O_DSYNC
ACK
flag
O_DIRECT flag
Filesystem Cache O_DIRECT
ACK ACK
ACK
Datafiles Redo
Logs
20.11.2018
Oracle I/O management
System calls - filesystems
Oracle dNFS implies the same properties as ODM over an Oracle optimized Remote
Procedure Call network protocol
> NFSv3, NFSv4, NFSv4.1 are protocols supported starting with Oracle 12cR1
connect(32, {…,sin_port=htons(2049),sin_addr=inet_addr(”192.168.101.111")},
sendmsg(32, {msg_name(0)=NULL, msg_iov(1)=[{“1"..., 152}],
poll([{fd=32, events=POLLIN}], 1, 500) = 1 ([{fd=32, revents=POLLIN}])
recvmsg(32, {msg_name(0)=NULL, msg_iov(1)=[{
20.11.2018
Troubleshooting
1
Case studies
20.11.2018
Troubleshooting
Oracle dNFS (directIO + Asynch. I/O)
Customer’s infrastructure
NetApp full flash storage 6x Veritas active/active cluster (single instance
Dedicated storage LAN (FCoE) databases)
RHEL6 x86_64 200 Oracle database instances
NIC all part of bonds Oracle dNFS
> mode active/passive
# top
top - 15:52:18 up 58 days, 19:01, 11 users, load average: 3.66, 3.58, 3.60
Tasks: 2428 total, 3 running, 2425 sleeping, 0 stopped, 0 zombie
%Cpu(s): 1.6 us, 4.2 sy, 0.0 ni, 91.2 id, 0.0 wa, 0.0 hi, 2.9 si, 0.0 st
KiB Mem : 39582355+total, 15929764+free, 21915532+used, 17370568 buff/cache
KiB Swap: 4194300 total, 4194300 free, 0 used. 17346779+avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
83 root 20 0 0 0 0 R 70.8 0.0 87:03.47 [ksoftirqd/2]
3254 oracle 20 0 28.364g 36524 29824 S 7.2 0.0 0:59.17 oracleDB12 (LOCAL=NO)
20381 oracle 20 0 20.354g 36744 27888 S 3.6 0.0 2:02.19 oracleDB12 (LOCAL=NO)
....
20.11.2018
Troubleshooting
Oracle dNFS (directIO + Asynch. I/O)
Root cause: Network bandwidth saturation! All servers had the same NIC interface active
within the bond used for FCoE (aka. All traffic went through the same FABRIC port)
20.11.2018
Troubleshooting
I/O Scheduler
Customer’s infrastructure
IBM XIV storage system
FC network
SLES 11 x86_64
EXT3 filesystem
> Huge filesystem buffer cache (> 4 TB)
FILESYSTEMIO_OPTIONS = none
ALL databases protected through Oracle Data Guard physical standby
20.11.2018
Troubleshooting
I/O Scheduler
# Wait event : log file parallel write
20.11.2018
Troubleshooting
FILESYSTEMIO_OPTIONS = NONE to Oracle ODM
New CPUs
2800Mhz /
4 cores
2100Mhz /
16 cores
20.11.2018
Core Message
20.11.2018
Core Message