0% found this document useful (0 votes)
157 views49 pages

Tuning RHEL For Databases

This document provides tuning tips for databases on Red Hat systems. It discusses tuning aspects like I/O, memory, CPU and network. For I/O tuning, it recommends choosing appropriate storage hardware, configuring I/O elevators like deadline or CFQ, using direct I/O, and placing hot database components like logs and temporary files on low latency devices. For memory tuning, it suggests using NUMA architecture, huge pages and avoiding unnecessary cache flushing. For CPU tuning, it discusses power management settings and their impact on database performance. Graphs are provided to show performance improvements from various tuning techniques.

Uploaded by

matthewmarra
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
157 views49 pages

Tuning RHEL For Databases

This document provides tuning tips for databases on Red Hat systems. It discusses tuning aspects like I/O, memory, CPU and network. For I/O tuning, it recommends choosing appropriate storage hardware, configuring I/O elevators like deadline or CFQ, using direct I/O, and placing hot database components like logs and temporary files on low latency devices. For memory tuning, it suggests using NUMA architecture, huge pages and avoiding unnecessary cache flushing. For CPU tuning, it discusses power management settings and their impact on database performance. Graphs are provided to show performance improvements from various tuning techniques.

Uploaded by

matthewmarra
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 49

Tuning your Red Hat System for Databases

Sanjay Rao Principal Software Engineer, Red Hat May 06, 2011

Objectives of this session

Share tuning tips

Bare metal

Aspects of tuning Tuning parameters Results of the tuning

Virtual Machines (RHEL)

Tools

Reading Graphs

Green arrow shows direction of best results

What To Know About Tuning

Proactive or Reactive Understand the Trade-offs No silver bullet You get what you pay for

What To Tune

I/O Memory CPU Network This session will cover I/O and Memory extensively

I/O Tuning - Hardware

Know Your Storage


SAS or SATA? Fibre Channel, Ethernet or SSD? Bandwidth limits


Device-mapper multipath Provides multipathing capabilities and LUN persistence Low level I/O tools dd, iozone, dt, etc. I/O representative of the database implementation

Multiple HBAs

How To

I/O Tuning Understanding I/O Elevators

Deadline

Two queues per device, one for read and one for writes I/Os dispatched based on time spent in queue

CFQ

Per process queue Each process queue gets fixed time slice (based on process priority)

Noop

FIFO Simple I/O Merging Lowest CPU Cost

I/O Tuning Configuring I/O Elevators

Boot-time

Grub command line elevator=deadline/cfq/noop echo deadline > /sys/class/block/sda/queue/scheduler

Dynamically, per device

ktune service (RHEL 5 ) tuned (RHEL6 utility)


tuned-adm profile throughput-performance tuned-adm profile enterprise-storage

CFQ vs Deadline 1 thread per device (4 devices)


Comparison CFQ vs Deadline
1 thread per multipath device (4 devices)
450M 4

400M

350M

300M 1 250M 0 200M -1 150M -2 CFQ MP dev Deadline MP dev % Diff (CFQ vs Deadline)

100M

50M

-3

0M

-4

8K-SW

16K-SW

32K-SW

32K-RW

16K-RW

64K-SW

8K-RW

64K-RW

16K-RR

8K-SR

8K-RR

16K-SR

32K-SR

32K-RR

64K-SR

64K-RR

CFQ vs Deadline 4 threads per device (4 devices)


Comparison CFQ vs Deadline
4 threads per multipath device (4 devices)
600M 250

500M

200

400M

150

300M

100

CFQ Deadline % diff (CFQ vs Deadline)

200M

50

100M

0M

-50

8K-SW

8K-RW

16K-SW

32K-SW

32K-RW

64K-SW

64K-RW

16K-RW

16K-RR

32K-RR

64K-RR

8K-SR

8K-RR

16K-SR

32K-SR

64K-SR

I/O Tuning Elevators OLTP - Sybase


Sybase IO scheduler testing - RHEL 5.5
OLTP transactional throughput on a Quad Core 4 Socket 2.5Ghz 96G Physical
180K

165K
160K

164K

162K

140K

120K

100K

80K

60K

40K

20K

DEADLINE

CFQ

NOOP

Impact of I/O Elevator - OLTP Workload


120K

100K

80K
Transactions / min

CFQ Deadline Noop

60K

40K

20K

K 10U 20U

Users

40U

60U

DSS Workload
Comparison CFQ vs Deadline
Oracle DSS Workload (with different thread count)
11:31 70

10:05

58.4 54.01

60

08:38

47.69

50

07:12

Time Elapsed

40 05:46 30 04:19 20 02:53

CFQ Deadline % diff

01:26

10

00:00

16
Parallel degree

32

I/O Tuning - File Systems

Direct I/O

Avoid double caching Predictable performance Reduce CPU overhead Eliminate synchronous I/O stall Critical for I/O intensive applications Database (parameters to configure read ahead) Block devices (getra, setra)

Asynchronous I/O

Configure read ahead


Turn off I/O barriers (RHEL6 and enterprise storage only)

I/O Tuning Effect of Direct I/O, Asynch I/O


OLTP Workload - 4 Socket 2 cores - 16G mem
Mid-level Fibre channel storage
80K

70K

60K

50K

Trans/min

40K

Setall DIO only AIO only No AIO DIO

30K

20K

10K

I/O Tuning Database Layout

Separate files by I/O (data , logs, undo, temp) OLTP data files / logs DSS data files / temp files Use low latency / high bandwidth devices for hot spots

I/O Tuning OLTP Logs - Fusion-IO


OLTP workload - Logs on FC vs Fusion-IO
Single Instance 400K

23.77 22.3 21.01

25

350K

20

300K

250K
Trans/Min

15

200K
10

Logs Fusion-io Logs FC %diff

150K

100K
5

50K

K
10U 40U 80U

I/O Tuning Storage (OLTP database)


OLTP workload - Fibre channel vs Fusion-IO
4 database instances 1400K

1200K

1000K

800K
Trans/Min

600K

400K

200K

4G-FC

Fusion-io

I/O Tuning OLTP Database (vmstats)


4 database instances Fibre channel
r b 62 19 3 20 21 27 7 20 25 18 4 21 1 32 4 17 17 14 35 14 20 13 1 14 23 24 34 12 r b 77 0 77 1 76 3 81 1 82 3 81 3 79 1 79 0 76 2 61 3 77 1 54 4 80 2 swpd free buff cache si so bi bo in cs us sy id wa st 5092 44894312 130704 76267048 0 0 8530 144255 35350 113257 43 4 45 9 0 5092 43670800 131216 77248544 0 0 6146 152650 29368 93373 33 3 53 11 0 5092 42975532 131620 77808736 0 0 2973 147526 20886 66140 20 2 65 13 0 5092 42555764 132012 78158840 0 0 2206 136012 19526 61452 17 2 69 12 0 5092 42002368 132536 78647472 0 0 2466 144191 20255 63366 19 2 67 11 0 5092 41469552 132944 79111672 0 0 2581 144470 21125 66029 22 2 65 11 0 5092 40814696 133368 79699200 0 0 2608 151518 21967 69841 23 2 64 11 0 5092 40046620 133804 80385232 0 0 2638 151933 23044 70294 24 2 64 10 0 5092 39499580 134204 80894864 0 0 2377 152805 23663 72655 25 2 62 10 0 5092 38910024 134596 81436952 0 0 2278 152864 24944 74231 27 2 61 9 0 5092 38313900 135032 81978544 0 0 2091 156207 24257 72968 26 2 62 10 0 5092 37831076 135528 82389120 0 0 1332 155549 19798 58195 20 2 67 11 0 5092 37430772 135936 82749040 0 0 1955 145791 19557 56133 18 2 66 14 0 5092 36864500 136396 83297184 0 0 1546 141385 19957 56894 19 2 67 13 0 swpd free buff cache si so 6604 55179876 358888 66226960 0 0 6604 50630092 359288 70476248 0 0 6604 46031168 360132 74444776 0 0 6604 41510608 360512 78641480 0 0 6604 35358836 361012 84466256 0 0 6604 34991452 361892 84740008 0 0 6604 34939792 362296 84747016 0 0 6604 34879644 362992 84754016 0 0 6604 34844616 363396 84760976 0 0 6604 34808680 363828 84768016 0 0 6604 34781944 364180 84774992 0 0 6604 34724948 364772 84803456 0 0 6604 34701500 365500 84809072 0 0 bi 7325 6873 5818 4970 4011 2126 2323 2275 2275 2209 2172 3031 2216 bo in cs us sy 266895 70185 149686 90 7 306900 70166 149804 88 7 574286 77388 177454 88 8 452939 75322 168464 89 7 441042 74022 162443 88 7 440876 73702 161618 88 7 400324 73091 161592 90 6 412631 73271 160766 89 6 415777 73019 158614 89 6 401522 72367 159100 89 6 401966 73253 159064 90 6 421299 72990 156224 89 6 573246 76404 175922 88 7 id wa st 4 0 0 5 0 0 4 0 0 3 0 0 4 0 0 5 0 0 3 0 0 4 0 0 4 0 0 4 0 0 4 0 0 4 0 0 5 1 0

4 database instances Solid State devices (PCI) - Fusion-io

I/O Tuning DSS - Temp


DSS Workload - Sort-Merge table create - Time Metric - Smaller is better
01:55:12

01:40:48

01:26:24

01:12:00

Elapsed Time

00:57:36

00:43:12

00:28:48

00:14:24

00:00:00

4G-FC

Fusion-IO

I/O Tuning RHEL 6 DSS Workload Sybase IQ 15.2


120K 100K
98K 87K

80K

77K

60K
51K 44K 47K

2 FC Arrays Fusion-io

40K

20K

K
Different measurement metrics

What To Tune

I/O Memory Memory CPU Network

Memory Tuning

Dense Memory Based on Architecture NUMA Huge Pages

Understanding NUMA (Non Uniform Memory Access)

Multi Socket Multi core architecture


NUMA required for scaling RHEL 5 / 6 completely NUMA aware Additional performance gains by enforcing NUMA placement

How to enforce NUMA placement


numactl CPU and memory pinning taskset CPU pinning cgroups (only in RHEL6) libvirt for KVM guests CPU pinning

Memory Tuning Huge Pages

2M pages vs 4K standard linux page Virtual to physical page map is 512 times smaller TLB can map more physical pages, resulting in fewer misses Traditional Huge Pages always pinned Transparent Huge Pages in RHEL6 Most databases support Huge pages How to configure Huge Pages (16G)

echo 8192 > /proc/sys/vm/nr_hugepages vi /etc/sysctl.conf (vm.nr_hugepages=8192)

Memory Tuning Huge Pages Sybase - OLTP


Sybase Huge Pages Testing - RHEL 5.5
OLTP transactional throughput on a Quad Core 4 Socket 2.5Ghz 96G Physical

180K

175K

170K
165K

160K

150K

140K

130K

120K

110K

100K

default

hugepages

OLTP Workload Effect of NUMA and Huge Pages


OLTP workload - Multi Instance
1400K
20

17.82
1200K

18

16

11.7
800K
Trans/Min

12

10

600K

8.23
8

400K

200K
2

0 Non NUMA NUMA non NUMA Huge Pages NUMA Huge Pages

% diff compared to Non NUMA number

1000K

14

NUMA and Huge Pages

Huge page allocation takes place uniformly across NUMA nodes Make sure that database shared segments are sized to fit Workaround Allocate Huge pages / Start DB / De-allocate Huge pages

Physical Memory 128G 4 NUMA nodes Huge Pages 80G 20G in each NUMA node 24G DB Shared Segment using Huge Pages 24G DB Shared Segment using NUMA and Huge Pages Huge Pages 100G 25G in each NUMA node

Tuning Memory Flushing Caches

Drop unused Cache Frees unused memory File cache If the DB uses cache, may notice slowdown

Free pagecache echo 1 > /proc/sys/vm/drop_caches Free slabcache

echo 2 > /proc/sys/vm/drop_caches echo 3 > /proc/sys/vm/drop_caches

Free pagecache and slabcache

CPU Tuning

CPU performance

Clock speed Multiple cores Power savings mode


cpuspeed off performance ondemand powersave

How To

echo "performance" > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor best of both worlds cron jobs to configure the governor mode ktune (RHEL5) tuned-adm profile server-powersave (RHEL6)

Tuning CPU Impact of Power Settings


RHEL6 - Database - OLTP workload
cpuspeed settings ( Range - 2.27 GHz - 1.06 GHz)
450K 400K 350K 300K 250K 200K 150K 100K 50K K
10U 20U

cpspeed off performance ondemand powersave

Tuning CPU Effect of Power Settings - DSS


DSS workload (I/O intensive)
Time Metric (Lower is better)
10:05

08:38

07:12

05:46

04:19

02:53

01:26

00:00

performance

ondemand

powersave

vmstat output during test: 7 12 7 12 2 0 7 11 1 15 5884 122884416 485900 734376 5884 122885024 485900 734376 5884 122884928 485908 734376 5884 122885056 485912 734372 5884 122885176 485920 734376 0 0 0 0 0 0 184848 39721 9175 37669 0 217766 27468 9904 42807 0 168496 45375 6294 27759 0 178790 40969 9433 38140 0 248283 19807 7710 37788 4 4 4 4 5 1 89 2 87 1 90 1 90 2 86 6 6 5 5 7 0 0 0 0 0

Network Tuning

Network Performance

Separate network for different functions If on same network, use arp_filter to prevent ARP flux

echo 1 > /proc/sys/net/ipv4/conf/all/arp_filter Supports RDMA w/ RHEL6 high performance networking package

10GigE

Infiniband Packet size

Database Performance

Application tuning

Design Reduce locking / waiting Database tools (optimize regularly)

Tuning - Virtualization KVM (RHEL guests)

Virtualization Tuning - Caching

Cache = none

I/O from the guest is not cached on the host I/O from the guest is cached and written through on the host Works well on large systems (lots of memory and CPU) Potential scaling problems with this option with multiple guests (host CPU used to maintain cache) Can lead to swapping on the host

Cache = writethrough

How To Configure I/O - Cache per disk in qemu command line or libvirt

Effect of I/O Cache Settings on Guest Performance


1000K 900K 800K 700K 600K 500K 400K 300K 200K 100K K
0 15 25 35

31.67
30

20 Cache=none Cache=WT %diff

10

5.82
5

1Guest

4Guests

Configurable per device: Virt-Manager - drop-down option under Advanced Options Libvirt xml file - driver name='qemu' type='raw' cache='writethrough' io='native'

AIO Native vs Threaded (default)


1200K

1000K

800K

600K

AIO Native AIO Default (threaded)

400K

200K

K
10U 20U

Configurable per device (only by xml configuration file): Libvirt xml file - driver name='qemu' type='raw' cache='writethrough' io='native'

Virtualization Tuning I/O Elevators - OLTP


Host Running Deadline
Trans / min - Higher is better
300K

250K

200K CFQ Deadline Noop

150K

100K

50K

1Guest

2 Guests

4 Guests

Virtualization Tuning I/O elevators - DSS


Host Running Deadline

Time metric - Lower is better


20:10

17:17

14:24

11:31

CFQ Deadline Noop

08:38

05:46

02:53

00:00

1Guest

2 Guests

4 Guests

Virtualization Tuning Using NUMA


400K
35

350K

28.6

30

300K
25

250K
20

200K
15

150K
10

Guest 4 Guest 3 Guest 2 Guest 1 % improvement

100K
5

50K

0.0
4Guest-24vcpu-56G 4Guest-24vcpu-56G-NUMA

Virtualization Tuning - Network

VirtIO

VirtIO drivers for network Bypass the qemu layer Bypass the host and pass the PCI device to the guest Can be passed only to one guest

vhost_net (low latency close to line speed)

PCI pass through


SR-IOV (Single root I/O Virtualization)


Pass through to the guest Can be shared among multiple guests Limited hardware support

Latency Comparison RHEL 6


Network Latency by Guest Interface Method
Guest Receive (Lower is better)
400 350 300

Latency (usecs)

250 200 150 100 50 0


1 6 16 27 45 64 99 189 256 387 765 4 102 9 153 9 306 6 409 5 4 9 9 6 7 7 614 1228 1638 2457 4914 6553 9830

vhost-net improves latency bringing it close to bare metal

Message Size (Bytes)


host RX virtio RX vhost RX SR-IOV RX

Performance Setting Tool

tuned for RHEL6

Configure system for different performance profiles


laptop-ac-powersave spindown-disk latency-performance laptop-battery-powersave server-powersave throughput-performance desktop-powersave enterprise-storage default

Performance Monitoring Tools

Monitoring tools

top, vmstat, ps, iostat, netstat, sar, perf /proc, sysctl, AltSysRq ethtool, ifconfig oprofile, strace, ltrace, systemtap, perf

Kernel tools

Networking

Profiling

Wrap Up Bare Metal

I/O

Choose the right elevator Eliminated hot spots Direct I/O or Asynchronous I/O Virtualization Caching NUMA Huge Pages Swapping Managing Caches

Memory

RHEL has many tools to help with debugging / tuning

Wrap Up Bare Metal (cont.)

CPU

Check cpuspeed settings Separate networks arp_filter Packet size

Network

Wrap Up Virtualization

VirtIO drivers aio (native) NUMA Cache options (none, writethrough) Network (vhost-net)

You might also like