0% found this document useful (0 votes)
42 views71 pages

Introduction To Low-Level Profiling and Tracing

Introduction to Low-level Profiling and Tracing

Uploaded by

sky smart
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views71 pages

Introduction To Low-Level Profiling and Tracing

Introduction to Low-level Profiling and Tracing

Uploaded by

sky smart
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 71

Introduction to low-level

profiling and tracing


EuroPython 2019 / Basel 2019-07-11
Christian Heimes
Principal Software Engineer

[email protected] / [email protected]
@ChristianHeimes
Our systems block every 5 minutes.
We are loosing money. Fix it!

Profiling and Tracing, EuroPython 2019, @ChristianHeimes, CC BY-SA 4.0


Is it me, or the GIL?

Christoph Heer
EuroPython 2019

Profiling and Tracing, EuroPython 2019, @ChristianHeimes, CC BY-SA 4.0


requests.get('https://www.python.org')

Profiling and Tracing, EuroPython 2019, @ChristianHeimes, CC BY-SA 4.0


2 ½ use case
for tracing tools
Who am I?

● from Hamburg/Germany

Python and C developer

Python core contributor since 2008

maintainer of ssl and hashlib module

Python security team

Profiling and Tracing, EuroPython 2019, @ChristianHeimes, CC BY-SA 4.0


Professional life

● Principal Software Engineer at Red Hat



Security Engineering

FreeIPA Identity Management

Dogtag PKI

Profiling and Tracing, EuroPython 2019, @ChristianHeimes, CC BY-SA 4.0


Agenda
&
Goals
Agenda

introduction

ptrace (strace, ltrace)

kernel & hardware tracing tools

Summary

~ 5 minutes Q&A

Profiling and Tracing, EuroPython 2019, @ChristianHeimes, CC BY-SA 4.0


Special thanks

Brendan Gregg (Netflix)

Victor Stinner (CPython, Red Hat)
https://vstinner.readthedocs.io/benchmark.html

Dmitry Levin (strace)
”Modern Strace”, DevConf.CZ 2019

Michal Sekletár (Red Hat)
”Tracing Tools for Systems Engineers”, DevConf.CZ 2019

Profiling and Tracing, EuroPython 2019, @ChristianHeimes, CC BY-SA 4.0


Introduction
Terminology
Debugging – The process of identifying and removing bugs.

active, expensive, intrusive, slow-down

deploy new version

attach debugger

Tracing – observing and monitoring behavior



passive, non-intrusive, and fast*

Profiling – gathering and analyzing metrics



byproduct of tracing with counting and timing

Profiling and Tracing, EuroPython 2019, @ChristianHeimes, CC BY-SA 4.0


Methodology
Application level tracing

debug builds

performance logs (MySQL Slow Query Log, PHP xdebug)

trace hooks (Python: sys.settrace())
User space tracing

LD_PRELOAD, ptrace
Kernel space tracing

ftrace, eBPF, systemtap, LTTng, ...
Hardware tracing

hardware performance counter, PMU, ...

Profiling and Tracing, EuroPython 2019, @ChristianHeimes, CC BY-SA 4.0


Prerequisites

installation of extra packages

permissions

root or special group

CAP_SYS_PTRACE, CAP_SYS_ADMIN, CAP_SYS_MODULE

recent Kernel (BCC, eBPF)

debug symbols
dnf debuginfo-install package, apt install package-dbg

compiler/linker flags (-g, -fno-omit-framepointer, BIND_LAZY)

disable secure boot (stap)

Kernel Live Patching or dynamic modules


Profiling and Tracing, EuroPython 2019, @ChristianHeimes, CC BY-SA 4.0
“Lies, damned lies,
and statistics”

”Wer misst, misst Mist.”

Profiling and Tracing, EuroPython 2019, @ChristianHeimes, CC BY-SA 4.0


benchmarks / statistics

average, median, standard deviation, percentile

observational error

systematic error

random error

systemic bias, cognitive bias (human factor)

misleading presentation (Vatican City has 2.27 popes / km²)

sampling profiler: quantization error, Nyquist–Shannon theorem

“Producing Wrong Data Without Doing Anything Obviously Wrong!”


(Mytkowicz, Diwan, Hauswirth, Sweeney)

Profiling and Tracing, EuroPython 2019, @ChristianHeimes, CC BY-SA 4.0


Computers are noisy


CPU power states (P-states, C-states, TurboBoost, thermal throttling)

caches (L1 CPU, file system, JIT warm-up, DNS/HTTP)

hardware IRQs, Address Space Layout Randomization (ASLR)

https://vstinner.readthedocs.io/benchmark.html

Profiling and Tracing, EuroPython 2019, @ChristianHeimes, CC BY-SA 4.0


env vars, path length, hostname, username
Benchmark
Benchmark without
without Python
Python virtual
virtual environment
environment

%% time
time seconds
seconds usecs/call
usecs/call calls
calls errors
errors syscall
syscall
------
------ -----------
----------- -----------
----------- ---------
--------- ---------
--------- ----------------
----------------
28.78
28.78 0.000438
0.000438 00 1440
1440 read
read
27.33
27.33 0.000416
0.000416 11 440
440 25
25 stat
stat
9.72
9.72 0.000148
0.000148 11 144
144 mmap
mmap
...
...
0.79
0.79 0.000012
0.000012 11 11
11 munmap
munmap

Benchmark
Benchmark inside
inside aa virtual
virtual environment
environment

%% time
time seconds
seconds usecs/call
usecs/call calls
calls errors
errors syscall
syscall
------
------ -----------
----------- -----------
----------- ---------
--------- ---------
--------- ----------------
----------------
57.12
57.12 0.099023
0.099023 22 61471
61471 munmap
munmap
41.87
41.87 0.072580
0.072580 11 61618
61618 mmap
mmap
0.23
0.23 0.000395
0.000395 11 465
465 27
27 stat
stat


https://mail.python.org/pipermail/python-dev/2019-February/156522.html

https://homes.cs.washington.edu/~bornholt/post/performance-evaluation.html

Profiling and Tracing, EuroPython 2019, @ChristianHeimes, CC BY-SA 4.0


Let's profile
import
import time
time

start
start == time.time()
time.time()

with
with open('/etc/os-release')
open('/etc/os-release') as
as f:
f:
lines = f.readlines()
lines = f.readlines()

print(time.time()
print(time.time() -- start)
start)

Profiling and Tracing, EuroPython 2019, @ChristianHeimes, CC BY-SA 4.0


Examples

shell

cat

Python

open + read file

HTTPS request with requests

Profiling and Tracing, EuroPython 2019, @ChristianHeimes, CC BY-SA 4.0


ptrace
ptrace

process trace syscall

introduced in Unix Version 6 (~1985)

user-space tracing

used by debuggers and code analysis tools

gdb

strace

ltrace

code coverage

...

Profiling and Tracing, EuroPython 2019, @ChristianHeimes, CC BY-SA 4.0


ltrace – A library call tracer
$$ ltrace
ltrace -e
-e 'SSL_CTX*@*'
'SSL_CTX*@*' \\
python3
python3 -c
-c 'import
'import requests;
requests; requests.get("https://www.python.org")'
requests.get("https://www.python.org")'
_ssl.cpython-37m-x86_64-linux-gnu.so->SSL_CTX_new(0x7fa4a1a77120,
_ssl.cpython-37m-x86_64-linux-gnu.so->SSL_CTX_new(0x7fa4a1a77120, 0,
0, 0,
0, 0)
0) == 0x55636123a100
0x55636123a100
_ssl.cpython-37m-x86_64-linux-gnu.so->SSL_CTX_get_verify_callback(0x55636123a100,
_ssl.cpython-37m-x86_64-linux-gnu.so->SSL_CTX_get_verify_callback(0x55636123a100, -2,-2, 0x7fa4a1acd5e0,
0x7fa4a1acd5e0, 0x7fa4a1acd5f8)
0x7fa4a1acd5f8) == 00
_ssl.cpython-37m-x86_64-linux-gnu.so->SSL_CTX_set_verify(0x55636123a100,
_ssl.cpython-37m-x86_64-linux-gnu.so->SSL_CTX_set_verify(0x55636123a100, 0,0, 0,
0, 0x7fa4a1acd5f8)
0x7fa4a1acd5f8) == 00
_ssl.cpython-37m-x86_64-linux-gnu.so->SSL_CTX_set_options(0x55636123a100,
_ssl.cpython-37m-x86_64-linux-gnu.so->SSL_CTX_set_options(0x55636123a100, 0x82420054,
0x82420054, 0,
0, 0x7fa4a1acd5f8)
0x7fa4a1acd5f8) == 0x82520054
0x82520054
_ssl.cpython-37m-x86_64-linux-gnu.so->SSL_CTX_ctrl(0x55636123a100,
_ssl.cpython-37m-x86_64-linux-gnu.so->SSL_CTX_ctrl(0x55636123a100, 33,
33, 16,
16, 0)
0) == 20
20
_ssl.cpython-37m-x86_64-linux-gnu.so->SSL_CTX_set_session_id_context(0x55636123a100,
_ssl.cpython-37m-x86_64-linux-gnu.so->SSL_CTX_set_session_id_context(0x55636123a100, 0x7fa4af500494,
0x7fa4af500494, 7,
7, 0)
0) == 11
...
...
_ssl.cpython-37m-x86_64-linux-gnu.so->SSL_CTX_free(0x55636123a100,
_ssl.cpython-37m-x86_64-linux-gnu.so->SSL_CTX_free(0x55636123a100, 0x7fa4af4b1e10,
0x7fa4af4b1e10, -3,
-3, 0x7fa4a12734c8)
0x7fa4a12734c8) == 00

Profiling and Tracing, EuroPython 2019, @ChristianHeimes, CC BY-SA 4.0


ltrace – count memory allocations
>>>
>>> import
import os,
os, time,
time, requests
requests
>>>
>>> os.getpid()
os.getpid()
6504
6504
## attach
attach ltrace
ltrace
>>>
>>> start
start == time.time();
time.time(); rr == requests.get("https://www.python.org");
requests.get("https://www.python.org"); print(time.time()
print(time.time() -- start)
start)
## without ltrace: 0.566 sec
without ltrace: 0.566 sec
## with
with ltrace:
ltrace: 3.396
3.396 sec
sec

$$ ltrace
ltrace -e
-e 'malloc+free+realloc@*'
'malloc+free+realloc@*' -c
-c -p
-p 6504
6504
%% time
time seconds
seconds usecs/call
usecs/call calls
calls function
function
------
------ -----------
----------- -----------
----------- ---------
--------- --------------------
--------------------
53.49
53.49 1.104804
1.104804 37
37 29796
29796 free
free
45.78
45.78 0.945565
0.945565 37
37 25523
25523 malloc
malloc
0.73
0.73 0.015069
0.015069 48
48 309
309 realloc
realloc
------
------ -----------
----------- -----------
----------- ---------
--------- --------------------
--------------------
100.00
100.00 2.065438
2.065438 55628
55628 total
total

Profiling and Tracing, EuroPython 2019, @ChristianHeimes, CC BY-SA 4.0


strace – trace system calls and signals

Paul Kranenburg in 1991

Dmitry Levin (current maintainer)

Ring 3

Ring 2 Least privileged

Ring 1

Ring 0

syscall Kernel

Most privileged
Device drivers

Device drivers

Applications

Hertzsprung at English Wikipedia [CC BY-SA 3.0]

Profiling and Tracing, EuroPython 2019, @ChristianHeimes, CC BY-SA 4.0


strace “open” syscall
$$ strace
strace -e
-e open
open cat
cat /etc/os-release
/etc/os-release >/dev/null
>/dev/null
+++
+++ exited
exited with
with 00 +++
+++
$$ man
man 22 open
open
...
...
The
The open()
open() system
system call
call opens
opens the
the file
file specified
specified by
by pathname.
pathname.

$$ strace
strace -c
-c cat
cat /etc/os-release
/etc/os-release >/dev/null
>/dev/null
%% time
time seconds
seconds usecs/call
usecs/call calls
calls errors
errors syscall
syscall
------
------ -----------
----------- -----------
----------- ---------
--------- ---------
--------- ----------------
----------------
31,40
31,40 0,000591
0,000591 590
590 11 execve
execve
13,51
13,51 0,000254
0,000254 28
28 99 mmap
mmap
8,67
8,67 0,000163
0,000163 40
40 44 mprotect
mprotect
7,20
7,20 0,000135
0,000135 22
22 66 read
read
6,95
6,95 0,000131
0,000131 32
32 44 openat
openat
6,36
6,36 0,000120
0,000120 19
19 66 close
close
6,21
6,21 0,000117
0,000117 29
29 44 brk
brk
...
...
------
------ -----------
----------- -----------
----------- ---------
--------- ---------
--------- ----------------
----------------
100.00
100.00 0,001882
0,001882 49
49 22 total
total

Profiling and Tracing, EuroPython 2019, @ChristianHeimes, CC BY-SA 4.0


syscall ABI / API

open

x86_64 glibc <= 2.25: open

x86_64 glibc >= 2.26: openat

riscv: openat
$ strace -e ’/^open(at)?$’

stat

fstat, fstat64, fstatat64, lstat, lstat64, newfstatat, oldfstat, oldlstat, oldstat, stat,
stat64, statx
$ strace -e %%stat
%%stat = %stat + %lstat + %fstat + statx

Profiling and Tracing, EuroPython 2019, @ChristianHeimes, CC BY-SA 4.0


strace “openat” syscall
$$ strace
strace -e
-e openat
openat cat
cat /etc/os-release
/etc/os-release >/dev/null
>/dev/null
openat(AT_FDCWD,
openat(AT_FDCWD, "/etc/ld.so.cache",
"/etc/ld.so.cache", O_RDONLY|O_CLOEXEC)
O_RDONLY|O_CLOEXEC) == 33
openat(AT_FDCWD,
openat(AT_FDCWD, "/lib64/libc.so.6",
"/lib64/libc.so.6", O_RDONLY|O_CLOEXEC)
O_RDONLY|O_CLOEXEC) == 33
openat(AT_FDCWD,
openat(AT_FDCWD, "/usr/lib/locale/locale-archive",
"/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC)
O_RDONLY|O_CLOEXEC) == 33
openat(AT_FDCWD, "/etc/os-release", O_RDONLY) =
openat(AT_FDCWD, "/etc/os-release", O_RDONLY) = 3 3
+++
+++ exited
exited with
with 00 +++
+++

$$ strace
strace -P
-P /etc/os-release
/etc/os-release cat
cat /etc/os-release
/etc/os-release >/dev/null
>/dev/null
strace:
strace: Requested
Requested path
path '/etc/os-release'
'/etc/os-release' resolved
resolved into
into '/usr/lib/os.release.d/os-release-fedora'
'/usr/lib/os.release.d/os-release-fedora'
openat(AT_FDCWD,
openat(AT_FDCWD, "/etc/os-release",
"/etc/os-release", O_RDONLY)
O_RDONLY) == 33
fstat(3,
fstat(3, {st_mode=S_IFREG|0644,
{st_mode=S_IFREG|0644, st_size=693,
st_size=693, ...})
...}) == 00
fadvise64(3, 0, 0, POSIX_FADV_SEQUENTIAL)
fadvise64(3, 0, 0, POSIX_FADV_SEQUENTIAL) = 0= 0
read(3,
read(3, "NAME=Fedora\nVERSION=\"29
"NAME=Fedora\nVERSION=\"29 (Twenty
(Twenty "...,
"..., 131072)
131072) == 693
693
read(3,
read(3, "",
"", 131072)
131072) == 00
close(3)
close(3) == 00
+++
+++ exited
exited with
with 00 +++
+++

Profiling and Tracing, EuroPython 2019, @ChristianHeimes, CC BY-SA 4.0


strace options

filter class examples

%file: syscall that take a file name as argument

%desc: file descriptor related syscalls

%net: network related syscalls

arguments

-P: path filter

-y: print path associated with file descriptor

-yy: print protocol information (e.g. ip:port)

-t/-tt/-ttt: time stamp

-T: time spent

-k: caller stack trace

Profiling and Tracing, EuroPython 2019, @ChristianHeimes, CC BY-SA 4.0


strace: requests.get(...)
$$ strace
strace -e
-e %file
%file -p
-p 6504
6504
...
...
stat("/etc/resolv.conf",
stat("/etc/resolv.conf", {st_mode=S_IFREG|0644,
{st_mode=S_IFREG|0644, st_size=300,
st_size=300, ...})
...}) == 00
openat(AT_FDCWD,
openat(AT_FDCWD, "/etc/hosts",
"/etc/hosts", O_RDONLY|O_CLOEXEC)
O_RDONLY|O_CLOEXEC) == 33
openat(AT_FDCWD,
openat(AT_FDCWD, "/etc/crypto-policies/back-ends/openssl.config",
"/etc/crypto-policies/back-ends/openssl.config", O_RDONLY)
O_RDONLY) == 44
openat(AT_FDCWD,
openat(AT_FDCWD, "/etc/ssl/cacert.pem",
"/etc/ssl/cacert.pem", O_RDONLY)
O_RDONLY) == -1
-1 ENOENT
ENOENT (No
(No such
such file
file or
or directory)
directory)

$$ strace
strace -e
-e %net
%net -p
-p 6504
6504
socket(AF_INET,
socket(AF_INET, SOCK_DGRAM|SOCK_CLOEXEC|SOCK_NONBLOCK,
SOCK_DGRAM|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_IP)
IPPROTO_IP) == 33
connect(3,
connect(3, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("10.38.5.26")}, 16)
{sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("10.38.5.26")}, 16) == 00
sendto(3,
sendto(3, "\323\300\1\0\0\1\0\0\0\0\0\0\3www\6python\3org\0\0\1\0\1", 32, MSG_NOSIGNAL, NULL,
"\323\300\1\0\0\1\0\0\0\0\0\0\3www\6python\3org\0\0\1\0\1", 32, MSG_NOSIGNAL, NULL, 0)0) == 32
32
recvfrom(3, "\323\300\201\200\0\1\0\2\0\0\0\0\3www\6python\3org\0\0\1\0\1"..., 2048,
recvfrom(3, "\323\300\201\200\0\1\0\2\0\0\0\0\3www\6python\3org\0\0\1\0\1"..., 2048, 0, 0,
{sa_family=AF_INET,
{sa_family=AF_INET, sin_port=htons(53),
sin_port=htons(53), sin_addr=inet_addr("10.38.5.26")},
sin_addr=inet_addr("10.38.5.26")}, [28->16])
[28->16]) == 93
93

socket(AF_INET,
socket(AF_INET, SOCK_STREAM|SOCK_CLOEXEC,
SOCK_STREAM|SOCK_CLOEXEC, IPPROTO_TCP)
IPPROTO_TCP) == 33
setsockopt(3,
setsockopt(3, SOL_TCP,
SOL_TCP, TCP_NODELAY,
TCP_NODELAY, [1],
[1], 4)
4) == 00
connect(3,
connect(3, {sa_family=AF_INET,
{sa_family=AF_INET, sin_port=htons(443),
sin_port=htons(443), sin_addr=inet_addr("151.101.36.223")},
sin_addr=inet_addr("151.101.36.223")}, 16)
16) == 00
getsockopt(3,
getsockopt(3, SOL_SOCKET,
SOL_SOCKET, SO_TYPE,
SO_TYPE, [1],
[1], [4])
[4]) == 00
getpeername(3,
getpeername(3, {sa_family=AF_INET,
{sa_family=AF_INET, sin_port=htons(443),
sin_port=htons(443), sin_addr=inet_addr("151.101.36.223")},
sin_addr=inet_addr("151.101.36.223")}, [16])
[16]) == 00

Profiling and Tracing, EuroPython 2019, @ChristianHeimes, CC BY-SA 4.0


syscall tampering
$$ strace
strace -e
-e inject=socket:error=EMFILE
inject=socket:error=EMFILE -p-p 6504
6504
>>> requests.get("https://www.python.org/en")
>>> requests.get("https://www.python.org/en")
Traceback
Traceback (most
(most recent
recent call
call last):
last):
...
...
socket.gaierror:
socket.gaierror: [Errno
[Errno -2]
-2] Name
Name or
or service
service not
not known
known
$$ strace
strace -e
-e inject=socket:error=EMFILE:when=2+
inject=socket:error=EMFILE:when=2+ -p -p 6504
6504
Traceback
Traceback (most
(most recent
recent call
call last):
last):
...
...
File
File "/usr/lib64/python3.7/socket.py",
"/usr/lib64/python3.7/socket.py", line
line 151,
151, in
in __init__
__init__
_socket.socket.__init__(self,
_socket.socket.__init__(self, family,
family, type,
type, proto,
proto, fileno)
fileno)
OSError:
OSError: [Errno
[Errno 24]
24] Too
Too many
many open
open files
files

$$ dd
dd if=/dev/zero
if=/dev/zero of=/dev/null
of=/dev/null bs=1M
bs=1M count=10
count=10
10485760
10485760 bytes (10 MB, 10 MiB) copied, 0,00328225
bytes (10 MB, 10 MiB) copied, 0,00328225 s,
s, 3,2
3,2 GB/s
GB/s
$$ strace
strace -e inject=write:delay_exit=100000 -o /dev/null \\
-e inject=write:delay_exit=100000 -o /dev/null
dd
dd if=/dev/zero
if=/dev/zero of=/dev/null
of=/dev/null bs=1M
bs=1M count=10
count=10
10485760
10485760 bytes
bytes (10
(10 MB,
MB, 10
10 MiB)
MiB) copied,
copied, 1,10699
1,10699 s,
s, 9,5
9,5 MB/s
MB/s

Profiling and Tracing, EuroPython 2019, @ChristianHeimes, CC BY-SA 4.0


syscall tampering
$$ strace
strace -P
-P /tmp/foo
/tmp/foo -e-e inject=unlinkat:retval=0
inject=unlinkat:retval=0 \\
rm /tmp/foo
rm /tmp/foo
newfstatat(AT_FDCWD,
newfstatat(AT_FDCWD, "/tmp/foo",
"/tmp/foo", {st_mode=S_IFREG|0664,
{st_mode=S_IFREG|0664, st_size=0,
st_size=0, ...},
...}, AT_SYMLINK_NOFOLLOW)
AT_SYMLINK_NOFOLLOW) == 00
newfstatat(AT_FDCWD,
newfstatat(AT_FDCWD, "/tmp/foo",
"/tmp/foo", {st_mode=S_IFREG|0664,
{st_mode=S_IFREG|0664, st_size=0,
st_size=0, ...},
...}, AT_SYMLINK_NOFOLLOW)
AT_SYMLINK_NOFOLLOW) == 00
faccessat(AT_FDCWD,
faccessat(AT_FDCWD, "/tmp/foo",
"/tmp/foo", W_OK)
W_OK) == 00
unlinkat(AT_FDCWD,
unlinkat(AT_FDCWD, "/tmp/foo",
"/tmp/foo", 0)
0) == 00 (INJECTED)
(INJECTED)
+++
+++ exited
exited with
with 00 +++
+++
$$ ls
ls /tmp/foo
/tmp/foo
/tmp/foo
/tmp/foo

Profiling and Tracing, EuroPython 2019, @ChristianHeimes, CC BY-SA 4.0


Verdict

easy to use

powerful tool for quick hacks

no extra privileges

slow

limit view (user-space only)

ltrace incompatible with optimizations
$$ ltrace
ltrace ls
ls >/dev/null
>/dev/null
+++
+++ exited (status
exited (status 0)0) +++
+++
$$ scanelf -a /bin/ls
scanelf -a /bin/ls
TYPE
TYPE PAX
PAX PERM
PERM ENDIAN
ENDIAN STK/REL/PTL
STK/REL/PTL TEXTREL
TEXTREL RPATH
RPATH BIND
BIND FILE
FILE
ET_DYN
ET_DYN PeMRxS
PeMRxS 0755
0755 LE
LE RW-
RW- R--
R-- RW-
RW- -- -- NOW
NOW /bin/ls
/bin/ls

Profiling and Tracing, EuroPython 2019, @ChristianHeimes, CC BY-SA 4.0


Low-level tracing
What is the Kernel doing?

What is my hardware doing?

Profiling and Tracing, EuroPython 2019, @ChristianHeimes, CC BY-SA 4.0


Low-level tracing

Trace inside the Kernel

file system

hardware drivers

Profile hardware

CPU cache

memory / MMU

efficient user-space tracing

single process

system-wide

learn how Kernel space and hardware works

Profiling and Tracing, EuroPython 2019, @ChristianHeimes, CC BY-SA 4.0


Data sources

kprobes

uprobes

events

perf_event (HPC, PMU, page faults, TLB, ...)

clock

Kernel TRACE_EVENT, DEFINE_EVENT

user defined

dtrace probe / USDT (Userland Statically Defined Tracing)

lttng-ust

Profiling and Tracing, EuroPython 2019, @ChristianHeimes, CC BY-SA 4.0


kprobes / uprobes
Kernel probes

(almost) all internal Kernel functions

/proc/kallsyms

User-space probes

(almost) all functions in binaries

applications

shared libraries

Profiling and Tracing, EuroPython 2019, @ChristianHeimes, CC BY-SA 4.0


perf_event & metrics
$$ perf
perf list
list
branch-instructions
branch-instructions OR
OR branches
branches [Hardware
[Hardware event]
event]
branch-misses
branch-misses [Hardware
[Hardware event]
event]
bus-cycles
bus-cycles [Hardware
[Hardware event]
event]
...
...
alignment-faults
alignment-faults [Software
[Software event]
event]
bpf-output
bpf-output [Software
[Software event]
event]
context-switches
context-switches OR
OR cs
cs [Software
[Software event]
event]
cpu-clock
cpu-clock [Software
[Software event]
event]
cpu-migrations
cpu-migrations OR
OR migrations
migrations [Software
[Software event]
event]
...
...
L1-dcache-load-misses
L1-dcache-load-misses [Hardware
[Hardware cache
cache event]
event]
L1-dcache-loads
L1-dcache-loads [Hardware
[Hardware cache
cache event]
event]
L1-dcache-stores
L1-dcache-stores [Hardware
[Hardware cache
cache event]
event]
L1-icache-load-misses
L1-icache-load-misses [Hardware
[Hardware cache
cache event]
event]
...
...
power/energy-cores/
power/energy-cores/ [Kernel
[Kernel PMU
PMU event]
event]
power/energy-gpu/
power/energy-gpu/ [Kernel
[Kernel PMU
PMU event]
event]
power/energy-pkg/
power/energy-pkg/ [Kernel
[Kernel PMU
PMU event]
event]
power/energy-psys/
power/energy-psys/ [Kernel
[Kernel PMU
PMU event]
event]
...
...
floating
floating point:
point:
fp_arith_inst_retired.128b_packed_double
fp_arith_inst_retired.128b_packed_double
[Number
[Number of
of SSE/AVX
SSE/AVX computational
computational 128-bit
128-bit packed
packed double
double precision
precision floating-point
floating-point instructions
instructions ...]
...]
...
...
Summary:
Summary:
CLKS
CLKS
[Per-thread
[Per-thread actual
actual clocks
clocks when
when the
the logical
logical processor
processor is
is active.
active. This
This is
is called
called 'Clockticks'
'Clockticks' in
in VTune]
VTune]
CPI
CPI
[Cycles
[Cycles Per
Per Instruction
Instruction (threaded)]
(threaded)]

Profiling and Tracing, EuroPython 2019, @ChristianHeimes, CC BY-SA 4.0


Kernel trace events
## find
find /sys/kernel/debug/tracing/events
/sys/kernel/debug/tracing/events -name
-name format
format || wc
wc -l
-l
1897
1897

## trace-cmd
trace-cmd record
record -e
-e 'cfg80211:cfg80211_inform_bss_frame'
'cfg80211:cfg80211_inform_bss_frame' -e
-e 'cfg80211:cfg80211_get_bss'
'cfg80211:cfg80211_get_bss'
## trace-cmd report
trace-cmd report
...
...
irq/140-iwlwifi-1834
irq/140-iwlwifi-1834 [002]
[002] 91697.995389:
91697.995389: cfg80211_inform_bss_frame:
cfg80211_inform_bss_frame: phy0,
phy0, band:
band: 1,
1, freq:
freq:
5180(scan_width:
5180(scan_width: 0)
0) signal:
signal: -6600,
-6600, tsb:132269768490547,
tsb:132269768490547, detect_tsf:0,
detect_tsf:0, tsf_bssid:
tsf_bssid: 00:00:00:00:00:00
00:00:00:00:00:00
kworker/u16:3-23176
kworker/u16:3-23176 [004]
[004] 91697.995468:
91697.995468: cfg80211_inform_bss_frame:
cfg80211_inform_bss_frame: phy0,
phy0, band:
band: 1,
1, freq:
freq:
5180(scan_width:
5180(scan_width: 0)
0) signal:
signal: -6600,
-6600, tsb:132269768490547,
tsb:132269768490547, detect_tsf:0,
detect_tsf:0, tsf_bssid:
tsf_bssid: 00:00:00:00:00:00
00:00:00:00:00:00
irq/140-iwlwifi-1834
irq/140-iwlwifi-1834 [002]
[002] 91698.002485:
91698.002485: cfg80211_inform_bss_frame:
cfg80211_inform_bss_frame: phy0,
phy0, band:
band: 1,
1, freq:
freq:
5180(scan_width:
5180(scan_width: 0)
0) signal:
signal: -6700,
-6700, tsb:132269775585563,
tsb:132269775585563, detect_tsf:0,
detect_tsf:0, tsf_bssid:
tsf_bssid: 00:00:00:00:00:00
00:00:00:00:00:00
wpa_supplicant-2320
wpa_supplicant-2320 [000]
[000] 91698.198168:
91698.198168: cfg80211_get_bss:
cfg80211_get_bss: phy0,
phy0, band:
band: 1,
1, freq:
freq: 5180,
5180,
08:96:d7:XX:XX:XX,
08:96:d7:XX:XX:XX, buf:
buf: 0x73,
0x73, bss_type:
bss_type: 0,
0, privacy:
privacy: 22
wpa_supplicant-2320
wpa_supplicant-2320 [000]
[000] 91698.216052:
91698.216052: cfg80211_get_bss:
cfg80211_get_bss: phy0,
phy0, band:
band: 1,
1, freq:
freq: 5180,
5180,
08:96:d7:XX:XX:XX, buf: 0x73, bss_type: 0, privacy:
08:96:d7:XX:XX:XX, buf: 0x73, bss_type: 0, privacy: 2 2

Profiling and Tracing, EuroPython 2019, @ChristianHeimes, CC BY-SA 4.0


Advanced Tools
Advanced tools

ftrace (tracefs)

perf

BCC / eBPF tools

SystemTap

more tools

LTTng

dtrace

VTune

SysDig

...

Profiling and Tracing, EuroPython 2019, @ChristianHeimes, CC BY-SA 4.0


DTrace

Sun Microsystems (2005) for Solaris

DTrace markers

operating systems

Linux (2017)

Windows (2018)

macOS

FreeBSD

NetBSD

Profiling and Tracing, EuroPython 2019, @ChristianHeimes, CC BY-SA 4.0


ftrace – Function Tracer

by Steven Rostedt (2008)

tracefs /sys/kernel/debug/tracing

ring buffer / pipes

GCC profiling, live patching, trampolines

foundation for Live Kernel Patching

frontends

busybox / shell (cat, echo)

trace-cmd

KernelShark

perf

Profiling and Tracing, EuroPython 2019, @ChristianHeimes, CC BY-SA 4.0


trace-cmd: NFS open (simplified)
## trace-cmd
trace-cmd record
record -p
-p function_graph
function_graph -l
-l 'nfs_*'
'nfs_*' -c
-c python3
python3 python_open.py
python_open.py
## trace-cmd
trace-cmd report
report
...
...
<...>-56879
<...>-56879 [006]
[006] 8398345.057032:
8398345.057032: funcgraph_entry:
funcgraph_entry: || nfs_permission()
nfs_permission() {{
<...>-56879
<...>-56879 [006]
[006] 8398345.057032:
8398345.057032: funcgraph_entry:
funcgraph_entry: 0.065
0.065 us
us || nfs_do_access();
nfs_do_access();
<...>-56879
<...>-56879 [006]
[006] 8398345.057033:
8398345.057033: funcgraph_entry:
funcgraph_entry: || nfs_do_access()
nfs_do_access() {{
<...>-56879
<...>-56879 [006]
[006] 8398345.057034:
8398345.057034: funcgraph_entry:
funcgraph_entry: 0.912
0.912 us
us || nfs_alloc_fattr();
nfs_alloc_fattr();
<...>-56879
<...>-56879 [006]
[006] 8398345.057947:
8398345.057947: funcgraph_entry:
funcgraph_entry: || nfs_refresh_inode()
nfs_refresh_inode() {{
<...>-56879
<...>-56879 [006]
[006] 8398345.057950:
8398345.057950: funcgraph_entry:
funcgraph_entry: || nfs_refresh_inode.part.27()
nfs_refresh_inode.part.27() {{
<...>-56879
<...>-56879 [006]
[006] 8398345.057951:
8398345.057951: funcgraph_entry:
funcgraph_entry: || nfs_refresh_inode_locked()
nfs_refresh_inode_locked() {{
<...>-56879
<...>-56879 [006]
[006] 8398345.057952:
8398345.057952: funcgraph_entry:
funcgraph_entry: || nfs_update_inode()
nfs_update_inode() {{
<...>-56879
<...>-56879 [006]
[006] 8398345.057952:
8398345.057952: funcgraph_entry:
funcgraph_entry: 0.190
0.190 us
us || nfs_file_has_writers();
nfs_file_has_writers();
<...>-56879
<...>-56879 [006]
[006] 8398345.057954:
8398345.057954: funcgraph_entry:
funcgraph_entry: 0.235
0.235 us
us || nfs_set_cache_invalid();
nfs_set_cache_invalid();
<...>-56879
<...>-56879 [006]
[006] 8398345.057955:
8398345.057955: funcgraph_exit:
funcgraph_exit: 3.089
3.089 us
us || }}
<...>-56879
<...>-56879 [006]
[006] 8398345.057955:
8398345.057955: funcgraph_exit:
funcgraph_exit: 4.106
4.106 us
us || }}
<...>-56879
<...>-56879 [006]
[006] 8398345.057955:
8398345.057955: funcgraph_exit:
funcgraph_exit: 4.830
4.830 us
us || }}
<...>-56879
<...>-56879 [006]
[006] 8398345.057964:
8398345.057964: funcgraph_exit:
funcgraph_exit: ++ 14.691
14.691 us
us || }}
<...>-56879
<...>-56879 [006]
[006] 8398345.057964:
8398345.057964: funcgraph_entry:
funcgraph_entry: 0.053
0.053 us
us || nfs_access_set_mask();
nfs_access_set_mask();
<...>-56879
<...>-56879 [006]
[006] 8398345.057966:
8398345.057966: funcgraph_entry:
funcgraph_entry: 2.621
2.621 us
us || nfs_access_add_cache();
nfs_access_add_cache();
<...>-56879
<...>-56879 [006]
[006] 8398345.057970:
8398345.057970: funcgraph_exit:
funcgraph_exit: !! 936.572
936.572 us
us || }}
<...>-56879
<...>-56879 [006]
[006] 8398345.057970:
8398345.057970: funcgraph_exit:
funcgraph_exit: !! 937.907
937.907 us
us || }}

Profiling and Tracing, EuroPython 2019, @ChristianHeimes, CC BY-SA 4.0


trace-cmd: NFS open call stack
##trace-cmd
trace-cmd record
record -p
-p function
function --func-stack
--func-stack -l
-l nfs_permission
nfs_permission -c
-c python3
python3 python_open.py
python_open.py
##trace-cmd report
trace-cmd report
<...>-56912
<...>-56912 [006]
[006] 8398623.394239:
8398623.394239: function:
function: nfs_permission
nfs_permission
<...>-56912
<...>-56912 [006]
[006] 8398623.394605:
8398623.394605: kernel_stack:
kernel_stack: <stack
<stack trace>
trace>
=>
=> nfs_permission
nfs_permission (ffffffffc075d415)
(ffffffffc075d415)
=>
=> inode_permission
inode_permission (ffffffffba2bb34e)
(ffffffffba2bb34e)
=>
=> link_path_walk.part.49
link_path_walk.part.49 (ffffffffba2bf172)
(ffffffffba2bf172)
=>
=> path_lookupat.isra.53
path_lookupat.isra.53 (ffffffffba2bfbc3)
(ffffffffba2bfbc3)
=>
=> filename_lookup.part.67
filename_lookup.part.67 (ffffffffba2c18c0)
(ffffffffba2c18c0)
=>
=> vfs_statx
vfs_statx (ffffffffba2b5683)
(ffffffffba2b5683)
=>
=> __do_sys_newstat
__do_sys_newstat (ffffffffba2b5c29)
(ffffffffba2b5c29)
=>
=> do_syscall_64
do_syscall_64 (ffffffffba00418b)
(ffffffffba00418b)
=>
=> entry_SYSCALL_64_after_hwframe
entry_SYSCALL_64_after_hwframe (ffffffffbaa00088)
(ffffffffbaa00088)
...
...
<...>-56912
<...>-56912 [006]
[006] 8398623.396069:
8398623.396069: function:
function: nfs_permission
nfs_permission
<...>-56912
<...>-56912 [006]
[006] 8398623.396093:
8398623.396093: kernel_stack:
kernel_stack: <stack
<stack trace>
trace>
=>
=> nfs_permission
nfs_permission (ffffffffc075d415)
(ffffffffc075d415)
=>
=> inode_permission
inode_permission (ffffffffba2bb34e)
(ffffffffba2bb34e)
=>
=> link_path_walk.part.49
link_path_walk.part.49 (ffffffffba2bf172)
(ffffffffba2bf172)
=>
=> path_openat
path_openat (ffffffffba2bfdef)
(ffffffffba2bfdef)
=>
=> do_filp_open
do_filp_open (ffffffffba2c26e3)
(ffffffffba2c26e3)
=>
=> do_sys_open
do_sys_open (ffffffffba2ada36)
(ffffffffba2ada36)
=>
=> do_syscall_64
do_syscall_64 (ffffffffba00418b)
(ffffffffba00418b)
=>
=> entry_SYSCALL_64_after_hwframe (ffffffffbaa00088)
entry_SYSCALL_64_after_hwframe (ffffffffbaa00088)

Profiling and Tracing, EuroPython 2019, @ChristianHeimes, CC BY-SA 4.0


perf – Performance analysis tools for Linux

Linux Kernel (2009)

perf_event_open() syscall

perf counters (HPC, SPC)

kprobes, uprobes, trace points

LBR (Last Branch Records) sampling on recent Intel CPUs

low-overhead sampling with callgraph

wide range analysis from CPU instructions to Python, Java, NodeJS…

perf command

privileged and unprivileged

ncurse user interface

Profiling and Tracing, EuroPython 2019, @ChristianHeimes, CC BY-SA 4.0


perf stat
## perf
perf stat
stat -d
-d make
make -j
-j
208.784,16
208.784,16 msec
msec task-clock
task-clock ## 2,599
2,599 CPUs
CPUs utilized
utilized
156.385
156.385 context-switches
context-switches ## 749,028
749,028 M/sec
M/sec
7.964
7.964 cpu-migrations
cpu-migrations ## 38,145
38,145 M/sec
M/sec
3.774.260
3.774.260 page-faults
page-faults ## 18077,343
18077,343 M/sec
M/sec
611.151.444.573
611.151.444.573 cycles
cycles ## 2927194,826
2927194,826 GHz
GHz (62,51%)
(62,51%)
620.124.617.748
620.124.617.748 instructions
instructions ## 1,01
1,01 insn
insn per
per cycle
cycle (75,02%)
(75,02%)
134.224.550.386
134.224.550.386 branches
branches ## 642887148,373
642887148,373 M/sec
M/sec (75,02%)
(75,02%)
4.022.428.040
4.022.428.040 branch-misses
branch-misses ## 3,00%
3,00% of
of all
all branches
branches (75,00%)
(75,00%)
170.553.275.660
170.553.275.660 L1-dcache-loads
L1-dcache-loads ## 816888629,684
816888629,684 M/sec
M/sec (75,00%)
(75,00%)
10.332.035.249
10.332.035.249 L1-dcache-load-misses
L1-dcache-load-misses ## 6,06%
6,06% of
of all
all L1-dcache
L1-dcache hits
hits (74,99%)
(74,99%)
2.288.155.928
2.288.155.928 LLC-loads
LLC-loads ## 10959440,992 M/sec
10959440,992 M/sec (49,98%)
(49,98%)
477.323.105
477.323.105 LLC-load-misses
LLC-load-misses ## 20,86%
20,86% of
of all
all LL-cache
LL-cache hits
hits (50,01%)
(50,01%)

80,335503652
80,335503652 seconds
seconds time
time elapsed
elapsed
185,163504000
185,163504000 seconds
seconds user
user
18,416851000
18,416851000 seconds
seconds sys
sys

## perf
perf stat
stat -M
-M Turbo_Utilization
Turbo_Utilization make
make -j
-j
599.287.742.470
599.287.742.470 cpu_clk_unhalted.thread
cpu_clk_unhalted.thread ## 1,7
1,7 Turbo_Utilization
Turbo_Utilization
354.672.677.325
354.672.677.325 cpu_clk_unhalted.ref_tsc
cpu_clk_unhalted.ref_tsc

Intel
Intel i5-8265U
i5-8265U 1.60GHz
1.60GHz // 3.90GHz
3.90GHz

Profiling and Tracing, EuroPython 2019, @ChristianHeimes, CC BY-SA 4.0


perf probe
$$ sudo
sudo perf
perf probe
probe -x
-x /lib64/libc.so.6
/lib64/libc.so.6 --add
--add malloc
malloc --add
--add realloc
realloc --add
--add free
free
Added
Added new
new events:
events:
probe_libc:malloc
probe_libc:malloc (on
(on malloc
malloc in
in /usr/lib64/libc-2.28.so)
/usr/lib64/libc-2.28.so)
probe_libc:malloc_1
probe_libc:malloc_1 (on
(on malloc
malloc in
in /usr/lib64/libc-2.28.so)
/usr/lib64/libc-2.28.so)
probe_libc:realloc
probe_libc:realloc (on
(on realloc
realloc in
in /usr/lib64/libc-2.28.so)
/usr/lib64/libc-2.28.so)
probe_libc:realloc_1
probe_libc:realloc_1 (on
(on realloc
realloc in
in /usr/lib64/libc-2.28.so)
/usr/lib64/libc-2.28.so)
probe_libc:free
probe_libc:free (on
(on free
free in
in /usr/lib64/libc-2.28.so)
/usr/lib64/libc-2.28.so)
probe_libc:free_1
probe_libc:free_1 (on
(on free
free in
in /usr/lib64/libc-2.28.so)
/usr/lib64/libc-2.28.so)

$$ sudo
sudo perf
perf stat
stat -e
-e 'probe_libc:malloc*'
'probe_libc:malloc*' -e
-e 'probe_libc:free*'
'probe_libc:free*' -e
-e 'probe_libc:realloc*'
'probe_libc:realloc*' -p
-p 18154
18154
25.540
25.540 probe_libc:malloc_1
probe_libc:malloc_1
00 probe_libc:malloc
probe_libc:malloc
00 probe_libc:free
probe_libc:free
29.948
29.948 probe_libc:free_1
probe_libc:free_1
00 probe_libc:realloc
probe_libc:realloc
309
309 probe_libc:realloc_1
probe_libc:realloc_1

plain 0.566 sec


ltrace 3.396 sec
perf 0.705 sec

Profiling and Tracing, EuroPython 2019, @ChristianHeimes, CC BY-SA 4.0


perf record / report
$$ perf
perf record
record -Fmax
-Fmax --call-graph
--call-graph lbr
lbr \\
python3
python3 -c
-c "import
"import requests;
requests; requests.get('https://www.python.org')"
requests.get('https://www.python.org')"

$$ perf
perf report
report
$$ perf
perf annotate
annotate

$$ perf
perf script
script || stackcollapse-perf.pl
stackcollapse-perf.pl >> out.perf-folded-full
out.perf-folded-full
$$ flamegraph.pl
flamegraph.pl out.perf-folded-full
out.perf-folded-full >> perf-calls-full.svg
perf-calls-full.svg

Profiling and Tracing, EuroPython 2019, @ChristianHeimes, CC BY-SA 4.0


Reset Zoom Flame Graph Search

[l..
_.. [libpy..
_.. [libpytho..
_.. [libpython..
_.. [libpython..
_.. [libpython3..
_.. [libpython3..
_.. _PyMethodDe..
_.. _.. _PyCFunctio..
_.. _.. _PyEval_EvalFr..
_.. P.. _PyEval_EvalCo.. _.. _..
_.. _.. P.. _Py.. _PyFunction_FastC.. _..
_.. _.. _.. [l.. _Py.. _PyEval_EvalFrameD.. _Py..
[.. _P.. _.. _.. _P.. _P.. _PyFunction_FastCallKeyw.. _Py.. _..
[.. P.. [l.. _P.. _.. _.. [lib.. _P.. _.. _P.. _PyEval_EvalFrameDefault _Py.. _Py_..
[lib.. [libpytho.. _Py.. _.. _P.. [lib.. _P.. _.. _P.. _PyFunction_FastCallKeywords _Py.. _Py_..
PE.. [libpyth.. [libpython3.7m... _Py.. [.. _P.. _PyO.. _PyFun.. _P.. _PyEval_EvalFrameDefault _PyF.. [libp..
X50.. [libpyth.. _PyGC_CollectNo.. _Py.. _.. _.. [l.. [libp.. _PyEval_.. _PyFunction_FastCallKeywords _PyE.. [libp.. dl..
by_.. PyGC_Col.. PyImport_Cleanup _Py.. P.. _P.. _P.. _PyMet.. _PyEval_E.. _PyEval_EvalFrameDefault _PyF.. _Py_U.. _d..
X50.. Py_FinalizeEx _Py.. _PyEval.. Py.. _PyCFu.. _PyFunction_FastCallKeywords _PyEva.. _.. __lib.. _d..
[_s.. [libpython3.7m.so.1.0] _PyEval.. _PyEval_EvalFrameDefault _PyFun.. _PyF.. _start
python3

Profiling and Tracing, EuroPython 2019, @ChristianHeimes, CC BY-SA 4.0


perf record / report (2)
$$ python3
python3
>>>
>>> import
import os,
os, requests
requests
>>>
>>> os.getpid()
os.getpid()
19414
19414
>>>
>>> requests.get('https://www.python.org')
requests.get('https://www.python.org')
<Response
<Response [200]>
[200]>

$$ perf
perf record
record -Fmax
-Fmax --call-graph
--call-graph lbr
lbr -p
-p 19414
19414
^C[
^C[ perf
perf record:
record: Woken
Woken up
up 11 times
times to
to write
write data
data ]]
$$ perf
perf script
script || stackcollapse-perf.pl
stackcollapse-perf.pl >> out.perf-folded-partial
out.perf-folded-partial
$$ flamegraph.pl
flamegraph.pl out.perf-folded-partial
out.perf-folded-partial >> perf-calls-partial.svg
perf-calls-partial.svg

Profiling and Tracing, EuroPython 2019, @ChristianHeimes, CC BY-SA 4.0


Reset Zoom Flame Graph Search

39%

as..
asn..
asn..
asn..
EC.. asn1.. AS..
a.. eck.. asn1.. AS.. as.. x.. _..
as.. ecke.. asn1.. ASN.. asn1_.. _..
asn.. x509_.. ASN1.. x509_name_canon asn1_i.. [..
asn.. pubke.. x509_name_ex_d2i asn1_t.. P..
asn1_item_embed_d2i asn1_i.. _P..
asn1_template_noexp_d2i m.. _.. ASN1_i.. m.. [li.. _..
asn1_template_ex_d2i m.. [.. X509_O.. _.. _PyEva.. _..
asn1_item_embed_d2i ms.. _.. OPENSS.. [.. _PyFun.. _..
asn1_template_noexp_d2i ms.. _.. X509_S.. [.. _PyEval_.. [.. a..
asn1_template_ex_d2i __.. _.. SSL_CT.. P.. _PyEval_.. P.. A..
select@.. asn1_item_embed_d2i in.. _P.. [_ssl... _.. _PyFunctio.. _P.. A..
[readline... ASN1_item_ex_d2i B.. EVP_D.. X5.. _Py.. [libpy.. _.. P.. _PyEval_Eva.. [li.. P..
PyOS_Readl.. ASN1_item_d2i PEM_read_bio_ex x5.. _Py.. [libpy.. _.. _P.. [l.. _PyEval_Eva.. _PyEva.. X..
PyTokenize.. PEM_X509_INFO_read_bio X5.. _Py.. [libpyt.. _Py.. _P.. _P.. _PyFunction_FastCall.. b..
PyParser_P.. o.. X509_load_cert_crl_file _Py.. _PyEval_Eval.. _P.. _P.. _PyEval_EvalFrameDefault X..
PyParser_A.. state.. by_file_ctrl _PyFunction_FastCa.. _P.. [l.. _PyEval_EvalCodeWithName _Py.. [_s..
[libpython.. SSL_d.. X509_STORE_load_locations _PyEval_EvalFrameDe.. Py.. Py.. _PyFunction_FastCallKeywords _.. _Py..
PyRun_Inte.. [_ssl.cpython-37m-x86_64-linux-gnu.so] _PyEval_EvalCodeWit.. _PyEval_EvalFrameDefault _P.. _Py..
python3

Profiling and Tracing, EuroPython 2019, @ChristianHeimes, CC BY-SA 4.0


Advanced Analysis
in Kernel space
BCC – BPF Compiler Collection
A toolkit for creating efficient kernel tracing and manipulation programs.

Run tracing code in Kernel space

extended BPF (Berkeley Packet Filters)

eBPF JIT 2014
● Usable Kernel 4.12 - 4.15, 2017

C with Python and LUA frontends

rich set of existing tools

Profiling and Tracing, EuroPython 2019, @ChristianHeimes, CC BY-SA 4.0


Profiling and Tracing, EuroPython 2019, @ChristianHeimes, CC BY-SA 4.0
BCC: ext4slow, tcpconnect
Slow ext4fs operations

## ./ext4slower
./ext4slower 11
Tracing
Tracing ext4
ext4 operations
operations slower
slower than
than 11 ms
ms
TIME
TIME COMM
COMM PID
PID TT BYTES
BYTES OFF_KB
OFF_KB LAT(ms)
LAT(ms) FILENAME
FILENAME
18:44:18
18:44:18 bash
bash 4573
4573 RR 128
128 00 6.45
6.45 balooctl
balooctl
18:44:19
18:44:19 IndexedDB
IndexedDB #198
#198 4571
4571 SS 00 00 8.90
8.90 583651055lp.sqlite-wal
583651055lp.sqlite-wal
18:44:21
18:44:21 IndexedDB
IndexedDB #198
#198 4571
4571 SS 00 00 10.02
10.02 583651055lp.sqlite-wal
583651055lp.sqlite-wal
18:44:21
18:44:21 IndexedDB
IndexedDB #198
#198 4571
4571 SS 00 00 3.90
3.90 583651055lp.sqlite
583651055lp.sqlite
18:44:43
18:44:43 mozStorage
mozStorage #5
#5 4571
4571 WW 24
24 288
288 6.10
6.10 cookies.sqlite-wal
cookies.sqlite-wal

TCP connections for user id 1000

## ./tcpconnect
./tcpconnect -u
-u 1000
1000
PID
PID COMM
COMM IP
IP SADDR
SADDR DADDR
DADDR DPORT
DPORT
4874
4874 Chrome_IOThr
Chrome_IOThr 44 192.168.7.168
192.168.7.168 107.170.8.46
107.170.8.46 80
80
4874
4874 Chrome_IOThr
Chrome_IOThr 44 192.168.7.168
192.168.7.168 107.170.8.46
107.170.8.46 443
443
4874
4874 Chrome_IOThr
Chrome_IOThr 44 192.168.7.168
192.168.7.168 107.170.8.46
107.170.8.46 443
443
4874
4874 Chrome_IOThr
Chrome_IOThr 44 192.168.7.168
192.168.7.168 107.170.8.46
107.170.8.46 443
443
4874
4874 Chrome_IOThr
Chrome_IOThr 44 192.168.7.168
192.168.7.168 107.170.8.46
107.170.8.46 443
443
4874
4874 Chrome_IOThr
Chrome_IOThr 44 192.168.7.168
192.168.7.168 107.170.8.46
107.170.8.46 443
443

Profiling and Tracing, EuroPython 2019, @ChristianHeimes, CC BY-SA 4.0


BCC: sslsniff
>>>
>>> requests.get("https://ep2019.europython.eu/",
requests.get("https://ep2019.europython.eu/", headers={'Accept-Encoding':
headers={'Accept-Encoding': 'identity'})
'identity'})

## ./sslsniff
./sslsniff -p18888
-p18888
FUNC
FUNC TIME(s)
TIME(s) COMM
COMM PID
PID LEN
LEN
WRITE/SEND
WRITE/SEND 0.000000000
0.000000000 python3
python3 18888
18888 146
146
-----
----- DATA
DATA -----
-----
GET
GET // HTTP/1.1
HTTP/1.1
Host:
Host: ep2019.europython.eu
ep2019.europython.eu
User-Agent:
User-Agent: python-requests/2.20.0
python-requests/2.20.0
Accept-Encoding:
Accept-Encoding: identity
identity
Accept:
Accept: */*
*/*
Connection:
Connection: keep-alive
keep-alive
-----
----- END
END DATA
DATA -----
-----

READ/RECV
READ/RECV 0.159023041
0.159023041 python3
python3 18888
18888 8192
8192
-----
----- DATA
DATA -----
-----
HTTP/1.1
HTTP/1.1 200
200 OK
OK
Server:
Server: nginx
nginx
Date:
Date: Sun,
Sun, 07
07 Jul
Jul 2019
2019 19:18:46
19:18:46 GMT
GMT
Content-Type:
Content-Type: text/html;
text/html; charset=utf-8
charset=utf-8
Content-Length:
Content-Length: 33617
33617
Connection:
Connection: keep-alive
keep-alive
X-Frame-Options:
X-Frame-Options: SAMEORIGIN
SAMEORIGIN
ETag:
ETag: "c35dc4fdb282b799d8d77703a7553424"
"c35dc4fdb282b799d8d77703a7553424"
Vary:
Vary: Accept-Language,
Accept-Language, Cookie
Cookie
Content-Language:
Content-Language: enen
Set-Cookie:
Set-Cookie: django_language=en;
django_language=en; expires=Mon,
expires=Mon, 06-Jul-2020
06-Jul-2020 19:18:46
19:18:46 GMT;
GMT; Max-Age=31536000;
Max-Age=31536000; Path=/
Path=/

Profiling and Tracing, EuroPython 2019, @ChristianHeimes, CC BY-SA 4.0


BPFtrace
https://github.com/iovisor/bpftrace
$$ sudo
sudo bpftrace
bpftrace -e
-e \\
'tracepoint:syscalls:sys_enter_openat
'tracepoint:syscalls:sys_enter_openat {{ printf("%s(%d)
printf("%s(%d) %s\n",
%s\n", comm,
comm, pid,
pid, str(args->filename));
str(args->filename)); }'
}'
irqbalance(1329)
irqbalance(1329) /proc/interrupts
/proc/interrupts

>>>
>>> requests.get("https://ep2019.europython.eu/")
requests.get("https://ep2019.europython.eu/")
$$ sudo
sudo bpftrace
bpftrace -e
-e 'uprobe:/lib64/libc.so.6:malloc
'uprobe:/lib64/libc.so.6:malloc // comm
comm ==
== "python3"
"python3" // {{ @bytes
@bytes == hist(arg0);
hist(arg0); }'
}'
@bytes:
@bytes:
[1]
[1] 171
171 || ||
[2,
[2, 4)
4) 661
661 |@@@
|@@@ ||
[4,
[4, 8)
8) 356
356 |@|@ ||
[8,
[8, 16)
16) 1699
1699 |@@@@@@@@
|@@@@@@@@ ||
[16,
[16, 32)
32) 7929
7929 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
|@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ ||
[32,
[32, 64)
64) 10513
10513 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
|@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[64,
[64, 128)
128) 1574
1574 |@@@@@@@
|@@@@@@@ ||
[128,
[128, 256)
256) 609
609 |@@@
|@@@ ||
[256,
[256, 512)
512) 1006
1006 |@@@@
|@@@@ ||
[512,
[512, 1K)
1K) 608
608 |@@@
|@@@ ||
[1K,
[1K, 2K)
2K) 362
362 |@|@ ||
[2K,
[2K, 4K)
4K) 57
57 || ||
[4K,
[4K, 8K)
8K) 10
10 || ||
[8K,
[8K, 16K)
16K) 12
12 || ||
[16K,
[16K, 32K)
32K) 13
13 || ||
[32K,
[32K, 64K)
64K) 22 || ||
[64K,
[64K, 128K)
128K) 11 || ||

Profiling and Tracing, EuroPython 2019, @ChristianHeimes, CC BY-SA 4.0


SystemTap

Initial release 2005

stap

Scripting Language for dynamic instrumentation

Kernel module

DynInst

eBPF (extended Berkeley Packet Filter) programs

additional features:

USDT / DTrace probes (Userland Statically Defined Tracing)

Cross-VM instrumentation

Prometheus

Profiling and Tracing, EuroPython 2019, @ChristianHeimes, CC BY-SA 4.0


USDT (dtrace / systemtap)
$$ stap
stap -l
-l 'process("/usr/lib64/libpython3.7m*").mark("*")'
'process("/usr/lib64/libpython3.7m*").mark("*")'
process("/usr/lib64/libpython3.7m.so.1.0").mark("function__entry")
process("/usr/lib64/libpython3.7m.so.1.0").mark("function__entry")
process("/usr/lib64/libpython3.7m.so.1.0").mark("function__return")
process("/usr/lib64/libpython3.7m.so.1.0").mark("function__return")
process("/usr/lib64/libpython3.7m.so.1.0").mark("gc__done")
process("/usr/lib64/libpython3.7m.so.1.0").mark("gc__done")
process("/usr/lib64/libpython3.7m.so.1.0").mark("gc__start")
process("/usr/lib64/libpython3.7m.so.1.0").mark("gc__start")
process("/usr/lib64/libpython3.7m.so.1.0").mark("import__find__load__done")
process("/usr/lib64/libpython3.7m.so.1.0").mark("import__find__load__done")
process("/usr/lib64/libpython3.7m.so.1.0").mark("import__find__load__start")
process("/usr/lib64/libpython3.7m.so.1.0").mark("import__find__load__start")
process("/usr/lib64/libpython3.7m.so.1.0").mark("line")
process("/usr/lib64/libpython3.7m.so.1.0").mark("line")

$$ stap
stap -l
-l 'process.provider("php").mark("*")'
'process.provider("php").mark("*")' -c
-c /usr/bin/php
/usr/bin/php
process("/usr/bin/php").provider("php").mark("compile__file__entry")
process("/usr/bin/php").provider("php").mark("compile__file__entry")
process("/usr/bin/php").provider("php").mark("compile__file__return")
process("/usr/bin/php").provider("php").mark("compile__file__return")
process("/usr/bin/php").provider("php").mark("error")
process("/usr/bin/php").provider("php").mark("error")
process("/usr/bin/php").provider("php").mark("exception__caught")
process("/usr/bin/php").provider("php").mark("exception__caught")
process("/usr/bin/php").provider("php").mark("exception__thrown")
process("/usr/bin/php").provider("php").mark("exception__thrown")
process("/usr/bin/php").provider("php").mark("execute__entry")
process("/usr/bin/php").provider("php").mark("execute__entry")
process("/usr/bin/php").provider("php").mark("execute__return")
process("/usr/bin/php").provider("php").mark("execute__return")
process("/usr/bin/php").provider("php").mark("function__entry")
process("/usr/bin/php").provider("php").mark("function__entry")
process("/usr/bin/php").provider("php").mark("function__return")
process("/usr/bin/php").provider("php").mark("function__return")
process("/usr/bin/php").provider("php").mark("request__shutdown")
process("/usr/bin/php").provider("php").mark("request__shutdown")
process("/usr/bin/php").provider("php").mark("request__startup")
process("/usr/bin/php").provider("php").mark("request__startup")

$$ stap
stap -l
-l 'process("/usr/lib/jvm/java*/jre/lib/amd64/server/libjvm.so").mark("*")'
'process("/usr/lib/jvm/java*/jre/lib/amd64/server/libjvm.so").mark("*")' || wc
wc -l
-l
521
521

Profiling and Tracing, EuroPython 2019, @ChristianHeimes, CC BY-SA 4.0


Tracing with stap
$$ sudo
sudo stap
stap python-import.stp
python-import.stp -c -c "python3
"python3 -c
-c pass"
pass"
Pass
Pass 1: parsed user script and 496 library scripts
1: parsed user script and 496 library scripts ...
...
Pass
Pass 2:
2: analyzed
analyzed script:
script: 22 probes,
probes, 44 functions,
functions, 44 embeds,
embeds, 22 globals
globals ...
...
Pass
Pass 3: translated to C into "/tmp/stapz4MP9b/stap_d6d27027af93310fa02967b0e5910963_5289_src.c" ...
3: translated to C into "/tmp/stapz4MP9b/stap_d6d27027af93310fa02967b0e5910963_5289_src.c" ...
Pass
Pass 4:
4: compiled
compiled CC into
into "stap_d6d27027af93310fa02967b0e5910963_5289.ko"
"stap_d6d27027af93310fa02967b0e5910963_5289.ko" ... ...
Pass
Pass 5:
5: starting
starting run.
run.
ERROR:
ERROR: Couldn't
Couldn't insert
insert module
module '/tmp/stapz4MP9b/stap_d6d27027af93310fa02967b0e5910963_5289.ko':
'/tmp/stapz4MP9b/stap_d6d27027af93310fa02967b0e5910963_5289.ko': Operation
Operation
not
not permitted
permitted
$$ dmesg
dmesg
[335808.816759]
[335808.816759] Lockdown:
Lockdown: staprun:
staprun: Loading
Loading of
of unsigned
unsigned module
module is
is restricted;
restricted; see
see man
man kernel_lockdown.7
kernel_lockdown.7

$$ sudo
sudo systemctl
systemctl reboot
reboot --firmware-setup
--firmware-setup

Profiling and Tracing, EuroPython 2019, @ChristianHeimes, CC BY-SA 4.0


Trace Python imports
global
global depths
depths == 0;
0;
global
global timing
timing

probe
probe process("python3").library("libpython3.7m.so.1.0").mark("import__find__load__start")
process("python3").library("libpython3.7m.so.1.0").mark("import__find__load__start") {{
modname
modname == user_string($arg1);
user_string($arg1);
now
now == local_clock_ns()
local_clock_ns()
timing[tid(),
timing[tid(), depths]
depths] == now
now
printf("%*s*
printf("%*s* Importing '%s'
Importing '%s' ...\n",
...\n", 2*depths,
2*depths, "",
"", modname);
modname);
depths++;
depths++;
}}

probe
probe process("python3").library("libpython3.7m.so.1.0").mark("import__find__load__done")
process("python3").library("libpython3.7m.so.1.0").mark("import__find__load__done") {{
modname
modname == user_string($arg1);
user_string($arg1);
found
found == $arg2;
$arg2;
depths--;
depths--;
now
now == local_clock_ns()
local_clock_ns()
dur
dur == now
now -- timing[tid(),
timing[tid(), depths]
depths]
if
if (found)
(found)
printf("%*s+
printf("%*s+ Imported
Imported '%s'
'%s' in
in %ldus\n",
%ldus\n", 2*depths,
2*depths, "",
"", modname,
modname, dur/1000);
dur/1000);
else
else
printf("%*s-
printf("%*s- Failed
Failed '%s'
'%s' in
in %ldus\n",
%ldus\n", 2*depths,
2*depths, "",
"", modname,
modname, dur/1000);
dur/1000);
}}

Profiling and Tracing, EuroPython 2019, @ChristianHeimes, CC BY-SA 4.0


Trace Python imports
$$ sudo
sudo stap
stap python-import.stp
python-import.stp -c-c "python3
"python3 -c-c pass"
pass"
** Importing
Importing 'zipimport'
'zipimport' ...
...
++ Imported
Imported 'zipimport'
'zipimport' in
in 125us
125us
...
...
** Importing
Importing 'encodings'
'encodings' ...
...
** Importing
Importing 'codecs'
'codecs' ...
...
** Importing
Importing '_codecs'
'_codecs' ...
...
++ Imported
Imported '_codecs' in
'_codecs' in 187us
187us
++ Imported 'codecs' in 1273us
Imported 'codecs' in 1273us
** Importing
Importing 'encodings.aliases'
'encodings.aliases' ...
...
++ Imported
Imported 'encodings.aliases'
'encodings.aliases' inin 836us
836us
++ Imported
Imported 'encodings'
'encodings' in
in 3263us
3263us
...
...
** Importing
Importing 'site'
'site' ...
...
...
...
** Importing
Importing 'sitecustomize'
'sitecustomize' ...
...
-- Failed
Failed 'sitecustomize'
'sitecustomize' in
in 149us
149us
** Importing
Importing 'usercustomize'
'usercustomize' ...
...
-- Failed
Failed 'usercustomize'
'usercustomize' in
in 144us
144us
++ Imported 'site' in 28946us
Imported 'site' in 28946us

Profiling and Tracing, EuroPython 2019, @ChristianHeimes, CC BY-SA 4.0


Verdict

extremely detailed information

fast, efficient

apps, system-wide, hardware events

wide range from pre-build tools to custom code

extremely detailed information

learning curve

turn your 64 core server into a Commodore C64

Profiling and Tracing, EuroPython 2019, @ChristianHeimes, CC BY-SA 4.0


Summary
In my humble opinion

strace, bpftrace: Swiss Army Knife for simple tasks

bcc: pre-build tools

perf: benchmarking and hot-spot profiling

SystemTap: dynamic languages (Python)

ftrace: tracing on old Kernels

future: eBFP

Profiling and Tracing, EuroPython 2019, @ChristianHeimes, CC BY-SA 4.0


Resources

Brendan Gregg http://www.brendangregg.com/

eBPF

IOVisor Project https://www.iovisor.org/

Sergey Klyaus: “Dynamic Tracing with DTrace & SystemTap”
https://myaut.github.io/dtrace-stap-book/

SystemTap beginners guide
https://sourceware.org/systemtap/SystemTap_Beginners_Guide/

Eben Freeman, PyBay16: Python Tracing Superpowers
https://speakerdeck.com/emfree/python-tracing-superpowers-with-systems-tools

Slides: https://speakerdeck.com/tiran/

Profiling and Tracing, EuroPython 2019, @ChristianHeimes, CC BY-SA 4.0


Questions?

@ChristianHeimes
[email protected]
[email protected]
https://speakerdeck.com/tiran/
THANK YOU
plus.google.com/+RedHat facebook.com/redhatinc

linkedin.com/company/red-hat twitter.com/RedHat

youtube.com/user/RedHatVideos

You might also like