Howto Perf Profiling
Howto Perf Profiling
profiler
Release 3.12.0
Contents
Index 6
Note: Support for the perf profiler is currently only available for Linux on select architectures. Check the output of the
configure build step or check the output of python -m sysconfig | grep HAVE_PERF_TRAMPOLINE
to see if your system is supported.
def foo(n):
result = 0
for _ in range(n):
result += 1
(continues on next page)
1
(continued from previous page)
return result
def bar(n):
foo(n)
def baz(n):
bar(n)
if __name__ == "__main__":
baz(1000000)
#
91.08% 0.00% 0 python.exe python.exe [.] _start
|
---_start
|
--90.71%--__libc_start_main
Py_BytesMain
|
|--56.88%--pymain_run_python.constprop.0
| |
| |--56.13%--_PyRun_AnyFileObject
| | _PyRun_SimpleFileObject
| | |
| | |--55.02%--run_mod
| | | |
| | | --54.65%--PyEval_EvalCode
| | | _PyEval_
,→EvalFrameDefault
| | | PyObject_
,→Vectorcall
| | | _PyEval_Vector
| | | _PyEval_
,→EvalFrameDefault
| | | PyObject_
,→Vectorcall
| | | _PyEval_Vector
| | | _PyEval_
,→EvalFrameDefault
| | | PyObject_
,→Vectorcall
| | | _PyEval_Vector
| | | |
| | | |--51.67%--_
,→PyEval_EvalFrameDefault
| | | | |
(continues on next page)
2
(continued from previous page)
| | | | |--11.
,→ 52%--_PyLong_Add
| | | | | ␣
,→ |
| | | | | ␣
,→ |--2.97%--_PyObject_Malloc
...
As you can see, the Python functions are not shown in the output, only _Py_Eval_EvalFrameDefault (the function
that evaluates the Python bytecode) shows up. Unfortunately that’s not very useful because all Python functions use the
same C function to evaluate bytecode so we cannot know which Python function corresponds to which bytecode-evaluating
function.
Instead, if we run the same experiment with perf support enabled we get:
$ perf report --stdio -n -g
#
90.58% 0.36% 1 python.exe python.exe [.] _start
|
---_start
|
--89.86%--__libc_start_main
Py_BytesMain
|
|--55.43%--pymain_run_python.constprop.0
| |
| |--54.71%--_PyRun_AnyFileObject
| | _PyRun_SimpleFileObject
| | |
| | |--53.62%--run_mod
| | | |
| | | --53.26%--PyEval_EvalCode
| | | py::<module>:/src/
,→script.py
| | | _PyEval_
,→EvalFrameDefault
| | | PyObject_
,→Vectorcall
| | | _PyEval_Vector
| | | py::baz:/src/
,→script.py
| | | _PyEval_
,→EvalFrameDefault
| | | PyObject_
,→Vectorcall
| | | _PyEval_Vector
| | | py::bar:/src/
,→script.py
| | | _PyEval_
,→EvalFrameDefault
| | | PyObject_
,→Vectorcall
| | | _PyEval_Vector
(continues on next page)
3
(continued from previous page)
| | | py::foo:/src/
,→ script.py
| | | |
| | | |--51.81%--_
,→PyEval_EvalFrameDefault
| | | | |
| | | | |--13.
,→77%--_PyLong_Add
| | | | | ␣
,→ |
| | | | | ␣
,→ |--3.26%--_PyObject_Malloc
perf profiling support can be enabled either from the start using the environment variable PYTHONPERFSUPPORT
or the -X perf option, or dynamically using sys.activate_stack_trampoline() and sys.
deactivate_stack_trampoline().
The sys functions take precedence over the -X option, the -X option takes precedence over the environment variable.
Example, using the environment variable:
import sys
sys.activate_stack_trampoline("perf")
do_profiled_stuff()
sys.deactivate_stack_trampoline()
non_profiled_stuff()
…then:
$ python ./example.py
$ perf report -g -i perf.data
4
2 How to obtain the best results
If you don’t see any output it means that your interpreter has not been compiled with frame pointers and therefore it may
not be able to show Python functions in the output of perf.
5
Index
E
environment variable
PYTHONPERFSUPPORT, 4
P
PYTHONPERFSUPPORT, 4