SlideShare a Scribd company logo
FOSDEM 2013, Bruxelles
Victor Stinner
<victor.stinner@gmail.com>
Distributed under CC BY-SA license: http://creativecommons.org/licenses/by-sa/3.0/
Two projects to
optimize Python
CPython bytecode is inefficient
AST optimizer
Register-based bytecode
Agenda
Part I
CPython bytecode
is inefficient
Python is very dynamic, cannot be easily
optimized
CPython peephole optimizer only supports
basic optimizations like replacing 1+1 with 2
CPython bytecode is inefficient
CPython is inefficient
def func():
x = 33
return x
Inefficient bytecode
Given a simple function:
LOAD_CONST 1 (33)
STORE_FAST 0 (x)
LOAD_FAST 0 (x)
RETURN_VALUE
LOAD_CONST 1 (33)
RETURN_VALUE
RETURN_CONST 1 (33)
Inefficient bytecode
I get:
(4 instructions)
I expected:
(2 instructions)
Or even:
(1 instruction)
Parse the source code
Build an Abstract Syntax Tree (AST)
Emit Bytecode
Peephole optimizer
Evaluate bytecode
How Python works
Parse the source code
Build an Abstract Syntax Tree (AST)
→ astoptimizer
Emit Bytecode
Peephole optimizer
Evaluate bytecode
→ registervm
Let's optimize!
Part II
AST optimizer
AST is high-level and contains a lot of
information
Rewrite AST to get faster code
Disable dynamic features of Python to allow
more optimizations
Unpythonic optimizations are disabled by
default
AST optimizer
Call builtin functions and methods:
len("abc") → 3
(32).bit_length() → 6
math.log(32) / math.log(2) → 5.0
Evaluate str % args and print(arg1, arg2, ...)
"x=%s" % 5 → "x=5"
print(2.3) → print("2.3")
AST optimizations (1)
Simplify expressions (2 instructions => 1):
not(x in y) → x not in y
Optimize loops (Python 2 only):
while True: ... → while 1: ...
for x in range(10): ...
→ for x in xrange(10): ...
In Python 2, True requires a (slow) global
lookup, the number 1 is a constant
AST optimizations (2)
Replace list (build at runtime) with tuple
(constant):
for x in [1, 2, 3]: ...
→ for x in (1, 2, 3): ...
Replace list with set (Python 3 only):
if x in [1, 2, 3]: ...
→ if x in {1, 2, 3}: ...
In Python 3, {1,2,3} is converted to a
constant frozenset (if used in a test)
AST optimizations (3)
Evaluate operators:
"abcdef"[:3] → "abc"
def f(): return 2 if 4 < 5 else 3
→ def f(): return 2
Remove dead code:
if 0: ...
→ pass
AST optimizations (4)
"if DEBUG" and "if os.name == 'nt'"
have a cost at runtime
Tests can be removed at compile time:
cfg.add_constant('DEBUG', False)
cfg.add_constant('os.name',
os.name)
Pythonic preprocessor: no need to modify your
code, code works without the preprocessor
Used as a preprocessor
Constant folding: experimental support
(buggy)
Unroll (short) loops
Function inlining (is it possible?)
astoptimizer TODO list
Part III
Register-based
bytecode
Rewrite instructions to use registers instead of
the stack
Use single assignment form (SSA)
Build the control flow graph
Apply different optimizations
Register allocator
Emit bytecode
registervm
def func():
x = 33
return x + 1
LOAD_CONST 1 (33) # stack: [33]
STORE_FAST 0 (x) # stack: []
LOAD_FAST 0 (x) # stack: [33]
LOAD_CONST 2 (1) # stack: [33, 1]
BINARY_ADD # stack: [34]
RETURN_VALUE # stack: []
(6 instructions)
Stack-based bytecode
def func():
x = 33
return x + 1
LOAD_CONST_REG 'x', 33 (const#1)
LOAD_CONST_REG R0, 1 (const#2)
BINARY_ADD_REG R0, 'x', R0
RETURN_VALUE_REG R0
(4 instructions)
Register bytecode
Using registers allows more optimizations
Move constants loads and globals loads (slow)
out of loops:
return [str(item) for item in data]
Constant folding:
x=1; y=x; return y
→ y=1; return y
Remove duplicate load/store instructions:
constants, names, globals, etc.
registervm optim (1)
Stack-based bytecode :
return (len("a"), len("a"))
LOAD_GLOBAL 'len' (name#0)
LOAD_CONST 'a' (const#1)
CALL_FUNCTION (1 positional)
LOAD_GLOBAL 'len' (name#0)
LOAD_CONST 'a' (const#1)
CALL_FUNCTION (1 positional)
BUILD_TUPLE 2
RETURN_VALUE
Merge duplicate loads
Register-based bytecode :
return (len("a"), len("a"))
LOAD_GLOBAL_REG R0, 'len' (name#0)
LOAD_CONST_REG R1, 'a' (const#1)
CALL_FUNCTION_REG R2, R0, 1, R1
CALL_FUNCTION_REG R0, R0, 1, R1
CLEAR_REG R1
BUILD_TUPLE_REG R2, 2, R2, R0
RETURN_VALUE_REG R2
Merge duplicate loads
Remove unreachable instructions (dead code)
Remove useless jumps (relative jump + 0)
registervm optim (2)
BuiltinMethodLookup:
fewer instructions: 390 => 22
24 ms => 1 ms (24x faster)
NormalInstanceAttribute:
fewer instructions: 381 => 81
40 ms => 21 ms (1.9x faster)
StringPredicates:
fewer instructions: 303 => 92
42 ms => 24 ms (1.8x faster)
Pybench results
Pybench is a microbenchmark
Don't expect such speedup on your
applications
registervm is still experimental and emits
invalid code
Pybench results
PyPy and its amazing JIT
Pymothoa, Numba: JIT (LLVM)
WPython: "Wordcode-based" bytecode
Hotpy 2
Shedskin, Pythran, Nuitka: compile to C++
Other projects
Questions?
https://bitbucket.org/haypo/astoptimizer
http://hg.python.org/sandbox/registervm
Distributed under CC BY-SA license: http://creativecommons.org/licenses/by-sa/3.0/
Contact:
victor.stinner@gmail.com
Thanks to David Malcom
for the LibreOffice template
http://dmalcolm.livejournal.com/

More Related Content

What's hot (20)

PDF
Bytes in the Machine: Inside the CPython interpreter
akaptur
 
PDF
"A 1,500 line (!!) switch statement powers your Python!" - Allison Kaptur, !!...
akaptur
 
PDF
Exploring slides
akaptur
 
PDF
«Отладка в Python 3.6: Быстрее, Выше, Сильнее» Елизавета Шашкова, JetBrains
it-people
 
PDF
When RV Meets CEP (RV 2016 Tutorial)
Sylvain Hallé
 
ZIP
.Net 4.0 Threading and Parallel Programming
Alex Moore
 
PDF
Pythran: Static compiler for high performance by Mehdi Amini PyData SV 2014
PyData
 
PDF
BeepBeep 3: A declarative event stream query engine (EDOC 2015)
Sylvain Hallé
 
PDF
Activity Recognition Through Complex Event Processing: First Findings
Sylvain Hallé
 
PDF
A peek on numerical programming in perl and python e christopher dyken 2005
Jules Krdenas
 
PDF
D vs OWKN Language at LLnagoya
N Masahiro
 
PDF
All I know about rsc.io/c2go
Moriyoshi Koizumi
 
PPT
Profiling and optimization
g3_nittala
 
PDF
Hacker Thursdays: An introduction to binary exploitation
OWASP Hacker Thursday
 
PDF
Are we ready to Go?
Adam Dudczak
 
PDF
GeoGebra JavaScript CheatSheet
Jose Perez
 
PPTX
Data Structures - Lecture 6 [queues]
Muhammad Hammad Waseem
 
PDF
bpftrace - Tracing Summit 2018
AlastairRobertson9
 
PDF
Beyond tf idf why, what & how
lucenerevolution
 
PPTX
Queue oop
Gouda Mando
 
Bytes in the Machine: Inside the CPython interpreter
akaptur
 
"A 1,500 line (!!) switch statement powers your Python!" - Allison Kaptur, !!...
akaptur
 
Exploring slides
akaptur
 
«Отладка в Python 3.6: Быстрее, Выше, Сильнее» Елизавета Шашкова, JetBrains
it-people
 
When RV Meets CEP (RV 2016 Tutorial)
Sylvain Hallé
 
.Net 4.0 Threading and Parallel Programming
Alex Moore
 
Pythran: Static compiler for high performance by Mehdi Amini PyData SV 2014
PyData
 
BeepBeep 3: A declarative event stream query engine (EDOC 2015)
Sylvain Hallé
 
Activity Recognition Through Complex Event Processing: First Findings
Sylvain Hallé
 
A peek on numerical programming in perl and python e christopher dyken 2005
Jules Krdenas
 
D vs OWKN Language at LLnagoya
N Masahiro
 
All I know about rsc.io/c2go
Moriyoshi Koizumi
 
Profiling and optimization
g3_nittala
 
Hacker Thursdays: An introduction to binary exploitation
OWASP Hacker Thursday
 
Are we ready to Go?
Adam Dudczak
 
GeoGebra JavaScript CheatSheet
Jose Perez
 
Data Structures - Lecture 6 [queues]
Muhammad Hammad Waseem
 
bpftrace - Tracing Summit 2018
AlastairRobertson9
 
Beyond tf idf why, what & how
lucenerevolution
 
Queue oop
Gouda Mando
 

Viewers also liked (20)

PDF
Python on Rails 2014
Albert O'Connor
 
PDF
Dive into Python Class
Jim Yeh
 
PDF
Python class
건희 김
 
PDF
The future of async i/o in Python
Saúl Ibarra Corretgé
 
PDF
A deep dive into PEP-3156 and the new asyncio module
Saúl Ibarra Corretgé
 
PDF
Python, do you even async?
Saúl Ibarra Corretgé
 
TXT
Comandos para ubuntu 400 que debes conocer
Geek Advisor Freddy
 
PDF
Python master class 3
Chathuranga Bandara
 
PDF
Python Async IO Horizon
Lukasz Dobrzanski
 
PDF
Practical continuous quality gates for development process
Andrii Soldatenko
 
PPTX
The Awesome Python Class Part-4
Binay Kumar Ray
 
PDF
Async Tasks with Django Channels
Albert O'Connor
 
PPTX
Async programming and python
Chetan Giridhar
 
PDF
Regexp
Ynon Perek
 
PDF
What is the best full text search engine for Python?
Andrii Soldatenko
 
PDF
Python as number crunching code glue
Jiahao Chen
 
PDF
Building social network with Neo4j and Python
Andrii Soldatenko
 
PDF
Async Web Frameworks in Python
Ryan Johnson
 
PDF
SylkServer: State of the art RTC application server
Saúl Ibarra Corretgé
 
PDF
Escalabilidad horizontal desde las trincheras
Saúl Ibarra Corretgé
 
Python on Rails 2014
Albert O'Connor
 
Dive into Python Class
Jim Yeh
 
Python class
건희 김
 
The future of async i/o in Python
Saúl Ibarra Corretgé
 
A deep dive into PEP-3156 and the new asyncio module
Saúl Ibarra Corretgé
 
Python, do you even async?
Saúl Ibarra Corretgé
 
Comandos para ubuntu 400 que debes conocer
Geek Advisor Freddy
 
Python master class 3
Chathuranga Bandara
 
Python Async IO Horizon
Lukasz Dobrzanski
 
Practical continuous quality gates for development process
Andrii Soldatenko
 
The Awesome Python Class Part-4
Binay Kumar Ray
 
Async Tasks with Django Channels
Albert O'Connor
 
Async programming and python
Chetan Giridhar
 
Regexp
Ynon Perek
 
What is the best full text search engine for Python?
Andrii Soldatenko
 
Python as number crunching code glue
Jiahao Chen
 
Building social network with Neo4j and Python
Andrii Soldatenko
 
Async Web Frameworks in Python
Ryan Johnson
 
SylkServer: State of the art RTC application server
Saúl Ibarra Corretgé
 
Escalabilidad horizontal desde las trincheras
Saúl Ibarra Corretgé
 
Ad

Similar to Faster Python, FOSDEM (20)

PDF
What's new in Python 3.11
Henry Schreiner
 
PDF
The Ring programming language version 1.5.3 book - Part 25 of 184
Mahmoud Samir Fayed
 
PPT
sonam Kumari python.ppt
ssuserd64918
 
PPT
User defined functions
shubham_jangid
 
PDF
Porting to Python 3
Lennart Regebro
 
PPTX
Building Efficient and Highly Run-Time Adaptable Virtual Machines
Guido Chari
 
PDF
The Ring programming language version 1.8 book - Part 86 of 202
Mahmoud Samir Fayed
 
ODP
Intro
Cosmin Poieana
 
PPTX
Advanced procedures in assembly language Full chapter ppt
Muhammad Sikandar Mustafa
 
PDF
Java VS Python
Simone Federici
 
PDF
Fibonacci Function Gallery - Part 2 - One in a series
Philip Schwarz
 
PPT
Chapter Eight(3)
bolovv
 
PDF
Introducción a Elixir
Svet Ivantchev
 
PPT
python.ppt
ramamoorthi24
 
PDF
20140531 serebryany lecture02_find_scary_cpp_bugs
Computer Science Club
 
PPTX
Seminar PSU 10.10.2014 mme
Vyacheslav Arbuzov
 
PDF
The Ring programming language version 1.7 book - Part 30 of 196
Mahmoud Samir Fayed
 
PPTX
scripting in Python
Team-VLSI-ITMU
 
PDF
SFSCON23 - Emily Bourne Yaman Güçlü - Pyccel write Python code, get Fortran ...
South Tyrol Free Software Conference
 
PDF
개발 과정 최적화 하기 내부툴로 더욱 강력한 개발하기 Stephen kennedy _(11시40분_103호)
changehee lee
 
What's new in Python 3.11
Henry Schreiner
 
The Ring programming language version 1.5.3 book - Part 25 of 184
Mahmoud Samir Fayed
 
sonam Kumari python.ppt
ssuserd64918
 
User defined functions
shubham_jangid
 
Porting to Python 3
Lennart Regebro
 
Building Efficient and Highly Run-Time Adaptable Virtual Machines
Guido Chari
 
The Ring programming language version 1.8 book - Part 86 of 202
Mahmoud Samir Fayed
 
Advanced procedures in assembly language Full chapter ppt
Muhammad Sikandar Mustafa
 
Java VS Python
Simone Federici
 
Fibonacci Function Gallery - Part 2 - One in a series
Philip Schwarz
 
Chapter Eight(3)
bolovv
 
Introducción a Elixir
Svet Ivantchev
 
python.ppt
ramamoorthi24
 
20140531 serebryany lecture02_find_scary_cpp_bugs
Computer Science Club
 
Seminar PSU 10.10.2014 mme
Vyacheslav Arbuzov
 
The Ring programming language version 1.7 book - Part 30 of 196
Mahmoud Samir Fayed
 
scripting in Python
Team-VLSI-ITMU
 
SFSCON23 - Emily Bourne Yaman Güçlü - Pyccel write Python code, get Fortran ...
South Tyrol Free Software Conference
 
개발 과정 최적화 하기 내부툴로 더욱 강력한 개발하기 Stephen kennedy _(11시40분_103호)
changehee lee
 
Ad

Faster Python, FOSDEM

  • 1. FOSDEM 2013, Bruxelles Victor Stinner <[email protected]> Distributed under CC BY-SA license: http://creativecommons.org/licenses/by-sa/3.0/ Two projects to optimize Python
  • 2. CPython bytecode is inefficient AST optimizer Register-based bytecode Agenda
  • 4. Python is very dynamic, cannot be easily optimized CPython peephole optimizer only supports basic optimizations like replacing 1+1 with 2 CPython bytecode is inefficient CPython is inefficient
  • 5. def func(): x = 33 return x Inefficient bytecode Given a simple function:
  • 6. LOAD_CONST 1 (33) STORE_FAST 0 (x) LOAD_FAST 0 (x) RETURN_VALUE LOAD_CONST 1 (33) RETURN_VALUE RETURN_CONST 1 (33) Inefficient bytecode I get: (4 instructions) I expected: (2 instructions) Or even: (1 instruction)
  • 7. Parse the source code Build an Abstract Syntax Tree (AST) Emit Bytecode Peephole optimizer Evaluate bytecode How Python works
  • 8. Parse the source code Build an Abstract Syntax Tree (AST) → astoptimizer Emit Bytecode Peephole optimizer Evaluate bytecode → registervm Let's optimize!
  • 10. AST is high-level and contains a lot of information Rewrite AST to get faster code Disable dynamic features of Python to allow more optimizations Unpythonic optimizations are disabled by default AST optimizer
  • 11. Call builtin functions and methods: len("abc") → 3 (32).bit_length() → 6 math.log(32) / math.log(2) → 5.0 Evaluate str % args and print(arg1, arg2, ...) "x=%s" % 5 → "x=5" print(2.3) → print("2.3") AST optimizations (1)
  • 12. Simplify expressions (2 instructions => 1): not(x in y) → x not in y Optimize loops (Python 2 only): while True: ... → while 1: ... for x in range(10): ... → for x in xrange(10): ... In Python 2, True requires a (slow) global lookup, the number 1 is a constant AST optimizations (2)
  • 13. Replace list (build at runtime) with tuple (constant): for x in [1, 2, 3]: ... → for x in (1, 2, 3): ... Replace list with set (Python 3 only): if x in [1, 2, 3]: ... → if x in {1, 2, 3}: ... In Python 3, {1,2,3} is converted to a constant frozenset (if used in a test) AST optimizations (3)
  • 14. Evaluate operators: "abcdef"[:3] → "abc" def f(): return 2 if 4 < 5 else 3 → def f(): return 2 Remove dead code: if 0: ... → pass AST optimizations (4)
  • 15. "if DEBUG" and "if os.name == 'nt'" have a cost at runtime Tests can be removed at compile time: cfg.add_constant('DEBUG', False) cfg.add_constant('os.name', os.name) Pythonic preprocessor: no need to modify your code, code works without the preprocessor Used as a preprocessor
  • 16. Constant folding: experimental support (buggy) Unroll (short) loops Function inlining (is it possible?) astoptimizer TODO list
  • 18. Rewrite instructions to use registers instead of the stack Use single assignment form (SSA) Build the control flow graph Apply different optimizations Register allocator Emit bytecode registervm
  • 19. def func(): x = 33 return x + 1 LOAD_CONST 1 (33) # stack: [33] STORE_FAST 0 (x) # stack: [] LOAD_FAST 0 (x) # stack: [33] LOAD_CONST 2 (1) # stack: [33, 1] BINARY_ADD # stack: [34] RETURN_VALUE # stack: [] (6 instructions) Stack-based bytecode
  • 20. def func(): x = 33 return x + 1 LOAD_CONST_REG 'x', 33 (const#1) LOAD_CONST_REG R0, 1 (const#2) BINARY_ADD_REG R0, 'x', R0 RETURN_VALUE_REG R0 (4 instructions) Register bytecode
  • 21. Using registers allows more optimizations Move constants loads and globals loads (slow) out of loops: return [str(item) for item in data] Constant folding: x=1; y=x; return y → y=1; return y Remove duplicate load/store instructions: constants, names, globals, etc. registervm optim (1)
  • 22. Stack-based bytecode : return (len("a"), len("a")) LOAD_GLOBAL 'len' (name#0) LOAD_CONST 'a' (const#1) CALL_FUNCTION (1 positional) LOAD_GLOBAL 'len' (name#0) LOAD_CONST 'a' (const#1) CALL_FUNCTION (1 positional) BUILD_TUPLE 2 RETURN_VALUE Merge duplicate loads
  • 23. Register-based bytecode : return (len("a"), len("a")) LOAD_GLOBAL_REG R0, 'len' (name#0) LOAD_CONST_REG R1, 'a' (const#1) CALL_FUNCTION_REG R2, R0, 1, R1 CALL_FUNCTION_REG R0, R0, 1, R1 CLEAR_REG R1 BUILD_TUPLE_REG R2, 2, R2, R0 RETURN_VALUE_REG R2 Merge duplicate loads
  • 24. Remove unreachable instructions (dead code) Remove useless jumps (relative jump + 0) registervm optim (2)
  • 25. BuiltinMethodLookup: fewer instructions: 390 => 22 24 ms => 1 ms (24x faster) NormalInstanceAttribute: fewer instructions: 381 => 81 40 ms => 21 ms (1.9x faster) StringPredicates: fewer instructions: 303 => 92 42 ms => 24 ms (1.8x faster) Pybench results
  • 26. Pybench is a microbenchmark Don't expect such speedup on your applications registervm is still experimental and emits invalid code Pybench results
  • 27. PyPy and its amazing JIT Pymothoa, Numba: JIT (LLVM) WPython: "Wordcode-based" bytecode Hotpy 2 Shedskin, Pythran, Nuitka: compile to C++ Other projects
  • 29. Thanks to David Malcom for the LibreOffice template http://dmalcolm.livejournal.com/