Inside The Python Virtual Machine Obi Ikenwosu pdf download
Inside The Python Virtual Machine Obi Ikenwosu pdf download
download
https://ebookbell.com/product/inside-the-python-virtual-machine-
obi-ikenwosu-11263784
Python The Stress Free Way To Learning Python Inside And Out It
Academy
https://ebookbell.com/product/python-the-stress-free-way-to-learning-
python-inside-and-out-it-academy-5903274
Python Academy The Stress Free Way To Learning Python Inside Out
Beginner It Academy
https://ebookbell.com/product/python-academy-the-stress-free-way-to-
learning-python-inside-out-beginner-it-academy-42933166
https://ebookbell.com/product/leaving-the-rat-race-with-python-an-
insiders-guide-to-freelance-developing-christian-mayer-44990992
https://ebookbell.com/product/inside-the-writers-room-conversations-
with-american-tv-writers-1st-edition-christina-kallas-46620134
Inside The Not So Big House Discovering The Details That Bring A Home
To Life Sarah Susanka
https://ebookbell.com/product/inside-the-not-so-big-house-discovering-
the-details-that-bring-a-home-to-life-sarah-susanka-46772352
https://ebookbell.com/product/inside-the-mind-of-toyota-management-
principles-for-enduring-growth-satoshi-hino-48192510
Inside The Montreal Mafia The Confessions Of Andrew Scoppa Flix Sguin
https://ebookbell.com/product/inside-the-montreal-mafia-the-
confessions-of-andrew-scoppa-flix-sguin-48273906
Inside The Magic The Making Of Fantastic Beasts And Where To Find Them
Ian Nathan
https://ebookbell.com/product/inside-the-magic-the-making-of-
fantastic-beasts-and-where-to-find-them-ian-nathan-48356796
Inside The Lost Museum Curating Past And Present Steven Lubar
https://ebookbell.com/product/inside-the-lost-museum-curating-past-
and-present-steven-lubar-48748634
Inside The Python Virtual Machine
Obi Ike-Nwosu
This book is for sale at http://leanpub.com/insidethepythonvirtualmachine
This is a Leanpub book. Leanpub empowers authors and publishers with the Lean Publishing
process. Lean Publishing is the act of publishing an in-progress ebook using lightweight tools and
many iterations to get reader feedback, pivot until you have the right book and build traction once
you do.
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
4. Python Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.1 PyObject . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.2 Under the cover of Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.3 Type Object Case Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.4 Minting type instances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.5 Objects and their attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.6 Method Resolution Order (MRO) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
5. Code Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5.1 Exploring code objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5.2 Code Objects within other code objects . . . . . . . . . . . . . . . . . . . . . . . . . . 68
5.3 Code Objects in the VM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
6. Frames Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
6.1 Allocating Frame Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
Python and CPython are used interchangeably in this text but any mention of Python
refers to CPython which is the version of python implemented in C. Other implementations
include PyPy which is python implemented in a restricted subset of Python, Jython which
is python implemented on the Java Virtual Machine etc.
I like to think of the execution of a python program as split into two or three main phases as listed
below depending on how the interpreter is invoked. These are covered in different measures within
this write-up:
1. Initialization : This involves the set up of the various data structures needed by the python
process. This will probably only counts when a program is being executed non-interactively
through the interpreter shell.
2. Compiling : This involves activities such as parsing source code to build syntax trees, creation
of abstract syntax trees, building of symbol tables and generation of code objects.
3. Interpreting : This involves the actual execution of generated code objects within some context.
The process of generating parse trees and abstract syntax trees from source code is language agnostic
so the same methods that apply to other languages also apply to Python; as a result, not much
is on this subject is covered here. On the other hand, the process of building symbol tables and
code objects from the Abstract Syntax tree is the more interesting part of the compilation phase
which is handled in a more or less python specific way and attention is paid to it. The interpreting
of compiled code objects and all the data structures that are used in the process is also covered.
Topics that will be touched upon include but are not limited to the process of building symbol tables
and generating code objects, python objects, frame objects, code objects, function objects, python
opcodes, the interpreter loop, generators and user defined classes.
¹http://tpq.io/
²http://pandas.pydata.org/
Introduction 2
This material is aimed at anybody that is interested in gaining some insight into how the
CPython virtual machine functions. It is assumed that the user is already familiar with python and
understands the fundamentals of the language. As part of this expose on the virtual machine, we go
through a considerable amount of C code so a user that has a rudimentary understanding of C will
find it easier to follow. After all said and done, all that is needed to get through this material is a
healthy desire to want to learn about the CPython virtual machine.
This work is an expanded version of personal notes taken while investigating the inner working of
the python interpreter. There is substantial amount of wisdoms in videos available in Pycon videos³,
school lectures⁴ and blog write-ups⁵. This work will not be complete without acknowledging these
fantastic sources that have been leveraged in the production of this work.
At the end of this book, a user should be able to understand the intricacies of how the Python
interpreter executes a program. This includes the various steps involved in executing the program
and the various data structures that are crucial to the execution of such program. We start off with
a gentle bird’s eye view of what happens when a trivial program is executed by passing the module
name to the interpreter at the commandline. The CPython executable can be installed from source
by following the instructions at the Python Developer’s Guide⁶.
³https://www.youtube.com/watch?v=XGF3Qu4dUqk
⁴http://pgbovine.net/cpython-internals.htm/
⁵https://tech.blog.aknin.name/2010/04/02/pythons-innards-introduction/
⁶https://docs.python.org/devguide/index.html#
2. The View From 30,000ft
This chapter provides a high level expose on how the interpreter goes about executing a python
program. In subsequent chapters, we zoom in on the various pieces of puzzle and provide a more
detailed description of such pieces. Regardless of the complexity of a python program, this process is
the same. The excellent explanation of this process provided by Yaniv Aknin in his Python Internal
series¹ provides some of the basis and motivation for this discussion.
Given a python module, test.py, this module can be executed at the command line by passing it
as an argument to the python interpreter program as such $python test.py. This is just one of the
ways of the invoking the python executable - we could start the interactive interpreter, execute a
string as code etc but these other methods of execution are not of interest to us. When the module
is passed as an argument to the executable on the command line, figure 2.1 best captures the flow
of various activities that are involved in the actual execution of the supplied module.
The python executable is a C program just like any other C program such as the linux kernel or
a simple hello world program in C so pretty much the same process happens when the python
executable is invoked. Take a moment to grasp this, the python executable is just another program
that runs your own program. The same argument can be made for the relationship between C and
assembly or llvm. The standard process initialization which depends on the platform the executable
is running on starts once the python executable is invoked with module name as argument,
¹https://tech.blog.aknin.name/2010/04/02/pythons-innards-introduction/
The View From 30,000ft 4
This writeup assumes a unix based operating system so some speicifics may differ when a
windows operating system is being used.
The C runtime performs all its initialisation magic - libraries are loaded, environment variables are
checked or set then the python executable’s main method is run just like any other C program.
The python executable’s main program is located in the ./Programs/python.c file and it handles
some initialization such as making copies of program command line arguments that were passed
to the module. The main function then calls the Py_Main function located in the ./Modules/main.c
which handles the interpreter initialization process - parsing commandline arguments and setting
program flags², reading environment variables, running hooks, carrying out hash randomization etc.
As part of the initialization process, Py_Initialize from pylifecycle.c is called; this handles the
initialization of the interpreter and thread state data structures - two very important data structures.
A look at the data structures definitions for the interpreter and thread states provides some context
into the functions of these data structures. The interpreter and thread states are just structures
with pointers to fields that hold information that is required for the execution of a program. The
interpreter state typedef (just assume that this is C lingo for type definition though this is not entirely
true) is provided in listing 2.1.
Listing 2.1: The interpreter state data structure
1 typedef struct _is {
2
3 struct _is *next;
4 struct _ts *tstate_head;
5
6 PyObject *modules;
7 PyObject *modules_by_index;
8 PyObject *sysdict;
9 PyObject *builtins;
10 PyObject *importlib;
11
12 PyObject *codec_search_path;
13 PyObject *codec_search_cache;
14 PyObject *codec_error_registry;
15 int codecs_initialized;
16 int fscodec_initialized;
17
18 PyObject *builtins_copy;
19 } PyInterpreterState;
Anyone who has used the Python programming language long enough may recognize a few of the
fields mentioned in this structure (sysdict, builtins, codec)*.
²https://docs.python.org/3.6/using/cmdline.html
The View From 30,000ft 5
1. The *next field is a reference to another interpreter instance as multiple python interpreters
can exist within the same process.
2. The *tstate_head field points to the main thread of execution - in the event that the python
program is multithreaded then the interpreter is shared by all threads created by the program
- the structure of a thread state is discussed shortly.
3. The modules, modules_by_index, sysdict, builtins and importlib are self explanatory - they
are all defined as instances of PyObject which is the root type for all python objects in the
virtual machine world. Python objects are covered in more detail in the chapters that will
follow.
4. The codec* related fields hold information that help with the location and loading of encodings.
These are very important for decoding bytes.
The execution of a program must occur within a thread. The thread state structure contains all the
information that is needed by a thread to execute python some code object - a part of the thread
data structure is shown in listing 2.2.
Listing 2.2: A cross-section of the thread state data structure
The interpreter and the thread state data structures are discussed in more details in subsequent
chapters. The initialization process also sets up the import mechanisms as well as rudimentary stdio.
Once all the initialization is complete, the Py_Main function invokes the run_file function also
located in the main.c module. The following series of function calls: PyRun_AnyFileExFlags ->
PyRun_SimpleFileExFlags->PyRun_FileExFlags->PyParser_ASTFromFileObject are made to the
PyParser_ASTFromFileObject function. The PyRun_SimpleFileExFlags function call creates the
__main__ namespace in which the file contents will be executed. It also checks if a pyc version
of the file exists - the pyc file is just a file containing the compiled version of the file being
executed. In the case that the file has a pyc version then an attempt will be made to read it in
as binary and then run it. In this case, there is no pyc file so the PyRun_FileExFlags is called and
so on. The PyParser_ASTFromFileObject function calls the PyParser_ParseFileObject which reads
the module content and builds a parse tree from it. The parse tree created is then passed to the
PyParser_ASTFromNodeObject which then goes ahead to create an abstract syntax tree from the
parse tree.
If you have been following the actual C source code by now you must have run into the
Py_INCREF and Py_DECREF by now. These are memory management functions that will be
discussed later on in more details. CPython manages the object life cycle using reference
counting; whenever a new reference to an object is made the reference is increased with the
Py_INCREF while whenever a reference goes out of scope the reference is reduced with the
Py_DECREF functions.
The AST generated is then passed to the run_mod function. This function invokes the PyAST_-
CompileObject function that creates code objects from the AST. Do note that the bytecode generated
during the call to PyAST_CompileObject is passed through a simple peephole optimizer that carries
out low hanging optimization of the generated bytecode before the code objects are created.
The run_mod function then invokes PyEval_EvalCode from the ceval.c file on the code object.
This results in another series of function call: PyEval_EvalCode->PyEval_EvalCode->_PyEval_-
EvalCodeWithName->_PyEval_EvalFrameEx function calls. The code object is passed as an argument
into most each of these functions in one form or another. The _PyEval_EvalFrameEx is the atual
interpreter loop that handles the execution of code objects. It is however not just invoked with a
code object as argument rather a frame object with has a field that references a code object is one
of its arguments. This frame object provides the context for the execution of the code object. A very
simplified version of what happens here is that the interpreter loop continuously reads the next
instruction pointed to by the instruction counter from an array of instructions. It then executes this
instruction - adding or removing objects from the value stack in the process (where is this value
The View From 30,000ft 7
stack), till there are no more instructions to be executed in the array or something exceptional that
breaks this loop occurs.
Python provides a set of functions that one can use to explore actual code objects. For example, a
simple program can be compiled into a code object and disassembled to get the opcodes that are
executed by the python virtual machine as shown in listing 2.3.
Listing 2.3: Disassembling a python function
The ./Include/opcodes.h header file contains a full listing of all the instruction/opcodes for the
python virtual machine. The opcodes are pretty straight forward conceptually. Take our example
from listing 2.3 that has a set of four instructions - the LOAD_FAST loads the value of the its
argument (x in this case) onto an evaluation (value) stack. The python virtual machine is a stack
based virtual machine so this means that values for evaluations by an opcode are gotten from a
stack and results of an evaluation are placed back on the stack for further use by other opcodes. The
BINARY_MULTIPLY opcode then pops two items from the value stack, performs binary multiplication
on both values and places the result of the binary multiplication back on the value stack. The RETURN
VALUE instruction pops a value from the stack, sets the return value to object to this value and breaks
out of the interpreter loop. From the disassembly in listing 2.3, it is pretty clear that this rather
simplistic
explanation of the operation of the interpreter loop leaves out a number of details that will be
discussed in subsequent chapters. A few of these outstanding questions may include.
1. Where are the values such as that loaded by the LOAD_FAST instruction gotten from
?
2. Where do arguments that are used as part of instructions come from ?
3. How are nested function and method calls managed ? 4 How does the interpreter loop
handle exceptions ?
After all the instructions have been the executed, the Py_Main function continues its execution
but this time around it starts the clean up process. Just as Py_Initialize was called to perform
The View From 30,000ft 8
initialization during the interpreter startup, Py_FinalizeEx is invoked to do some clean-up work;
this clean-up process involves waiting for threads to exit, calling any exit hooks and also freeing up
any memory allocated by the interpreter that is still in use.
The above description is a high-level description of the processes the python executable goes through
to execute a python program. As noted previously, alot of details are stil left to be answered and in
the chapters that will follow, we will dig into each of the stages that have been covered and try
to provide details on each of these stages. We get into action starting with a description of the
compilation process in the next chapter.
Other documents randomly have
different content
— Town Malling, i. 226
— Cuckfield, ii. 81
— Newark, i. 308
— Shrewsbury, ii. 80
— Southwark, i. 79
— Rushyford Bridge, i. 60
— Bath, i. 254
D ighlington ii 255
— Drighlington, ii. 255
— Eatanswill, i. 230
— Glastonbury, i. 112
— Guildford, ii. 55
— Somerton, i. 185-187
— Southwark, i. 226-228
— Widcombe, i. 254
— Salt Hill, i. 60
Printed and bound by Hazell, Watson & Viney, Ld., London and
Aylesbury.
Footnotes:
[1] The Great North Road, 1901, vol. i., pp. 260-66.
[2] The sign of “Scole White Hart,” illustrated in Norwich Road, p.
265.
[3] Illustrated: Brighton Road, pp. 333, 337.
[4] Illustrated: Brighton Road, p. 295.
[5] Illustrated: Norwich Road, p. 256.
[6] It is now the “Dolphin,” and numbered 269.
[7] Cf. The Hastings Road, p. 82.
[8] The Holyhead Road, vol. i., pp. 244-7; Stage Coach and Mail in
Days of Yore, vol. i., p. 46.
*** END OF THE PROJECT GUTENBERG EBOOK THE OLD INNS OF
OLD ENGLAND, VOLUME 2 (OF 2) ***
1.D. The copyright laws of the place where you are located also
govern what you can do with this work. Copyright laws in most
countries are in a constant state of change. If you are outside
the United States, check the laws of your country in addition to
the terms of this agreement before downloading, copying,
displaying, performing, distributing or creating derivative works
based on this work or any other Project Gutenberg™ work. The
Foundation makes no representations concerning the copyright
status of any work in any country other than the United States.
1.E.6. You may convert to and distribute this work in any binary,
compressed, marked up, nonproprietary or proprietary form,
including any word processing or hypertext form. However, if
you provide access to or distribute copies of a Project
Gutenberg™ work in a format other than “Plain Vanilla ASCII” or
other format used in the official version posted on the official
Project Gutenberg™ website (www.gutenberg.org), you must,
at no additional cost, fee or expense to the user, provide a copy,
a means of exporting a copy, or a means of obtaining a copy
upon request, of the work in its original “Plain Vanilla ASCII” or
other form. Any alternate format must include the full Project
Gutenberg™ License as specified in paragraph 1.E.1.
• You pay a royalty fee of 20% of the gross profits you derive
from the use of Project Gutenberg™ works calculated using the
method you already use to calculate your applicable taxes. The
fee is owed to the owner of the Project Gutenberg™ trademark,
but he has agreed to donate royalties under this paragraph to
the Project Gutenberg Literary Archive Foundation. Royalty
payments must be paid within 60 days following each date on
which you prepare (or are legally required to prepare) your
periodic tax returns. Royalty payments should be clearly marked
as such and sent to the Project Gutenberg Literary Archive
Foundation at the address specified in Section 4, “Information
about donations to the Project Gutenberg Literary Archive
Foundation.”
• You comply with all other terms of this agreement for free
distribution of Project Gutenberg™ works.
1.F.
Most people start at our website which has the main PG search
facility: www.gutenberg.org.
ebookbell.com