0% found this document useful (0 votes)
52 views

Tutorial On Python Iterators and Generators

This document provides a tutorial on iterators and generators in Python. It begins by explaining what iterators are and why they are useful for iterating over sequences in an elegant and memory-efficient way. It then provides examples of using iterators to generate Fibonacci numbers, iterate over a "circular" array, and subclass the file class. The document also explains generators, which are similar to iterators but can be written as functions that use the yield keyword. It provides additional examples of using generators and discusses their advantages over iterators.

Uploaded by

sai
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
52 views

Tutorial On Python Iterators and Generators

This document provides a tutorial on iterators and generators in Python. It begins by explaining what iterators are and why they are useful for iterating over sequences in an elegant and memory-efficient way. It then provides examples of using iterators to generate Fibonacci numbers, iterate over a "circular" array, and subclass the file class. The document also explains generators, which are similar to iterators but can be written as functions that use the yield keyword. It provides additional examples of using generators and discusses their advantages over iterators.

Uploaded by

sai
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

Tutorial on Python Iterators and Generators

Norman Matloff
University of California, Davis
c
2005-2008,
N. Matloff
February 20, 2008

Contents
1

Iterators

1.1

What Are Iterators? Why Use Them? . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.2

Example: Fibonacci Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.3

Example: Circular Array . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.4

Example: Subclassing the file Class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.5

The itertools Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Generators

2.1

General Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2.2

Example: Fibonacci Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.3

Example: Word Fetcher . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.4

Mutiple Iterators from the Same Generator . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.5

Modularity/Reusability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.6

Dont put yield in a Subfunction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.7

Coroutines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.7.1

My thrd Class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.7.2

The SimPy Discrete Event Simulation Library . . . . . . . . . . . . . . . . . . . . 19

Iterators

1.1

What Are Iterators? Why Use Them?

Lets start with an example we know from our unit on Python file and directory programming (http:
//heather.cs.ucdavis.edu/matloff/Python/PyFileDir.pdf). Say we open a file and
assign the result to f, e.g.
f = open(x)

Suppose we wish to print out the lengths of the lines of the file.
for l in f.readlines():
print len(l)

But the same result can be obtained with


for l in f:
print len(l)

The second method has two advantages:


(a) its simpler and more elegant
(b) a line of the file is not read until it is actually needed
Point (a) becomes even clearer if we take the functional programming approach. The code
print map(len,f.readlines())

is not as nice as
print map(len,f)

Point (b) would be of major importance if the file were really large. The first method above would have the
entire file in memory, very undesirable. Here we read just one line of the file at a time. Of course, we also
could do this by calling readline() instead of readlines(), but not as simply and elegantly.
In our second method, f is serving as an iterator. Lets look at the concept more generally.
Recall that a Python sequence is roughly like an array in most languages, and takes on two formslists and
tuples.1 Sequence operations in Python are much more flexible than in a language like C or C++. One can
have a function return a sequence; one can slice sequences; one can concatenate sequences; etc.
In this context, an iterator looks like a sequence when you use it, but with some major differences:
(a) you usually must write a function which actually constructs that sequence-like object
(b) an element of the sequence is not actually produced until you need it
(c) unlike real sequences, an iterator sequence can be infinitely long
1

Recall also that strings are tuples, but with extra properties.

1.2

Example: Fibonacci Numbers

For simplicity, lets start with everyones favorite computer science example, Fibonacci numbers, as defined
by the recursion,


fn =

1,
if n = 1, 2
fn1 + fn2 , if n > 2

(1)

Its easy to write a loop to compute these numbers. But lets try it as an iterator:
1
2

# iterator example; uses Fibonacci numbers, so common in computer


# science examples: f_n = f_{n-1} + f_{n-2}, with f_0 = f_1 = 1

3
4
5
6
7
8
9
10
11
12
13
14
15

class fibnum:
def __init__(self):
self.fn2 = 1 # "f_{n-2}"
self.fn1 = 1 # "f_{n-1}"
def next(self): # next() is the heart of any iterator
# note the use of the following tuple to not only save lines of
# code but also to insure that only the old values of self.fn1 and
# self.fn2 are used in assigning the new values
(self.fn1,self.fn2,oldfn2) = (self.fn1+self.fn2,self.fn1,self.fn2)
return oldfn2
def __iter__(self):
return self

Now here is how we would use the iterator, e.g. to loop with it:
1

from fib import *

2
3
4
5
6
7

def main():
f = fibnum()
for i in f:
print i
if i > 20: break

8
9
10

if __name__ == __main__:
main()

By including the method iter () in our fibnum class, we informed the Python interpreter that we wish
to use this class as an iterator. We also had to include the method next(), which as its name implies, is the
mechanism by which the sequence is formed. This enabled us to simply place an instance of the class
in the for loop above. Knowing that f is an iterator, the Python interpreter will repeatedly call f.next(),
assigning the values returned by that function to i.
As stated above, the iterator approach often makes for more elegant code. But again, note the importance
of not having to compute the entire sequence at once. Having the entire sequence in memory would waste
memory and would be impossible in the case of an infinite sequence, as we have here. Our for loop above is
iterating through an infinite number of iterationsand would do so, if we didnt stop it as we did. But each
element of the sequence is computed only at the time it is needed.
Moreover, this may be necessary, not just a luxury, even in the finite case. Consider this simple client/server
pair:
1

# x.py, server

import socket,sys,os

4
5
6
7
8
9
10
11
12
13

def main():
ls = socket.socket(socket.AF_INET,socket.SOCK_STREAM);
port = int(sys.argv[1])
ls.bind((, port))
ls.listen(1)
(conn, addr) = ls.accept()
while 1:
l = raw_input()
conn.send(l)

14
15
16

if __name__ == __main__:
main()

# w.py, client

2
3

import socket,sys

4
5
6
7
8
9
10
11
12

def main():
s = socket.socket(socket.AF_INET,socket.SOCK_STREAM)
host = sys.argv[1]
port = int(sys.argv[2])
s.connect((host,port))
flo = s.makefile(r,0)
for l in flo:
print l

13
14
15

if __name__ == __main__:
main()

(If you do not know the makefile() function, see our Python network tutorial, at http://heather.cs.
ucdavis.edu/matloff/Python/PyNet.pdf.)
The server reads lines from the keyboard. It sends each line to the client as soon as the line is typed.
However, if on the client side we had written
for l in flo.readlines:
print l

instead of
for l in flo:
print l

then the client would print out nothing until all of flo is received, meaning that the user on the server end
typed ctrl-d to end the keyboard input, thus closing the connection.
Rather than being thought of as an accident, one can use exceptions as an elegant way to end a loop
involving an iterator, using the built-in exception type StopIteration. For example:
1
2
3
4
5
6

class fibnum20:
def __init__(self):
self.fn2 = 1 # "f_{n-2}"
self.fn1 = 1 # "f_{n-1}"
def next(self):
(self.fn1,self.fn2,oldfn2) = (self.fn1+self.fn2,self.fn1,self.fn2)

7
8
9
10

if oldfn2 > 20: raise StopIteration


return oldfn2
def __iter__(self):
return self

>>> from fib20 import *


>>> g = fibnum20()
>>> for i in g:
...
i
...
1
1
2
3
5
8
13

What happens is that iterating in the loop


>>> for i in g:

catches the exception StopIteration, which makes the looping terminate, and our sequence is finite.
You can also make a real sequence out of an iterators output by using the list() function, though you of
course do have to make sure the iterator produces finite output. For example:
>>> from fib20 import *
>>> g = fibnum20()
>>> g
<fib20.fibnum20 instance at 0xb7e6c50c>
>>> list(g)
[1, 1, 2, 3, 5, 8, 13]

The functions sum(), max() and min() are built-ins for iterators, e.g.
>>> from fib20 import *
>>> g = fibnum20()
>>> sum(g)
33

1.3

Example: Circular Array

Heres an example of using iterators to make a circular array. In our tutorial on Python network programming, http://heather.cs.ucdavis.edu/matloff/Python/PyNet.pdf, we needed
to continually cycle through a list cs of client sockets:2
1
2
3
4
5
6

while (1):
# get next client, with effect of a circular queue
clnt = cs.pop(0)
...
cs.append(clnt)
...
2

I am slightly modifying it here, by assuming a constant number of clients.

Heres how to make an iterator out of it:3


1

# circular queue

2
3
4
5
6
7
8
9
10

class cq: # the argument q is a list


def __init__(self,q):
self.q = q
def __iter__(self):
return self
def next(self):
self.q = self.q[1:] + [self.q[0]]
return self.q[-1]

Lets test it:


>>>
>>>
>>>
1
>>>
2
>>>
3
>>>
1
>>>
2

from cq import *
x = cq([1,2,3])
x.next()
x.next()
x.next()
x.next()
x.next()

With this, our while loop in the network program above would look like this:
1
2
3
4

cit = cq(cs)
for clnt in cit:
# code using clnt
...

The code would iterate indefinitely.


Of course, we had to do a bit of work to set this up. But now that we have, we can reuse this code in lots of
different applications in the future, and the nice, clear form such as that above,
for clnt in cs:

adds to ease of programming and readability of code.

1.4

Example: Subclassing the file Class

As mentioned, one can use a file as an iterator. The file class does have member functions
next(). The latter is what is called by readline() and readlines(), and can be overriden.

iter () and

Suppose we often deal with text files whose only elements are 0 and 1, with the same number of elements
per line. We can form a class file01 as a subclass of file, and add some error checking:
3

Ive also made the code more compact, independent of the change to an iterator.

import sys

2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18

class file01(file):
def __init__(self,name,mode,ni):
file.__init__(self,name,mode)
self.ni = ni
def next(self):
line = file.next(self)
items = line.split()
if len(items) != self.ni:
print wrong number of items
print line
raise StopIteration
for itm in items:
if itm != 1 and itm != 0:
print non-0/1 item:, itm
raise StopIteration
return line

19
20
21
22

def main():
f = file01(sys.argv[1],r,int(sys.argv[2]))
for l in f: print l[:-1]

23
24

if __name__ == __main__: main()

Here are some simple examples:


% cat u
1 0 1
0 1 1
% python file01.py u 3
1 0 1
0 1 1
% python file01.py u 2
wrong number of items
1 0 1
% cat v
1 1 b
1 0 1
% python file01.py v 3
non-0/1 item: b

One point to note here that you can open any file (not just of this new special kind) by simply creating an
instance of the file class. For example, this would open a file x for reading and print its lines:
f = file(x,r)
for l in f: print l

Ive used that fact in main() above.


Note also that in overriding file.next(), we still need to call it, in order to to read a line:
line = file.next(self)

1.5

The itertools Module

Here you can really treat an infinite iterator like a sequence, using various tools in this module.
For instance, iterators.islice() is handy:
7

>>>
>>>
>>>
[1,

from itertools import *


g = fibnum()
list(islice(g,6)) # slice out the first 6 elements
1, 2, 3, 5, 8]

The general form of islice() is


itertools.islice(iteratorname, [start], stop, [step])

Here we get elements start, start + step, and so on, but ending before element stop.
For instance:
>>> list(islice(g,3,9,2))
[3, 8, 21]

There are also analogs of the map() and filter() functions which operate on real sequences. The call
itertools.imap(f, iter1, iter2, ...)

returns the stream f(iter1[0],iter2[0],...), which one can then apply list() to.
The call
itertools.ifilter(boolean expression, iter)

applies the boolean test to each elment of the iterator stream.

2
2.1

Generators
General Structures

Generators are entities which generate iterators! Hence the name.


Speaking very roughly in terms of our goals, a generator is a function that we wish to call repeatedly, but
which is unlike an ordinary function in that successive calls to a generator function dont start execution at
the beginning of the function. Instead, the current call to a generator function will resume execution right
after the spot in the code at which the last call exited, i.e. we pick up where we left off.
The way this occurs is as follows. One calls the generator itself just once. That returns an iterator. This is a
real iterator, with iter() and next() methods. The latter is essentially the function which implements our
pick up where we left off goal. We can either call next() directly, or use the iterator in a loop.
Note that difference in approach:
In the case of iterators, a class is recognized by the Python interpreter as an iterator by the presence
of the iter() and next() methods.
8

By contrast, with a generator we dont even need to set up a class. We simply write a plain function,
with its only distinguishing feature for recognition by the Python interpreter being that we use yield
instead of return.
Note, though, that yield and return work quite differently from each other. When a yield is executed, the
Python interpreter records the line number of that statement (there may be several yield lines within the
same generator). Then, the next time this generator function is called with this same iterator, the function
will resume execution at the line following the yield.
Here are the key points:
A yield causes an exit from the function, but the next time the function is called, we start where we
left off, i.e. at the line following the yield rather than at the beginning of the function.
All the values of the local variables which existed at the time of the yield action are now still intact
when we resume.
There may be several yield lines in the same generator.
We can also have return statements, but execution of any such statement will result in a StopIteration
exception being raised if the next() method is called again.
The yield operation has one operand (or none), which is the return value. That one operand can be a
tuple, though. As usual, if there is no ambiguity, you do not have to enclose the tuple in parentheses.
Read the following example carefully, keeping all of the above points in mind:
1

# yieldex.py example of yield, return in generator functions

2
3
4
5
6
7
8
9
10

def gy():
x = 2
y = 3
yield x,y,x+y
z = 12
yield z/x
print z/y
return

11
12
13
14
15
16

def main():
g = gy()
print g.next()
print g.next()
print g.next()

# prints x,y,x+y
# prints z/x

17
18
19

1
2
3
4
5
6
7
8
9
10

if __name__ == __main__:
main()

% python yieldex.py
(2, 3, 5)
6
4
Traceback (most recent call last):
File "yieldex.py", line 19, in ?
main()
File "yieldex.py", line 16, in main
print g.next()
StopIteration

Note that execution of the actual code in the function gy(), i.e. the lines
x = 2
...

does not occur until the first g.next() is executed.

2.2

Example: Fibonacci Numbers

As another simple illustration, lets look at the good ol Fibonacci numbers again:
1
2

# fibg,py, generator example; Fibonacci numbers


# f_n = f_{n-1} + f_{n-2}

3
4
5
6
7
8
9

1
2
3
4
5
6
7
8
9
10
11
12
13
14

def fib():
fn2 = 1 # "f_{n-2}"
fn1 = 1 # "f_{n-1}"
while True:
(fn1,fn2,oldfn2) = (fn1+fn2,fn1,fn2)
yield oldfn2

>>>
>>>
>>>
1
>>>
1
>>>
2
>>>
3
>>>
5
>>>
8

from fibg import *


g = fib()
g.next()
g.next()
g.next()
g.next()
g.next()
g.next()

Note that we do need to resume execution of the function in the middle, rather than at the top. We
certainly dont want to execute
fn2 = 1

again, for instance. Indeed, a key point is that the local variables fn1 and fn2 retain their values between
calls. This is what allowed us to get away with using just a function instead of a class. This is simpler and
cleaner than the class-based approach. For instance, in the code here we refer to fn1 instead of self.fn1 as
we did in our class-based version in Section 1.2. In more complicated functions, all these simplifications
would add up to a major improvement in readability.
This property of retaining locals between calls is like that of locals declared as static in C.4 ). Note, though,
that in Python we might set up several instances of a given generator, each instance maintaining different
values for the locals. To do this in C, we need to have arrays of the locals, indexed by the instance number.
4
If you need review of this in the C context, make sure to check a C book, or the C portion of a C++ book. Its a very important
concept

10

To implement generator-like code in C, we would still use Cs ordinary return, but would put labels on the
statements following our various return lines. Before each return, we would have code to record the label
of the next line to execute when we later resume running this function. This record will also be in a static
variable. At the top of the function, we would have a C switch statement, which would be indexed by this
record variable, and which would consist of a bunch of goto statements.

2.3

Example: Word Fetcher

The following is a producer/consumer example. The producer, getword(), gets words from a text file,
feeding them one at a time to the consumer.5 In the test here, the consumer is testgw.py.
1

# getword.py

2
3
4

# the function getword() reads from the text file fl, returning one word
# at a time; will not return a word until an entire line has been read

5
6
7
8
9
10

1
2

def getword(fl):
for line in fl:
for word in line.split():
yield word
return

# testgw.py, test of getword; counts words and computes average length


# of words

3
4
5

# usage: python testgw.py [filename]


# (stdin) is assumed if no file is specified)

6
7

from getword import *

8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23

def main():
import sys
# determine which file well evaluate
try:
f = open(sys.argv[1])
except:
f = sys.stdin
# generate the iterator
w = getword(f)
wcount = 0
wltot = 0
for wrd in w:
wcount += 1
wltot += len(wrd)
print "%d words, average length %f" % (wcount,wltot/float(wcount))

24
25
26

if __name__ == __main__:
main()

2.4

Mutiple Iterators from the Same Generator

Note our phrasing earlier (emphasis added):


...the next time this generator function is called with this same iterator, the function will resume
execution at the line following the yield
5

I thank C. Osterwisch for this much improved version of the code I had here originally.

11

Suppose for instance that we have two sorted text files, one word per line, and we wish to merge them into
a combined sorted file. We could use our getword() function above, setting up two iterators, one for each
file. Note that we might reach the end of one file before the other. We would then continue with the other
file by itself. To deal with this, we would have to test for the StopIteration exception to sense when weve
come to the end of a file.

2.5

Modularity/Reusability

You may have noticed in the last example a similarity to Unix pipes. At the shell level, we can do a lot of
everyday tasks by simply chaining together several shell commands into a pipe. For instance, say I want to
find out how many lines in the file g contain the word Davis. I could do this:
% grep Davis x | wc -l

The command wc (word count), with its -l option, counts lines. So, grep would find the files lines that I
want, and pass them on to wc via the pipe, after which wc would count them.
Similarly, the getword() function above is producing output which then is used as input by our program
testgw.py. We could chain several generators together in a pipe.
This shows that one of the biggest advantages of using iterators, and especially generators, is modularity
and reusability. Once you write a few tools like this, you can keep making use of them in lots of applications
that you write. Of course, we could still do that with ordinary functions, but without the compact elegance
and clarity that we get from iterators, e.g.
for w in word:

in the example above.

2.6

Dont put yield in a Subfunction

If you have a generator g(), and it in turn calls a function h(), dont put a yield statement in the latter, as the
Python interpreter wont know how to deal with it.

2.7

Coroutines

The term coroutines in computer science refers to subroutines that alternate in execution. Subroutine A will
run for a while, then subroutine B will run for a while, then A again, and so on. Each a subroutine runs, it
will resume execution right where it left off beforejust like Python generators.
Basically coroutines are threads, but of the nonpreemptive type. In other words, a coroutine will continue
executing until it voluntarily relinquishes the CPU. (Of course, this doesnt count timesharing. We are only
discussing flow of control among the threads of one program.) In ordinary threads, the timing of the
passing of control from one thread to another is to various degrees random.

12

The major advantage of using nonpreemptive threads is that you do not need locks. This makes your code
a lot simpler and cleaner, and much easier to debug. (The randomness alone makes ordinary threads really
tough to debug.)
In this section, I will show you two examples of Python coroutines. The first is a library class I wrote,
thrd, which serves as a Python nonpreemptive threads library. The second example is SimPy, a well-known
Python discrete-event simulation library written by Klaus Muller and Tony Vignaux.
2.7.1

My thrd Class

Though most threading systems are preemptive, there are some prominent exceptions. The GNU PTH
library, for instance, is nonpreemptive and supports C/C++. Another example is the threads library in the
Ruby scripting language.
Generators make it easy to develop a nonpreemptive threads package in Python. The yield construct is a
natural way to relinquish the CPU, and one writes the threads manager to give a thread a turn by simply
calling i.next(), where i is the iterator for the thread. Thats what Ive done here.
As an example of use of thrd, well take the string build example presented in our units on networks and
threading, available at http://heather.cs.ucdavis.edu/matloff/Python/PyNet.pdf and
http://heather.cs.ucdavis.edu/matloff/Python/PyThreads.pdf. Clients send characters one at a time to a server, which accumulates them in a string, which it echoes back to the clients.
There are two major issues in the example. First, we must deal with the fact that we have asynchronous I/O;
the server doesnt know which client it will hear from next. Second, we must make sure that the accumulated
string is always updated atomically.
Here we will use nonblocking I/O to address the issue of asynchroneity. But atomicity will be no problem
at all. Again, since threads are never interrupted, we do not need locks. Here is the code for the server:
1

# simple illustration of thrd module

2
3
4
5
6

#
#
#
#

multiple clients connect to server; each client repeatedly sends a


letter k, which the server adds to a global string v and echos back
to the client; k = means the client is dropping out; when all
clients are gone, server prints final value of v

7
8

# this is the server

9
10
11
12

import socket
import sys
from pth import *

13
14
15

class glbs: # globals


v = # the string we are building up from the clients

16
17
18
19
20
21
22
23
24
25
26
27

class serveclient(thrd):
def __init__(self,id,c):
thrd.__init__(self,id)
self.c = c[0] # socket for this client
self.c.send(c) # confirm connection
def run(self):
while 1:
# receive letter or EOF signal from c
try:
k = self.c.recv(1)
if k == : break

13

# concatenate v with k, but do NOT need a lock


glbs.v += k
self.c.send(glbs.v)
except:
pass
yield clnt loop, pause
self.c.close()

28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57

def main():
lstn = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
port = int(sys.argv[1]) # server port number
lstn.bind((, port))
lstn.listen(5)
# initialize concatenated string, v
glbs.v =
# number of clients
nclnt = 2
# accept calls from the clients
for i in range(nclnt):
(clnt,ap) = lstn.accept()
clnt.setblocking(0) # set client socket to be nonblocking
# start thread for this client, with the first argument being a
# string ID I choose for this thread, and the second argument begin
# (a tuple consisting of) the socket
t = serveclient(client +str(i),(clnt,))
# shut down the server socket, since its not needed anymore
lstn.close()
# start the threads; the call will block until all threads are done
thrd.tmgr()
print the final value of v is, glbs.v

58
59

if __name__ == __main__:

main()

Here is the client (which of course is not threaded):


1

# simple illustration of thrd module

2
3
4
5
6

#
#
#
#

two clients connect to server; each client repeatedly sends a letter,


stored in the variable k, which the server appends to a global string
v, and reports v to the client; k = means the client is dropping
out; when all clients are gone, server prints the final string v

7
8

# this is the client; usage is

9
10

python clnt.py server_address port_number

11
12
13

import socket
import sys

14
15
16
17
18

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
host = sys.argv[1] # server address
port = int(sys.argv[2]) # server port
s.connect((host, port))

19
20
21

confirm = s.recv(1)
print confirm

22
23
24
25
26
27
28
29
30

while(1):
# get letter
k = raw_input(enter a letter:)
s.send(k) # send k to server
# if stop signal, then leave loop
if k == : break
v = s.recv(1024) # receive v from server (up to 1024 bytes)
print v

31
32

s.close() # close socket

14

Note that as with the Python threading module, the user must write a function named run which will
override the one built in to the thrd class. As before, that function describes the action of the thread. The
difference here, though, is that now this function is a generator, as you (and the Python interpreter) can tell
from the presence of the yield statement.
There is a separate thread for each client. The thread for a given client will repeatedly execute the following
cycle:
Try to read a character from the client.
Process the character if there is one.
Yield, allowing the thread for another client to run.
Again, since a thread will run until it hits a yield, we dont need locks.
Just as is the case with Pythons ordinary threads, thrd is good mainly for I/O-bound applications. While
one I/O action is being done in one thread, we can start another one in another thread. A common example
would be a Web server. But those applications would be too huge to deal with in this tutorial, so we have
that very simple toy example above.
Below is another toy example, even more contrived, but again presented here because it is simple, and
because it illustrates the set/wait constructs not included in the last example. There is really no way to
describe the actions it takes, except to say that it is designed to exercise most of the possible thrd operations.
Just look at the output shown below, and then see how the code works to produce that output.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33

a1 starts
a1 x: 6
a1 pauses
a2 starts
a2 x: 7
a2 pauses
b starts
b pauses
c1 starts
c1 waits for a1-ev
c2 starts
c2 waits for a1-ev
a1 z: 19
a1 waits for b-ev
a2 z: 21
a2 waits for b-ev
b.v: 8
b sets b-ev
a1 z: 19
a1 sets a1-ev for all
c1 quits
events:
b-ev: a2
a1-ev:
c2 quits
events:
b-ev: a2
a1-ev:
b sets b-ev but stays
b quits
a2 z: 21
a2 quits
a1 quits

15

Here is the code:


1

from pth import *

2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26

class a(thrd):
def __init__(self,thrid):
thrd.__init__(self,thrid)
self.x = None
self.y = None
self.z = None
self.num = int(self.id[1])
def run(self):
print self.id, starts
self.x = 5+self.num
self.y = 12+self.num
print self.id, x:, self.x
print self.id, pauses
yield 1,pause
self.z = self.x + self.y
print self.id, z:, self.z
print self.id, waits for b-ev
yield 2,wait,b-ev
print self.id, z:, self.z
if self.id == a1:
print a1 sets a1-ev for all
yield 2a,set_all,a1-ev
print self.id, quits
yield 3,quit

27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45

class b(thrd):
def __init__(self,thrid):
thrd.__init__(self,thrid)
self.u = None
self.v = None
def run(self):
print b starts
self.u = 5
print b pauses
yield 11,pause
self.v = 8
print b.v:, self.v
print b sets b-ev
yield 12,set,b-ev
print b sets b-ev but stays
yield uv,set_but_stay,b-ev
print b quits
yield our last one,quit

46
47
48
49
50
51
52
53
54
55
56

class c(thrd):
def __init__(self,thrid):
thrd.__init__(self,thrid)
def run(self):
print self.id, starts
print self.id, waits for a1-ev
yield cwait,wait,a1-ev
print self.id, quits
thrd.prevs()
yield cquit,quit

57
58
59
60
61
62
63
64

def main():
ta1 = a(a1)
ta2 = a(a2)
tb = b(b)
tc1 = c(c1)
tc2 = c(c2)
thrd.tmgr()

65

16

66

if __name__ == __main__: main()

Now, how is all this done. Below is the code for the thrd library.
First read the comments at the top of the file, and then the init()
threads manager . The latter repeatedly does the following:

code. Then glance at the code for the

get the first thread in the runnable list, thr


have it run until it hits a yield, by calling thr.itr.next()
take whatever action (pause, wait, set, etc.) that the thread requested when it yielded
Then you should be able to follow the thrd member functions fairly easily.
1
2

# pth.py: non-preemptive threads for Python; inspired by the GNU PTH


# package for C/C++

3
4
5
6

# typical application will have one class for each type of thread; its
# main() will set up the threads as instances of the classes, and lastly
# will call thrd.tmgr()

7
8
9
10

# each thread type is a subclass of the class thrd; in that subclass,


# the user must override thrd.run(), with the code consisting of the
# actions the thread will take

11
12
13

# threads actions are triggered by the Python yield construct, in the


# following format:

14
15

yield yieldID, action_string [, arguments]

16
17

# the yieldID is for application code debugging purposes

18
19

# possible actions:

20
21
22

#
#

yield yieldID, pause:


thread relinquishes this turn, rejoins runnable list at the end

#
#
#

yield yieldID, wait, eventID:


thread changes state to waiting, joins end of queue for
the given event

#
#
#
#

yield yieldID,
thread sets
the thread,
at the head

#
#
#
#

yield yieldID, set_but_stay, eventID:


thread sets the given event, but remains at head of runnable list;
thread, if any, at head of queue for the event is inserted in
runnable list following the head

#
#
#
#

yield yieldID, set_all, eventID:


thread sets the given event, rejoins runnable list at the end;
all threads in queue for the event are inserted at the head of
the runnable list, in the same order they had in the queue

#
#

yield yieldID, quit:


thread exits

23
24
25
26
27
28
29
30
31

set, eventID:
the given event, rejoins runnable list at the end;
if any, at head of queue for this event is inserted
of the runnable list

32
33
34
35
36
37
38
39
40
41
42
43
44
45
46

class thrd:

47

17

48
49
50
51
52
53

runlst = [] # runnable thread list


evnts = {} # a key is an event ID, a string; value is a list of
# threads waiting for that event
waitlst = [] # waiting thread list
didyield = None # thread that last performed a yield op; for
# application code debugging purposes

54
55
56
57
58
59
60
61
62
63

def __init__(self,id):
self.id = id # user-supplied string
self.state = runnable # the other possible state is waiting
self.yieldact = # action at last yield; for application code
# debugging purposes
self.waitevnt = # what event this thread is waiting for, if any;
# for application code debugging purposes
self.itr = self.run() # this threads iterator
thrd.runlst.append(self)

64
65
66

def run(self):
pass

# stub, must override

67
68
69
70
71

# triggered by: yield yieldID, pause


def do_pause(self,yv):
del thrd.runlst[0]
thrd.runlst.append(self)

72
73
74
75
76
77
78
79
80
81
82
83

# triggered by: yield yieldID, wait, eventID


def do_wait(self,yv):
del thrd.runlst[0]
self.state = waiting
self.waitevnt = yv[2]
# check to see if this is a new event
if yv[2] not in thrd.evnts.keys():
thrd.evnts[yv[2]] = [self]
else:
thrd.evnts[yv[2]].append(self)
thrd.waitlst.append(self)

84
85
86
87
88
89
90
91
92
93

# reactivate first thread waiting for this event, and place it at


# position pos of runlst
def react(ev,pos):
thr = thrd.evnts[ev].pop(0)
thr.state = runnable
thr.waitevnt =
thrd.waitlst.remove(thr)
thrd.runlst.insert(pos,thr)
react = staticmethod(react)

94
95
96
97
98
99
100

# triggered by: yield yieldID, set, eventID


def do_set(thr,yv):
del thrd.runlst[0]
thrd.runlst.append(thr)
thrd.react(yv[2],0)
do_set = staticmethod(do_set)

101
102
103
104
105

# triggered by: yield yieldID, set_but_stay


def do_set_but_stay(thr,yv):
thrd.react(yv[2],1)
do_set_but_stay = staticmethod(do_set_but_stay)

106
107
108
109
110
111
112
113

# triggered by: yield yieldID, set_all, eventID


def do_set_all(self,yv):
del thrd.runlst[0]
ev = yv[2]
for i in range(len(thrd.evnts[ev])):
thrd.react(ev,i)
thrd.runlst.append(self)

114
115

# triggered by:

yield yieldID, quit

18

116
117

def do_quit(self,yv):
del thrd.runlst[0]

118
119
120
121
122
123

# for application code debugging


# prints info about a thread
def prthr(self):
print ID: %s, state: %s, ev: %s, yldact: %s % \
(self.id, self.state, self.waitevnt, self.yieldact)

124
125
126
127
128
129
130
131
132
133
134

# for application code debugging


# print info on all threads
def prthrs():
print runlst:
for t in thrd.runlst:
t.prthr()
print waiting list:
for t in thrd.waitlst:
thrd.prthr(t)
prthrs = staticmethod(prthrs)

135
136
137
138
139
140
141
142
143
144
145

# for application code debugging


# printf info on all events
def prevs():
print events:
for eid in thrd.evnts.keys():
print %s: % eid,
for thr in thrd.evnts[eid]:
print thr.id,
print
prevs = staticmethod(prevs)

146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164

# threads manager
def tmgr():
# while still have runnable threads, cycle repeatedly through them
while (thrd.runlst):
# get next thread
thr = thrd.runlst[0]
# call it
yieldvalue = thr.itr.next()
# the above call to next() runs the thread until a yield, with
# the latter returning yieldvalue
thr.yieldID = yieldvalue[0]
thrd.didyield = thr
# call the function requested in the yield
yv1 = yieldvalue[1] # requested action
thr.yieldact = yv1
actftn = eval(thrd.do_+yv1)
actftn(thr,yieldvalue)
tmgr = staticmethod(tmgr)

2.7.2

The SimPy Discrete Event Simulation Library

In discrete event simulation (DES), we are modeling discontinuous changes in the system state. We may
be simulating a queuing system, for example, and since the number of jobs in the queue is an integer, the
number will be incremented by an integer value, typically 1 or -1.6 By contrast, if we are modeling a weather
system, variables such as temperature change continuously.
SimPy is a widely used open-source Python library for DES. Following is an example of its use:
1

#!/usr/bin/env python

2
6

Batch queues may take several jobs at a time, but the increment is still integer-valued.

19

# MachRep.py

4
5
6
7
8
9

#
#
#
#
#

SimPy example: Two machines, but sometimes break down. Up time is


exponentially distributed with mean 1.0, and repair time is
exponentially distributed with mean 0.5. In this example, there is
only one repairperson, so the two machines cannot be repaired
simultaneously if they are down at the same time.

#
#
#
#

In addition to finding the long-run proportion of up time, lets also


find the long-run proportion of the time that a given machine does not
have immediate access to the repairperson when the machine breaks
down. Output values should be about 0.6 and 0.67.

10
11
12
13
14
15
16
17

from SimPy.Simulation import *


from random import Random,expovariate,uniform

18
19
20
21
22

class G: # globals
Rnd = Random(12345)
# create the repairperson
RepairPerson = Resource(1)

23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57

class MachineClass(Process):
TotalUpTime = 0.0 # total up time for all machines
NRep = 0 # number of times the machines have broken down
NImmedRep = 0 # number of breakdowns in which the machine
# started repair service right away
UpRate = 1/1.0 # breakdown rate
RepairRate = 1/0.5 # repair rate
# the following two variables are not actually used, but are useful
# for debugging purposes
NextID = 0 # next available ID number for MachineClass objects
NUp = 0 # number of machines currently up
def __init__(self):
Process.__init__(self)
self.StartUpTime = 0.0 # time the current up period stated
self.ID = MachineClass.NextID
# ID for this MachineClass object
MachineClass.NextID += 1
MachineClass.NUp += 1 # machines start in the up mode
def Run(self):
while 1:
self.StartUpTime = now()
yield hold,self,G.Rnd.expovariate(MachineClass.UpRate)
MachineClass.TotalUpTime += now() - self.StartUpTime
# update number of breakdowns
MachineClass.NRep += 1
# check whether we get repair service immediately
if G.RepairPerson.n == 1:
MachineClass.NImmedRep += 1
# need to request, and possibly queue for, the repairperson
yield request,self,G.RepairPerson
# OK, weve obtained access to the repairperson; now
# hold for repair time
yield hold,self,G.Rnd.expovariate(MachineClass.RepairRate)
# release the repairperson
yield release,self,G.RepairPerson

58
59
60
61
62
63
64
65
66
67
68
69

def main():
initialize()
# set up the two machine processes
for I in range(2):
M = MachineClass()
activate(M,M.Run())
MaxSimtime = 10000.0
simulate(until=MaxSimtime)
print proportion of up time:, MachineClass.TotalUpTime/(2*MaxSimtime)
print proportion of times repair was immediate:, \
float(MachineClass.NImmedRep)/MachineClass.NRep

70

20

71

if __name__ == __main__:

main()

There is a lot here, but basically it is similar to the thrd class we saw above. If you were to look at the
SimPy internal code, SimPy.Simulation.py, you would see that a large amount of it looks like the code in
thrd. In fact, the SimPy library could be rewritten on top of thrd, reducing the size of the library. That
would make future changes to the library easier, and would even make it easier to convert SimPy to some
other language, say Ruby.
Read the comments in the first few lines of the code to see what kind of system this program is modeling
before going further.
Now, lets see the details.
SimPys thread class is Process. The application programmer writes one or more subclasses of this one to
serve as thread classes. Similar to the case for the thrd and threading classes, the subclasses of Process
must include a method Run(), which describes the actions of the thread. The SimPy method activate() is
used to add a thread to the run list.
The main new ingredient here is the notion of simulated time. The current simulated time is stored in the
variable Simulation. t. Each time an event is created, via execution of a statement like
yield hold, self, holdtime

SimPy schedules the event to occur holdtime time units from now, i.e. at time t+holdtime. What I mean
by schedule here is that SimPy maintains an internal data structure which stores all future events, ordered
by their occurrence times. Lets call this the scheduled events structure, SES. Note that the elements in SES
are threads, i.e. instances of the class Process. A new event will be inserted into the SES at the proper place
in terms of time ordering.
The main loop in SimPy repeatedly cycles through the following:
Remove the earliest event, say v, from SES.
Advance the simulated time clock Simulation. t to the occurrence time of v.
Call the iterator for v, i.e. the iterator for the Run() generator of that thread.
After Run() does a yield, act on whatever operation it requests, such as hold.
Note that this is similar to, though different from, an ordinary threads manager, due to the time element. In
ordinary threads programming, there is no predicting as to which thread will run next. Here, we know which
one it will be (as long as there are no tied event times, which in most applications do not occur).
In simulation programming, we often need to have one entity wait for some event to occur. In our example
here, if one machine goes down while the other is being repaired, the newly-broken machine will need to
wait for the repairperson to become available. Clearly this is like the condition variables construct in most
threads packages, including the wait and set operations in thrd, albeit at a somewhat higher level.
Specifically, SimPy includes a Resource class. In our case here, the resource is the repairperson. When a
line like

21

yield request,self,G.RepairPerson

is executed, SimPy will look at the internal data structure in which SimPy stores the queue for the repairperson. If it is empty, the thread that made the request will acquire access to the repairperson, and control will
return to the statement following yield request. If there are threads in the queue (here, there would be at
most one), then the thread which made the request will be added to the queue. Later, when a statement like
yield release,self,G.RepairPerson

is executed by the thread currently accessing the repairperson, SimPy will check its queue, and if the queue
is nonempty, SimPy will remove the first thread from the queue, and have it resume execution where it left
off.7
Since the simulated time variable Simulation. t is in a separate module, we cannot access it directly. Thus
SimPy includes a getter function, now(), which returns the value of Simulation. t.
Most discrete event simulation applications are stochastic in nature, such as we see here with the random
up and repair times for the machines. Thus most SimPy programs import the Python random module, as in
this example.

This will not happen immediately. The thread that triggered the release of the resource will be allowed to resume execution
right after the yield release statement. But SimPy will place an artificial event in the SES, with event time equal to the current time,
i.e. the time at which the release occurred. So, as soon as the current thread finishes, the awakened thread will get a chance to run
again.

22

You might also like