Module 5 Complete [TB]
Module 5 Complete [TB]
Making Things
Interactive with Bokeh
Overview
In this chapter, we will design interactive plots using the Bokeh library. By
the end of this chapter, you will be able to use Bokeh to create insightful
web-based visualizations and explain the difference between two interfaces
for plotting. You will identify when to use the Bokeh server and create
interactive visualizations.
306 | Making Things Interactive with Bokeh
Introduction
Bokeh is an interactive visualization library focused on modern browsers and the
web. Other than Matplotlib or geoplotlib, the plots and visualizations we are going to
create in this chapter will be based on JavaScript widgets. Bokeh allows us to create
visually appealing plots and graphs nearly out of the box without much styling. In
addition to that, it helps us construct performant interactive dashboards based on
large static datasets or even streaming data.
Bokeh has been around since 2013, with version 1.4.0 being released in November
2019. It targets modern web browsers to present interactive visualizations to users
rather than static images. The following are some of the features of Bokeh:
• Supports multiple languages: Other than Matplotlib and geoplotlib, Bokeh has
libraries for both Python and JavaScript, in addition to several other
popular languages.
• Beautiful chart styling: The tech stack is based on Tornado in the backend
and is powered by D3 in the frontend. D3 is a JavaScript library for creating
outstanding visualizations. Using the underlying D3 visuals allows us to create
beautiful plots without much custom styling.
Since we are using Jupyter Notebook throughout this book, it's worth mentioning that
Bokeh, including its interactivity, is natively supported in Notebook.
Introduction | 307
Concepts of Bokeh
The basic concept of Bokeh is, in some ways, comparable to that of Matplotlib. In
Bokeh, we have a figure as our root element, which has sub-elements such as a title,
an axis, and glyphs. Glyphs have to be added to a figure, which can take on different
shapes, such as circles, bars, and triangles. The following hierarchy shows the
different concepts of Bokeh:
Interfaces in Bokeh
The interface-based approach provides different levels of complexity for users that
either want to create some basic plots with very few customizable parameters or
want full control over their visualizations to customize every single element of their
plots. This layered approach is divided into two levels:
Note
The models interface is the basic building block for all plots.
The following are the two levels of the layered approach to interfaces:
• bokeh.plotting
The vital thing to note here is that even though its setup is done automatically,
we can configure the sub-elements. When using this interface, the creation of
the scene graph used by BokehJS is handled automatically too.
• bokeh.models
This low-level interface is composed of two libraries: the JavaScript library called
BokehJS, which gets used for displaying the charts in the browser, and the core
plot creation Python code, which provides the developer interface. Internally, the
definition created in Python creates JSON objects that hold the declaration for
the JavaScript representation in the browser.
Introduction | 309
The models interface provides complete control over how Bokeh plots and
widgets (elements that enable users to interact with the data displayed) are
assembled and configured. This means that it is up to the developer to ensure
the correctness of the scene graph (a collection of objects describing
the visualization).
Output
Outputting Bokeh charts is straightforward. There are three ways this can be done:
• The .show() method: The primary option is to display the plot in an HTML page
using this method.
• The inline .show() method: When using inline plotting with a Jupyter
Notebook, the .show() method will allow you to display the chart inside
your Notebook.
The most powerful way of providing your visualization is through the use of the
Bokeh server.
Bokeh Server
Bokeh creates scene graph JSON objects that will be interpreted by the BokehJS
library to create the visualization output. This process gives you a unified format for
other languages to create the same Bokeh plots and visualizations, independently of
the language used.
To create more complex visualizations and leverage the tooling provided by Python,
we need a way to keep our visualizations in sync with one another. This way, we can
not only filter data but also do calculations and operations on the server-side, which
updates the visualizations in real-time.
In addition to that, since we will have an entry point for data, we can create
visualizations that get fed by streams instead of static datasets. This design provides a
way to develop more complex systems with even greater capabilities.
310 | Making Things Interactive with Bokeh
Looking at the scheme of this architecture, we can see that the documents are
provided on the server-side, then moved over to the browser, which then inserts
it into the BokehJS library. This insertion will trigger the interpretation by BokehJS,
which will then create the visualization. The following diagram describes how the
Bokeh server works:
Presentation
In Bokeh, presentations help make the visualization more interactive by using
different features, such as interactions, styling, tools, and layouts.
Interactions
Probably the most exciting feature of Bokeh is its interactions. There are two types of
interactions: passive and active.
Introduction | 311
Passive interactions are actions that the users can take that doesn't change the
dataset. In Bokeh, this is called the inspector. As we mentioned before, the inspector
contains attributes such as zooming, panning, and hovering over data. This tooling
allows the user to inspect the data in more detail and might provide better insights
by allowing the user to observe a zoomed-in subset of the visualized data points. The
elements highlighted with a box in the following figure show the essential passive
interaction elements provided by Bokeh. They include zooming, panning, and
clipping data.
Active interactions are actions that directly change the displayed data. This includes
actions such as selecting subsets of data or filtering the dataset based on parameters.
Widgets are the most prominent of active interactions since they allow users to
manipulate the displayed data with handlers. Examples of available widgets are
buttons, sliders, and checkboxes.
312 | Making Things Interactive with Bokeh
Referring back to the subsection about the output styles, these widgets can be
used in both the so-called standalone applications in the browser and the Bokeh
server. This will help us consolidate the recently learned theoretical concepts and
make things more transparent. Some of the interactions in Bokeh are tab panes,
dropdowns, multi-selects, radio groups, text inputs, check button groups, data tables,
and sliders. The elements highlighted with a red box in the following figure show a
custom active interaction widget for the same plot we looked at in the example of
passive interaction.
Integrating
Embedding Bokeh visualizations can take two forms:
Bokeh is a little bit more complicated than Matplotlib with Seaborn and has its
drawbacks like every other library. Once you have the basic workflow down, however,
you're able to quickly extend basic visualizations with interactivity features to give
power to the user.
Note
One interesting feature is the to_bokeh method, which allows you to
plot Matplotlib figures with Bokeh without configuration overhead. Further
information about this method is available at https://bokeh.pydata.org/
en/0.12.3/docs/user_guide/compat.html.
In the following exercises and activities, we'll consolidate the theoretical knowledge
and build several simple visualizations to explain Bokeh and its two interfaces.
After we've covered the basic usage, we will compare the plotting and models
interfaces and work with widgets that add interactivity to the visualizations.
314 | Making Things Interactive with Bokeh
Basic Plotting
As mentioned before, the plotting interface of Bokeh gives us a higher-level
abstraction, which allows us to quickly visualize data points on a grid.
output_notebook()
Before we can create a plot, we need to import the dataset. In the examples in this
chapter, we will work with a computer hardware dataset. It can be imported by using
pandas' read_csv method.
The basic flow when using the plotting interface is comparable to that of
Matplotlib. We first create a figure. This figure is then used as a container to define
elements and call methods on:
show(plot)
Once we have created a new figure instance using the imported figure() method,
we can use it to draw lines, circles, or any glyph objects that Bokeh offers. Note that
the first two arguments of the plot.line method is datasets that contain an equal
number of elements to plot the element.
Basic Plotting | 315
To display the plot, we then call the show() method we imported from the bokeh.
plotting interface earlier on. The following figure shows the output of the
preceding code:
Figure 6.5: Line plot showing the cache memory of different hardware
316 | Making Things Interactive with Bokeh
Since the interface of different plotting types is unified, scatter plots can be created in
the same way as line plots:
Figure 6.6: Scatter plot showing the cache memory of different hardware
Basic Plotting | 317
show(plot)
318 | Making Things Interactive with Bokeh
Figure 6.7: Line plots displaying the cache memory and cycle time per
hardware with the legend
Basic Plotting | 319
When looking at the preceding example, we can see that once we have several lines,
the visualization can get cluttered.
We can give the user the ability to mute, meaning defocus, the clicked element in
the legend.
plot.legend.click_policy="mute"
show(plot)
320 | Making Things Interactive with Bokeh
Figure 6.8: Line plots displaying the cache memory and cycle time per hardware with a
mutable legend; cycle time is also muted
344 | Making Things Interactive with Bokeh
In the next section, we will create interactive visualizations that allow the user to
modify the data that is displayed.
Adding Widgets
One of the most powerful features of Bokeh is the ability to use widgets to
interactively change the data that's displayed in a visualization. To understand the
importance of interactivity in your visualizations, imagine seeing a static visualization
about stock prices that only shows data for the last year.
If you're interested in seeing the current year or even visually comparing it to the
recent and coming years, static plots won't be suitable. You would need to create one
plot for every year or even overlay different years on one visualization, which would
make it much harder to read.
Comparing this to a simple plot that lets the user select the date range they want, we
can already see the advantages. You can guide the user by restricting values and only
displaying what you want them to see. Developing a story behind your visualization is
very important, and doing this is much easier if the user has ways of interacting with
the data.
Bokeh widgets work best when used in combination with the Bokeh server. However,
using the Bokeh server approach is beyond the content of this book, since we would
need to work with simple Python files. Instead, we will use a hybrid approach that
only works with the Jupyter Notebook.
We will look at the different widgets and how to use them before going in and
building a basic plot with one of them. There are a few different options regarding
how to trigger updates, which are also explained in this section. The widgets that will
be covered in the following exercise are explained in the following table:
Adding Widgets | 345
The general way to create a new widget visible in a Jupyter Notebook is to define
a new method and wrap it into an interact widget. We'll be using the "syntactic
sugar" way of adding a decorator to a method—that is, by using annotations. This will
give us an interactive element that will be displayed after the executable cell, like in
the following example:
In the preceding example, we first import the interact element from the
ipywidgets library. This then allows us to define a new method and annotate it
with the @interact decorator.
The Value attribute tells the interact element which widget to use based on the
data type of the argument. In our example, we provide a string, which will give us a
TextBox widget. We can refer to the preceding table to determine which Value
data type will return which widget.
The print statement in the preceding code prints whatever has been entered in the
textbox below the widget.
Note
The methods that we can use interact with always have the same structure.
We will look at several examples in the following exercise.
Networked programs
While many of the examples in this book have focused on reading files and looking
for data in those files, there are many different sources of information when one
considers the Internet.
In this chapter we will pretend to be a web browser and retrieve web pages using
the Hypertext Transfer Protocol (HTTP). Then we will read through the web page
data and parse it.
145
146 CHAPTER 12. NETWORKED PROGRAMS
https://www.w3.org/Protocols/rfc2616/rfc2616.txt
This is a long and complex 176-page document with a lot of detail. If you find
it interesting, feel free to read it all. But if you take a look around page 36 of
RFC2616 you will find the syntax for the GET request. To request a document
from a web server, we make a connection, e.g. to the www.pr4e.org server on port
80, and then send a line of the form
GET http://data.pr4e.org/romeo.txt HTTP/1.0
where the second parameter is the web page we are requesting, and then we also
send a blank line. The web server will respond with some header information about
the document and a blank line followed by the document content.
import socket
while True:
data = mysock.recv(512)
if len(data) < 1:
break
print(data.decode(),end='')
mysock.close()
# Code: https://www.py4e.com/code3/socket1.py
HTTP/1.1 200 OK
12.2. THE WORLD’S SIMPLEST WEB BROWSER 147
Your
Program
www.py4e.com
socket )
* Web Pages
connect + Port 80 .
send ,
- .
recv . .
The output starts with headers which the web server sends to describe the docu-
ment. For example, the Content-Type header indicates that the document is a
plain text document (text/plain).
After the server sends us the headers, it adds a blank line to indicate the end of
the headers, and then sends the actual data of the file romeo.txt.
This example shows how to make a low-level network connection with sockets.
Sockets can be used to communicate with a web server or with a mail server or
many other kinds of servers. All that is needed is to find the document which
describes the protocol and write the code to send and receive the data according
to the protocol.
However, since the protocol that we use most commonly is the HTTP web protocol,
Python has a special library specifically designed to support the HTTP protocol
for the retrieval of documents and data over the web.
One of the requirements for using the HTTP protocol is the need to send and
receive data as bytes objects, instead of strings. In the preceding example, the
encode() and decode() methods convert strings into bytes objects and back again.
148 CHAPTER 12. NETWORKED PROGRAMS
The next example uses b'' notation to specify that a variable should be stored as
a bytes object. encode() and b'' are equivalent.
In the above example, we retrieved a plain text file which had newlines in the file
and we simply copied the data to the screen as the program ran. We can use a
similar program to retrieve an image across using HTTP. Instead of copying the
data to the screen as the program runs, we accumulate the data in a string, trim
off the headers, and then save the image data to a file as follows:
import socket
import time
HOST = 'data.pr4e.org'
PORT = 80
mysock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
mysock.connect((HOST, PORT))
mysock.sendall(b'GET http://data.pr4e.org/cover3.jpg HTTP/1.0\r\n\r\n')
count = 0
picture = b""
while True:
data = mysock.recv(5120)
if len(data) < 1: break
#time.sleep(0.25)
count = count + len(data)
print(len(data), count)
picture = picture + data
mysock.close()
# Code: https://www.py4e.com/code3/urljpeg.py
12.3. RETRIEVING AN IMAGE OVER HTTP 149
$ python urljpeg.py
5120 5120
5120 10240
4240 14480
5120 19600
...
5120 214000
3200 217200
5120 222320
5120 227440
3167 230607
Header length 393
HTTP/1.1 200 OK
Date: Wed, 11 Apr 2018 18:54:09 GMT
Server: Apache/2.4.7 (Ubuntu)
Last-Modified: Mon, 15 May 2017 12:27:40 GMT
ETag: "38342-54f8f2e5b6277"
Accept-Ranges: bytes
Content-Length: 230210
Vary: Accept-Encoding
Cache-Control: max-age=0, no-cache, no-store, must-revalidate
Pragma: no-cache
Expires: Wed, 11 Jan 1984 05:00:00 GMT
Connection: close
Content-Type: image/jpeg
You can see that for this url, the Content-Type header indicates that body of the
document is an image (image/jpeg). Once the program completes, you can view
the image data by opening the file stuff.jpg in an image viewer.
As the program runs, you can see that we don’t get 5120 characters each time
we call the recv() method. We get as many characters as have been transferred
across the network to us by the web server at the moment we call recv(). In this
example, we either get as few as 3200 characters each time we request up to 5120
characters of data.
Your results may be different depending on your network speed. Also note that on
the last call to recv() we get 3167 bytes, which is the end of the stream, and in
the next call to recv() we get a zero-length string that tells us that the server has
called close() on its end of the socket and there is no more data forthcoming.
We can slow down our successive recv() calls by uncommenting the call to
time.sleep(). This way, we wait a quarter of a second after each call so that
the server can “get ahead” of us and send more data to us before we call recv()
again. With the delay, in place the program executes as follows:
$ python urljpeg.py
5120 5120
5120 10240
5120 15360
...
5120 225280
150 CHAPTER 12. NETWORKED PROGRAMS
5120 230400
207 230607
Header length 393
HTTP/1.1 200 OK
Date: Wed, 11 Apr 2018 21:42:08 GMT
Server: Apache/2.4.7 (Ubuntu)
Last-Modified: Mon, 15 May 2017 12:27:40 GMT
ETag: "38342-54f8f2e5b6277"
Accept-Ranges: bytes
Content-Length: 230210
Vary: Accept-Encoding
Cache-Control: max-age=0, no-cache, no-store, must-revalidate
Pragma: no-cache
Expires: Wed, 11 Jan 1984 05:00:00 GMT
Connection: close
Content-Type: image/jpeg
Now other than the first and last calls to recv(), we now get 5120 characters each
time we ask for new data.
There is a buffer between the server making send() requests and our application
making recv() requests. When we run the program with the delay in place, at
some point the server might fill up the buffer in the socket and be forced to pause
until our program starts to empty the buffer. The pausing of either the sending
application or the receiving application is called “flow control.”
import urllib.request
fhand = urllib.request.urlopen('http://data.pr4e.org/romeo.txt')
for line in fhand:
print(line.decode().strip())
# Code: https://www.py4e.com/code3/urllib1.py
Once the web page has been opened with urllib.request.urlopen, we can treat
it like a file and read through it using a for loop.
When the program runs, we only see the output of the contents of the file. The
headers are still sent, but the urllib code consumes the headers and only returns
the data to us.
12.5. READING BINARY FILES USING URLLIB 151
As an example, we can write a program to retrieve the data for romeo.txt and
compute the frequency of each word in the file as follows:
fhand = urllib.request.urlopen('http://data.pr4e.org/romeo.txt')
counts = dict()
for line in fhand:
words = line.decode().split()
for word in words:
counts[word] = counts.get(word, 0) + 1
print(counts)
# Code: https://www.py4e.com/code3/urlwords.py
Again, once we have opened the web page, we can read it like a local file.
img = urllib.request.urlopen('http://data.pr4e.org/cover3.jpg').read()
fhand = open('cover3.jpg', 'wb')
fhand.write(img)
fhand.close()
# Code: https://www.py4e.com/code3/curl1.py
This program reads all of the data in at once across the network and stores it in the
variable img in the main memory of your computer, then opens the file cover.jpg
and writes the data out to your disk. The wb argument for open() opens a binary
file for writing only. This program will work if the size of the file is less than the
size of the memory of your computer.
However if this is a large audio or video file, this program may crash or at least
run extremely slowly when your computer runs out of memory. In order to avoid
152 CHAPTER 12. NETWORKED PROGRAMS
running out of memory, we retrieve the data in blocks (or buffers) and then write
each block to your disk before retrieving the next block. This way the program can
read any size file without using up all of the memory you have in your computer.
img = urllib.request.urlopen('http://data.pr4e.org/cover3.jpg')
fhand = open('cover3.jpg', 'wb')
size = 0
while True:
info = img.read(100000)
if len(info) < 1: break
size = size + len(info)
fhand.write(info)
# Code: https://www.py4e.com/code3/curl2.py
In this example, we read only 100,000 characters at a time and then write those
characters to the cover3.jpg file before retrieving the next 100,000 characters of
data from the web.
This program runs as follows:
python curl2.py
230210 characters copied.
We can construct a well-formed regular expression to match and extract the link
values from the above text as follows:
href="http[s]?://.+?"
Our regular expression looks for strings that start with “href="http://” or
“href="https://”, followed by one or more characters (.+?), followed by another
double quote. The question mark behind the [s]? indicates to search for the
string “http” followed by zero or one “s”.
The question mark added to the .+? indicates that the match is to be done in
a “non-greedy” fashion instead of a “greedy” fashion. A non-greedy match tries
to find the smallest possible matching string and a greedy match tries to find the
largest possible matching string.
We add parentheses to our regular expression to indicate which part of our matched
string we would like to extract, and produce the following program:
# Code: https://www.py4e.com/code3/urlregex.py
The ssl library allows this program to access web sites that strictly enforce HTTPS.
The read method returns HTML source code as a bytes object instead of returning
an HTTPResponse object. The findall regular expression method will give us a
list of all of the strings that match our regular expression, returning only the link
text between the double quotes.
When we run the program and input a URL, we get the following output:
154 CHAPTER 12. NETWORKED PROGRAMS
Enter - https://docs.python.org
https://docs.python.org/3/index.html
https://www.python.org/
https://docs.python.org/3.8/
https://docs.python.org/3.7/
https://docs.python.org/3.5/
https://docs.python.org/2.7/
https://www.python.org/doc/versions/
https://www.python.org/dev/peps/
https://wiki.python.org/moin/BeginnersGuide
https://wiki.python.org/moin/PythonBooks
https://www.python.org/doc/av/
https://www.python.org/
https://www.python.org/psf/donations/
http://sphinx.pocoo.org/
Regular expressions work very nicely when your HTML is well formatted and
predictable. But since there are a lot of “broken” HTML pages out there, a solution
only using regular expressions might either miss some valid links or end up with
bad data.
This can be solved by using a robust HTML parsing library.
# Code: https://www.py4e.com/code3/urllinks.py
The program prompts for a web address, then opens the web page, reads the data
and passes the data to the BeautifulSoup parser, and then retrieves all of the
anchor tags and prints out the href attribute for each tag.
When the program runs, it produces the following output:
Enter - https://docs.python.org
genindex.html
py-modindex.html
https://www.python.org/
#
whatsnew/3.6.html
whatsnew/index.html
tutorial/index.html
library/index.html
reference/index.html
using/index.html
howto/index.html
installing/index.html
distributing/index.html
extending/index.html
c-api/index.html
faq/index.html
py-modindex.html
genindex.html
glossary.html
search.html
contents.html
bugs.html
about.html
license.html
copyright.html
download.html
156 CHAPTER 12. NETWORKED PROGRAMS
https://docs.python.org/3.8/
https://docs.python.org/3.7/
https://docs.python.org/3.5/
https://docs.python.org/2.7/
https://www.python.org/doc/versions/
https://www.python.org/dev/peps/
https://wiki.python.org/moin/BeginnersGuide
https://wiki.python.org/moin/PythonBooks
https://www.python.org/doc/av/
genindex.html
py-modindex.html
https://www.python.org/
#
copyright.html
https://www.python.org/psf/donations/
bugs.html
http://sphinx.pocoo.org/
This list is much longer because some HTML anchor tags are relative paths (e.g.,
tutorial/index.html) or in-page references (e.g., ‘#’) that do not include “http://”
or “https://”, which was a requirement in our regular expression.
You can use also BeautifulSoup to pull out various parts of each tag:
# Code: https://www.py4e.com/code3/urllink2.py
python urllink2.py
12.9. BONUS SECTION FOR UNIX / LINUX USERS 157
Enter - http://www.dr-chuck.com/page1.htm
TAG: <a href="http://www.dr-chuck.com/page2.htm">
Second Page</a>
URL: http://www.dr-chuck.com/page2.htm
Content: ['\nSecond Page']
Attrs: [('href', 'http://www.dr-chuck.com/page2.htm')]
html.parser is the HTML parser included in the standard Python 3 library. In-
formation on other HTML parsers is available at:
http://www.crummy.com/software/BeautifulSoup/bs4/doc/#installing-a-parser
These examples only begin to show the power of BeautifulSoup when it comes to
parsing HTML.
If you have a Linux, Unix, or Macintosh computer, you probably have commands
built in to your operating system that retrieves both plain text and binary files
using the HTTP or File Transfer (FTP) protocols. One of these commands is
curl:
$ curl -O http://www.py4e.com/cover.jpg
The command curl is short for “copy URL” and so the two examples listed earlier
to retrieve binary files with urllib are cleverly named curl1.py and curl2.py
on www.py4e.com/code3 as they implement similar functionality to the curl com-
mand. There is also a curl3.py sample program that does this task a little more
effectively, in case you actually want to use this pattern in a program you are
writing.
A second command that functions very similarly is wget:
$ wget http://www.py4e.com/cover.jpg
Both of these commands make retrieving webpages and remote files a simple task.
12.10 Glossary