0% found this document useful (0 votes)
62 views

Fast Python High Performance Techniques For Large Datasets Meap V10 All 10 Chapters Tiago Rodrigues Antao pdf download

The document discusses 'Fast Python High Performance Techniques for Large Datasets', a book aimed at Python programmers seeking to improve the efficiency of their code for handling large datasets. It covers various topics including concurrency, high-performance libraries like NumPy and Pandas, and the importance of optimizing code for different computing environments. The book emphasizes the need for efficient computing solutions in the face of exponential data growth and the limitations of traditional computing architectures.

Uploaded by

jubbywcs
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
62 views

Fast Python High Performance Techniques For Large Datasets Meap V10 All 10 Chapters Tiago Rodrigues Antao pdf download

The document discusses 'Fast Python High Performance Techniques for Large Datasets', a book aimed at Python programmers seeking to improve the efficiency of their code for handling large datasets. It covers various topics including concurrency, high-performance libraries like NumPy and Pandas, and the importance of optimizing code for different computing environments. The book emphasizes the need for efficient computing solutions in the face of exponential data growth and the limitations of traditional computing architectures.

Uploaded by

jubbywcs
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 89

Fast Python High Performance Techniques For

Large Datasets Meap V10 All 10 Chapters Tiago


Rodrigues Antao download

https://ebookbell.com/product/fast-python-high-performance-
techniques-for-large-datasets-meap-v10-all-10-chapters-tiago-
rodrigues-antao-47525652

Explore and download more ebooks at ebookbell.com


Here are some recommended products that we believe you will be
interested in. You can click the link to download.

Building Python Web Apis With Fastapi A Fastpaced Guide To Building


Highperformance Robust Web Apis Abdulazeez Abdulazeez Adeshina

https://ebookbell.com/product/building-python-web-apis-with-fastapi-a-
fastpaced-guide-to-building-highperformance-robust-web-apis-
abdulazeez-abdulazeez-adeshina-44468546

Python Programming How To Code Python Fast In Just 24 Hours With 7


Simple Steps Jason Scotts

https://ebookbell.com/product/python-programming-how-to-code-python-
fast-in-just-24-hours-with-7-simple-steps-jason-scotts-46888312

Python For Finance A Crash Course Modern Guide Learn Python Fast
Bisette

https://ebookbell.com/product/python-for-finance-a-crash-course-
modern-guide-learn-python-fast-bisette-56074420

Data Science Solutions With Python Fast And Scalable Models Using
Keras Pyspark Mllib H2o Xgboost And Scikitlearn 1st Edition Tshepo
Chris Nokeri

https://ebookbell.com/product/data-science-solutions-with-python-fast-
and-scalable-models-using-keras-pyspark-mllib-h2o-xgboost-and-
scikitlearn-1st-edition-tshepo-chris-nokeri-35650824
Handson Python Programming For Beginners Learn Practical Python Fast
Ai Publishing

https://ebookbell.com/product/handson-python-programming-for-
beginners-learn-practical-python-fast-ai-publishing-43271100

Learn Python By Coding Video Games Beginner A Stepbystep Guide To


Coding In Python Fast Patrick Felicia

https://ebookbell.com/product/learn-python-by-coding-video-games-
beginner-a-stepbystep-guide-to-coding-in-python-fast-patrick-
felicia-47434782

Python Python For Beginners Crash Course Master Python Programming


Fast And Easy Today Raj Ali Ali

https://ebookbell.com/product/python-python-for-beginners-crash-
course-master-python-programming-fast-and-easy-today-raj-ali-
ali-35439916

Building Serverless Applications With Python Develop Fast Scalable And


Costeffective Web Applications That Are Always Available Jalem Raj
Rohit

https://ebookbell.com/product/building-serverless-applications-with-
python-develop-fast-scalable-and-costeffective-web-applications-that-
are-always-available-jalem-raj-rohit-10826092

Python Quick Reference Guide The Cheat Sheet For Fast Learning Learn
Pythons Key Concepts And Boost Your Coding Productivity Jordan Loopman

https://ebookbell.com/product/python-quick-reference-guide-the-cheat-
sheet-for-fast-learning-learn-pythons-key-concepts-and-boost-your-
coding-productivity-jordan-loopman-231608212
Fast Python for Data Science MEAP V10
1. MEAP_VERSION_10
2. Welcome
3. 1_The_need_for_efficient_computing_and_data_storage
4. 2_Extracting_maximum_performance_from_built-in_features
5. 3_Concurrency,_parallelism_and_asynchronous_processing
6. 4_High_performance_NumPy
7. 5_Re-implementing_critical_code_with_Cython
8. 6_Memory_hierarchy,_storage_and_networking
9. 7_High_performance_Pandas_and_Apache_Arrow
10. 8_Storing_big_data
11. 9_Data_Analysis_using_GPU_computing
12. 10_Analyzing_big_data_with_Dask
13. Appendix_A._Setting_up_the_environment
14. Appendix_B._Using_Numba_to_generate_efficient_low_level_code
Cover

MEAP VERSION 10
Welcome
Thank you for purchasing the MEAP for Fast Python. This is an advanced
book written for Python programmers who already have some practical
experience under their belt. You are probably already dealing with some
large problems and you would like to know how to produce solutions that
are more efficient: you want a faster solution, that uses less CPU resources,
less storage and less network. You want to dig deeper and understand a bit
more how Python works: you are at a stage where you need to dig deeper in
order to write more efficient solutions.

You know all the basic Python language features: most of its syntax and a
few of its built-in libraries. You are using, or have heard of libraries like
NumPy, Pandas or SciPy. You might have dabbled with the multiprocessing
module, but you would definitely like to know more. You know that you
can rewrite parts of your Python code in a lower level language or system
like Cython, Numba or C. You are keen on exploring new ways to make
your code more efficient like offloading code to GPUs

When I started programming, more than 25 years ago, I believed that


writing code would become, as time went by, a more declarative discipline.
That is, coding would be more about modeling a problem domain than
dealing with the computer and the network. To put it mildly, I was wrong.
CPU power is growing at a much slower pace than before while data is
exploding and algorithms becoming more sophisticated. The importance of
writing programs that take into consideration the computational platform is
increasing, not decreasing.

This book is concerned with writing Python code that delivers more
performance. Performance here means several things: it is speed of
execution, but it is also being as IO frugal as possible, and surely is
reducing the overall financial cost of our code by using less computers, less
storage, less time. There are ways of achieving this, and I believe that we
can do this in an elegant way – more efficient code doesn’t mean uglier
code or less maintainable code.
The approach we will be taking is muti-faceted. We tackle pure-Python
code, multiprocessing or writing critical parts in faster languages. Adding to
this we will be looking at the libraries that are the bread and butter of data
analysis in Python: How can use libraries like NumPy or Pandas in a more
performant way? And because IO is a big bottleneck in our big-data world
we will pay close attention to persistence: we will transform data into more
efficient representations and introduce modern libraries to do storage and
IO.

It is quite important for me that all the above topics are contextualized in
their environment: The best solution to be run on a single computer is
probably very different from the best solution to run on the cloud.. There is
no single solution to rule them all. Therefore we will be also discussing the
impact of CPU, disks, network and cloud architectures. You will have to
think differently as your platform changes and this book, hopefully, will
help you with that.

The topics covered are complex and I know that your feedback will be
fundamental to improve this work quite substantially. Please be sure to post
any questions, comments, or suggestions you have about the book in the
liveBook discussion forum.

—Tiago Antão

In this book

MEAP VERSION 10 About this MEAP Welcome Brief Table of Contents 1


The need for efficient computing and data storage 2 Extracting maximum
performance from built-in features 3 Concurrency, parallelism and
asynchronous processing 4 High performance NumPy 5 Re-implementing
critical code with Cython 6 Memory hierarchy, storage and networking 7
High performance Pandas and Apache Arrow 8 Storing big data 9 Data
Analysis using GPU computing 10 Analyzing big data with Dask
Appendix A. Setting up the environment Appendix B. Using Numba to
generate efficient low level code
1 The need for efficient computing
and data storage
This chapter covers

The challenges of dealing with exponential growth of data


Comparing traditional and recent computing architectures
The role and shortcomings of Python in modern data analytics
A summary of the techniques for delivering efficient Python computing
solutions

It is difficult to think of a more common cliche than the one about how we
live in "a data deluge," but it happens that this cliche is also very true.
Software development professionals are tasked with dealing with immense
amounts of data, and Python has emerged as the language of choice to do—
or at least glue—all the heavy lifting around this deluge. Indeed Python’s
popularity in data science and data engineering is one of the main drivers of
the language’s growth, helping to push it to one of the top three most used
languages across most developer surveys. Python has its own unique set of
advantages and limitations for dealing with big data, and in this book we will
explore techniques for doing efficient data processing in Python. We will
examine a variety of angles and approaches which target software, hardware,
coding, and more. Starting with pure Python best practices for efficiency, we
then move on to how to best leverage multi-processing; improving our use of
data processing libraries; and re-implementing parts of the code in lower
level languages. We will look not only at CPU processing optimizations, but
also at storage and network efficiency gains. And we will look at all this in
the context of traditional single-computer architectures as well as newer
approaches like the cloud and GPU-based computing. By the end of this
book, you will have a toolbox full of reliable solutions for using less
resources and saving money, while still responding faster to computing
requirements.
In this chapter let’s first take a look at a few specifics about the so-called
data deluge, to orient ourselves to what, exactly we are dealing with. Then
we will sketch out why the old solutions, such as increasing CPU speed, are
no longer adequate. Next we’ll look at the particular issues that Python faces
when dealing with big data, including Python’s threading and the CPython’s
infamous Global Interpreter Lock (GIL). Once we’ve seen the need for new
approaches to making Python perform better, I’ll explain what precisely I
mean by high-performance Python, and what you’ll learn in this book.

1.1 The overwhelming need for efficient computing


in Python
Several important new developments are driving the need for our code to be
more and more efficient. First, there is the increasing amount of available
data, most of which is not structured. Let’s look a little closer at the
fundamental problem of dealing with ever-increasing amounts of data.

There are many examples of exponential growth of data. There is for


example, Edholm’s law (https://en.wikipedia.org/wiki/Edholm%27s_law)
which states that data rates in telecommunications double every 18 months.
You might already be familiar with Moore’s law, about the doubling of
transistor density having a period of 24 months. If we take these two
observations together we can easily see a problem: data is growing at a much
faster pace—we are talking here about data transfer rate as a proxy for data
size—than processing power. Because exponential growth can be tricky to
understand in words, I’ve plotted one against the other in figure 1.1.

Figure 1.1. The ratio between Moore’s Law and Edholm’s law suggests that hardware will
always lag behind the amount of data being generated. Moreover the gap will increase over time.
The situation described by this graph can be seen as a fight between what we
need to analyze (Edlhom’s law) vs the power that we have to do that analysis
(Moore’s law). The graph actually paints a rosier picture than what we have
in reality. We will see why in chapter 6 when we discuss Moore’s law in the
context of modern CPU architectures.

Let’s look at one example, internet traffic, which is an indirect measure of


data available. As you can see in figure 1.2 [source:
https://en.wikipedia.org/wiki/Internet_traffic], the growth of internet traffic
over the years tracks Edholm’s law quite well. 1.2

Figure 1.2. The growth of Global Internet Traffic over the years measured in Petabytes per
month. (source: Wikipedia)
In addition, 90% of the data humankind has produced happened in the last
two years (To read more about this see
https://www.uschamberfoundation.org/bhq/big-data-and-what-it-means).
Whether the quality of this new data is proportional to its size is another
matter altogether. The point is that data produced will need to be processed
and that processing will require more resources.

The way all this new data is represented is also changing in nature. Some
project that by 2025, around 80% of data could be unstructured, (for details
see https://www.aparavi.com/data-growth-statistics-blow-your-mind/)
Simply put, unstructured data makes data processing more demanding from
a computational perspective.

How do we deal with all this growth in data? Surprisingly and sadly, it turns
out that we mostly don’t. More than 99% of data produced is never
analyzed, according to an article published in The Guardian
(https://www.theguardian.com/news/datablog/2012/dec/19/big-data-study-
digital-universe-global-volume). Part of what holds us back from making use
of so much of our data is that we lack efficient procedures to analyze it.

The growth of data and the concomitant need for more processing has
developed into one of the most pernicious mantras about computing, which
goes along these lines: "If you have more data, just throw more servers at it."
An alternative approach, when we need to increase the performance of an
existing system, is to have a look at the existing architecture and
implementation and find places where we can optimize for performance. I
have personally lost count of how many times I have been able to get ten-
fold increases in performance just by being mindful of efficiency issues
when reviewing existing code.

What is crucial to understand is that the relationship between the


amount of increased data to analyze, and the complexity of the
infrastructure needed to analyze it, is hardly linear. This is true not just in
cloud environments, but also with in-house clusters, and even in single-
machine implementations. A few use cases will help to make this clear. For
example:

Your solution requires only a single computer, but suddenly you need
more machines. Adding machines means you will have to manage the
number of machines, distribute the workload across them, and make
sure the data is partitioned correctly. You might also need a file system
server to add to your list of machines. The cost of maintaining a server
farm—or just a cloud—is qualitatively much more than maintaining a
single computer.
Your solution works well in-memory but then the amount of data
increases and no longer fits your memory. To handle the new amount of
data stored in disk will normally entail a major re-write of your code.
And, of course, the code itself will grow in complexity. For instance, if
the main database is now on disk, you may need to create a cache
policy. Or you may need to do concurrent reads from multiple
processes. Or, even worse, concurrent writes.
You use a SQL database and suddenly you reach maximum throughput
capacity of the server. If it’s only a read capacity problem then you
might survive by just creating a few read replicas. But if it is a write
problem, what do you do? Maybe you set up sharding [1]? Or do you
decide to completely change your database technology in favor of some
supposedly better performant NoSQL variant?
If you are dependent on a system is in the cloud based on vendor
proprietary technologies, you might discover that the ability to scale
indefinitely is more marketing talk than technological reality. In many
cases, if you hit performance limits, the only realistic solution is to
change the technology that you are using, a change that requires
enormous time, money, and human energy.
I hope these examples make the case that growing is not just a question of
“adding more machines,” but instead entails substantial work on several
fronts to deal with the increased complexity. Even something as "simple" as
a parallel solution implemented on a single computer can bring with it all the
problems of parallel processing (races, deadlocks, and more). These more
efficient solutions can have a dramatic effect on complexity, reliability and
cost.

Finally we could make case that even if we could scale our infrastructure
linearly (we can’t, really) there would be ethical and ecological issues to
consider: Forecasts put energy consumption related to a “Tsunami of data” at
20% of global electricity production (For details see
https://www.theguardian.com/environment/2017/dec/11/tsunami-of-data-
could-consume-fifth-global-electricity-by-2025), and is there also an issue of
landfill disposal as we update hardware.

The good news is that becoming computationally more efficient when


handling big data helps us to reduce our computing bill, reduce the
complexity of the architecture for our solution, reduce our storage needs,
reduce our time to market and also reduce our energy footprint. And
sometimes more efficient solutions might even come with minimal
implementation costs. For example, judicious use of data structures might
reduce computing time at no substantial development cost.

On the other hand, many of the solutions we’ll look at will have a
development cost and will add an amount of complexity themselves. When
you look at your data and forecasts for its growth, you will have to make a
judgment call on where to optimize, as there are no clear-cut recipes or one-
size-fits-all solutions. That being said, there might be just one rule that can
be applied across the board:

If the solution is good for Netflix, Google, Amazon, Apple or Facebook then
probably it is not good for you—unless, of course, you work for one of these
companies.

The amount of data the most of us will see will be substantially lower than
the biggest technological companies use. It will still be enormous, it will still
be hard, but it will probably be a few orders of magnitude lower. The
somewhat prevailing wisdom that what works for those companies is also a
good fit for the rest of us is, in my opinion, just wrong. Generally, less
complex solutions will be more appropriate for most of us.

As you can see, this new world with extreme growth—both in quantity and
complexity—of both data and algorithms requires more sophisticated
techniques to perform computation and storage in an efficient and cost-
conscious way. Don’t get me wrong, sometimes you will need to scale up
your infrastructure. But when you architect and implement your solution,
you can still use the same mindset of focusing on efficiency. Its just that the
techniques will be different.

Now that we have a broad overview of the problem, let’s see how to address
it. In the next section we will look at computing architectures in general:
From what is going on inside the computer all the away to the implications
of large clusters and cloud solutions. With these environments in mind we
can, in the section afterwards, start discussing the advantages and pitfalls of
Python for high performance processing of large datasets.
[1]Sharding is the partition of data so that parts of it reside in different
servers.

1.2 The impact of modern computing architectures


on high performance computing
Creating more efficient solutions does not happen in an abstract void. First
we have our domain problem to consider, i.e. what real problem you are
trying to solve. Equally important is the computing architecture where our
solution will be run. Computing architectures play a major role in
determining the best optimization techniques, so we have to take them into
consideration when we devise our software solutions. In this section we will
take a look at the main architectural issues that impact the design and
implementation of our solutions.

1.2.1 Changes inside the computer


Radical changes are happening inside the computer. First, we have CPUs
that are increasing processing power mostly in number of parallel units, not
raw speed, as they did in the past. Computers can also be equipped with
Graphics Processing Units (GPUs), which were originally developed for
graphics processing only, but now can be used for general computing as
well. Indeed many efficient implementations of AI algorithms are done for
GPUs. Unfortunately—at least from our perspective—GPUs have a
completely different architecture than CPUs: they are composed of thousand
of computing units that are expected to do the same "simple" computation
across all units. The memory model is also completely different. These
differences mean that programming GPUs requires a radically different
approach from programming CPUs.

To understand how we can leverage GPUs for data processing, we need to


understand their original purpose and architectural implications. GPUs, as
the name indicates, were developed to help with graphics processing. One of
the most computationally demanding applications are actually games.
Games, and graphic applications in general, are constantly updating millions
of pixels on the screen. The hardware architecture devised to solve this
problem has many small processing cores. Its quite easy for a GPU to have
thousands of cores, while a CPU typically has less than 10. GPU cores are
substantially simpler and mostly run the same code on each core. They are
thus very good for running a massive amount of similar tasks—like updating
pixels.

Given the sheer amount of processing power in GPUs, there was an attempt
to try to use that power for other tasks with the appearance of General-
Purpose Computing on Graphics Processing Units (GPGPU). Because of the
way GPU architectures are organized, they are mostly applicable to tasks
that are massively parallel in nature. It turns out that many modern AI
algorithms, like ones based on neural networks, tend to be massively
parallel. So there was a natural fit between the two.

Unfortunately, the difference between CPUs and GPUs is not only in number
of cores and their complexity. GPU memory—especially on the most
computationally powerful—is separated from main memory. Thus there is
also the issue of transferring data between main memory and GPU memory.
So we have two massive issues to consider when targeting GPUs.
For reasons that will become clear in chapter 9, "GPU Computng with
Python," programming GPUs with Python is substantially more difficult and
less practical than targeting CPUs. Nonetheless, there is still more than
enough scope to make use of GPUs from Python.

While less fashionable than the advances in GPUs, monumental changes


have also come to how CPUs can be programmed. And, unlike GPUs, we
can easily leverage most of these CPU changes in Python. CPU performance
increases are being delivered in a different way by manufacturers than in the
past. Their solution—driven by the laws of physics—is to build in more
parallel-processing, not more speed. Moore’s law is sometimes stated as the
doubling of speed every 24 months, but that is actually not the correct
definition: it relates instead to the transistor density doubling every two
years. The linear relationship between increased speed and transistor density
broke more than a decade ago, and speed has mostly plateaued since then.
Given that data has continued to grow along with algorithm complexity, we
have are in a pernicious situation. The first line of solutions coming from
CPU manufacturers is allowing more parallelism: more CPUs per computer,
more cores per CPU, simultaneous multi-threading. Processors are not really
accelerating sequential computations anymore, but allowing for more
concurrent execution. This concurrent execution requires a paradigm shift in
how we program computers. Before, the speed of a program would
``magically'' increase when you changed CPU. Now, increasing speed
depends upon the programmer being aware of the shift in the underlying
architecture to the parallel programming paradigm.

There are many changes in the way we program modern CPUs, and as you
will see in chapter 6, "CPU and Memory Heirarchy," some of them are so
counter-intuitive they are worth keeping an eye on from the onset. For
example, while CPU speeds have leveled in the recent years, CPUs are still
orders of magnitude faster than RAM memory. If CPU caches did not exist
then CPUs would be mostly idle as they would spend most of the time
waiting for RAM. This means that sometimes it is faster to work with
compressed data—including the cost of decompression—than with raw data.
Why? If you are able to put a compressed block on the CPU cache then
those cycles that otherwise would be idle waiting for RAM access, could be
used to decompress the data with still cycles to spare that could be used for
computation! A similar argument could work for compressed file systems:
they sometimes can be faster than raw file systems. There are direct
applications of this in the Python world: for example by changing a simple
boolean flag regarding the choice of internal representation of NumPy arrays
you take advantage of cache locality issues and speed up your NumPy
processing considerably. We have some access times and sizes for different
kinds of memory in 1.1 including CPU cache, RAM, local disk and remote
storage. The key point here are not the precise numbers but the orders of
magnitude in difference in both size and access time.

Table 1.1. Memory hierarchy with sizes and access times for an hypothetical, but realistic
modern desktop

Access
Type Size
time

CPU

L1 cache 256 KB 2 ns

L2 cache 1 MB 5 ns

L3 cache 6 MB 30 ns

RAM

DIMM 8 GB 100 ns

Secondary storage

SSD 256 GB 50 µs
HDD 2 TB 5 ms

Tertiary storage

Network
NAS - Network Access Server 100 TB
dependent

Provider
Cloud proprietary 1 PB
dependent

Table 1.1 introduces tertiary storage, which happens outside the computer.
There are also been changes there, which we will address in the next section.

1.2.2 Changes in the network

In high performance computing settings we use the network as both a way to


add more storage but especially to increase computing power. While we
would like to solve our problems using a single computer, sometimes relying
on a compute cluster is inevitable. Optimizing for the architectures with
multiple computers—be it in the cloud or on on-premise—will be a part of
our journey to high performance.

Using many computers and external storage brings a whole new class of
problems related to distributed computing: network topologies, sharing data
across machines, managing processes running across the network. There are
many examples. For example, what is the price of using REST APIs on
services that require high-performance and low latency? How we deal with
the penalties of having remote file-systems, can we mitigate those?

We will be trying to optimize our usage of the network stack and for that we
will have to be aware of it at all levels shown in figure 1.3. Outside the
network we have our code and Python libraries; which make choices about
the layers below. At the top of the network stack a typical choice for data
transport HTTPS with a payload based on JSON. While this is a perfectly
reasonable choice for many applications, there more performant alternatives
for cases where network speed and lag matters. For example a binary
payload might be more efficient than JSON. Also HTTP might be replaced
by a direct TCP socket. But there are more radical alternatives like replacing
the TCP transport layer: Most Internet application protocols use TCP, though
there are a few exceptions like DNS and DHCP, which are both UDP based.
The TCP protocol is highly reliable, but there is a performance penalty to be
paid for that reliability. There will be times where the smaller overhead of
UDP will be a more efficient alternative and the extra reliability is not
needed.

Below transport protocols we have the Internet Protocol (IP) and the
physical infrastructure. The physical infrastructure can be important when
we design our solutions. For example if we have a very reliable local
network, then UDP, which can loose data, will be more of an alternative than
it would be in an unreliable network.

Figure 1.3. API calls via the network stack. Understanding the alternatives available for network
communication can dramatically increase the speed of Internet-based applications
1.2.3 The cloud

In the past, most data processing implementations were made to function on


a single computer or on an on-premises cluster maintained by the same
organization which runs the workload. Currently cloud-based infrastructure
where all servers are "virtual" and maintained by an external entity, is
becoming increasingly common. Sometimes, as with so-called serverless
computing, we do not even deal with servers directly.

The cloud is not just about adding more computers or network storage. It’s
also about a set of proprietary extensions on how to deal with storage and
compute resources, and those extensions have consequences in terms of
performance. Furthermore, virtual computers can throw a wrench on some
CPU optimizations. For example in a bare metal machine you can devise a
solution that is considerate of cache locality issues, but in a virtual machine
you have no way to know if your cache is being preempted but another
virtual machine being executed concurrently. How do we keep our
algorithms efficient in such an environment? Also the cost model of cloud
computing is completely different—time is literally money—and as such
efficient solutions become even more important.

Many of the compute and storage solutions in the cloud are also proprietary
and have very specific APIs and behaviors. Using such proprietary solutions
also has consequences on performance that should be considered. As such,
and while most issues pertaining traditional clusters are also applicable to
the cloud, sometimes there will be specific issues that will need to be dealt
with separately.

Now that we have a view of the architectural possibilities and limitations


that will shape our applications, let’s turn to the advantages and perils of
Python for high performance computing.

1.3 Working with Python’s limitations


Python is widely used in modern data process applications. As with any
language, it has its advantages and its drawbacks. There are great reasons to
use Python but here we are more concerned with dealing with Python’s
limitations for high performance data processing.

Lets not sugar coat reality: Python is spectacularly ill-equipped to handle


high performance computing. If performance and parallelism were the only
consideration, nobody would use Python. Python has an amazing ecology of
libraries for doing data analysis, great documentation and a wonderful
supportive community. That is why we use it, not computational
performance.

There is a saying that goes something like this There


are no slow
languages, only slow language implementations. I hope you allow me
to disagree. It is not fair to ask the implementors of a dynamic, high-level
language like Python (or, say, JavaScript for that matter) to compete in terms
of speed with lower level languages like C, C++, Rust or Go.
Features like dynamic typing or garbage collection will pay a price in terms
of performance. And that is fine: there are many cases where programmer
time is more valuable than compute time. But let’s not bury our head in the
sand: more declarative and dynamic languages will pay a price in
computation and memory. It’s a balance.

That being said, this is no excuse for poorly performant language


implementations. In this regard how does CPython—the flagship Python
implementation that you are probably using—fare? A complete analysis
would not be easy but you can do a simple exercise: write a matrix
multiplication function and time it. Then, for example, run it with another
Python implementation like PyPy. Then convert your code to JavaScript (a
fair comparison as the language is also dynamic - an unfair comparison
would be would C) and time it again.

Spoiler alert: CPython will not fare well. We have a language that is
naturally slow and a flagship implementation that does not seem to have
speed as its main consideration. Now, the good news is that most of these
problems can be overcome. Actually many people have produced
applications and libraries that will mitigate most performance issues. You
can still write code in Python that will perform very well with a small
memory footprint. You just have to write code while attending to Python’s
warts.

Note

In most of the book, when we talk about Python we are referring to the
CPython implementation. All exceptions to this rule will be explicitly called
it out.

Give Python’s limitations with regards to performance, optimizing our


Python code sometimes not be enough. In those cases we will end up
rewriting that part in a lower-level language—or at the very least annotate
our code so that it gets rewritten in a lower-level language by some code
conversion tool. The part of the code that we will need to rewrite is normally
very small, so weare decidedly not ditching Python. When we do this last
stage optimization, probably more that 90% of the code will still be Python.
This is what many core scientific libraries like NumPy, scikit-learn or SciPy
actually do: their most computationally demanding parts are usually
implemented in C or Fortran.

1.3.1 The Global Interpreter Lock (GIL)

In discussions about about Python’s performance, its GIL, or Global


Interpreter Lock, inevitably comes up. What exactly is the GIL? While
Python has the concept of threads, CPython has a GIL, which only allows a
single thread to execute at a point in time. Even on a multi-core processor,
you only get a single thread executing at a single point in time.

Other implementations of Python, like Jython or IronPython, do not have a


GIL and can use all cores in modern multiprocessors. But CPython is still
the reference implementation for which all the main libraries. In addition,
Jython and IronPython are respectively JVM and .NET dependent. As such,
CPython, given its massive library base, ends up being the default Python
implementation. We will briefly discuss other implementations in the book
—most notably PyPy—but in practice CPython is Queen.

To understand how to work around the GIL,]it is useful to remember the


difference between concurrency and parallelism. Concurrency, you may
recall, is when a certain number of tasks can overlap in time, though they
may not be running at the same time. They can, for example, interleave.
Parallelism is when tasks are actually executed at the same time. So, in
Python, concurrency is possible, but parallelism is not… or is it?

Concurrency without parallelism is still quite useful. The best example of


this comes from the JavaScript world and Node.JS—which is
overwhelmingly used to implement the back-end of web servers: in many
server-side web tasks most of the time is actually spent waiting for IO - that
is a great time for a thread to voluntarily relinquish control so that other
thread can continue with computation. Modern Python has similar
asynchronous facilities and we will be discussing them.

But, back to the main issue: does the GIL impose a serious performance
penalty? In most cases the answer is a surprising No. There are two main
reasons for this:
Most of the high-performance code, those tight inner loops, will
probably have to be written in a lower level language as we’ve
discussed.
Python provides mechanisms for lower level languages to release the
GIL.

This means that when you enter a part of the code rewritten in a lower level
language, you can instruct Python to continue with other Python threads in
parallel with your low-level implementation. You should only release the
GIL if that is safe: for example if you do not write to objects that may be in
use by other threads.

Also, Multi-processing—running multiple process simultaneously—is not


affected by the GIL, which only impacts threads, so there is still plenty of
space to deploy parallel solutions even in pure Python.

So, in theory the GIL is a concern with regards to performance, but in


practice it rarely is the source of problems that cannot be overcome. We will
dive deep into this subject in chapter 3.

1.4 What will you learn from this book


This book is about getting high performance from Python, but you can only
devise efficient code if you have a broader perspective of data and algorithm
demands as well as computing architectures. While its impossible to go into
every architectural and algorithmic detail here, my aim is to help you
understand the implications that CPU design, GPUs, storage alternatives,
network protocols and cloud architectures and other system topics depicted
in figure 1.4 to make sound decisions related to the performance of your
Python code. You will be able to assess the advantages and drawbacks of
your computing architecture—whether it is a single computer, a GPU-
enabled computer, a cluster or a cloud environment—and implement the
necessary changes to take full advantage of it. In short, the goal of this book
is to introduce you to a range of solutions, and teach you how and where
each one is best applied, so you can select and implement the most efficient
solution for any problem you encounter.

Figure 1.4. The underlying hardware architectures


After reading this book you will be able to look at native Python code and
understand the performance implications, of builtin data-structures and
algorithms. You will be able to detect and replace inefficient structures with
more appropriate solutions—for example replace lists with sets where a
search is being repeated on a constant list, or use non-object arrays instead of
lists of objects for speed.

You will also be able to take an existing algorithm that is non-performant


and: (i) [profile the code to] find the pieces that are causing performance
problems, and (ii) determine the best way to optimize those [pieces of code].

The book also addresses the widely used Python ecology of libraries for data
processing and analysis (such as Pandas and NumPy), with the aim of
improving how we use them. On the computing side, this is a lot of material,
so we will not discuss very high-level libraries. For example, we will not
talk about optimizing the usage of say, TensorFlow but we will discuss
techniques to make the underlying algorithms more efficient.

With regards to data storage and transformation, you will be able to look at a
data source and understand its drawbacks for efficient processing and
storage. Then you will be able to transform the data in a way that all required
information is still maintained but access patterns to the data will be
substantially more efficient.

Finally, you will also learn about Dask a Python-based framework that
allows you to develop parallel solutions that can scale from a single machine
to very large clusters of computers or cloud computing solutions.

1.5 The reader for this book


This book is intended for an intermediate to advanced audience. If you skim
the table of contents, you should recognize most of the technologies and you
will probably have used quite a few of them. Except for sections on IO
libraries and GPU computing, there is little introductory material here; you
need to already know the basics. If you are already writing code to be
performant and facing real challenges in dealing with so much data in an
efficient way, then this book is for you.

The reader for this book will probably have at least a couple of years of
Python experience, and will know Python control structures and what are
lists, sets and dictionaries. You will have used some of the Python standard
libraries like os, sys, pickle or multiprocessing.

To take best advantage of the techniques I present here, you should also have
some level of exposure to standard data analysis libraries like NumPy—you
will have at least minimal exposure to arrays—and Pandas where you had
some contact with data frames.

It would be helpful if you are aware—though you might have no direct


exposure—of ways to accelerate Python code through either foreign
language interfaces to C or Rust, or know of alternative approaches like
Cython or Numba.

Experience dealing with IO in Python will also help you. Given that IO
libraries are less explored in the literature, we will take you from the very
beginning with formats like Apache Parquet or libraries like Zarr.

You should know the basic shell commands of Linux terminals (or MacOS
terminals). If you are on Windows, please have installed either a Unix based
shell or know your way around the command line or PowerShell. And of
course, you need Python software installed on your computer.

Sometimes we will be providing tips for the cloud, but cloud access or
knowledge is not in anyway a requirement for reading this book. If you are
interested in cloud approaches, then you are expected to know how to do
basic operations like create instances or access the storage of your cloud
provider. The book presents examples using Amazon AWS, but they should
be easily transposable to other cloud providers.

While you do not have to be, at all, academically trained in the field, a basic
notion of complexity costs will be helpful. For example, the intuitive notion
that algorithms that scale linearly with data are better than algorithms that
scale exponentially.

If you plan on using GPU optimizations, you are not expected to know
anything at this stage.

1.5.1 Setting up the software

Before you continue with this book be sure to check appendix A for a
description of options to setup your environment.

1.6 Summary
Yes, the cliche is true: there is a lot of data and we have to increase the
efficiency in processing it if we want to stand a chance to extract the
most value from it.
Increased algorithm complexity adds an extra strain to computation cost
and we will have to find ways to mitigate computational impact.
There is a large heterogeneity of computing architectures: the network
now also includes cloud-based approaches. Inside our computers there
are now powerful GPUs whose computing paradigm is substantially
different from CPUs. We need to be able to harness those.
Python is an amazing language for data analysis surrounded by a
complete ecology of data processing libraries and frameworks. But it
also suffers from serious problems on the performance side. We will
need to be able to circumvent those problems in order to process lots of
data with sophisticated algorithms.
While some of the problems that we will be dealing can be hard, they
are mostly solvable. The goal of this book is to introduce you to plenty
of alternative solutions, and teach you how and where each one is best
applied, so you can choose and implement the most efficient solution
for any problem you encounter.
2 Extracting maximum
performance from built-in features
This chapter covers

Profiling code to find speed and memory bottlenecks


Making more efficient use of existing Python data structures
Understanding Python’s memory cost of allocating typical data
structures
Using lazy programming techniques to process large amounts of data

There are many tools and libraries to help us write more efficient Python.
But before we dive into all the external options to improve performance,
let’s first take a closer look at how we can write pure Python code that is
more efficient, in both computing and IO performance. Indeed many, though
certainly not all, Python performance problems can be solved by being more
mindful of Python’s limits and capabilities.

To demonstrate Python’s own tools for improving performance, let’s use


them on a hypothetical, though realistic problem. Let’s say you are a data
engineer tasked with preparing the analysis of climate data around the world.
The data will be based on the Integrated Surface Database from the US
National Oceanic and Atmospheric Administration (NOAA) from
https://www.ncei.noaa.gov/products/land-based-station/integrated-surface-
database . You are on a tight deadline and you will only be able to use
mostly standard Python; furthermore buying more processing power is out of
the question due to budgetary constraints. The data will start to arrive in one
month and you plan on using the time before it arrives to increase code
performance. Your task, then, is to find the places in need of optimization,
and to increase their performance.

The first thing that you want to do is to profile the existing code that will
ingest the data. You know that the code that you already have is slow, but
before you try to optimize it you need to find empirical evidence for where
the bottlenecks are. Profiling is important because it allows us to search, in a
rigorous and systematic way, for bottlenecks in our code. The most common
alternative—guestimating—is particularly ineffective here because many
slowdown points can be quite unintuitive.

Optimizing pure Python code is the low-hanging fruit and also where most
problems tend to reside, so it will be generally very impactful. In this chapter
we will see what pure Python offers out of the box to help us develop more
performant code. We will start by profiling the code, using several profiling
tools, to detect problem areas. Then we will focus on Python’s basic data
structures: lists, sets, and dictionaries. Our goal here will be to improve the
efficiency of these data structures and to allocate memory to them in the best
way for optimal performance. Finally, we will see how modern Python lazy
programming techniques, might help us improve the performance of the sata
data pipeline.

This chapter will only discuss optimizing Python without external libaries,
but we will still use some external tools to help us optimize performance and
access data. Will will be using Snakeviz to visualize the output of Python
profiling. We will also use line_profiler to profile code line-by-line. Finally
we will use the requests library to download data from the Internet.

If you use Docker, the default image has all you need. If you used the
instructions for Anaconda Python from Appendix A you are all set..

Lets now start by downloading our data from weather stations and studying
temperature on each station.

2.1 Profiling applications with both IO and


computing workloads
Our first objective will be to download data from a weather stations and get
the minimum temperature for a certain year on that station.

Data on NOAA’s site has CSV files one per year and then per station, for
example the file:
https://www.ncei.noaa.gov/data/global-
hourly/access/2021/01494099999.csv

Has all entries for station 01494099999 for year 2021. This includes, among
other entries temperature, pressure or window done pontentially several
times a day.

Let’s develop a script to download the data for a set of stations on an interval
of years. After downloading the data of interest we will get the minimum
temperature for each station.

2.1.1 Downloading data and computing minimum temperatures

Our script will have a simple command line interface, where we pass a list of
stations and an interval of years of interest. Here is the code to parse the
input (The code below can be found on 02-python/sec1-io-cpu/load.py):
import collections
import csv
import datetime
import sys

import requests

stations = sys.argv[1].split(",")
years = [int(year) for year in sys.argv[2].split("-")]
start_year = years[0]
end_year = years[1]

Here is the code to download the data from the server, to ease the coding
part, we will be using the requests library to actually get the file:
TEMPLATE_URL = "https://www.ncei.noaa.gov/data/global-
hourly/access/{year}/{station}.csv"
TEMPLATE_FILE = "station_{station}_{year}.csv"

def download_data(station, year):


my_url = TEMPLATE_URL.format(station=station, year=year)
req = requests.get(my_url) #1
if req.status_code != 200:
return # not found
w = open(TEMPLATE_FILE.format(station=station, year=year),
"wt")
w.write(req.text)
w.close()

def download_all_data(stations, start_year, end_year):


for station in stations:
for year in range(start_year, end_year + 1):
download_data(station, year)

The code above will write each downloaded file to disk for all the requested
stations across all years.

Now lets get all the temperatures in a single file:


def get_file_temperatures(file_name):
with open(file_name, "rt") as f:
reader = csv.reader(f)
header = next(reader)
for row in reader:
station = row[header.index("STATION")]
# date =
datetime.datetime.fromisoformat(row[header.index('DATE')])
tmp = row[header.index("TMP")]
temperature, status = tmp.split(",") #1
if status != "1": #2
continue
temperature = int(temperature) / 10
yield temperature

Let’s now get all temperatures and get the minimum temperature per station:
def get_all_temperatures(stations, start_year, end_year):
temperatures = collections.defaultdict(list)
for station in stations:
for year in range(start_year, end_year + 1):
for temperature in
get_file_temperatures(TEMPLATE_FILE.format(station=station,
year=year)):
temperatures[station].append(temperature)
return temperatures

def get_min_temperatures(all_temperatures):
return {station: min(temperatures) for station, temperatures
in all_temperatures.items()}
Now we can tie everything together: download the data, get all temperatures,
compute the minimum per station and print the results.
download_all_data(stations, start_year, end_year)
all_temperatures = get_all_temperatures(stations, start_year,
end_year)
min_temperatures = get_min_temperatures(all_temperatures)
print(min_temperatures)

For example to load the data for stations 01044099999 and 02293099999 for
the year 2021 we do:
python load.py 01044099999,02293099999 2021-2021

The output being:


{'01044099999': -10.0, '02293099999': -27.6}

Now the real fun will start: as we want to be able to download lots of
stations for many years, we want to make the code as efficient as possible
and for that we will use Python built-in profiling machinery.

2.1.2 Using Python’s built-in profiling module

As we want to make sure our code is as efficient as possible the first thing
we need to do is to find existing bottlenecks in that code. Our first port of
call will be profiling the code to check each function time consumption. For
this we run the code via Python’s cProfile module. This module is built-in
into Python and allows us to obtain profiling information from our code.
Make sure you do not use the profile module, as it is orders of magnitude
slower; its only useful if you are developing profiling tools yourself.

We can run
python -m cProfile -s cumulative load.py 01044099999,02293099999
2021-2021 > profile.txt

Remember that running with python with the -m flag will execute the
module, so we are running the cProfile module. This is Python’s
recommended module to gather profiling information. We are asking for
profile statistics ordered by cumulative time. The easiest way to use the
module is by passing our script to the profiler in a module call like this. In
our case, the genetics conversion script has a parameter which is the block
size.
375402 function calls (370670 primitive calls) in 3.061
seconds #1

Ordered by: cumulative time

ncalls tottime percall cumtime percall


filename:lineno(function)
158/1 0.000 0.000 3.061 3.061 {built-in method
builtins.exec}
1 0.000 0.000 3.061 3.061 load.py:1(<module>)
1 0.001 0.001 2.768 2.768
load.py:27(download_all_data) #2
2 0.001 0.000 2.766 1.383
load.py:17(download_data)
2 0.000 0.000 2.714 1.357 api.py:64(get)
2 0.000 0.000 2.714 1.357 api.py:16(request)
2 0.000 0.000 2.710 1.355
sessions.py:470(request)
2 0.000 0.000 2.704 1.352 sessions.py:626(send)
3015 0.017 0.000 1.857 0.001 socket.py:690(readinto)
3015 0.017 0.000 1.829 0.001 ssl.py:1230(recv_into)
[...]
1 0.000 0.000 0.000 0.000
load.py:58(get_min_temperatures) #3

The output is ordered by cumulative time, which is all the time spent inside a
certain function. Another output is the number of calls per function. For
example there is only a single call to download_all_data (which takes care
of downloading all data) but its cumulative time is almost equal to the total
time of the script. You will notice two columns called percall. The first one
states the time spent on the function excluding the time spent on the all sub-
calls. The second one includes the time spent on sub-calls. In the case of
download_all_data it is clear that most time is actually consumed by some
of the sub-functions.

In many cases, when you have some intensive form of I/O like here, there is
a strong possibility that I/O dominates in terms of time needed. In our case
we have both network I/O—getting the data from NOAA—and disk I/O—
writing it to disk. Network costs can vary widely, even between runs, as they
are dependent of many connection points along the way.

As network costs are normally the biggest time sink, let’s try to mitigate
those.

2.1.3 Using Local caches to reduce network usage

To reduce network communication, we will save a copy for future use when
we download a file for the first time. We will build a local cache of data.

We will use the same code as above, save for the function
download_all_data. The implementation below can be found on 02-
python/sec1-io-cpu/load_cache.py.

import os
def download_all_data(stations, start_year, end_year):
for station in stations:
for year in range(start_year, end_year + 1):
if not
os.path.exists(TEMPLATE_FILE.format(station=station,
year=year)): #1
download_data(station, year)

The first run of the code will take the same time as the solution above, but a
second run will not require any network access. For example, given the same
run as above, it goes from 2.8s to 0.26s: more than an order of magnitude
increase. Remember that due to high variance in network access the time to
download files can vary substantially in your case: this is yet another reason
to consider caching network data—having a more predictable execution
time.
python -m cProfile -s cumulative load_cache.py
01044099999,02293099999 2021-2021 > profile_cache.txt

Now the result is different in where time is consumed:


299938 function calls (295246 primitive calls) in 0.260
seconds

Ordered by: cumulative time


ncalls tottime percall cumtime percall
filename:lineno(function)
156/1 0.000 0.000 0.260 0.260 {built-in method
builtins.exec}
1 0.000 0.000 0.260 0.260
load_cache.py:1(<module>)
1 0.008 0.008 0.166 0.166
load_cache.py:51(get_all_temperatures)
33650 0.137 0.000 0.156 0.000
load_cache.py:36(get_file_temperatures)
[...]
1 0.000 0.000 0.001 0.001
load_cache.py:60(get_min_temperatures)

While the time to run decreased one order of magintude, IO is still top: now
its not the network, but disk access. This is mostly caused by the
computation being acually low.

Warning

Caches, as this example shows, can speed up code by orders of magnitude,


so we will revisit caches in other parts of the book.

But, cache management can be problematic and is a common source of bugs.


In our example the files never change over time, but there are many use
cases for caches where the source might be changing. In that case the cache
management code needs to be cognizant of that issue.

We are now going to consider a case where CPU is the limiting factor.

2.2 Profiling code to detect performance


bottlenecks
Here we are going to look at code where CPU is the resource costing the
most time in a process. We are going to take all stations in the NOAA
database and compute the distance between all of then—a problem of
complexity n2.
In the repository you will find a file—02-python/sec2-cpu/locations.csv
—with all the geographical coordinates of the stations. The code presented
here is available in 02-python/sec2-cpu/distance_cache.py :
import csv
import math

def get_locations():
with open("locations.csv", "rt") as f:
reader = csv.reader(f)
header = next(reader)
for row in reader:
station = row[header.index("STATION")]
lat = float(row[header.index("LATITUDE")])
lon = float(row[header.index("LONGITUDE")])
yield station, (lat, lon)

def get_distance(p1, p2): #1


lat1, lon1 = p1
lat2, lon2 = p2

lat_dist = math.radians(lat2 - lat1)


lon_dist = math.radians(lon2 - lon1)
a = (
math.sin(lat_dist / 2) * math.sin(lat_dist / 2) +
math.cos(math.radians(lat1)) *
math.cos(math.radians(lat2)) *
math.sin(lon_dist / 2) * math.sin(lon_dist / 2)
)
c = 2 * math.atan2(math.sqrt(a), math.sqrt(1 - a))
earth_radius = 6371
dist = earth_radius * c

return dist

def get_distances(stations, locations):


distances = {}
for first_i in range(len(stations) - 1): #2
first_station = stations[first_i]
first_location = locations[first_station]
for second_i in range(first_i, len(stations)): #2
second_station = stations[second_i]
second_location = locations[second_station]
distances[(first_station, second_station)] =
get_distance(
first_location, second_location)
return distances

locations = {station: (lat, lon) for station, (lat, lon) in


get_locations()}
stations = sorted(locations.keys())
distances = get_distances(stations, locations)

The code above will take a long time to run. It also takes a lot of memory. If
you have memory issues, limit the numeber of stations that you are
processing.

Let’s now use Python’s profiling infrastructure to see where most time is
spent.

2.2.1 Visualizing profiling information

Here we are going again to use Python’s profiling infrastructure to find


pieces of code that are delaying execution. But in order to better inspect the
trace, we are going to use an external visualization tool, SnakeViz—
https://jiffyclub.github.io/snakeviz/ .

We start by saving a profile trace:


python -m cProfile -o distance_cache.prof distance_cache.py

The -o parameter specifies the file where the profiling information will be
stored, after that we have the call to our code as usual.

Python provided module to analyze profiling information

Python provides the pstats module to analyze traces written to disk. You
can do python -m pstats distance_cache.prof which will start a
command line interface to analyze the cost of our script. You can find more
information about this module on the Python documentation or in the
profiling section of chapter 5.
To analyze this information we will use a web-based visualization tool called
SnakeViz. You just need to do snakeviz distance_cache.prof. This will
start an interactive browser window (Figure 2.1 shows a screenshot).

Figure 2.1. Using SnakeViz to inspect profiling information of our script.

Familiarizing yourself with SnakeViz interface

This would be a good time to play with the interface a bit. For example you
can change the style from Icicle to Sunburst (arguably cuter but with less
information as the file name disappears). Re-order the table in the bottom.
Check the Depth and Cutoff entries. Do not forget to click on some of the
colored blocks and finally return to the main view by clicking on Call Stack
and choosing the 0 entry.

Most of the time is spent inside the function get_distance, but exactly
where? We are able to see the cost of some of the math functions, but
Python’s profiling doesn’t allow us to have a fine-grained view of what
happens inside each function. We only get aggregate views for each
trigonometric function: yes there is some time spent in math.sin, but given
that we use it in several lines, where exactly are we paying a steep price? For
that we need to recruit the help of the line profing module.

2.2.2 Line profiling

Built-in profiling, like we used above, allowed us to find the piece of code
that was causing a massive delay. But there are limits to what we can do with
it. We are going discuss those limits here and introduce line profiling as a
way to find further performance bottlenecks in our code.

To understand the cost of each line of get_distance we will use the


line_profiler package which is available at at
https://github.com/pyutils/line_profiler. Using the line profiler is quite easy:
you can just need to add an annotation to get_distance:
@profile
def get_distance(p1, p2):

You might have noticed that we have not imported the profile annotation
from anywhere. This is because we will be using the convenience script
kernprof from the line_profiler package that will take care of this.

Let’s then run the line profiler in our code:

kernprof -l lprofile_distance_cache.py

Be prepared for the instrumentation required by the line profiler to slow the
code substantially, by several orders of magnitude. Let it run for a minute or
so, and after that interrupt it: kernprof would probably run for many hours
if you let it complete. If you interrupt it, you will still have a trace.
After the profiler ends, you can have a look at the results with:

python -m line_profiler lprofile_distance_cache.py.lprof

If you look at the output below, you can see that the it has many calls that
take quite some time. So we will probably want to optimize that code. At
this stage, as we are discussing only profiling we will stop here, but
afterwards we would need to optimize those lines (and we will do so later in
this chapter). If you are interested in optimizing this exact piece of code have
a look at the Cython chapter or the Numba appendix as they provide the
most straightforward avenues to increase the speed.

Listing 2.1. The output of the line_profiler package for our code

Timer unit: 1e-06 s

Total time: 619.401 s #1


File: lprofile_distance_cache.py
Function: get_distance at line 16

Line # Hits Time Per Hit % Time Line Contents


#2
==============================================================
16 @profile
17 def
get_distance(p1, p2):
18 84753141 36675975.0 0.4 5.9 lat1, lon1
= p1
19 84753141 35140326.0 0.4 5.7 lat2, lon2
= p2
20
21 84753141 39451843.0 0.5 6.4 lat_dist =
math.radians(lat2 -lat1)
22 84753141 38480853.0 0.5 6.2 lon_dist =
math.adians(lon2 - lon1)
23 84753141 28281163.0 0.3 4.6 a = (
24 169506282 84658529.0 0.5 13.7
math.sin(lat_dist / 2)* math.sin(lat_dist / 2) +
25 254259423 118542280.0 0.5 19.1
math.cos(math.radians(lat1)) * math.cos(math.radians(lat2)) *
26 169506282 81240276.0 0.5 13.1
math.sin(lon_dist / 2)* math.sin(lon_dist / 2)
27 )
28 84753141 65457056.0 0.8 10.6 c = 2 *
math.atan2(math.sqrt(a),math.sqrt(1 - a))
29 84753141 29816074.0 0.4 4.8
earth_radius = 6371
30 84753141 33769542.0 0.4 5.5 dist =
earth_radius * c
31
32 84753141 27886650.0 0.3 4.5 return dist

Hopefully you will find line_profiler’s output substantially more intuitive


than the output from the built-in profiler.

As we’ve seen, overall built-in profiling is a big help as a first approach; it is


also substantially faster than line profiling. But line profiling is significantly
more informative, mostly because built-in Python profiling doesn’t provide a
breakdown inside the function. Instead, Python’s profiling only provides
cumulative values per function, as well as showing how much time is spent
on sub-calls. In specific cases it is possible to know if a sub-call belongs to
another function, but in general that is not possible. An overall strategy for
profiling needs to take all this into account.

With that in mind, our profiling approach can be summarized as follows:


First try the built-in Python profiling module cProfile because it is fast and
does provide some high-level information. If that is not enough, use line
profiling, which is more informative but also slower. Remember, here we are
mostly concerned with locating bottlenecks; later chapters will provide ways
to actually optimize the code. Sometimes just changing parts of an existing
solution is not enough and a general re-architecturing will be necessary; we
will also discuss that in due time.

Other profiling tools

There are many other utilities that can be useful if you are profiling code, but
a profiling section would not be complete without a reference to one of
these, the timeit module. This is probably the most common approach that
newcomers take to profile code and you can find endless examples using the
timeit module on the Internet. The easiest way to use the timeit module is
by using IPython or Jupyter Notebook, as these systems make timeit very
streamlined. Just add the %timeit magic to what you want to profile, for
example inside ipython:
In [1]: %timeit list(range(1000000))
27.4 ms ± 72.5 µs per loop (mean ± std. dev. of 7 runs, 10 loops
each)

In [2]: %timeit range(1000000)


189 ns ± 22.6 ns per loop (mean ± std. dev. of 7 runs, 10000000
loops each)

This gives you the run time of several runs of the function that you are
profiling. The magic will decide how many times to run and the report basic
statistical information. Above you have the difference between a
range(1000000) and a list(range(1000000)). In this specific case, timeit
shows that the lazy version of range is two orders of magnitude faster than
the eager one.

You will be able to find much more details in the documentation of the
timeit module, but for most use cases the %timeit magic of IPython will be
enough to access its functionality. You are encouraged to use IPython and its
magics, but in most of the rest of the book we will use the standard
interpreter. You can read more about the %timeit magick here:
https://ipython.readthedocs.io/en/stable/interactive/magics.html.

Now that we introduced profiling tools, lets direct our attention to a different
subject: optimizing the usage of Python data structures.

2.3 Optimizing basic data structures for speed:


lists, sets, dictionaries
Here we will try to find inefficient uses of Python basic data structures and
rewrite pieces of code in a more efficient way. We will continue to use our
example. In this case we are going to try to determine if a certain
temperature occurred in a station during a specified time interval.

We will re-use the code from the first section of the chapter to read the data.
The code can be found in 02-python/sec3-basic-
ds/exists_temperature.py.

We will be using data from station 01044099999 for years 2005-2021:


stations = ['01044099999']
start_year = 2005
end_year = 2021
download_all_data(stations, start_year, end_year)
all_temperatures = get_all_temperatures(stations, start_year,
end_year)
first_all_temperatures = all_temperatures[stations[0]]

first_all_temperatures has a list of temperatures for the station. We can


get some basic stats with print(len(first_all_temperatures),
max(first_all_temperatures), min(first_all_temperatures))). We
have 141,082 entries with a maximum of 27.0 C and a minimum of -16.0 C.

2.3.1 Performance of list searches

Checking if a temperature is in the list is a matter of temperature in


first_all_temperatures. Lets get a rough estimate of how much time it
takes to check if -10.7 is in the list:

%timeit (-10.7 in first_all_temperatures)

The output on my computer is:

313 µs ± 6.39 µs per loop (mean ± std. dev. of 7 runs, 1,000


loops each)

Let’s try with a value that we know is not on the list:

%timeit (-100 in first_all_temperatures))

The result being:

2.87 ms ± 20.3 µs per loop (mean ± std. dev. of 7 runs, 100


loops each)

This is roughly one order of magnitude slower than our search for -10.7.

Why such a low performance? The in operator does a sequential scan


starting from the beginning. Which means the worse case it that the whole
list has to be searched—this is the case when the element that we are looking
for is not on the list which explains why the search for -100 is slower than
the search for -10.7. For small lists this is not relevant, but as the list grows
and the number of searches that you might have to do on those this can be
burdensome. There are substantially better algorithms with regards to time
complexity.

At this stage we have no number to compare against, but its safe to assume
that a milli- and even a microsecond range are not very encouraging. This
should be doable in orders-of-magnitude less time.

2.3.2 Searching using sets

But can we do even better? Lets convert our ordered list into a set and try to
do a search.
set_first_all_temperatures = set(first_all_temperatures)

%timeit (-10.7 in set_first_all_temperatures)


%timeit (-100 in set_first_all_temperatures)

With the time costs being:


62.1 ns ± 3.27 ns per loop (mean ± std. dev. of 7 runs,
10,000,000 loops each)
26.6 ns ± 0.115 ns per loop (mean ± std. dev. of 7 runs,
10,000,000 loops each)

This is several orders of magnitude faster than the solutions above! Why is
this. There are two main reasons: one related to set size and another related
to complexity. The complexity part will be discussed in the next sub-section.

With regards to the size, remember that the original list had 141,082
elements. But, with a set, all repeated values are collapsed into a single
value, and there are plenty of repeated elements on the original list. The set
size is reduced to print(len(set_first_all_temperatures)), i.e., 400
elements. 350 times less. No wonder searching is so much faster as the size
of structure is much smaller.

While being aware of repetion of elements in a list and the potential


advantages of using a set to have a much smaller data structure, there is also
a much deeper difference between the implementation of lists and set in
Python…

2.3.3 List, set and dictionary complexity in Python

The performance from the example above was mostly based on the de facto
reduction in size of the data structure when we switched from a list to a set.
What would happen if there was no repetition and hence list and set were the
same size? We can simulate this trivially with a range as all elements will be
different:
a_list_range = list(range(100000))
a_set_range = set(a_list_range)

%timeit 50000 in a_list_range


%timeit 50000 in a_set_range
%timeit 500000 in a_list_range
%timeit 500000 in a_set_range

We have a range of 0 to 99,999 that is implemented as both a list and a set.


We search both data structures for 50,000 and 500,000. Here are the timings:
455 µs ± 2.68 µs per loop (mean ± std. dev. of 7 runs, 1,000
loops each)
40.1 ns ± 0.115 ns per loop (mean ± std. dev. of 7 runs,
10,000,000 loops each)
936 µs ± 9.37 µs per loop (mean ± std. dev. of 7 runs, 1,000
loops each)
28.1 ns ± 0.107 ns per loop (mean ± std. dev. of 7 runs,
10,000,000 loops each)

The set implementation that we used as has much better performance. That
is because in Python (to be more precise in CPython) a set is implemented
with an hash. Finding an element thus has the cost of searching an hash.
Hash functions come in many flavors and have to deal with many design
issues. But when comparing lists and sets we can mostly assume that set
lookup is mostly constant—and will perform as well with a collection of size
10 or 10 million. This is not really correct, but as an intuition to compare
against list lookups it is reasonable.
A set is mostly implemented like a dictionary without values, which means
that when you search on a dictionary key, you get the same performance as
searching on a set

However, sets and dictionaries are not the silver bullet that they might seem
here. For example, if you want to search an interval then an ordered list is
substantially more efficient: In an ordered list you can find the lowest
element and then traverse from that point up until you find the first element
above the interval and then stop. In a set or dictionary you would have to do
a lookup for each element in the interval. So, if you know the value you are
searching for, then a dictionary can be extremely fast. But if you are looking
in an interval then it suddenly it stops being a reasonable option: An ordered
list with a bisection algorithm would perform much better.

Given that lists are so pervasive and easy to use in Python there are many
cases where more appropriate data structures exist but it is worth stressing
that lists are a fundamental data structure that have many good use cases.
The point is to be mindful of choices, not banish lists.

Tip

Be careful when using in to search inside large lists.If you browse through
Python code, the pattern of doing using in to find elements in a list (the
method index of list objects is in practice the same thing) is quite common.
This is not a problem for small lists as the time penalty is quite small and its
perfectly reasonable, but can be serious with large lists.

From a very down-to-earth software engineering perspective, the use of in


with lists can go from an unnoticed issue in development to a massive
problem in production. The common pattern is a developer testing with
small data examples, because feeding big data is normally not practical with
most unit testing. The real data might be very large, however, and once it’s
introduced, it could bring a production system to an halt.

A more systematic solution would be to test the code—maybe not always


but at least from time to time—with very large datasets. This can occur in
different stages of testing, from unit to end-to-end testing.
This should not be construed as an argument against using in with lists. Just
be mindful about the discrepancies between performance during
development and production due to data size.

For most searching operations, there is a substantially better family of data


structures than lists, sets or dictionaries: trees. But in this chapter we are
evaluating Python’s builtin data structures.

With regards to Python fundamental data structures, we will delay the


discussion of tuples until we discuss concurrency in the next chapter.

The whole topic of choosing appropriate algorithms and data structures is


the subject of many books - and typically some of the most difficult courses
in Computer Science degrees. The point here is not to have an exhaustive
discussion of the topic, but to make you aware of the most common
alternatives in Python. If you believe existing Python data structures are not
enough for your needs, you may want to consider other types of data
structures. This book’s focus is on Python, but other resources will cover
data structures outside of Python for example Data Structures and
Algorithms in Python from Michael T. Goodrich, Roberto Tamassia,
Michael H. Goldwasser, Wiley 2013 provides a good introduction.

You can find the time complexity of many operations over many existing
Python data structures in https://wiki.python.org/moin/TimeComplexity .

Now that we have mostly discussed time performance, lets discuss another
important performance issue with big datasets: conserving memory.

2.4 Finding excessive memory allocation


Memory consumption can be important for performance: It’s not only the
case that you might run out of memory, but effective memory allocation can
allow for more processes to be run in parallel on the same machine. Most
importantly, judicious memory use might allow for in-memory algorithms.

Going back to our scenario, we decided that we want to reduce the disk
consumption of our data. For that we are going to start with a study of the
content of the data files. Our objective is to load a few of them and do some
statistics on character distributions.
def download_all_data(stations, start_year, end_year):
for station in stations:
for year in range(start_year, end_year + 1):
if not
os.path.exists(TEMPLATE_FILE.format(station=station,
year=year)):
download_data(station, year)

def get_all_files(stations, start_year, end_year):


all_files = collections.defaultdict(list)
for station in stations:
for year in range(start_year, end_year + 1):
f = open(TEMPLATE_FILE.format(station=station,
year=year), 'rb')
content = list(f.read())
all_files[station].append(content)
f.close()
return all_files

stations = ['01044099999']
start_year = 2005
end_year = 2021
download_all_data(stations, start_year, end_year)
all_files = get_all_files(stations, start_year, end_year)

all_files now has a dictionary where each item contains the contents for
all the files related to a station. Let’s study the memory usage of this

2.4.1 Navigating the minefield of Python memory estimation

Python provides a function in the sys module, getsizeof, that supposedly


returns the memory occupied by an object. We can get a grip of the memory
occupied by our dictionary by doing:
print(sys.getsizeof(all_files))
print(sys.getsizeof(all_files.values()))
print(sys.getsizeof(list(all_files.values())))

The result being:


240
40
64

getsizeof might not return what you expect. The files on the disk are in the
MB range, so estimates below 1KB sound quite suspicious. getsizeof is
actually returning the size of the containers (the first is a dictionary, the
second is an iterator and the third is a list) without accounting for the
content. So, we have to account for two things occupying memory: the
content of the container container itself.

Note

Note that there is no problem with the getsizeof implementation in the


language, it is just that the expectation of an unsuspecting user is typically of
something different—namely that it would return the memory footprint of
everything referred in the object. If you read the official documentation you
will even find a point to a recursive implementation that solves most
problems.

For us, the intricacies of getsizeof are mostly a starting point to discuss
CPython memory allocation in deep.

Let’s get some basic information about our station data:


station_content = all_files[stations[0]]
print(len(station_content))
print(sys.getsizeof(station_content))

The output is:


17
248

Our dictionary has only one entry, corresponding to a single station. It


contains a list with 17 entries. The list itself takes 248 bytes, but remember
that doen’t include the content.

Now lets inspect the size of the first entry:


print(len(station_content[0]))
print(sys.getsizeof(station_content[0]))
print(type(station_content[0]))

The length is 1,303,981, corresponding to the size of but file. We get a size
of 10,431,904. This is around 8 times the size of the underlying file. Why 8
times? Because each entry is a pointer to a character and a pointer is 8 bytes
in size. At this stage this looks quite bad, as we have a large data structure
and we didn’t yet accounted for the data proper. Lets have a look at a single
character:
print(sys.getsizeof(station_content[0]))
print(type(station_content[0]))

This is colossal in size. The output is 28 with a type of int. So every


character, which should take only one 1 byte, is represented by 28 bytes. We
hence have 10,431,904 for the size of the list plus 28 * 1,303,981
(36,511,468) for a grand total of 46,943,372. This is 36 times bigger than the
original file!

Fortunately the situation is not as bad as it seems and also we can do much
better. We will start by seeing that Python—or rather CPython—is quite
smart with memory allocation.

CPython is able to allocate objects in a more sophisticated way and it turns


out that our approach to computing memory allocation is quite naive. Let
compute the size of only the inner content but instead of going through all
the integers in our matrix, we will make sure that we are not double
counting: In Python, if an object is used many times it gets the same id. So
if we see the same id many times, we should only count a single memory
allocation.
single_file_data = station_content[0]
all_ids = set()
for entry in single_file_data:
all_ids.add(id(entry)) #1
print(len(all_ids))

The code above gets the unique identifier for all of our numbers. In CPython
that happens to be memory location. CPython is smart enough to see that the
same string content is being used over and over again—remember that each
ASCII character is represented by an integer between 0 and 127—and as
such the output of the code above is 46.

So, dumb allocation of memory would be dreadful, but Python, or to be


more precise CPython, is much smarter. The memory cost of this solution is
"just" the list infrastructure (10,431,904). Note that in our case we only have
46 distinct charaters, and Python is quite good, which such a small subset, to
do smart memory allocation. Do not expect this best-case scenario to occur
all times as it will depend on your data pattern.

Object caching and reuse in Python

Python tries to be as smart as possible with object re-use, but we need to be


careful with expectations. The first reason is that this is implementation
dependent. CPython is different from other Python implementations in terms
of this behavior.

Another reason is that even CPython makes no promises about the most of
its allocation policies from version to version. What works for your specific
version might change in a different version.

Finally even if you have a fixed version, how things work might not be
completely obvious. Consider this code in Python 3.7.3 (this might vary on
other versions):
s1 = 'a' * 2
s2 = 'a' * 2 #1
s = 2
s3 = 'a' * s #2
s4 = 'a' * s
print(id(s1))
print(id(s2))
print(id(s3))
print(id(s4))
print(s1 == s4) #3

The result will be:


140002256425568
140002256425568
140002256425904
Random documents with unrelated
content Scribd suggests to you:
ground was incrustated with salt; all the nullahs were white with it,
and to all appearance we were leaving bad for worse.

At sunset, after which time it would have been impossible to


proceed and when most had given up hope, we came to a nullah
running down from the north, and to the surprise and delight of all
we found good water a few feet below the surface, and a small
quantity of boortsa on the adjacent hills. A strong north wind blew
hard during the night, which made us wonder how our tent ever
withstood the tension. Two or three miles further on from this place,
we came to the bed of a salt lake partially dried up. Here again
misfortune overtook us, for some of the animals got bogged, and
nothing but an absolute desolation of salt land still loomed ahead of
us. The going became so heavy that poor Sulloo on his pony, being
unable to keep up, was left miles behind. It is impossible to picture
such a barren land as we were in, and it seemed as though there
would never be an end to it as long as we pursued our eastern
course. We therefore struck a more northerly one, and after
eventually getting beyond the salt belt marched east again. In some
places we noticed a large amount of yellow soil and in others of
bright red. In spite of our manœuvre we came to another dried salt
lake, a disheartening obstruction, and when our doubled exertions
seemed to be hopeless and our trials at their worst, we saw through
our glasses, some considerable way off, a small patch of grass on a
bit of rising ground. We were at once inspired with new life and
marched straight for this harbour. The grass we reached grew at the
foot of a nullah that led over an easy pass, so we allowed the mules
to enjoy a few minutes' grazing before commencing the ascent. This
grass was quite green, and the joy of the animals at meeting with
such food was clearly manifest by the avaricious way in which they
tore it off, and the marvellously quick way in which they ate it. Later
on we came to a broad, sandy nullah, with abundance of good water
just below the surface. The nullah, too, was itself well sheltered from
the cold winds by the higher ground on all sides, where splendid
grass was sprouting. Antelope had made no mistake in choosing this
as one of their haunts to wander into. In order to counteract the
results due to such depressing and demoralizing country as we had
just passed through, and as Sulloo and Tokhta were still in the rear,
we determined to make the most of our opportunity and halt for half
of the following day.

MALCOLM AT BREAKFAST WITH ESAU.

This was now the 10th of July and we had reached Camp 51. We
were well repaid for our decision, for the following morning was
perfectly glorious—not a cloud, not a breath of wind was there to
mar the quietude that man and beast at this time so much needed.
To commemorate the occasion, I photographed Malcolm enjoying his
breakfast just outside the tent, with Esau standing by the other side
of the table, holding in his hand a dish of luxuries!
About midday, Tokhta, Sulloo, and the pony walked slowly into
camp. They persisted that nothing on earth would induce them to
travel onwards another step; poor fellows, they had reached what
seemed to them a perfect haven of rest; they must have felt
thoroughly worn out, for all they wanted to do was to remain where
they were and quietly die. It was quite certain that it would have
been madness for us to remain with them, for only a few more days'
rations remained, and our only chance of getting through the
country at all lay in our coming across nomads from whom by hook
or crook we could get supplies. We did think of leaving some men
behind, while a small party marched on as fast as possible with light
loads in search of people, but these men did not relish being left,
and supposing there were no people to find, our situation would
have been still more critical. We ended our problem by leaving the
two sick men with a pony and a supply of food and drinking utensils,
etc., so that if they felt inclined they might follow after, for they
would have found no difficulty in tracking us. We buoyed them up,
too, with the hopes we entertained of shortly finding people, when
we would at once send back assistance to them. We also
endeavoured to persuade them to make an effort in reaching a fresh
camp each day, by marching and halting according to their
inclination, for we told them we should only make short marches,
and at each camp we would leave a supply of food for them and
some grain for the pony. It was a sad thing having to leave these
men and the pony as we did, and when we halted for the night and
the sun began to set calmly over these vast solitudes, there was no
sign of their coming, look back as we might to the far-off hills for
some tiny, distant, yet moving, speck. The darkness of night soon
gathered around, and we could only wonder how close they might
be to us. The next day we saw new life, for Malcolm had a shot at a
wild dog, while I saw two eagles; such sights as these at once set
our imagination at work, for we argued as to how could these
creatures exist unless people were living somewhere close. At the
same time it brought encouragement to all.

Towards evening, after making two short marches during the


day, we camped south of the snow range we had been steering for,
but there were no signs to tell us that the three abandoned ones
were following. More food and grain was left here, and we moved off
soon after 4 a.m. It is, as some will know, chilly work sallying forth
before sunrise when the minimum registers over twenty degrees of
frost; and as one tramps along, marching only two miles an hour
with the animals, one eagerly watches for the first tip of the sun to
appear, meanwhile warming the hands alternately inside the coat,
for we always made a point of carrying a rifle each.

It was my turn to go on ahead to-day, and after a brisk walk of


five miles I came upon a most inviting spot. There were two tiny
fresh-water lakes, surrounded by grassy hills, with the snow peaks
on the northern side feeding the hills below with a daily supply of
water. Fearing disappointment in that the water might be salt, I
hastened on to the two pools, and, as I expected, they were fresh,
so I hurried back to climb some rising ground, from whence the
caravan would be in sight and earshot. There the firing off of my
gun announced to them, according to previous arrangement, that
water and grass had been found.
WE CAMP BY TWO FRESH-WATER POOLS.

Whilst enjoying our midday halt a couple of antelopes and sand-


grouse came to drink, and fell victims to our guns for their
greediness. We all revelled in the abundance of such good things,
and would have much liked to lengthen our stay; but on inspecting
our supplies we found the men had only fifteen days' rations left. We
tried hard to persuade them to subsist on half rations, as we
ourselves had been doing, and although at the time they finally
expressed their willingness to do so, and saw the expediency of the
plan, still they made their promise when filled with immense meals
of hallaled antelope and heavy chupatties, and we doubted their
power of abstaining. If they could have managed to show more self-
control over their food, we reckoned in thirty more days on half
rations we should cover another three hundred miles, and we
considered the fact of travelling that distance further on, without
meeting anybody, was an absolute impossibility; besides, the general
appearance of the country was improving, and on that very evening
we actually encamped on the grassy banks of a small running
stream. It was an enticement to us to follow up this rivulet, but the
extreme southern course it took outweighed our wishes. The men
already began to grumble that they could not work on half rations,
this too when they had vast supplies of meat from the antelope. The
only advice we could give them, was to eat up their food as fast as
they could, and then, when it was all gone, they would have to exist
on still less than they were now doing, if they wanted to live at all.
Whatever argument we brought forward had no weight with such
men, who would only think of appeasing their wants for the time
being. Although we spoke to them harshly, still it was our fixed
intention to strive our utmost to shoot game for the men, so that
they might save a little of their rations, and sometimes at our
midday halt we would sally forth with rifles to try and bag something
before the afternoon's march. Even then it was only in a grumbling
frame of mind that a man would accompany us to hallal the animal.

The muleteers, too, began on some occasions to quarrel


amongst themselves, and to threaten all kinds of punishments to
one another; but the mere threatening, and actual carrying out, are
very different things. Still, there was a kind of feeling in the air that
unless we got assistance some calamity would befall us. Whilst on
the march a big black mule died, and as the pony with Sulloo had
not come in, this reduced us to only twelve animals, a very small
number indeed for the men to load and look after, and a very small
cause for them to complain of overwork.

At midday halt the spirits of the muleteers became more


discontented than ever, for no water could be found for a long time,
nor would any of them bestir themselves in the matter; so
unreasonable had they become that we doubted our ever being able
to forget and forgive their failings when we came to the end of our
journey. Towards evening we saw a fresh-water lake, and camped a
mile or so east of it, choosing a spot by some good grazing. As the
animals had had no water that day, we drove them down to the
lake, but the banks all round were treacherous to such a degree that
one mule only just escaped drowning, so difficult was it to drag him
out of the heavy mud; and when we eventually did, he had been so
long in the water and was so benumbed, for the sun had set and a
bitter blast was blowing, that when we got him back to camp he was
too far gone to think of eating. He was a fine, powerful mule, and
his loss would have been severely felt by us. All our warm putties,
etc., were given up on this occasion, and the frozen mule was
bandaged up almost from head to foot. The following morning great
was our relief at finding he was none the worse whatever for his
lengthened drink at the lake.
CHAPTER XI.
SHOOTING AN ANTELOPE—SNOW—A MYSTERIOUS TRACK—THE BED OF AN
ANCIENT LAKE—EMOTION OF MAHOMED RAHIM—VARIABLE WEATHER—MORE
ANTELOPES SHOT—THEODOLITE BROKEN—EXTRAORDINARILY SUDDEN WIND
—HUNGER v. CEREMONY—NEW FINDS.

Before starting forth again we upbraided the head muleteer,


Ghulam Russul, for his chicken-heartedness and bad example to the
rest of the men. He denied a grumbling spirit, and said he was brave
and ready to undergo any hardship, and follow us anywhere, but as
to the other men, he said they were a discontented lot. Knowing as
we did how he influenced them, his statement bore no weight with
us. We made a double march over undulating grassy country,
intersected by some broad gravel nullahs, running almost at right
angles to our course. South of us lay a range of hills running east
and west, making it appear as though a river were running along at
their base, and for this reason we intended to steer gradually for
them.

At night we found a well-sheltered nook, with water close at


hand, and such splendid grass that we were induced to remain half
the next day and feed up; besides, there had been no signs at all of
Tokhta and Sulloo, and we considered this the last hope of ever
seeing them again. Henceforth we would make no more provision
for their coming, in the shape of leaving food and grain behind. It
was fortunate we had fixed on a half day's halt, for at daybreak
there was a strong wind blowing from the north with driving sleet.
The grass at this camp, No. 56, had far more nourishment in it than
any other grass we had come across up to the present, and Ghulam
Russul remarked that if we saw more grass like it we were sure to
come across nomads, so all were, for the time being, in a more
hopeful frame of mind. The melting snow had made the marching
somewhat heavy, and there was no lack of water.

Soon after starting, Malcolm, who had gone on ahead, came


hurrying back to me, with the request that I would come with him
and shoot an antelope, for he had seen a great number of them.
Shortly afterwards we saw a large herd grazing or playing about as
antelopes do, but one of them, without the slightest provocation,
came trotting towards us; perhaps he was wondering whatever
could have brought such queer-looking creatures there as we were.
We sat down to make ourselves still more mysterious and to receive
him, when suddenly his instinct seemed to tell him that there was
just a suspicion of danger attached to us, for he started off at a
gallop, crossing our front at about fifty yards distance. We both fired
simultaneously at so inviting a mark, and both hit. It was a sight
worth seeing, an antelope retreating at top speed in a second
bowled over quite dead, so much so that when Mahomed Rahim,
who was at hand, rushed up to hallal him, no blood would flow from
the operation, and the men declared it was not fit for them to eat. In
one way it was satisfactory to hear them say this, for we were
convinced that up to date they had not been suffering from hunger.
If they had, no hesitation, through religious scruples, would have
arisen about eating the antelope's flesh.

We halted by a lake whose water tasted very nearly fresh, but


the banks were so treacherous that it was a hazardous undertaking
to get close to it, and after our previous experience we preferred
digging instead. This very likely accounted for the absence of game
in the neighbourhood, that they could not get to the water; but it is
difficult to account for the absence of birds on the lake itself, for
there was not even a Brahmini duck. A bright night, and we made
preparations for a very early start the next morning, but the ill luck
that sometimes accompanied us had brought another storm, so that
the ground bore a very white appearance, and all idea of marching,
for the time being, had to be abandoned. By noon, although the
ground was still heavy, we ventured forth again, and hit off another
large lake containing water very nearly fresh.

Throughout the day storms continued to rage around us amidst


the adjacent hills, but, fortunately, none fell actually over us; we
could not help reflecting how all this snow must have entirely baffled
Sulloo and Tokhta in their tracking us, that is, if they had attempted
to do so. Everywhere the country was beautifully grassy, and
occasionally we picked a new species of flower. Another large lake
was situated to the north of us, and during our march down the
valley the hills that lay both north and south were gradually closing
in. As we proceeded, the going became more difficult. In addition to
another snowstorm which had fallen during the night, the valley
became split up by irregular nullahs and hills running in every
direction, with no defined features.

We continued making our double marches, and as the loads


were becoming lighter we hoped to cover fifteen miles a day. One
great continuous anxiety was the task of finding enough game to
shoot, that we might all live. At one time it would be plentiful
enough, at others for days we could find nothing. On some nights
we registered over twenty degrees of frost, and still remained over
16,000 feet above the sea level, and at this great height we actually
saw a brown butterfly.
On the 20th of July we began to notice the days were growing
shorter, as the sun would rise just a few minutes before 5 o'clock;
but the whole country appeared to be changing for the better, which
in no small degree alleviated our fears of being able to get across
this high plateau before the cold weather should set in. Generally
speaking, everywhere there was more grass growing, and, instead of
the coarse tufts we had been accustomed to see, their place was
taken by short crisp grass, the kind of growth that is so much sought
after by the nomads. We were, too, making a very gradual descent,
and felt convinced that, with such natural signs, we must before very
long hit off streams which would lead us to some sort of civilization.

At our midday halt the men's spirits were more cheerful. We had
stopped in a fine broad nullah, running nearly due east, with
pleasant-looking grassy hills sloping down on either side, and, with a
cloudless sky and no wind, we were glad to sit in our shirt-sleeves,
whilst our twelve veteran mules, with their saddles off, rolled in the
sand before enjoying the rich grass and water. We began to pick
fresh additions to our flower collection, the specimens being chiefly
of a mauve or white colour, and up to the present time we had only
found one yellow flower. At 7.30 p.m., in Camp 61, at a height of over
16,000 feet, the temperature was forty degrees Fahrenheit, and
during the night there were nineteen degrees of frost. Fine grass
and fine weather still favoured us, while the presence of a number of
sand-grouse indicated that water was at no great distance off.

Just after leaving Camp 62, we were all struck with wonderment
at finding a track running almost at right angles to our own route. It
was so well defined, and bore such unmistakable signs of a
considerable amount of traffic having gone along it, that we
concluded it could be no other than a high road from Turkistan to
the mysterious Lhassa, yet the track was not more than a foot
broad. Our surmises, too, were considerably strengthened when one
of the men picked up the entire leg bone of some baggage animal,
probably a mule, for still adhering to the leg was a shoe. This was a
sure proof that the road had been made use of by some merchant or
explorer, and that it could not have been merely a kyang or yak
track, or one made use of only by nomads, for they never shoe their
animals in this part of the world.

Such a startling discovery as this bore weight with the men, and
nothing would have suited their spirits better than to have stuck to
the track and march northwards, and they evidently thought us
strange mortals for not following this course; therefore, instead of
being elated with joy, they became more despondent than ever
when they found we were still bent upon blundering along in our
eastern route. But it was our strong belief that we should for a
certainty find people in a very few days' time, and this being the
case, we did not see the force of travelling in a wrong direction, and
put aside the objects for which we had set out, just to suit the
passing whim of a few craven-hearted men, especially when we
knew that the cause of their running short of food and consequent
trouble was entirely due to their own dishonest behaviour. We did,
however, send one man, Mahomed Rahim, supplied with food, with
instructions to follow the road north as far as he had courage to go,
thinking that when he had crossed a certain range of hills he would
discover the whereabouts of people. Furthermore we explained to
him the way we intended going, so that there could be no chance of
his losing himself.

A mile or so further on we came to the dried-up salt bed of a


very ancient lake. The salt was in every shape and form of
crustation, and the whole lake for several miles across was divided
up into small squares with walls one to three feet high, rugged and
irregular. The going across this was troublesome and arduous, first
stumbling over one wall, then crossing a few yards of crumbling,
crystallised salt before another had to be scrambled over. Thus it
went on for mile after mile, and the length of the lake being most
deceptive it seemed as though we should never, never cross it. As
the sun rose higher some of the salt composite melted, and then we
found ourselves first in slush, then on a bit of hard, rugged going,
most liable to cause a sprain to any of the mules. It became evident
that unless we were pretty smart in getting off the lake altogether,
we should find ourselves bogged there for the rest of the day; thus
our first idea of going straight ahead across the lake had to fall
through, and we steered for the nearest shore, which was on the
southern side, all the time the ground getting worse and more
treacherous. When, only in the nick of time, we did stand on a
sound footing again, we congratulated ourselves that for once only
had we deviated a short way from our course. Although this salt bed
had proved such an unforeseen obstacle in our line, still it was useful
to us in another way. The salt was of an excellent quality, and we
were able to replenish our store of this most necessary article, of
which there was so little left that we were carefully economising it.

During the morning's march next day we shot an antelope and a


kyang, and not wishing to delay the mules or to overload them, we
left two men behind, Ghulam Russul and Shukr Ali, to cut up the
meat and bring it in, whilst we continued in search of a suitable spot
for a midday halt. This was a plan we frequently adopted, and there
were always volunteers to stop behind, for by doing so they took
good care to light a fire and feast on the tit-bits to their hearts'
content, and well fortify themselves before carrying the load of meat
to their fellow muleteers.

We had halted, and were expecting the arrival of these two men,
when Mahomed Rahim, who had been sent to follow up the track,
rejoined us, and as he approached we could see he was weeping
bitterly. On asking the man what ailed him, he sobbed out that he
had lost his way. He was a ludicrous sight, for he was a great, big,
strong fellow, and we asked him, if he wept like this at finding us
again after only being absent a day and a night, how would he weep
had he not found us at all? We fed up the great baby with some
unleavened bread, which he ate voraciously amidst his sobs. Some
kyang came trotting up to camp with a look of wonderment at our
being present there, and as we were about to move off some
antelopes also came to inspect us.

The men carried quantities of cooked meat about their persons,


wrapped up in their clothes, and as they tramped along they
munched almost incessantly at the tough food tending to make them
very thirsty, so that when we halted for the night they suffered
considerably, for the water we dug out was too salt for drinking.

The following morning we came to a most dreary-looking region,


ornamented only with a big salt lake, without any vegetation or kind
of life, making us eager to get across such a solitude. At the east
end of the lake we marched over rising ground up a nullah about a
couple of miles before we came to some fairly good grass, where we
called a halt, never dreaming that we were doomed to an
unpleasant disappointment. On getting up some water from below
the surface, we found it to be the worst we had tasted, quite
impossible for man or beast to drink. Two of the men, however, did
gulp some of it down, and suffered in consequence for their
indulgence. Their thirst became far more acute than was that of the
rest of us. We were afraid that should we find no water by the
evening, it would go badly with all. Some of the animals were too
thirsty even to eat the grass. We, therefore, made an earlier start
than usual, sending on ahead a couple of men to search for water in
some likely-looking ground that lay some distance on in front on our
right flank.

As we were marching along in silence, we suddenly saw the two


men were coming towards us, and as soon as they drew near
enough for the other muleteers to see by their animated appearance
that they had found water, they made a general rush towards them,
forgetful of what became of the mules, or whether Malcolm and
myself had any water at all. Their one and only thought, as usual,
was themselves. A few miles further on we found two pools of good
water, and resolved to remain there half a day to give the animals a
chance of regaining their lost strength.

During the night our tent had great difficulty in withstanding the
wind, that blew with much violence, while the temperature fell to
twenty-one degrees of frost. As we had run short of iron pegs, we
found a most efficient substitute in fastening the ropes to our tin
boxes of ammunition. On other occasions, too, the ground was so
sandy that pegs were entirely useless, and each rope had to be
fastened to a yakdan, or to one of our bags of grain.

During the afternoon we marched along a broad, grassy, and


somewhat monotonous valley, steering for some snow peaks we had
seen the previous day. We found no game, excepting sand-grouse,
which, by their unmistakable notes, made their presence known in
the mornings up to 8 or 9 o'clock, and after sunset.

On the 26th July we left Camp 66, moving off by moonlight, for
the going was easy. On halting for breakfast, two antelopes ventured
to come and have a look at us, and, of course, paid the penalty of
death. Such an opportunity as this was not to be thrown away, and
laying them together, I photographed them, and afterwards cut
them up, carrying as much meat as we possibly could manage—
enough for three or four days' consumption. The afternoon was hot,
like a summer's day in England. Some yak, resembling big black
dots, could be seen in several of the grassy nullahs: a trying
temptation to have a stalk after them, for the ground was of such a
nature that with care one might have come up to within a hundred
yards of some of them without being seen. But then it would have
been useless to slaughter them, so we contented ourselves with
watching their movements, and with making out what we could have
done had we been merely on an ordinary shooting trip, or had we
been hard up for meat.

TWO ANTELOPES ARE SHOT CLOSE TO CAMP.

We met with a great misfortune that afternoon, for one of the


mules had been loaded so carelessly that its baggage, consisting of
two yakdans, fell off with a crash on to some ground as hard as
rock. One of these yakdans contained my theodolite, and on opening
up for the evening's observations, I found the top spirit level was
broken, and from that time I had to be dependent only upon the
sextant.

As usual we were off by 4.30 a.m., and going on ahead, I


climbed up some hills to spy out the land. It was pleasant walking,
for grass grew everywhere, and in the lower-lying ground were
flowers and water. On crossing a certain ridge I saw two yak grazing
quietly, as they probably had done without any interruption ever
since they had been dependent upon themselves for picking up a
living. I sat down silently, without, however, attempting
concealment, to enjoy the sight of watching carefully, at so short a
distance, the habits of these massive, dark-haired cattle at home in
their wild state. At length the caravan, which had been marching
along on much lower ground, over grassy valleys, came in sight, a
signal that I must push ahead again and reconnoitre. I rose,
therefore, and walked up towards the two yak, and one of them was
so tame and eager to make out whatever on earth I was, that he
allowed me to walk up to within forty yards of him, so that, had I
chosen, I might have given him a very telling shot. As it was, he
merely trotted off a short way and started grubbing again.

Ahead of us was a range of mountains, an imposing sight, with


grand snow peaks, the very ones we had been steering for. From the
high ground it seemed as though there was a pass leading over
them between two of the peaks, but entirely without vegetation. It
was impossible to make out how far the pass went, and what would
be in store for us after we had reached the point as far as we could
see. We calculated that the climb in our present condition could not
have been done in one march, and wondered how we could
strengthen our animals sufficiently for the second march, if there
were no grass at the end of the first. We knew from experience that
an ascent of this description would have taken more out of our
mules than several days of ordinary marching, and therefore
determined to abandon the idea of surmounting the pass, or rather
what appeared to be a pass, but to strike north, finding a way
somehow or other round the entire range.

As we steered for some extra good-looking grass and water by


which to make our midday halt and give the mules their midday
graze, a couple of inquisitive yak actually came trotting after us,
keeping at a distance of two or three hundred yards. Such boldness
augured well for a plentiful supply of good meat in the future. We
were glad to pitch our tent in this pleasant spot for a few hours, and
even under that shade the maximum thermometer registered
seventy-five degrees.

Having breakfasted off our antelope meat and some good tea,
we were busy with our maps, and drying flowers, etc. Everything
was spread out—for such frail specimens it was a splendid
opportunity; the men were sleeping, too. The mules, having eaten
their fill, were standing still enjoying the rest and perfect peace; all
was absolute silence, with the exception of our own chatting to each
other, as we amused ourselves with our hobbies, when without a
moment's notice a powerful blast of wind caught us with such
violence that the tent was blown down and many things were
carried completely away, and our camp, which only a second ago
had been the most peaceful scene imaginable, became a turbulent
one of utter confusion, as every one jumped up in an instant,
anxious to save anything he could lay hold of, or to run frantically
after whatever had escaped—for some things were being carried
along at a terrific rate. Fortunately the loss, compared to the
excitement, was trifling; but we made up our minds not to be caught
napping in this way again.

That same afternoon, after marching north, we crossed a river


that took its rise from the snow peaks; the bed was sandy, about
half a mile across, with several small, swiftly-flowing streams about a
foot deep, which had to be crossed barefooted. This was the largest
body of water we had as yet come across, and there was much
speculation amongst us as to where it would lead, and we thought
we should at any rate not lose sight of its course. Splendid green
grass and flowers were flourishing everywhere. Vegetables, too,
were a valuable addition to our table, for besides the "kumbuk" and
"hann," we here first found the wild onion, which afterwards formed
the chief staple of our food. Onions cut up into pieces and fried in
yak's fat, was a dish appetising at these great heights in the absence
of other food, besides being very sustaining and an excellent
medicine for all internal complaints. On some nights the mules and
ponies were wont to stray, but with such good grass close at hand,
and the presence of water in more than one place, as a rule they did
not go very far; but, as we could not run the risk of a long delay, the
first thing in the morning, they were nearly always watched
throughout the night in turn by the men. We found a nullah with a
small stream in it running eastwards, rising all the time, and
marched up it, leaving the river to wind its way north; we had no
real fear of losing it, for we could see it turned east again later on.
At the top of the pass we found another nullah running northwards,
and followed this down to a prairie-like looking valley, thence on to a
beautiful lake. At the western extremity we could see it was fed by
the river we had crossed the day before. All around the valleys and
hills were green, and on many of them the grazing yak were dotted
about in great numbers.

As we were now running short of meat I instituted a stalk


against one of them, and took a vast amount of trouble and exertion
in order to come to a close range before firing, little knowing that it
was a waste of labour, as one could have approached them with
taking only ordinary precautions. Close to the yak were several
kyang, who were the more watchful of the two, for they were the
first to notice my crawling along and at once stood up in
bewilderment, but beyond that they did nothing more, so that I was
enabled, without in the remotest degree disturbing the yak, to get
within sixty yards of them. There I took my shot and bowled over
with a single bullet the one which I considered to be the juiciest-
looking one in the herd. The rest of them merely raised their heads
for a moment at the unwonted noise, and then began to graze
again, making no attempt to escape. I, too, then rose, and it was
only after a deal of shouting that they grasped that it really was
rather dangerous to remain where they were, thereupon off they
trotted across the valley, far, far away. Not so the herd of kyang, who
appeared the most disturbed at first; they continued to manœuvre
around the whole time we were there, as though inviting us to try
our skill on them, but one dead yak is oceans of meat for a much
larger caravan than ours, for many a day.

As soon as one of the men had come up, I told him to look
sharp and cut its throat for it was not quite dead, although in reality
it had breathed its last some ten minutes ago. He at once set to
work, but so tough was the hide, and so blunt his knife, that he
could not cut through it, and merely first pricked it with the point;
and although no blood exuded, he nevertheless told the other men
that he had properly hallaled the brute, and they by this time having
become less scrupulous with regard to their religious custom, made
no bones about arguing as to the meat being unfit for them to eat.
As a matter of fact they were beginning to learn what real hunger
was. Some of them came to help cut off the meat in a business-like
sort of way, pretending not to examine the throat at all.

As we made our midday halt only a hundred yards from the


carcass, all fed right royally, and carried off large lumps of the flesh
as well. The men, too, were in high spirits, for they had found a very
old chula, or fireplace, consisting of three stones, and what was still
more joyful tidings, close to the dead yak ran a narrow track actually
in the direction we intended going.

About here we also saw some new creatures—large marmots,


butterflies, and hoopoos. I skinned one of the latter. Such fresh
sights, and the discovery of the track in addition to the improvement
in the climate, the grass and abundance of water, made all eager to
be off again in expectation as to where the track would lead us.
CHAPTER XII.
A FOOTPRINT—SHAHZAD MIR INDISPOSED—DESERTION OF MULETEERS—A
RAINY NIGHT.

It was now the 28th of July, and we had reached a spot between
our night encampments 69 and 70, the day camps not being
recorded in the map. Since leaving Lanak La on the 31st of May, we
had been daily finding our own way across country, over mountains
and valleys, along nullahs and beds of rivers, etc., and at last we had
found a track we could follow. Such a sensation was novel to us. We
could scarcely grasp that there was no need to go ahead to find a
way. We had simply to follow our nose. We thought that our troubles
were nearly finished, and for the rest of our journey that there
would be easy marching, and every moment we quite expected that
the dwellings of mankind would heave into sight. Especially, too,
when one of the men picked up a stout stick, three or four feet long,
which must have been carried there by somebody or other, for since
leaving Niagzu the highest species of vegetation we had seen was
the wild onion. Some of the men also declared that they had found a
man's footprint. Personally we did not see this sign of civilization, but
the men maintained there could be no mistake about it, for they said
it was the footprint of a cripple!
Besides all this comforting news, there was no need to be
tramping over the hills in search of game for food. The antelope,
yak, and kyang were plentiful and easily shot in all the valleys, and,
had we been so disposed, we might have shot a dozen yak during
the afternoon's march. When we halted for the night one of the wild
yak actually came and grazed amongst our mules!

We camped at the entrance of a winding nullah, along which


grew rich grass, and being tempted by the shelter, some of the
mules wandered up it during the night and thus forced upon us a
late start the next morning; but as there was a strong wind blowing,
this somewhat counterbalanced the otherwise too over-powerful
heat of the sun.

Our track led up a fine grass valley, where we could actually


smell the wild flowers, but as we continued the track became less
defined, till eventually there was no track at all. We spread out to
the right and left hand, but without success. Whether it had turned
off to north or south it is impossible to say. For the moment we were
disappointed at the overthrow of all our hopes, and instinctively felt
that our journey had not quite come to an end.

We had been marching uphill, and at the top of the valley found
a fast-running rivulet taking its rise from the snow mountains that
lay south of us, the same range that had blocked our way and
compelled us to make the detour. Added to the work of once more
having to find our own way, the country took a change for the
worse. Although there was no difficulty about the water, still there
was less grass, the soil became slatey, and in places barren. Storms
began to brew around us, but we were lucky in being favoured with
only some of the outlying drops. We had a perfectly still night with
one degree of frost.
It had been our custom, especially on dark nights, to make the
men take their turn of guard over the mules, to watch and see that
they did not stray. They were far too precious to lose, and by
marching in the early morning, felt less fatigue.

On this particular morning, 30th July, Camp 70, no mules were


forthcoming at the time when we wanted to load up, and it turned
out that the man, Usman, who should have been on watch, was fast
asleep in some secluded corner. It was only the previous night that
this very man, after unloading the mules, had been sent to fetch
some water for the other men and ourselves, but as we waited and
there was none forthcoming, another man was sent to see what was
the matter. He found Usman, having had a good drink himself,
contentedly sleeping with the empty water-skin by his side. We
therefore had no inclination to go in search of him on this particular
morning, but after collecting the mules from all quarters, loaded up
without him.

Our twelve mules with fairly light loads seemed to be stronger


and fitter than they had been for a long time, no doubt due to the
excellent green and sufficiency of water they had of late enjoyed.
We had besides become better acquainted with the carelessness and
laziness of our men, whom we used to watch very closely, never
trusting them entirely.

After marching about a mile, we crossed a narrow track running


north and south. Here again we were much tempted to take the
northern route, but as our mules were so fit, we still stuck to our
eastern one, daily expecting more than ever to find people.

Another inducement for doing so was that of late there had been
little difficulty in keeping all well supplied with meat. It thus
happened that when everything was in our favour, we were sanguine
of accomplishing our journey without any further mishap. We
crossed over several cols and saw fresh-water lakes, while yak,
kyang, antelope, and sand-grouse were plentiful.

Storms had been threatening a great part of the day to break


over us, but were held in check by some extra high peaks. In the
evening, however, we had crossed a broad sandy bed of a river,
wherein a shallow stream was flowing, and had just pitched our
tents in a small sandy nullah, well sheltered from the wind, when
down came the rain in real earnest.

We were sorry to find that Shahzad Mir had not come in, though
very shortly the man who was carrying the plane-table walked up,
saying that Shahzad Mir had stopped the other side of the stream
with a pain in his stomach. We knew quite well what was the cause
of this. He had been taking some chlorodyne and afterwards had
eaten enormous quantities of meat. As there was nothing to be
gained by getting anybody else soaked, we sent back the same man
to fetch him in. The night was very dark and the rain turned to
snow, still neither of them came. Fearing that on account of the
darkness they had gone astray, we popped outside and fired off our
gun at intervals; still the ammunition was wasted. Nothing but
daybreak brought them back, when it turned out that they had been
so ridiculous as to sleep in a nullah only a few yards from our camp.
They had even heard the shots, but still could not find us. Neither of
them was any the worse for the outing, in fact the result had been
beneficial, for the stomach-ache from which Shahzad Mir had been
suffering was completely cured. They caused a good deal of
merriment amongst us all, and we all thought they might have
selected a more suitable night for sleeping out of camp.

The ground was covered with snow, so it was out of the


question to think of marching early. We were rather anxious to cover
a few more miles that morning. It was the last day of the month,
and since leaving Leh we had marched very nearly a thousand miles,
and we thought we would like to start a fresh thousand on the 1st of
August. To our delight the sun made an appearance quicker than we
had anticipated and the snow was very soon thawed, allowing us to
move off again at 11 o'clock.

SHAHZAD MIR AT WORK.

The day was fine and warm, and as I went on ahead to explore,
I saw below me some grassy hillocks, and, grazing in their midst, a
fine yak. I thought it would be interesting to make a stalk just to see
how close it was possible to get without disturbing him. I walked
down the hill I was on and dodged in and out between the hillocks,
always keeping out of sight, still getting closer and closer, till at last
there was only one small hillock that separated us, not more than
half a dozen yards. But when I stood up before him and he raised
his head, for he was intent upon grazing, and saw me, his look of
utter bewilderment was most amusing to see. He was so filled with
astonishment, as the chances are he had never seen a human form
before, that it was some moments before he could collect his
thoughts sufficiently to make up his mind and be off.

Further on were many streams, forming their own course over a


very broad, sandy river bed, all swollen on account of the recent
rain. Although we were at an altitude of 16,000 feet we felt no
discomfort in taking off our boots and stockings and paddling across
and about the streams, collecting bits of stick wherewith to make a
fire for our breakfast of venison and fried onions.

To-day we were only making a single march, and in the


afternoon halted by a pool of rain-water on some high ground, well
sheltered on all sides from the wind by a number of sandy mounds.
From here we had magnificent views of the massive snow mountains
that surrounded us, looking grander than ever from the fresh supply
of snow.

In the direction we intended marching there seemed to be


abundance of water, but whether rivers or lakes were in store for us
it was impossible to make out. We rested the following morning,
enjoying the warm sunshine and the glorious scenery, and would
fain have remained there when the time came to load up and
continue our journey.

The water, about which we had been unable to make up our


minds, proved to be a large shallow salt-water lake. We found it best
to march round by the southern shore. In some places there were
tiny rivulets flowing into it, which caused us some trouble in
crossing, for the bottom and ground around was muddy. Otherwise
the going was good, and we marched on till it was almost dark.

Our men that morning had behaved in a peculiar way, for each
of them had come to make his salaam to us; not that we attached
much importance at the time to it, still it flitted across our minds that
they were becoming very faithful muleteers all at once, and perhaps
intended doing better work for the future. That evening we
impressed upon them the necessity of making double marches
again, as the last two days we had only made single, and told them
how impossible it was to march much further without meeting
somebody, and gave orders for them to commence loading at 3.30
a.m.

On waking up the following morning we found no attempt was


being made to collect the mules, and it was 5 o'clock before they
could be induced to bring them all in from grazing. Then we noticed
that muttering was going on, but no attempt at loading up. Failing to
elicit any reply from them for their conduct, I upbraided them
severely for their laziness, and told them that if the only thing they
wanted was to remain where they were and not come along with us
any more, to do so by all means, but that Malcolm, Shahzad Mir, and
myself, whatever they might choose to do, intended marching.
Thereupon they replied sullenly that they would go no further, and
hurriedly taking up their belongings from amongst a heap of
baggage, they moved off in a body in a southerly direction, and
were soon hidden from sight by the rising ground.

All this happened in a very short space of time, and fortunately,


at the moment of the dispute, Esau and Lassoo were a little way off,
busy with our things, or they too would have joined the deserters, as
they one day afterwards told us. As it was, when we began to collect
the mules again to try and load them, Lassoo was very uncertain in
his mind as to which party he should throw his lot in with. Had he
gone off with the muleteers, our difficulties would have been
doubled, for none of us had had much experience in loading mules,
and, even with it, loading a mule properly is no easy matter, whereas
Lassoo had been a muleteer, and was far handier and quicker at the
work than any of our other men. This we had already noticed, as he
often used to give them a helping hand.

It was some time before we could collect all the mules again.
Some of them seemed to know there was something up, and there
was every chance of their being deserters. One little black chap in
fact was so clever at evading our united efforts to catch him, that we
had to give him up as a bad job, and load eleven animals instead of
twelve.

We were reduced to so small a party that Shahzad Mir had to


carry the plane-table. Either Malcolm or myself, taking a mule by the
head rope, would lead the way, leaving only three to drive the mules
along and keep them together, and readjust the loads, which was
frequently necessary owing to our inexperience.

On looking over our baggage we found we had made one great


mistake; we had allowed the muleteers to go off with the twenty
remaining pounds of flour. But we had no inclination to run after
them; they might have led us a chase for days, by which time the
flour would have been eaten. What we were most anxious to do was
to let these men see that we were in a position to be independent of
their help, for we surmised they would very likely be watching us
from a distance.

We learnt afterwards that these muleteers had deserted in


accordance with a preconcerted plan, formed even before leaving
Shushal, on the Pangong Lake, when every man had sworn to follow
Ghulam Russul, whatever he might choose to do, and they had
agreed amongst themselves to leave us as soon as the rations ran
short. Furthermore, Ghulam Russul, whom they well knew had been
with Littledale on his last famous journey, had deluded them into the
idea that he could show them the way into Lhassa. They had
imagined that if they all left us, it would be impossible for us to load
up and march without them, and that we should be compelled to
remain where we were. At night-time they had planned to come and
steal our mules and ride on them to the capital.

As we moved off, we felt somewhat anxious in our minds as to


whether we should find water, grass, and droppings for our fire, for
if we met with ill luck we thought it quite probable that Esau and
Lassoo too would join the muleteers. This of course would have
been suicidal to them, as we were some 300 miles from Lhassa,
which, as far as we knew, was the nearest inhabited place, and the
exact direction of it they could not possibly have known.

We made a long march, longer than we had made for many a


day, till we came to a large salt lake, round which we had to skirt.
Everywhere grassy nullahs sloped down to it, and during the
afternoon we came to a secluded nook with a pool of fresh water,
and all around were the dried droppings of yak; evidently the place
was a favourite haunt of these animals. This was a perfect camping
ground for us, and, to prove to our two men how favourable our
kismet was, we decided to halt. We all set to work with the
unloading and watering of the mules, pitching the two little tents,
making fires, and the numerous other little jobs always connected
with making a camp. It seemed peaceful and quiet after all the
grumbling and bickering we had been accustomed to. We were close
by the edge of the lake, completely concealed in a hollow by rising
ground on all sides, and we were rather anxious that the mules
should not stray too high up and disclose our whereabouts. We
concluded that these muleteers would not have sufficient courage
and determination to march straight away, and were prepared to see
an attempt being made any moment at capturing some of the
mules.

Towards sunset we made preparations to guard against a


surprise by night. We fastened a rope to the ground, very securely,
between the two tents, to which we could picket the mules. In one
tent was Malcolm and myself, and in the other Shahzad Mir, with
Esau and Lassoo. At dusk we fastened all the mules to this rope, and
arranged for each of us to take turns in watching throughout the
night. Another advantage gained by this plan was that the animals
were ready for us to load the first thing in the morning. Had we
allowed them to stray during the night our work would have been
doubled; as it was, it took us an hour and a half to load up. We also
decided to make one long march instead of two, for being so short
handed, all our time would have been spent in catching the mules
and loading them. Besides, as we could not let them graze by night,
we should have to give them more time by day.

Soon after dark, when everything was in readiness for the night,
rain began to fall; it rained, as the saying goes, cats and dogs, such
as we had never seen it rain before. All five of us were snug, dry,
and warm in our little tents, from which we could watch the mules,
whilst the deserters must have spent a most miserable night without
any shelter and food, and the hot tea which they all loved so much.
We felt that they were being deservedly punished for their sins. Esau
and Lassoo soon realized how much they had already gained by
following us, and they swore to stick to us through thick and thin,
and this for evermore they undoubtedly did.

It rained during the greater part of the night, so that the sodden
condition of the ground put all idea of early marching out of our
head.
In order to lessen our work, and to make the marching easier for
the mules, we decided to load only ten of them, and let two always
go spare. We made a pile of the things we should not require, such
as the muleteers' big cooking pot, and their tent, etc., and left them
at the camp. By thus lightening our loads we reckoned we should be
able to march sixteen or eighteen miles a day, an astounding fact for
the muleteers, who had imagined we could not move without their
aid. We drew comparisons between the welfare of the men with us
and that of the deserters. The latter were possessors of all the flour
and most of the cooked meat and the tobacco, but no cooking pots,
while the former had three days' rice and plenty of tea, cooking
utensils, and shelter at night, an advantage they were already fully
aware of.
The one behind the table also Known as Mina Fu ie also
Known as Kao ie
CHAPTER XIII.
RETURN OF THE DESERTERS—SHUKR ALI—LONG MARCHES—DEATH OF EIGHT
MULES AND A PONY—A CHEERING REPAST.

On leaving Camp 74 on August 3rd, we had to cross an arm of


the lake, or rather to make our way round it, for the rain had made
the sand too soft to admit of our venturing on it. After marching for
some considerable time, we therefore found ourselves just opposite
our camp of the previous night, separated only by a narrow strip of
treacherous ground. When we had gone thus far we noticed
something or other moving on the crest of the high ground above
our old camp, and on closer examination, by means of our field
glasses, discovered that these moving objects were no other than
the reappearance of the deserters. Soon afterwards another one
came into sight, and then another. It struck us as highly probable
that there had been some disagreement in the party, and that they
were already beginning to taste the fruits of their crime.

We pretended to take no notice of them whatever, but rather


increased the rate of our marching, keeping the animals close and
compact, so that they might see for themselves how easy it was for
us to manage without them. We could see them steering for our last
Welcome to our website – the perfect destination for book lovers and
knowledge seekers. We believe that every book holds a new world,
offering opportunities for learning, discovery, and personal growth.
That’s why we are dedicated to bringing you a diverse collection of
books, ranging from classic literature and specialized publications to
self-development guides and children's books.

More than just a book-buying platform, we strive to be a bridge


connecting you with timeless cultural and intellectual values. With an
elegant, user-friendly interface and a smart search system, you can
quickly find the books that best suit your interests. Additionally,
our special promotions and home delivery services help you save time
and fully enjoy the joy of reading.

Join us on a journey of knowledge exploration, passion nurturing, and


personal growth every day!

ebookbell.com

You might also like