100% found this document useful (28 votes)
1K views

Full Download Hands On Machine Learning with Scikit Learn and TensorFlow Concepts Tools and Techniques to Build Intelligent Systems 1st Edition by Aurelien Geron ISBN 1491962291 9781491962299 PDF DOCX

The document provides information about various ebooks available for download on ebookball.com, focusing on machine learning and related topics. It includes titles such as 'Hands-On Machine Learning with Scikit-Learn and TensorFlow' by Aurélien Géron and 'Learning Malware Analysis' by Monnappa, among others. Each entry includes the book's ISBN, a brief description, and a link for instant download in multiple formats.

Uploaded by

astafarjb87
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (28 votes)
1K views

Full Download Hands On Machine Learning with Scikit Learn and TensorFlow Concepts Tools and Techniques to Build Intelligent Systems 1st Edition by Aurelien Geron ISBN 1491962291 9781491962299 PDF DOCX

The document provides information about various ebooks available for download on ebookball.com, focusing on machine learning and related topics. It includes titles such as 'Hands-On Machine Learning with Scikit-Learn and TensorFlow' by Aurélien Géron and 'Learning Malware Analysis' by Monnappa, among others. Each entry includes the book's ISBN, a brief description, and a link for instant download in multiple formats.

Uploaded by

astafarjb87
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 83

Explore the full ebook collection and download it now at ebookball.

com

Hands On Machine Learning with Scikit Learn and


TensorFlow Concepts Tools and Techniques to Build
Intelligent Systems 1st Edition by Aurelien Geron
ISBN 1491962291 9781491962299
https://ebookball.com/product/hands-on-machine-learning-
with-scikit-learn-and-tensorflow-concepts-tools-and-
techniques-to-build-intelligent-systems-1st-edition-by-
aurelien-geron-isbn-1491962291-9781491962299-16116/

OR CLICK HERE

DOWLOAD EBOOK

Browse and Get More Ebook Downloads Instantly at https://ebookball.com


Click here to visit ebookball.com and download ebookball now
Your digital treasures (PDF, ePub, MOBI) await
Download instantly and pick your perfect format...

Read anywhere, anytime, on any device!

Hands On Machine Learning With Scikit Learn and TensorFlow


Techniques and Tools to Build Learning Machines 3rd
Edition by OReilly Media ISBN 9781098122461 1098122461
https://ebookball.com/product/hands-on-machine-learning-with-scikit-
learn-and-tensorflow-techniques-and-tools-to-build-learning-
machines-3rd-edition-by-oreilly-media-
isbn-9781098122461-1098122461-15618/
ebookball.com

Hands On Machine Learning With Scikit Learn and TensorFlow


Techniques and Tools to Build Learning Machines 1st
Edition by Aurélien Géron 9352135210 9789352135219
https://ebookball.com/product/hands-on-machine-learning-with-scikit-
learn-and-tensorflow-techniques-and-tools-to-build-learning-
machines-1st-edition-by-aura-c-lien-ga-c-
ron-9352135210-9789352135219-16084/
ebookball.com

Learning Malware Analysis Explore the Concepts Tools and


Techniques to Analyze and Investigate Windows Malware 1st
edition by Monnappa ISBN 1788392507 978-1788392501
https://ebookball.com/product/learning-malware-analysis-explore-the-
concepts-tools-and-techniques-to-analyze-and-investigate-windows-
malware-1st-edition-by-monnappa-isbn-1788392507-978-1788392501-16514/

ebookball.com

Data Mining Practical Machine Learning Tools and


Techniques 2nd Edition by Ian Witten, Eibe Frank ISBN
0120884070 8670120884070
https://ebookball.com/product/data-mining-practical-machine-learning-
tools-and-techniques-2nd-edition-by-ian-witten-eibe-frank-
isbn-0120884070-8670120884070-17118/

ebookball.com
Hands On Machine Learning for Cybersecurity Safeguard your
system by making your machines intelligent using the
Python ecosystem 1st edition by Soma Halder, Sinan Ozdemir
9781788990967 178899096X
https://ebookball.com/product/hands-on-machine-learning-for-
cybersecurity-safeguard-your-system-by-making-your-machines-
intelligent-using-the-python-ecosystem-1st-edition-by-soma-halder-
sinan-ozdemir-9781788990967-178899096x-18658/
ebookball.com

Machine Learning for Business Analytics Concepts


Techniques and Applications with JMP Pro 2nd Edition by
Galit Shmueli, Peter Bruce, Mia Stephens, Muralidhara
Anandamurthy, Nitin Patel ISBN 9781119903857 1119903858
https://ebookball.com/product/machine-learning-for-business-analytics-
concepts-techniques-and-applications-with-jmp-pro-2nd-edition-by-
galit-shmueli-peter-bruce-mia-stephens-muralidhara-anandamurthy-nitin-
patel-isbn-9781119903857/
ebookball.com

Machine Learning for Cybersecurity Cookbook Over 80


recipes on how to implement machine learning algorithms
for building security systems using Python 1st edition by
Emmanuel Tsukerman 9781838556341 1838556346
https://ebookball.com/product/machine-learning-for-cybersecurity-
cookbook-over-80-recipes-on-how-to-implement-machine-learning-
algorithms-for-building-security-systems-using-python-1st-edition-by-
emmanuel-tsukerman-9781838556341-1/
ebookball.com

Natural Language Processing With PyTorch Build Intelligent


Language Applications Using Deep Learning 1st edition by
Delip Rao, Brian McMahan 9781491978184 149197818X
https://ebookball.com/product/natural-language-processing-with-
pytorch-build-intelligent-language-applications-using-deep-
learning-1st-edition-by-delip-rao-brian-
mcmahan-9781491978184-149197818x-18728/
ebookball.com

Machine Learning Step by Step Guide To Implement Machine


Learning Algorithms with Python 1st Edition by Rudolph
Russell ISBN 9781719528405
https://ebookball.com/product/machine-learning-step-by-step-guide-to-
implement-machine-learning-algorithms-with-python-1st-edition-by-
rudolph-russell-isbn-9781719528405-16086/

ebookball.com
Hands-On
Machine Learning
with Scikit-Learn
& TensorFlow
CONCEPTS, TOOLS, AND TECHNIQUES
TO BUILD INTELLIGENT SYSTEMS

Aurélien Géron
Hands-On Machine Learning with
Scikit-Learn and TensorFlow
Concepts, Tools, and Techniques to
Build Intelligent Systems

Aurélien Géron

Beijing Boston Farnham Sebastopol Tokyo


Hands-On Machine Learning with Scikit-Learn and TensorFlow
by Aurélien Géron
Copyright © 2017 Aurélien Géron. All rights reserved.
Printed in the United States of America.
Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472.
O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are
also available for most titles (http://oreilly.com/safari). For more information, contact our corporate/insti‐
tutional sales department: 800-998-9938 or [email protected].

Editor: Nicole Tache Indexer: Wendy Catalano


Production Editor: Nicholas Adams Interior Designer: David Futato
Copyeditor: Rachel Monaghan Cover Designer: Randy Comer
Proofreader: Charles Roumeliotis Illustrator: Rebecca Demarest

March 2017: First Edition

Revision History for the First Edition


2017-03-10: First Release

See http://oreilly.com/catalog/errata.csp?isbn=9781491962299 for release details.

The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. Hands-On Machine Learning with
Scikit-Learn and TensorFlow, the cover image, and related trade dress are trademarks of O’Reilly Media,
Inc.
While the publisher and the author have used good faith efforts to ensure that the information and
instructions contained in this work are accurate, the publisher and the author disclaim all responsibility
for errors or omissions, including without limitation responsibility for damages resulting from the use of
or reliance on this work. Use of the information and instructions contained in this work is at your own
risk. If any code samples or other technology this work contains or describes is subject to open source
licenses or the intellectual property rights of others, it is your responsibility to ensure that your use
thereof complies with such licenses and/or rights.

978-1-491-96229-9
[LSI]
Table of Contents

Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii

Part I. The Fundamentals of Machine Learning


1. The Machine Learning Landscape. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
What Is Machine Learning? 4
Why Use Machine Learning? 4
Types of Machine Learning Systems 7
Supervised/Unsupervised Learning 8
Batch and Online Learning 14
Instance-Based Versus Model-Based Learning 17
Main Challenges of Machine Learning 22
Insufficient Quantity of Training Data 22
Nonrepresentative Training Data 24
Poor-Quality Data 25
Irrelevant Features 25
Overfitting the Training Data 26
Underfitting the Training Data 28
Stepping Back 28
Testing and Validating 29
Exercises 31

2. End-to-End Machine Learning Project. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33


Working with Real Data 33
Look at the Big Picture 35
Frame the Problem 35
Select a Performance Measure 37

iii
Check the Assumptions 40
Get the Data 40
Create the Workspace 40
Download the Data 43
Take a Quick Look at the Data Structure 45
Create a Test Set 49
Discover and Visualize the Data to Gain Insights 53
Visualizing Geographical Data 53
Looking for Correlations 55
Experimenting with Attribute Combinations 58
Prepare the Data for Machine Learning Algorithms 59
Data Cleaning 60
Handling Text and Categorical Attributes 62
Custom Transformers 64
Feature Scaling 65
Transformation Pipelines 66
Select and Train a Model 68
Training and Evaluating on the Training Set 68
Better Evaluation Using Cross-Validation 69
Fine-Tune Your Model 71
Grid Search 72
Randomized Search 74
Ensemble Methods 74
Analyze the Best Models and Their Errors 74
Evaluate Your System on the Test Set 75
Launch, Monitor, and Maintain Your System 76
Try It Out! 77
Exercises 77

3. Classification. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
MNIST 79
Training a Binary Classifier 82
Performance Measures 82
Measuring Accuracy Using Cross-Validation 83
Confusion Matrix 84
Precision and Recall 86
Precision/Recall Tradeoff 87
The ROC Curve 91
Multiclass Classification 93
Error Analysis 96
Multilabel Classification 100
Multioutput Classification 101

iv | Table of Contents
Exercises 102

4. Training Models. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105


Linear Regression 106
The Normal Equation 108
Computational Complexity 110
Gradient Descent 111
Batch Gradient Descent 114
Stochastic Gradient Descent 117
Mini-batch Gradient Descent 119
Polynomial Regression 121
Learning Curves 123
Regularized Linear Models 127
Ridge Regression 127
Lasso Regression 130
Elastic Net 132
Early Stopping 133
Logistic Regression 134
Estimating Probabilities 134
Training and Cost Function 135
Decision Boundaries 136
Softmax Regression 139
Exercises 142

5. Support Vector Machines. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145


Linear SVM Classification 145
Soft Margin Classification 146
Nonlinear SVM Classification 149
Polynomial Kernel 150
Adding Similarity Features 151
Gaussian RBF Kernel 152
Computational Complexity 153
SVM Regression 154
Under the Hood 156
Decision Function and Predictions 156
Training Objective 157
Quadratic Programming 159
The Dual Problem 160
Kernelized SVM 161
Online SVMs 164
Exercises 165

Table of Contents | v
6. Decision Trees. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
Training and Visualizing a Decision Tree 167
Making Predictions 169
Estimating Class Probabilities 171
The CART Training Algorithm 171
Computational Complexity 172
Gini Impurity or Entropy? 172
Regularization Hyperparameters 173
Regression 175
Instability 177
Exercises 178

7. Ensemble Learning and Random Forests. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181


Voting Classifiers 181
Bagging and Pasting 185
Bagging and Pasting in Scikit-Learn 186
Out-of-Bag Evaluation 187
Random Patches and Random Subspaces 188
Random Forests 189
Extra-Trees 190
Feature Importance 190
Boosting 191
AdaBoost 192
Gradient Boosting 195
Stacking 200
Exercises 202

8. Dimensionality Reduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205


The Curse of Dimensionality 206
Main Approaches for Dimensionality Reduction 207
Projection 207
Manifold Learning 210
PCA 211
Preserving the Variance 211
Principal Components 212
Projecting Down to d Dimensions 213
Using Scikit-Learn 214
Explained Variance Ratio 214
Choosing the Right Number of Dimensions 215
PCA for Compression 216
Incremental PCA 217
Randomized PCA 218

vi | Table of Contents
Kernel PCA 218
Selecting a Kernel and Tuning Hyperparameters 219
LLE 221
Other Dimensionality Reduction Techniques 223
Exercises 224

Part II. Neural Networks and Deep Learning


9. Up and Running with TensorFlow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
Installation 232
Creating Your First Graph and Running It in a Session 232
Managing Graphs 234
Lifecycle of a Node Value 235
Linear Regression with TensorFlow 235
Implementing Gradient Descent 237
Manually Computing the Gradients 237
Using autodiff 238
Using an Optimizer 239
Feeding Data to the Training Algorithm 239
Saving and Restoring Models 241
Visualizing the Graph and Training Curves Using TensorBoard 242
Name Scopes 245
Modularity 246
Sharing Variables 248
Exercises 251

10. Introduction to Artificial Neural Networks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253


From Biological to Artificial Neurons 254
Biological Neurons 255
Logical Computations with Neurons 256
The Perceptron 257
Multi-Layer Perceptron and Backpropagation 261
Training an MLP with TensorFlow’s High-Level API 264
Training a DNN Using Plain TensorFlow 265
Construction Phase 265
Execution Phase 269
Using the Neural Network 270
Fine-Tuning Neural Network Hyperparameters 270
Number of Hidden Layers 270
Number of Neurons per Hidden Layer 272
Activation Functions 272

Table of Contents | vii


Exercises 273

11. Training Deep Neural Nets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275


Vanishing/Exploding Gradients Problems 275
Xavier and He Initialization 277
Nonsaturating Activation Functions 279
Batch Normalization 282
Gradient Clipping 286
Reusing Pretrained Layers 286
Reusing a TensorFlow Model 287
Reusing Models from Other Frameworks 288
Freezing the Lower Layers 289
Caching the Frozen Layers 290
Tweaking, Dropping, or Replacing the Upper Layers 290
Model Zoos 291
Unsupervised Pretraining 291
Pretraining on an Auxiliary Task 292
Faster Optimizers 293
Momentum optimization 294
Nesterov Accelerated Gradient 295
AdaGrad 296
RMSProp 298
Adam Optimization 298
Learning Rate Scheduling 300
Avoiding Overfitting Through Regularization 302
Early Stopping 303
ℓ1 and ℓ2 Regularization 303
Dropout 304
Max-Norm Regularization 307
Data Augmentation 309
Practical Guidelines 310
Exercises 311

12. Distributing TensorFlow Across Devices and Servers. . . . . . . . . . . . . . . . . . . . . . . . . . . 313


Multiple Devices on a Single Machine 314
Installation 314
Managing the GPU RAM 317
Placing Operations on Devices 318
Parallel Execution 321
Control Dependencies 323
Multiple Devices Across Multiple Servers 323
Opening a Session 325

viii | Table of Contents


The Master and Worker Services 325
Pinning Operations Across Tasks 326
Sharding Variables Across Multiple Parameter Servers 327
Sharing State Across Sessions Using Resource Containers 328
Asynchronous Communication Using TensorFlow Queues 329
Loading Data Directly from the Graph 335
Parallelizing Neural Networks on a TensorFlow Cluster 342
One Neural Network per Device 342
In-Graph Versus Between-Graph Replication 343
Model Parallelism 345
Data Parallelism 347
Exercises 352

13. Convolutional Neural Networks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353


The Architecture of the Visual Cortex 354
Convolutional Layer 355
Filters 357
Stacking Multiple Feature Maps 358
TensorFlow Implementation 360
Memory Requirements 362
Pooling Layer 363
CNN Architectures 365
LeNet-5 366
AlexNet 367
GoogLeNet 368
ResNet 372
Exercises 376

14. Recurrent Neural Networks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 379


Recurrent Neurons 380
Memory Cells 382
Input and Output Sequences 382
Basic RNNs in TensorFlow 384
Static Unrolling Through Time 385
Dynamic Unrolling Through Time 387
Handling Variable Length Input Sequences 387
Handling Variable-Length Output Sequences 388
Training RNNs 389
Training a Sequence Classifier 389
Training to Predict Time Series 392
Creative RNN 396
Deep RNNs 396

Table of Contents | ix
Distributing a Deep RNN Across Multiple GPUs 397
Applying Dropout 399
The Difficulty of Training over Many Time Steps 400
LSTM Cell 401
Peephole Connections 403
GRU Cell 404
Natural Language Processing 405
Word Embeddings 405
An Encoder–Decoder Network for Machine Translation 407
Exercises 410

15. Autoencoders. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 411


Efficient Data Representations 412
Performing PCA with an Undercomplete Linear Autoencoder 413
Stacked Autoencoders 415
TensorFlow Implementation 416
Tying Weights 417
Training One Autoencoder at a Time 418
Visualizing the Reconstructions 420
Visualizing Features 421
Unsupervised Pretraining Using Stacked Autoencoders 422
Denoising Autoencoders 424
TensorFlow Implementation 425
Sparse Autoencoders 426
TensorFlow Implementation 427
Variational Autoencoders 428
Generating Digits 431
Other Autoencoders 432
Exercises 433

16. Reinforcement Learning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 437


Learning to Optimize Rewards 438
Policy Search 440
Introduction to OpenAI Gym 441
Neural Network Policies 444
Evaluating Actions: The Credit Assignment Problem 447
Policy Gradients 448
Markov Decision Processes 453
Temporal Difference Learning and Q-Learning 457
Exploration Policies 459
Approximate Q-Learning 460
Learning to Play Ms. Pac-Man Using Deep Q-Learning 460

x | Table of Contents
Exercises 469
Thank You! 470

A. Exercise Solutions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 471

B. Machine Learning Project Checklist. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 497

C. SVM Dual Problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 503

D. Autodiff. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 507

E. Other Popular ANN Architectures. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 515

Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 525

Table of Contents | xi
Preface

The Machine Learning Tsunami


In 2006, Geoffrey Hinton et al. published a paper1 showing how to train a deep neural
network capable of recognizing handwritten digits with state-of-the-art precision
(>98%). They branded this technique “Deep Learning.” Training a deep neural net
was widely considered impossible at the time,2 and most researchers had abandoned
the idea since the 1990s. This paper revived the interest of the scientific community
and before long many new papers demonstrated that Deep Learning was not only
possible, but capable of mind-blowing achievements that no other Machine Learning
(ML) technique could hope to match (with the help of tremendous computing power
and great amounts of data). This enthusiasm soon extended to many other areas of
Machine Learning.
Fast-forward 10 years and Machine Learning has conquered the industry: it is now at
the heart of much of the magic in today’s high-tech products, ranking your web
search results, powering your smartphone’s speech recognition, and recommending
videos, beating the world champion at the game of Go. Before you know it, it will be
driving your car.

Machine Learning in Your Projects


So naturally you are excited about Machine Learning and you would love to join the
party!
Perhaps you would like to give your homemade robot a brain of its own? Make it rec‐
ognize faces? Or learn to walk around?

1 Available on Hinton’s home page at http://www.cs.toronto.edu/~hinton/.


2 Despite the fact that Yann Lecun’s deep convolutional neural networks had worked well for image recognition
since the 1990s, although they were not as general purpose.

xiii
Or maybe your company has tons of data (user logs, financial data, production data,
machine sensor data, hotline stats, HR reports, etc.), and more than likely you could
unearth some hidden gems if you just knew where to look; for example:

• Segment customers and find the best marketing strategy for each group
• Recommend products for each client based on what similar clients bought
• Detect which transactions are likely to be fraudulent
• Predict next year’s revenue
• And more

Whatever the reason, you have decided to learn Machine Learning and implement it
in your projects. Great idea!

Objective and Approach


This book assumes that you know close to nothing about Machine Learning. Its goal
is to give you the concepts, the intuitions, and the tools you need to actually imple‐
ment programs capable of learning from data.
We will cover a large number of techniques, from the simplest and most commonly
used (such as linear regression) to some of the Deep Learning techniques that regu‐
larly win competitions.
Rather than implementing our own toy versions of each algorithm, we will be using
actual production-ready Python frameworks:

• Scikit-Learn is very easy to use, yet it implements many Machine Learning algo‐
rithms efficiently, so it makes for a great entry point to learn Machine Learning.
• TensorFlow is a more complex library for distributed numerical computation
using data flow graphs. It makes it possible to train and run very large neural net‐
works efficiently by distributing the computations across potentially thousands
of multi-GPU servers. TensorFlow was created at Google and supports many of
their large-scale Machine Learning applications. It was open-sourced in Novem‐
ber 2015.

The book favors a hands-on approach, growing an intuitive understanding of


Machine Learning through concrete working examples and just a little bit of theory.
While you can read this book without picking up your laptop, we highly recommend
you experiment with the code examples available online as Jupyter notebooks at
https://github.com/ageron/handson-ml.

xiv | Preface
Prerequisites
This book assumes that you have some Python programming experience and that you
are familiar with Python’s main scientific libraries, in particular NumPy, Pandas, and
Matplotlib.
Also, if you care about what’s under the hood you should have a reasonable under‐
standing of college-level math as well (calculus, linear algebra, probabilities, and sta‐
tistics).
If you don’t know Python yet, http://learnpython.org/ is a great place to start. The offi‐
cial tutorial on python.org is also quite good.
If you have never used Jupyter, Chapter 2 will guide you through installation and the
basics: it is a great tool to have in your toolbox.
If you are not familiar with Python’s scientific libraries, the provided Jupyter note‐
books include a few tutorials. There is also a quick math tutorial for linear algebra.

Roadmap
This book is organized in two parts. Part I, The Fundamentals of Machine Learning,
covers the following topics:

• What is Machine Learning? What problems does it try to solve? What are the
main categories and fundamental concepts of Machine Learning systems?
• The main steps in a typical Machine Learning project.
• Learning by fitting a model to data.
• Optimizing a cost function.
• Handling, cleaning, and preparing data.
• Selecting and engineering features.
• Selecting a model and tuning hyperparameters using cross-validation.
• The main challenges of Machine Learning, in particular underfitting and overfit‐
ting (the bias/variance tradeoff).
• Reducing the dimensionality of the training data to fight the curse of dimension‐
ality.
• The most common learning algorithms: Linear and Polynomial Regression,
Logistic Regression, k-Nearest Neighbors, Support Vector Machines, Decision
Trees, Random Forests, and Ensemble methods.

Preface | xv
Part II, Neural Networks and Deep Learning, covers the following topics:

• What are neural nets? What are they good for?


• Building and training neural nets using TensorFlow.
• The most important neural net architectures: feedforward neural nets, convolu‐
tional nets, recurrent nets, long short-term memory (LSTM) nets, and autoen‐
coders.
• Techniques for training deep neural nets.
• Scaling neural networks for huge datasets.
• Reinforcement learning.

The first part is based mostly on Scikit-Learn while the second part uses TensorFlow.

Don’t jump into deep waters too hastily: while Deep Learning is no
doubt one of the most exciting areas in Machine Learning, you
should master the fundamentals first. Moreover, most problems
can be solved quite well using simpler techniques such as Random
Forests and Ensemble methods (discussed in Part I). Deep Learn‐
ing is best suited for complex problems such as image recognition,
speech recognition, or natural language processing, provided you
have enough data, computing power, and patience.

Other Resources
Many resources are available to learn about Machine Learning. Andrew Ng’s ML
course on Coursera and Geoffrey Hinton’s course on neural networks and Deep
Learning are amazing, although they both require a significant time investment
(think months).
There are also many interesting websites about Machine Learning, including of
course Scikit-Learn’s exceptional User Guide. You may also enjoy Dataquest, which
provides very nice interactive tutorials, and ML blogs such as those listed on Quora.
Finally, the Deep Learning website has a good list of resources to learn more.
Of course there are also many other introductory books about Machine Learning, in
particular:

• Joel Grus, Data Science from Scratch (O’Reilly). This book presents the funda‐
mentals of Machine Learning, and implements some of the main algorithms in
pure Python (from scratch, as the name suggests).
• Stephen Marsland, Machine Learning: An Algorithmic Perspective (Chapman and
Hall). This book is a great introduction to Machine Learning, covering a wide

xvi | Preface
range of topics in depth, with code examples in Python (also from scratch, but
using NumPy).
• Sebastian Raschka, Python Machine Learning (Packt Publishing). Also a great
introduction to Machine Learning, this book leverages Python open source libra‐
ries (Pylearn 2 and Theano).
• Yaser S. Abu-Mostafa, Malik Magdon-Ismail, and Hsuan-Tien Lin, Learning from
Data (AMLBook). A rather theoretical approach to ML, this book provides deep
insights, in particular on the bias/variance tradeoff (see Chapter 4).
• Stuart Russell and Peter Norvig, Artificial Intelligence: A Modern Approach, 3rd
Edition (Pearson). This is a great (and huge) book covering an incredible amount
of topics, including Machine Learning. It helps put ML into perspective.

Finally, a great way to learn is to join ML competition websites such as Kaggle.com


this will allow you to practice your skills on real-world problems, with help and
insights from some of the best ML professionals out there.

Conventions Used in This Book


The following typographical conventions are used in this book:
Italic
Indicates new terms, URLs, email addresses, filenames, and file extensions.
Constant width
Used for program listings, as well as within paragraphs to refer to program ele‐
ments such as variable or function names, databases, data types, environment
variables, statements and keywords.
Constant width bold
Shows commands or other text that should be typed literally by the user.
Constant width italic
Shows text that should be replaced with user-supplied values or by values deter‐
mined by context.

This element signifies a tip or suggestion.

Preface | xvii
This element signifies a general note.

This element indicates a warning or caution.

Using Code Examples


Supplemental material (code examples, exercises, etc.) is available for download at
https://github.com/ageron/handson-ml.
This book is here to help you get your job done. In general, if example code is offered
with this book, you may use it in your programs and documentation. You do not
need to contact us for permission unless you’re reproducing a significant portion of
the code. For example, writing a program that uses several chunks of code from this
book does not require permission. Selling or distributing a CD-ROM of examples
from O’Reilly books does require permission. Answering a question by citing this
book and quoting example code does not require permission. Incorporating a signifi‐
cant amount of example code from this book into your product’s documentation does
require permission.
We appreciate, but do not require, attribution. An attribution usually includes the
title, author, publisher, and ISBN. For example: “Hands-On Machine Learning with
Scikit-Learn and TensorFlow by Aurélien Géron (O’Reilly). Copyright 2017 Aurélien
Géron, 978-1-491-96229-9.”
If you feel your use of code examples falls outside fair use or the permission given
above, feel free to contact us at [email protected].

O’Reilly Safari
Safari (formerly Safari Books Online) is a membership-based
training and reference platform for enterprise, government,
educators, and individuals.

Members have access to thousands of books, training videos, Learning Paths, interac‐
tive tutorials, and curated playlists from over 250 publishers, including O’Reilly
Media, Harvard Business Review, Prentice Hall Professional, Addison-Wesley Profes‐
sional, Microsoft Press, Sams, Que, Peachpit Press, Adobe, Focal Press, Cisco Press,

xviii | Preface
John Wiley & Sons, Syngress, Morgan Kaufmann, IBM Redbooks, Packt, Adobe
Press, FT Press, Apress, Manning, New Riders, McGraw-Hill, Jones & Bartlett, and
Course Technology, among others.
For more information, please visit http://oreilly.com/safari.

How to Contact Us
Please address comments and questions concerning this book to the publisher:

O’Reilly Media, Inc.


1005 Gravenstein Highway North
Sebastopol, CA 95472
800-998-9938 (in the United States or Canada)
707-829-0515 (international or local)
707-829-0104 (fax)

We have a web page for this book, where we list errata, examples, and any additional
information. You can access this page at http://bit.ly/hands-on-machine-learning-
with-scikit-learn-and-tensorflow.
To comment or ask technical questions about this book, send email to bookques‐
[email protected].
For more information about our books, courses, conferences, and news, see our web‐
site at http://www.oreilly.com.
Find us on Facebook: http://facebook.com/oreilly
Follow us on Twitter: http://twitter.com/oreillymedia
Watch us on YouTube: http://www.youtube.com/oreillymedia

Acknowledgments
I would like to thank my Google colleagues, in particular the YouTube video classifi‐
cation team, for teaching me so much about Machine Learning. I could never have
started this project without them. Special thanks to my personal ML gurus: Clément
Courbet, Julien Dubois, Mathias Kende, Daniel Kitachewsky, James Pack, Alexander
Pak, Anosh Raj, Vitor Sessak, Wiktor Tomczak, Ingrid von Glehn, Rich Washington,
and everyone at YouTube Paris.
I am incredibly grateful to all the amazing people who took time out of their busy
lives to review my book in so much detail. Thanks to Pete Warden for answering all
my TensorFlow questions, reviewing Part II, providing many interesting insights, and
of course for being part of the core TensorFlow team. You should definitely check out

Preface | xix
his blog! Many thanks to Lukas Biewald for his very thorough review of Part II: he left
no stone unturned, tested all the code (and caught a few errors), made many great
suggestions, and his enthusiasm was contagious. You should check out his blog and
his cool robots! Thanks to Justin Francis, who also reviewed Part II very thoroughly,
catching errors and providing great insights, in particular in Chapter 16. Check out
his posts on TensorFlow!
Huge thanks as well to David Andrzejewski, who reviewed Part I and provided
incredibly useful feedback, identifying unclear sections and suggesting how to
improve them. Check out his website! Thanks to Grégoire Mesnil, who reviewed
Part II and contributed very interesting practical advice on training neural networks.
Thanks as well to Eddy Hung, Salim Sémaoune, Karim Matrah, Ingrid von Glehn,
Iain Smears, and Vincent Guilbeau for reviewing Part I and making many useful sug‐
gestions. And I also wish to thank my father-in-law, Michel Tessier, former mathe‐
matics teacher and now a great translator of Anton Chekhov, for helping me iron out
some of the mathematics and notations in this book and reviewing the linear algebra
Jupyter notebook.
And of course, a gigantic “thank you” to my dear brother Sylvain, who reviewed every
single chapter, tested every line of code, provided feedback on virtually every section,
and encouraged me from the first line to the last. Love you, bro!
Many thanks as well to O’Reilly’s fantastic staff, in particular Nicole Tache, who gave
me insightful feedback, always cheerful, encouraging, and helpful. Thanks as well to
Marie Beaugureau, Ben Lorica, Mike Loukides, and Laurel Ruma for believing in this
project and helping me define its scope. Thanks to Matt Hacker and all of the Atlas
team for answering all my technical questions regarding formatting, asciidoc, and
LaTeX, and thanks to Rachel Monaghan, Nick Adams, and all of the production team
for their final review and their hundreds of corrections.
Last but not least, I am infinitely grateful to my beloved wife, Emmanuelle, and to our
three wonderful kids, Alexandre, Rémi, and Gabrielle, for encouraging me to work
hard on this book, asking many questions (who said you can’t teach neural networks
to a seven-year-old?), and even bringing me cookies and coffee. What more can one
dream of?

xx | Preface
PART I
The Fundamentals of
Machine Learning
CHAPTER 1
The Machine Learning Landscape

When most people hear “Machine Learning,” they picture a robot: a dependable but‐
ler or a deadly Terminator depending on who you ask. But Machine Learning is not
just a futuristic fantasy, it’s already here. In fact, it has been around for decades in
some specialized applications, such as Optical Character Recognition (OCR). But the
first ML application that really became mainstream, improving the lives of hundreds
of millions of people, took over the world back in the 1990s: it was the spam filter.
Not exactly a self-aware Skynet, but it does technically qualify as Machine Learning
(it has actually learned so well that you seldom need to flag an email as spam any‐
more). It was followed by hundreds of ML applications that now quietly power hun‐
dreds of products and features that you use regularly, from better recommendations
to voice search.
Where does Machine Learning start and where does it end? What exactly does it
mean for a machine to learn something? If I download a copy of Wikipedia, has my
computer really “learned” something? Is it suddenly smarter? In this chapter we will
start by clarifying what Machine Learning is and why you may want to use it.
Then, before we set out to explore the Machine Learning continent, we will take a
look at the map and learn about the main regions and the most notable landmarks:
supervised versus unsupervised learning, online versus batch learning, instance-
based versus model-based learning. Then we will look at the workflow of a typical ML
project, discuss the main challenges you may face, and cover how to evaluate and
fine-tune a Machine Learning system.
This chapter introduces a lot of fundamental concepts (and jargon) that every data
scientist should know by heart. It will be a high-level overview (the only chapter
without much code), all rather simple, but you should make sure everything is
crystal-clear to you before continuing to the rest of the book. So grab a coffee and let’s
get started!

3
If you already know all the Machine Learning basics, you may want
to skip directly to Chapter 2. If you are not sure, try to answer all
the questions listed at the end of the chapter before moving on.

What Is Machine Learning?


Machine Learning is the science (and art) of programming computers so they can
learn from data.
Here is a slightly more general definition:
[Machine Learning is the] field of study that gives computers the ability to learn
without being explicitly programmed.
—Arthur Samuel, 1959

And a more engineering-oriented one:


A computer program is said to learn from experience E with respect to some task T
and some performance measure P, if its performance on T, as measured by P, improves
with experience E.
—Tom Mitchell, 1997

For example, your spam filter is a Machine Learning program that can learn to flag
spam given examples of spam emails (e.g., flagged by users) and examples of regular
(nonspam, also called “ham”) emails. The examples that the system uses to learn are
called the training set. Each training example is called a training instance (or sample).
In this case, the task T is to flag spam for new emails, the experience E is the training
data, and the performance measure P needs to be defined; for example, you can use
the ratio of correctly classified emails. This particular performance measure is called
accuracy and it is often used in classification tasks.
If you just download a copy of Wikipedia, your computer has a lot more data, but it is
not suddenly better at any task. Thus, it is not Machine Learning.

Why Use Machine Learning?


Consider how you would write a spam filter using traditional programming techni‐
ques (Figure 1-1):

1. First you would look at what spam typically looks like. You might notice that
some words or phrases (such as “4U,” “credit card,” “free,” and “amazing”) tend to
come up a lot in the subject. Perhaps you would also notice a few other patterns
in the sender’s name, the email’s body, and so on.

4 | Chapter 1: The Machine Learning Landscape


2. You would write a detection algorithm for each of the patterns that you noticed,
and your program would flag emails as spam if a number of these patterns are
detected.
3. You would test your program, and repeat steps 1 and 2 until it is good enough.

Figure 1-1. The traditional approach

Since the problem is not trivial, your program will likely become a long list of com‐
plex rules—pretty hard to maintain.
In contrast, a spam filter based on Machine Learning techniques automatically learns
which words and phrases are good predictors of spam by detecting unusually fre‐
quent patterns of words in the spam examples compared to the ham examples
(Figure 1-2). The program is much shorter, easier to maintain, and most likely more
accurate.

Figure 1-2. Machine Learning approach

Why Use Machine Learning? | 5


Moreover, if spammers notice that all their emails containing “4U” are blocked, they
might start writing “For U” instead. A spam filter using traditional programming
techniques would need to be updated to flag “For U” emails. If spammers keep work‐
ing around your spam filter, you will need to keep writing new rules forever.
In contrast, a spam filter based on Machine Learning techniques automatically noti‐
ces that “For U” has become unusually frequent in spam flagged by users, and it starts
flagging them without your intervention (Figure 1-3).

Figure 1-3. Automatically adapting to change

Another area where Machine Learning shines is for problems that either are too com‐
plex for traditional approaches or have no known algorithm. For example, consider
speech recognition: say you want to start simple and write a program capable of dis‐
tinguishing the words “one” and “two.” You might notice that the word “two” starts
with a high-pitch sound (“T”), so you could hardcode an algorithm that measures
high-pitch sound intensity and use that to distinguish ones and twos. Obviously this
technique will not scale to thousands of words spoken by millions of very different
people in noisy environments and in dozens of languages. The best solution (at least
today) is to write an algorithm that learns by itself, given many example recordings
for each word.
Finally, Machine Learning can help humans learn (Figure 1-4): ML algorithms can be
inspected to see what they have learned (although for some algorithms this can be
tricky). For instance, once the spam filter has been trained on enough spam, it can
easily be inspected to reveal the list of words and combinations of words that it
believes are the best predictors of spam. Sometimes this will reveal unsuspected cor‐
relations or new trends, and thereby lead to a better understanding of the problem.
Applying ML techniques to dig into large amounts of data can help discover patterns
that were not immediately apparent. This is called data mining.

6 | Chapter 1: The Machine Learning Landscape


Figure 1-4. Machine Learning can help humans learn

To summarize, Machine Learning is great for:

• Problems for which existing solutions require a lot of hand-tuning or long lists of
rules: one Machine Learning algorithm can often simplify code and perform bet‐
ter.
• Complex problems for which there is no good solution at all using a traditional
approach: the best Machine Learning techniques can find a solution.
• Fluctuating environments: a Machine Learning system can adapt to new data.
• Getting insights about complex problems and large amounts of data.

Types of Machine Learning Systems


There are so many different types of Machine Learning systems that it is useful to
classify them in broad categories based on:

• Whether or not they are trained with human supervision (supervised, unsuper‐
vised, semisupervised, and Reinforcement Learning)
• Whether or not they can learn incrementally on the fly (online versus batch
learning)
• Whether they work by simply comparing new data points to known data points,
or instead detect patterns in the training data and build a predictive model, much
like scientists do (instance-based versus model-based learning)

These criteria are not exclusive; you can combine them in any way you like. For
example, a state-of-the-art spam filter may learn on the fly using a deep neural net‐

Types of Machine Learning Systems | 7


work model trained using examples of spam and ham; this makes it an online, model-
based, supervised learning system.
Let’s look at each of these criteria a bit more closely.

Supervised/Unsupervised Learning
Machine Learning systems can be classified according to the amount and type of
supervision they get during training. There are four major categories: supervised
learning, unsupervised learning, semisupervised learning, and Reinforcement Learn‐
ing.

Supervised learning
In supervised learning, the training data you feed to the algorithm includes the desired
solutions, called labels (Figure 1-5).

Figure 1-5. A labeled training set for supervised learning (e.g., spam classification)

A typical supervised learning task is classification. The spam filter is a good example
of this: it is trained with many example emails along with their class (spam or ham),
and it must learn how to classify new emails.
Another typical task is to predict a target numeric value, such as the price of a car,
given a set of features (mileage, age, brand, etc.) called predictors. This sort of task is
called regression (Figure 1-6).1 To train the system, you need to give it many examples
of cars, including both their predictors and their labels (i.e., their prices).

1 Fun fact: this odd-sounding name is a statistics term introduced by Francis Galton while he was studying the
fact that the children of tall people tend to be shorter than their parents. Since children were shorter, he called
this regression to the mean. This name was then applied to the methods he used to analyze correlations
between variables.

8 | Chapter 1: The Machine Learning Landscape


In Machine Learning an attribute is a data type (e.g., “Mileage”),
while a feature has several meanings depending on the context, but
generally means an attribute plus its value (e.g., “Mileage =
15,000”). Many people use the words attribute and feature inter‐
changeably, though.

Figure 1-6. Regression

Note that some regression algorithms can be used for classification as well, and vice
versa. For example, Logistic Regression is commonly used for classification, as it can
output a value that corresponds to the probability of belonging to a given class (e.g.,
20% chance of being spam).
Here are some of the most important supervised learning algorithms (covered in this
book):

• k-Nearest Neighbors
• Linear Regression
• Logistic Regression
• Support Vector Machines (SVMs)
• Decision Trees and Random Forests
• Neural networks2

2 Some neural network architectures can be unsupervised, such as autoencoders and restricted Boltzmann
machines. They can also be semisupervised, such as in deep belief networks and unsupervised pretraining.

Types of Machine Learning Systems | 9


Unsupervised learning
In unsupervised learning, as you might guess, the training data is unlabeled
(Figure 1-7). The system tries to learn without a teacher.

Figure 1-7. An unlabeled training set for unsupervised learning

Here are some of the most important unsupervised learning algorithms (we will
cover dimensionality reduction in Chapter 8):

• Clustering
— k-Means
— Hierarchical Cluster Analysis (HCA)
— Expectation Maximization
• Visualization and dimensionality reduction
— Principal Component Analysis (PCA)
— Kernel PCA
— Locally-Linear Embedding (LLE)
— t-distributed Stochastic Neighbor Embedding (t-SNE)
• Association rule learning
— Apriori
— Eclat

For example, say you have a lot of data about your blog’s visitors. You may want to
run a clustering algorithm to try to detect groups of similar visitors (Figure 1-8). At
no point do you tell the algorithm which group a visitor belongs to: it finds those
connections without your help. For example, it might notice that 40% of your visitors
are males who love comic books and generally read your blog in the evening, while
20% are young sci-fi lovers who visit during the weekends, and so on. If you use a
hierarchical clustering algorithm, it may also subdivide each group into smaller
groups. This may help you target your posts for each group.

10 | Chapter 1: The Machine Learning Landscape


Figure 1-8. Clustering

Visualization algorithms are also good examples of unsupervised learning algorithms:


you feed them a lot of complex and unlabeled data, and they output a 2D or 3D rep‐
resentation of your data that can easily be plotted (Figure 1-9). These algorithms try
to preserve as much structure as they can (e.g., trying to keep separate clusters in the
input space from overlapping in the visualization), so you can understand how the
data is organized and perhaps identify unsuspected patterns.

Figure 1-9. Example of a t-SNE visualization highlighting semantic clusters3

3 Notice how animals are rather well separated from vehicles, how horses are close to deer but far from birds,
and so on. Figure reproduced with permission from Socher, Ganjoo, Manning, and Ng (2013), “T-SNE visual‐
ization of the semantic word space.”

Types of Machine Learning Systems | 11


A related task is dimensionality reduction, in which the goal is to simplify the data
without losing too much information. One way to do this is to merge several correla‐
ted features into one. For example, a car’s mileage may be very correlated with its age,
so the dimensionality reduction algorithm will merge them into one feature that rep‐
resents the car’s wear and tear. This is called feature extraction.

It is often a good idea to try to reduce the dimension of your train‐


ing data using a dimensionality reduction algorithm before you
feed it to another Machine Learning algorithm (such as a super‐
vised learning algorithm). It will run much faster, the data will take
up less disk and memory space, and in some cases it may also per‐
form better.

Yet another important unsupervised task is anomaly detection—for example, detect‐


ing unusual credit card transactions to prevent fraud, catching manufacturing defects,
or automatically removing outliers from a dataset before feeding it to another learn‐
ing algorithm. The system is trained with normal instances, and when it sees a new
instance it can tell whether it looks like a normal one or whether it is likely an anom‐
aly (see Figure 1-10).

Figure 1-10. Anomaly detection

Finally, another common unsupervised task is association rule learning, in which the
goal is to dig into large amounts of data and discover interesting relations between
attributes. For example, suppose you own a supermarket. Running an association rule
on your sales logs may reveal that people who purchase barbecue sauce and potato
chips also tend to buy steak. Thus, you may want to place these items close to each
other.

12 | Chapter 1: The Machine Learning Landscape


Semisupervised learning
Some algorithms can deal with partially labeled training data, usually a lot of unla‐
beled data and a little bit of labeled data. This is called semisupervised learning
(Figure 1-11).
Some photo-hosting services, such as Google Photos, are good examples of this. Once
you upload all your family photos to the service, it automatically recognizes that the
same person A shows up in photos 1, 5, and 11, while another person B shows up in
photos 2, 5, and 7. This is the unsupervised part of the algorithm (clustering). Now all
the system needs is for you to tell it who these people are. Just one label per person,4
and it is able to name everyone in every photo, which is useful for searching photos.

Figure 1-11. Semisupervised learning

Most semisupervised learning algorithms are combinations of unsupervised and


supervised algorithms. For example, deep belief networks (DBNs) are based on unsu‐
pervised components called restricted Boltzmann machines (RBMs) stacked on top of
one another. RBMs are trained sequentially in an unsupervised manner, and then the
whole system is fine-tuned using supervised learning techniques.

Reinforcement Learning
Reinforcement Learning is a very different beast. The learning system, called an agent
in this context, can observe the environment, select and perform actions, and get
rewards in return (or penalties in the form of negative rewards, as in Figure 1-12). It
must then learn by itself what is the best strategy, called a policy, to get the most
reward over time. A policy defines what action the agent should choose when it is in a
given situation.

4 That’s when the system works perfectly. In practice it often creates a few clusters per person, and sometimes
mixes up two people who look alike, so you need to provide a few labels per person and manually clean up
some clusters.

Types of Machine Learning Systems | 13


Figure 1-12. Reinforcement Learning

For example, many robots implement Reinforcement Learning algorithms to learn


how to walk. DeepMind’s AlphaGo program is also a good example of Reinforcement
Learning: it made the headlines in March 2016 when it beat the world champion Lee
Sedol at the game of Go. It learned its winning policy by analyzing millions of games,
and then playing many games against itself. Note that learning was turned off during
the games against the champion; AlphaGo was just applying the policy it had learned.

Batch and Online Learning


Another criterion used to classify Machine Learning systems is whether or not the
system can learn incrementally from a stream of incoming data.

Batch learning
In batch learning, the system is incapable of learning incrementally: it must be trained
using all the available data. This will generally take a lot of time and computing
resources, so it is typically done offline. First the system is trained, and then it is
launched into production and runs without learning anymore; it just applies what it
has learned. This is called offline learning.
If you want a batch learning system to know about new data (such as a new type of
spam), you need to train a new version of the system from scratch on the full dataset
(not just the new data, but also the old data), then stop the old system and replace it
with the new one.
Fortunately, the whole process of training, evaluating, and launching a Machine
Learning system can be automated fairly easily (as shown in Figure 1-3), so even a

14 | Chapter 1: The Machine Learning Landscape


batch learning system can adapt to change. Simply update the data and train a new
version of the system from scratch as often as needed.
This solution is simple and often works fine, but training using the full set of data can
take many hours, so you would typically train a new system only every 24 hours or
even just weekly. If your system needs to adapt to rapidly changing data (e.g., to pre‐
dict stock prices), then you need a more reactive solution.
Also, training on the full set of data requires a lot of computing resources (CPU,
memory space, disk space, disk I/O, network I/O, etc.). If you have a lot of data and
you automate your system to train from scratch every day, it will end up costing you a
lot of money. If the amount of data is huge, it may even be impossible to use a batch
learning algorithm.
Finally, if your system needs to be able to learn autonomously and it has limited
resources (e.g., a smartphone application or a rover on Mars), then carrying around
large amounts of training data and taking up a lot of resources to train for hours
every day is a showstopper.
Fortunately, a better option in all these cases is to use algorithms that are capable of
learning incrementally.

Online learning
In online learning, you train the system incrementally by feeding it data instances
sequentially, either individually or by small groups called mini-batches. Each learning
step is fast and cheap, so the system can learn about new data on the fly, as it arrives
(see Figure 1-13).

Figure 1-13. Online learning

Online learning is great for systems that receive data as a continuous flow (e.g., stock
prices) and need to adapt to change rapidly or autonomously. It is also a good option

Types of Machine Learning Systems | 15


if you have limited computing resources: once an online learning system has learned
about new data instances, it does not need them anymore, so you can discard them
(unless you want to be able to roll back to a previous state and “replay” the data). This
can save a huge amount of space.
Online learning algorithms can also be used to train systems on huge datasets that
cannot fit in one machine’s main memory (this is called out-of-core learning). The
algorithm loads part of the data, runs a training step on that data, and repeats the
process until it has run on all of the data (see Figure 1-14).

This whole process is usually done offline (i.e., not on the live sys‐
tem), so online learning can be a confusing name. Think of it as
incremental learning.

Figure 1-14. Using online learning to handle huge datasets

One important parameter of online learning systems is how fast they should adapt to
changing data: this is called the learning rate. If you set a high learning rate, then your
system will rapidly adapt to new data, but it will also tend to quickly forget the old
data (you don’t want a spam filter to flag only the latest kinds of spam it was shown).
Conversely, if you set a low learning rate, the system will have more inertia; that is, it
will learn more slowly, but it will also be less sensitive to noise in the new data or to
sequences of nonrepresentative data points.
A big challenge with online learning is that if bad data is fed to the system, the sys‐
tem’s performance will gradually decline. If we are talking about a live system, your
clients will notice. For example, bad data could come from a malfunctioning sensor
on a robot, or from someone spamming a search engine to try to rank high in search

16 | Chapter 1: The Machine Learning Landscape


results. To reduce this risk, you need to monitor your system closely and promptly
switch learning off (and possibly revert to a previously working state) if you detect a
drop in performance. You may also want to monitor the input data and react to
abnormal data (e.g., using an anomaly detection algorithm).

Instance-Based Versus Model-Based Learning


One more way to categorize Machine Learning systems is by how they generalize.
Most Machine Learning tasks are about making predictions. This means that given a
number of training examples, the system needs to be able to generalize to examples it
has never seen before. Having a good performance measure on the training data is
good, but insufficient; the true goal is to perform well on new instances.
There are two main approaches to generalization: instance-based learning and
model-based learning.

Instance-based learning
Possibly the most trivial form of learning is simply to learn by heart. If you were to
create a spam filter this way, it would just flag all emails that are identical to emails
that have already been flagged by users—not the worst solution, but certainly not the
best.
Instead of just flagging emails that are identical to known spam emails, your spam
filter could be programmed to also flag emails that are very similar to known spam
emails. This requires a measure of similarity between two emails. A (very basic) simi‐
larity measure between two emails could be to count the number of words they have
in common. The system would flag an email as spam if it has many words in com‐
mon with a known spam email.
This is called instance-based learning: the system learns the examples by heart, then
generalizes to new cases using a similarity measure (Figure 1-15).

Figure 1-15. Instance-based learning

Types of Machine Learning Systems | 17


Model-based learning
Another way to generalize from a set of examples is to build a model of these exam‐
ples, then use that model to make predictions. This is called model-based learning
(Figure 1-16).

Figure 1-16. Model-based learning

For example, suppose you want to know if money makes people happy, so you down‐
load the Better Life Index data from the OECD’s website as well as stats about GDP
per capita from the IMF’s website. Then you join the tables and sort by GDP per cap‐
ita. Table 1-1 shows an excerpt of what you get.

Table 1-1. Does money make people happier?


Country GDP per capita (USD) Life satisfaction
Hungary 12,240 4.9
Korea 27,195 5.8
France 37,675 6.5
Australia 50,962 7.3
United States 55,805 7.2

Let’s plot the data for a few random countries (Figure 1-17).

18 | Chapter 1: The Machine Learning Landscape


Figure 1-17. Do you see a trend here?

There does seem to be a trend here! Although the data is noisy (i.e., partly random), it
looks like life satisfaction goes up more or less linearly as the country’s GDP per cap‐
ita increases. So you decide to model life satisfaction as a linear function of GDP per
capita. This step is called model selection: you selected a linear model of life satisfac‐
tion with just one attribute, GDP per capita (Equation 1-1).

Equation 1-1. A simple linear model


li f e_satis f action = θ0 + θ1 × GDP_ per_capita

This model has two model parameters, θ0 and θ1.5 By tweaking these parameters, you
can make your model represent any linear function, as shown in Figure 1-18.

Figure 1-18. A few possible linear models

5 By convention, the Greek letter θ (theta) is frequently used to represent model parameters.

Types of Machine Learning Systems | 19


Before you can use your model, you need to define the parameter values θ0 and θ1.
How can you know which values will make your model perform best? To answer this
question, you need to specify a performance measure. You can either define a utility
function (or fitness function) that measures how good your model is, or you can define
a cost function that measures how bad it is. For linear regression problems, people
typically use a cost function that measures the distance between the linear model’s
predictions and the training examples; the objective is to minimize this distance.
This is where the Linear Regression algorithm comes in: you feed it your training
examples and it finds the parameters that make the linear model fit best to your data.
This is called training the model. In our case the algorithm finds that the optimal
parameter values are θ0 = 4.85 and θ1 = 4.91 × 10–5.
Now the model fits the training data as closely as possible (for a linear model), as you
can see in Figure 1-19.

Figure 1-19. The linear model that fits the training data best

You are finally ready to run the model to make predictions. For example, say you
want to know how happy Cypriots are, and the OECD data does not have the answer.
Fortunately, you can use your model to make a good prediction: you look up Cyprus’s
GDP per capita, find $22,587, and then apply your model and find that life satisfac‐
tion is likely to be somewhere around 4.85 + 22,587 × 4.91 × 10-5 = 5.96.
To whet your appetite, Example 1-1 shows the Python code that loads the data, pre‐
pares it,6 creates a scatterplot for visualization, and then trains a linear model and
makes a prediction.7

6 The code assumes that prepare_country_stats() is already defined: it merges the GDP and life satisfaction
data into a single Pandas dataframe.
7 It’s okay if you don’t understand all the code yet; we will present Scikit-Learn in the following chapters.

20 | Chapter 1: The Machine Learning Landscape


Example 1-1. Training and running a linear model using Scikit-Learn
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import sklearn

# Load the data


oecd_bli = pd.read_csv("oecd_bli_2015.csv", thousands=',')
gdp_per_capita = pd.read_csv("gdp_per_capita.csv",thousands=',',delimiter='\t',
encoding='latin1', na_values="n/a")

# Prepare the data


country_stats = prepare_country_stats(oecd_bli, gdp_per_capita)
X = np.c_[country_stats["GDP per capita"]]
y = np.c_[country_stats["Life satisfaction"]]

# Visualize the data


country_stats.plot(kind='scatter', x="GDP per capita", y='Life satisfaction')
plt.show()

# Select a linear model


lin_reg_model = sklearn.linear_model.LinearRegression()

# Train the model


lin_reg_model.fit(X, y)

# Make a prediction for Cyprus


X_new = [[22587]] # Cyprus' GDP per capita
print(lin_reg_model.predict(X_new)) # outputs [[ 5.96242338]]

If you had used an instance-based learning algorithm instead, you


would have found that Slovenia has the closest GDP per capita to
that of Cyprus ($20,732), and since the OECD data tells us that
Slovenians’ life satisfaction is 5.7, you would have predicted a life
satisfaction of 5.7 for Cyprus. If you zoom out a bit and look at the
two next closest countries, you will find Portugal and Spain with
life satisfactions of 5.1 and 6.5, respectively. Averaging these three
values, you get 5.77, which is pretty close to your model-based pre‐
diction. This simple algorithm is called k-Nearest Neighbors regres‐
sion (in this example, k = 3).
Replacing the Linear Regression model with k-Nearest Neighbors
regression in the previous code is as simple as replacing this line:
clf = sklearn.linear_model.LinearRegression()
with this one:
clf = sklearn.neighbors.KNeighborsRegressor(n_neighbors=3)

Types of Machine Learning Systems | 21


If all went well, your model will make good predictions. If not, you may need to use
more attributes (employment rate, health, air pollution, etc.), get more or better qual‐
ity training data, or perhaps select a more powerful model (e.g., a Polynomial Regres‐
sion model).
In summary:

• You studied the data.


• You selected a model.
• You trained it on the training data (i.e., the learning algorithm searched for the
model parameter values that minimize a cost function).
• Finally, you applied the model to make predictions on new cases (this is called
inference), hoping that this model will generalize well.

This is what a typical Machine Learning project looks like. In Chapter 2 you will
experience this first-hand by going through an end-to-end project.
We have covered a lot of ground so far: you now know what Machine Learning is
really about, why it is useful, what some of the most common categories of ML sys‐
tems are, and what a typical project workflow looks like. Now let’s look at what can go
wrong in learning and prevent you from making accurate predictions.

Main Challenges of Machine Learning


In short, since your main task is to select a learning algorithm and train it on some
data, the two things that can go wrong are “bad algorithm” and “bad data.” Let’s start
with examples of bad data.

Insufficient Quantity of Training Data


For a toddler to learn what an apple is, all it takes is for you to point to an apple and
say “apple” (possibly repeating this procedure a few times). Now the child is able to
recognize apples in all sorts of colors and shapes. Genius.
Machine Learning is not quite there yet; it takes a lot of data for most Machine Learn‐
ing algorithms to work properly. Even for very simple problems you typically need
thousands of examples, and for complex problems such as image or speech recogni‐
tion you may need millions of examples (unless you can reuse parts of an existing
model).

22 | Chapter 1: The Machine Learning Landscape


The Unreasonable Effectiveness of Data
In a famous paper published in 2001, Microsoft researchers Michele Banko and Eric
Brill showed that very different Machine Learning algorithms, including fairly simple
ones, performed almost identically well on a complex problem of natural language
disambiguation8 once they were given enough data (as you can see in Figure 1-20).

Figure 1-20. The importance of data versus algorithms9

As the authors put it: “these results suggest that we may want to reconsider the trade-
off between spending time and money on algorithm development versus spending it
on corpus development.”
The idea that data matters more than algorithms for complex problems was further
popularized by Peter Norvig et al. in a paper titled “The Unreasonable Effectiveness
of Data” published in 2009.10 It should be noted, however, that small- and medium-
sized datasets are still very common, and it is not always easy or cheap to get extra
training data, so don’t abandon algorithms just yet.

8 For example, knowing whether to write “to,” “two,” or “too” depending on the context.
9 Figure reproduced with permission from Banko and Brill (2001), “Learning Curves for Confusion Set Disam‐
biguation.”
10 “The Unreasonable Effectiveness of Data,” Peter Norvig et al. (2009).

Main Challenges of Machine Learning | 23


Nonrepresentative Training Data
In order to generalize well, it is crucial that your training data be representative of the
new cases you want to generalize to. This is true whether you use instance-based
learning or model-based learning.
For example, the set of countries we used earlier for training the linear model was not
perfectly representative; a few countries were missing. Figure 1-21 shows what the
data looks like when you add the missing countries.

Figure 1-21. A more representative training sample

If you train a linear model on this data, you get the solid line, while the old model is
represented by the dotted line. As you can see, not only does adding a few missing
countries significantly alter the model, but it makes it clear that such a simple linear
model is probably never going to work well. It seems that very rich countries are not
happier than moderately rich countries (in fact they seem unhappier), and conversely
some poor countries seem happier than many rich countries.
By using a nonrepresentative training set, we trained a model that is unlikely to make
accurate predictions, especially for very poor and very rich countries.
It is crucial to use a training set that is representative of the cases you want to general‐
ize to. This is often harder than it sounds: if the sample is too small, you will have
sampling noise (i.e., nonrepresentative data as a result of chance), but even very large
samples can be nonrepresentative if the sampling method is flawed. This is called
sampling bias.

A Famous Example of Sampling Bias


Perhaps the most famous example of sampling bias happened during the US presi‐
dential election in 1936, which pitted Landon against Roosevelt: the Literary Digest
conducted a very large poll, sending mail to about 10 million people. It got 2.4 million
answers, and predicted with high confidence that Landon would get 57% of the votes.

24 | Chapter 1: The Machine Learning Landscape


Instead, Roosevelt won with 62% of the votes. The flaw was in the Literary Digest’s
sampling method:

• First, to obtain the addresses to send the polls to, the Literary Digest used tele‐
phone directories, lists of magazine subscribers, club membership lists, and the
like. All of these lists tend to favor wealthier people, who are more likely to vote
Republican (hence Landon).
• Second, less than 25% of the people who received the poll answered. Again, this
introduces a sampling bias, by ruling out people who don’t care much about poli‐
tics, people who don’t like the Literary Digest, and other key groups. This is a spe‐
cial type of sampling bias called nonresponse bias.

Here is another example: say you want to build a system to recognize funk music vid‐
eos. One way to build your training set is to search “funk music” on YouTube and use
the resulting videos. But this assumes that YouTube’s search engine returns a set of
videos that are representative of all the funk music videos on YouTube. In reality, the
search results are likely to be biased toward popular artists (and if you live in Brazil
you will get a lot of “funk carioca” videos, which sound nothing like James Brown).
On the other hand, how else can you get a large training set?

Poor-Quality Data
Obviously, if your training data is full of errors, outliers, and noise (e.g., due to poor-
quality measurements), it will make it harder for the system to detect the underlying
patterns, so your system is less likely to perform well. It is often well worth the effort
to spend time cleaning up your training data. The truth is, most data scientists spend
a significant part of their time doing just that. For example:

• If some instances are clearly outliers, it may help to simply discard them or try to
fix the errors manually.
• If some instances are missing a few features (e.g., 5% of your customers did not
specify their age), you must decide whether you want to ignore this attribute alto‐
gether, ignore these instances, fill in the missing values (e.g., with the median
age), or train one model with the feature and one model without it, and so on.

Irrelevant Features
As the saying goes: garbage in, garbage out. Your system will only be capable of learn‐
ing if the training data contains enough relevant features and not too many irrelevant
ones. A critical part of the success of a Machine Learning project is coming up with a
good set of features to train on. This process, called feature engineering, involves:

Main Challenges of Machine Learning | 25


• Feature selection: selecting the most useful features to train on among existing
features.
• Feature extraction: combining existing features to produce a more useful one (as
we saw earlier, dimensionality reduction algorithms can help).
• Creating new features by gathering new data.

Now that we have looked at many examples of bad data, let’s look at a couple of exam‐
ples of bad algorithms.

Overfitting the Training Data


Say you are visiting a foreign country and the taxi driver rips you off. You might be
tempted to say that all taxi drivers in that country are thieves. Overgeneralizing is
something that we humans do all too often, and unfortunately machines can fall into
the same trap if we are not careful. In Machine Learning this is called overfitting: it
means that the model performs well on the training data, but it does not generalize
well.
Figure 1-22 shows an example of a high-degree polynomial life satisfaction model
that strongly overfits the training data. Even though it performs much better on the
training data than the simple linear model, would you really trust its predictions?

Figure 1-22. Overfitting the training data

Complex models such as deep neural networks can detect subtle patterns in the data,
but if the training set is noisy, or if it is too small (which introduces sampling noise),
then the model is likely to detect patterns in the noise itself. Obviously these patterns
will not generalize to new instances. For example, say you feed your life satisfaction
model many more attributes, including uninformative ones such as the country’s
name. In that case, a complex model may detect patterns like the fact that all coun‐
tries in the training data with a w in their name have a life satisfaction greater than 7:
New Zealand (7.3), Norway (7.4), Sweden (7.2), and Switzerland (7.5). How confident

26 | Chapter 1: The Machine Learning Landscape


are you that the W-satisfaction rule generalizes to Rwanda or Zimbabwe? Obviously
this pattern occurred in the training data by pure chance, but the model has no way
to tell whether a pattern is real or simply the result of noise in the data.

Overfitting happens when the model is too complex relative to the


amount and noisiness of the training data. The possible solutions
are:

• To simplify the model by selecting one with fewer parameters


(e.g., a linear model rather than a high-degree polynomial
model), by reducing the number of attributes in the training
data or by constraining the model
• To gather more training data
• To reduce the noise in the training data (e.g., fix data errors
and remove outliers)

Constraining a model to make it simpler and reduce the risk of overfitting is called
regularization. For example, the linear model we defined earlier has two parameters,
θ0 and θ1. This gives the learning algorithm two degrees of freedom to adapt the model
to the training data: it can tweak both the height (θ0) and the slope (θ1) of the line. If
we forced θ1 = 0, the algorithm would have only one degree of freedom and would
have a much harder time fitting the data properly: all it could do is move the line up
or down to get as close as possible to the training instances, so it would end up
around the mean. A very simple model indeed! If we allow the algorithm to modify θ1
but we force it to keep it small, then the learning algorithm will effectively have some‐
where in between one and two degrees of freedom. It will produce a simpler model
than with two degrees of freedom, but more complex than with just one. You want to
find the right balance between fitting the data perfectly and keeping the model simple
enough to ensure that it will generalize well.
Figure 1-23 shows three models: the dotted line represents the original model that
was trained with a few countries missing, the dashed line is our second model trained
with all countries, and the solid line is a linear model trained with the same data as
the first model but with a regularization constraint. You can see that regularization
forced the model to have a smaller slope, which fits a bit less the training data that the
model was trained on, but actually allows it to generalize better to new examples.

Main Challenges of Machine Learning | 27


Figure 1-23. Regularization reduces the risk of overfitting

The amount of regularization to apply during learning can be controlled by a hyper‐


parameter. A hyperparameter is a parameter of a learning algorithm (not of the
model). As such, it is not affected by the learning algorithm itself; it must be set prior
to training and remains constant during training. If you set the regularization hyper‐
parameter to a very large value, you will get an almost flat model (a slope close to
zero); the learning algorithm will almost certainly not overfit the training data, but it
will be less likely to find a good solution. Tuning hyperparameters is an important
part of building a Machine Learning system (you will see a detailed example in the
next chapter).

Underfitting the Training Data


As you might guess, underfitting is the opposite of overfitting: it occurs when your
model is too simple to learn the underlying structure of the data. For example, a lin‐
ear model of life satisfaction is prone to underfit; reality is just more complex than
the model, so its predictions are bound to be inaccurate, even on the training exam‐
ples.
The main options to fix this problem are:

• Selecting a more powerful model, with more parameters


• Feeding better features to the learning algorithm (feature engineering)
• Reducing the constraints on the model (e.g., reducing the regularization hyper‐
parameter)

Stepping Back
By now you already know a lot about Machine Learning. However, we went through
so many concepts that you may be feeling a little lost, so let’s step back and look at the
big picture:

28 | Chapter 1: The Machine Learning Landscape


Another Random Document on
Scribd Without Any Related Topics
natives emerged from the pass they were disarmed. When they
reached the terraced ridge, where the army was halted, they drew
back in fear, but they were soon reassured. Men, women and
children were eager to greet the soldiers, for the chiefs had assured
them that these were their best friends.

While this strange scene was being witnessed, Colonel Loch and
Captain Speedy were manœuvring at the extremity of Selasse, on
the road which encircled the fortress and thence led to Magdala.
Looking up to the heights the British officers saw a number of men
careering about on the plateau which connected Selasse with
Magdala. It was ascertained that they belonged to the enemy, and
their dress indicated that they were chiefs. When these men saw the
cavalry advancing round the corner at Selasse they retired slowly
and in good order to Magdala, firing as they went.

As the British proceeded, the officers soon discovered the meaning


of the presence of the Abyssinians. They had been attempting to
secure a number of cannon and mortars lying at the Selasse end of
the plateau. The cannon were at once seized by our men, and were
found to be mostly of French and British manufacture.

After retiring as far as the foot of Magdala, a few of the Abyssinians


made a pretence of preparing to charge, but apparently hesitated.
Along the brow of the famous fortress many dark heads could be
seen, and now and then shots awoke the echoes. Suddenly the
Abyssinians who were first noticed made a dash towards Captain
Speedy and the artillery, which accompanied him. After coming
within three hundred yards the natives halted, and judge of the
surprise of the British officers when they discovered that the
foremost among the company of horsemen was no other than
Theodore, king of Abyssinia!

Such a discovery was of course highly satisfactory to the British, who


had been somewhat downcast at the report of the king’s escape.
As showing the reckless courage of the king, it is said that his words
of greeting to the British were, “Come on! Are ye women, that ye
hesitate to attack a few warriors?”

As Theodore and his followers showed a disposition to advance,


some soldiers of the 33rd were ordered to take up a position
commanding all paths leading to the valleys on all sides of the
plateau. A company of the 33rd, who had eagerly ascended Selasse
for the purpose of planting their colours on its rampart, were also
invited to aid in the defence of the captured artillery.

A few shells were now sent whizzing amongst the Abyssinians, who
had by this time commenced a desultory firing. Very soon, growing
alarmed at the work of our artillery, the Abyssinians retired for
shelter behind some wooden booths. A few more shells, however,
soon dislodged Theodore and his men from their hiding places, and
they beat a rapid retreat towards Magdala. Still they had not
finished, and continued to fire at all who came within reach of their
mountain stronghold. Their persistent firing ultimately lured a
detachment of the 33rd Foot into action, but without marked effect,
and shortly after this orders came from Sir Charles Staveley to cease
firing. At the same time the British flag was hoisted above Selasse
and Fahla. Only Magdala now remained.

Describing the stronghold, one of the correspondents present says:


—“Suppose a platform of rock, oval in shape, and a mile and a half
in length, and from a half to three-quarters of a mile in width, rising
five hundred feet perpendicularly about a narrow plateau, which
connected its northern end with Selasse. The rock was Magdala, the
plateau Islamgee. On the western and southern sides Magdala
towered above the valley of the Melkaschillo some two thousand
feet. The eastern side rose in three terraces of about 600 feet in
height, one above another. Its whole summit was covered with
houses, straw-thatched, and of a conical shape. The extreme brow
of the fortress was defended by a stone wall, on the top of which a
hurdle revetment was planted. But the side fronting Islamgee was
defended by a lower wall and revetment constructed nearly half way
up the slope. In the centre of the revetment was a barbican, up to
which led the only available road to the fortress.”

Fahla and Selasse having been left in the hands of sufficient


garrisons, the remainder of the British troops were withdrawn to
Islamgee, where they were halted behind the captured artillery. Sir
Robert Napier had been at great pains to ascertain the strength of
the fortress. One thing he had made sure of, that at only one point
was it assailable, and that was the side which fronted the troops as
they stood upon Islamgee.

Then Napier distributed his force in preparation for the attack. Soon
twenty guns were thundering at the gates. Theodore could not
misunderstand the meaning of the British now. It was surrender or
death for him and his followers.

The bombardment lasted two hours. At the end of this period Napier
had made up his mind that the defenders were weak, and that the
British troops would suffer very little loss in the assault. He therefore
ordered the Royal Engineers, the 33rd, the 45th, and the King’s Own
to be prepared to carry on the attack. Already the fire from the
fortress had ceased Soon signals for rapid firing were given to the
British artillery, and under the furious cannonade which proceeded,
the British troops began their march along the plateau.

Upon their arrival within fifty yards of the foot of Magdala, the order
was given to the artillery to cease fire. Then the Engineers at once
brought their sniders into play, and for ten minutes they and the
33rd and 45th rained a storm of leaden pellets upon the defenders.

Theodore and his brave followers had been concealed while the
artillery was at work. Now, however, the king showed himself. Up he
sprang, singing out his war-cry, and with his bodyguard he hastened
to the gates, prepared to give the invaders a fitting welcome. He
posted his men at the loopholes and along the wall, topped with
wattled hurdles. Soon his signal was given, and heavy firing was
directed upon the advancing soldiers, several of whom were
wounded. Next the British fire was concentrated on the barbican,
and the revetment, through the loopholes of which rays of smoke
issuing forth betrayed the presence of the enemy. Slowly the soldiers
advanced through the rain which accompanied the thunderstorm
which now raged. For a minute there was a pause, and then again a
dozen bullets hurtled through the advance guard of the troops,
wounding Major Pritchard and several of the Engineers. Then Major
Pritchard and Lieutenant Morgan made a dash upon the barbican.
They found the gate closed, and the inside of the square completely
blocked up with huge stones.

A drummer of the 33rd climbed up the cliff wall. Reaching a ledge,


he ascended another, and shouted to his companions to “Come on!”
as he had found a way. In a short time the intrepid soldiers had
passed all the lower defences, and scattering themselves over the
ground they made a rush for the other defence, 75 feet above them,
passing over not a few ghastly reminders of the battle. There were
obstacles in the way, but they could not stop the excited Irishmen.
They leaped forward and fired volley after volley into the faces of
the Abyssinians.

Nor must we forget the charge of Drummer Maguire and Private


Bergin upon Magdala. It is related that the two men were advancing
a few paces from each other to the upper revetment when they saw
about a dozen of the enemy aiming at them. The doughty pair
immediately opened fire, and so quick and well-directed was it that
but few of their assailants escaped. Seeing a host of red-coats
advancing upward, the others retreated precipitately. Over the upper
revetment both men made their way, and at the same time they
observed a man standing near a grass stack with, a revolver in his
hand. When he saw them prepare to fire, he ran behind it, and both
men plainly heard the shot fired which followed. Advancing, they
found him prostrate on the ground, in a dying state, the revolver
clutched convulsively in his right hand. To their minds the revolver
was but their proper loot, and, without any ceremony, they
wrenched it from the grasp of the dying man. The silver plate on the
stock, however, arrested their attention, and, on examining it, they
deciphered the following inscription—“Presented by Victoria, Queen
of Great Britain and Ireland, to Theodore, Emperor of Abyssinia, as a
slight token of her gratitude for his kindness to her servant Plowden,
1854.”

The soldiers were in the presence of the Emperor, and he was dying.
Soon the rest of the troops followed their leaders, and the British
flag was straining from the post which crowned the summit of the
Abyssinian stronghold. Then, while the sound of “God Save the
Queen” rent the once more peaceful air, and the soldiers of the
Queen joined lustily in the triumphant cheers, the once proud
Emperor of Abyssinia, in all the gorgeous trappings of his state, and
surrounded by a crowd of interested spectators, breathed his last in
the stronghold where he had thought to give pause to those he
regarded as the enemies of his kingdom.

Soon after “the Advance” was once more sounded, and the soldiers
filed in column through the narrow streets, the commander-in-chief
and staff following.

When the cost of the assault came to be reckoned, it was found that
17 British had been wounded, though none of them mortally. The
Abyssinian dead were estimated at 60, with double that number of
wounded.

On the fourth morning after the fall of Magdala, the Abyssinians, to


the number of 30,000, commenced their march for Dalanta. Every
living soul having left, the gates were blown up, and the houses set
on fire. The flames soon did their work, and nothing escaped.

On the 18th April, 1868, the troops turned their faces northward for
their homeward march, their object fully attained.
CHAPTER LVII.
THE BATTLES OF AMOAFUL AND
ORDASHU.
1874.

For years the Ashantees had been a source of trouble and


annoyance to the British settlers on the Gold Coast, and the
campaign of 1873-74 was by no means entered upon without
considerable provocation from this barbarous and fanatical people.

With the march of time, Britain extended and strengthened her hold
upon the settlement, and ultimately, pursuing this policy, brought
out the Danes, and made exchanges with the Dutch there. These
proceedings culminated in Britain becoming possessors of the whole
of the territory formerly under Dutch protection. The taking over of
the Dutch forts caused heart-burning among the Ashantees.
Particularly was this the case with regard to Elimina, where, at the
time the negotiations for the transfer were being considered, a
number of Ashantee troops were lying.

King Koffee Kalkali, the ruler of the Ashantees, protested against the
transfer, maintaining that the Dutch had no right to hand over the
territory to Britain, as it belonged to him. Notwithstanding, the
Dutch contrived to get rid of the truculent Koffee and his followers
then stationed at Elimina.

Not only did the Ashantees resent the Anglo-Dutch agreement, but
other tribes in several instances also took objection. This especially
was the case as regarded the Fanties and Eliminas, who hated each
other, and interchanged hostile acts, although by this time both were
under one common protection.

The old hatred of Britain had been awakened. King Koffee assumed
a dominant and aggressive spirit, and became bent on invasion. To
some extent he was abetted by the Eliminas, who, in part at any
rate, were disloyal to the whites. From these causes arose the
campaign of ’73-’74 and the battles of Amoaful and Ordashu.

At the outbreak of hostilities the British force available to resist


attack was ridiculously meagre, numbering, it is computed, not more
than 600 men, scattered over several stations.

At home, the Government was slow to act, and not until repeated
application had been made for white troops was the appeal given
heed to.

That renowned soldier Sir Garnet Wolseley was commissioned to


operate against the Ashantees. The announcement gave great
satisfaction. If the spirit of the wild tribe was to be crushed, it was
felt that Sir Garnet was the man to do it. But his task was no light
one, and without white troops the issue was doubly doubtful.

His instructions, briefly, were to drive the Ashantees back over the
Prah, then to follow and punish them until they should consent to be
peaceful, should release their prisoners, and comply with terms
necessary to our own interests and those of humanity.

The deadly nature of the coast, “the white man’s grave,” was
doubtless a potent factor with the Government in that they did not
immediately acquiesce with Sir Garnet’s request for white troops.
But, as we know, the Government at last acceded, and the
regiments selected for service in that disease-pregnated country
have added lustre to their fame and also another page of glorious
history to the story of the pluck and endurance of Britain’s soldiers.
The total number of troops under the command of Sir Garnet
Wolseley being made up of Colonel Wood’s native regiment of 400
men, Major Russell’s native regiment of 400, the 42nd Highlanders
(Black Watch) 575 strong, the Rifle Brigade 650, 75 men of the 23rd
Fusiliers, Royal Naval Brigade 225, 2nd West India Regiment 350,
Royal Engineers 40, and Rait’s artillery 50.

About the end of October, 1873, Sir Garnet Wolseley began his
forward march into the interior. There was fighting to be done ere
long, for the enemy made an attempt to arrest the progress of the
troops by besieging Abrakrampa, the chief town of the province of
Abra, of which the native king was Britain’s staunch ally. A three
days’ ineffectual leaguer ensued, during which the Ashantees lost
heavily, while not so much as one white man was injured. With Sir
Garnet close behind, the Ashantees thought it best to recross the
Prah and retreat towards Coomassie.

Through the dense bush the troops marched in the garish and
dazzling sunlight, and at the end of their daily tramp through the
hostile country they were glad to lie down and rest in the huts
provided for them. In the way of rations the men were well looked
after by the commissariat department, the fare being as follows:—
One and a half pounds of meat, salt or fresh, one pound of pressed
meat, one and a quarter pounds of biscuits, four ounces of pressed
vegetables, two ounces of rice or preserved peas, three ounces of
sugar, three-quarters of an ounce of tea, half an ounce of salt, one-
thirteenth of an ounce of pepper. With such substantial and varied
feeding the hardships of the march were minimised and weakness
was rare—another striking illustration of the truth of the maxim of
the great Napoleon that “an army goes upon its belly.”

The further the British force progressed, denser and loftier grew the
forest, although the Engineers with unflagging energy had cleared a
pathway as far as the Prah. On the 15th December, 1873, Sir Garnet
Wolseley was able to report “the first phase of the war had been
brought to a satisfactory conclusion by a few companies of the 2nd
West India regiment, Rait’s artillery, Gordon’s Houssas, and Wood’s
and Russell’s regiments, admirably conducted by the British officers
belonging to them, without the assistance of any other troops except
the marines and blue-jackets who were upon the station on his
arrival.”

Sir Garnet arrived at Prashu on the 2nd January, 1874, and was
joyfully received by the assembled soldiers. Early in the same
morning an Ashantee embassy was espied on the other side of the
Prah. These ambassadors brought a letter from the truculent King
Koffee, in which the wily savage had the audacity to point out that
the attack upon him was unjustifiable.

The “Times” correspondent wrote that “many stories were afloat


about the King of Ashantee’s proceedings. The following is a fair
specimen, and illustrates well the extreme superstition of the
Ashantees, showing by what influences Koffee is popularly supposed
to be guided, and upon what councillors he is supposed to rely in
the present crisis. Koffee, the story goes, recently summoned a
great meeting of his fetish men, and sought their advice as to how
he should act towards Britain, and whether he ought to seek for
peace or stake his fortunes on the result of a war. The fetish men at
first declined to give an answer, until they had been guaranteed that,
no matter what their reply was, their lives should not be forfeited.
Having been assured upon this point, they then replied that ‘they
saw everything dark, except the streets of Coomassie, which ran
with blood.’ King Koffee was dissatisfied with the vagueness of this
reply, and determined to appeal still further to the oracle. He
resorted to what he considered a final and conclusive test. Two he-
goats were selected, one entirely black, the other of a spotless white
colour, and, after due fetish ceremonies had been performed over
the two goats, they were set at each other. The white goat easily
overcame and killed his opponent. King Koffee, after this test, was
satisfied that he was doomed to defeat at the hands of the white
men.”

He immediately sent the embassies before referred to, to seek for


peace, but the object which was of greatest importance to him was
to avoid the humiliation of seeing his territory invaded by the whites.
When, however, he found that all his conciliatory overtures were
powerless to hinder the advance of the British, the national pride of
the chiefs and the ardour of the fighting population was too strong
to admit of any restraint. These causes, combined with the
threatened humiliation of seeing his capital invaded by the British
and his fetish supremacy destroyed, nerved him for one desperate
effort.

For this final move Sir Garnet was prepared. In his notes for the use
of his army the commander says:—

“Each soldier must remember that with his breechloader he is equal


to at least twenty Ashantees, wretchedly armed, as they are, with
old flint muskets, firing slugs or pieces of stone that do not hurt
badly at more than forty or fifty yards range. Our enemies have
neither guns nor rockets, and have a superstitious dread of those
used by us.”

With these and similar heartening instructions, the coming fight was
anticipated eagerly by our troops, the Fanties alone, who were
employed as transport bearers, proving unreliable. These latter
deserted in thousands, thus throwing extra work upon the white
troops, many of the regiments having to carry their own baggage.

Information was received at the British headquarters on the 30th


January, 1874, that a big battle was pending on the morrow. The
natives were assembled in enormous strength, and were prepared to
offer a stout resistance. On the eve of the fray the advance guard of
the British force was at Quarman, a distance of not more than a
couple of miles from Amoaful, one of the principal villages of the
country. Between these two places lay the hamlet of Egginassie, and
to this point Major Home’s Engineers were busily engaged preparing
a way for the advancing force.
In front of Amoaful 20,000 of the natives had taken up a position. Of
this fanatical horde there was not a man but would be ready to
perpetrate the most wanton cruelty, and to whom butchery was but
second nature. As usual, the Ashantees were armed with muskets
that fired slugs. They held a position of considerable strength upon
the slopes of the hill that led to Amoaful. The dense nature of the
bush, high walls of foliage, through which our troops had to pass,
made it difficult for the soldiers to fire with precision, or make rapid
progress. The protection of not only our flanks, but also our rear,
was a matter of special importance and anxiety, for in the enclosing
screen of underwood it would be no difficult task for a stealthy and
numerous foe to surround and decimate small detachments of the
not over strong British force. But every precaution was taken to
guard against surprise, and the British general had every confidence
in each member of his force, officers and men alike.

The troops were early on the move, and with precision they filed into
their allotted places. Led by Brigadier Sir Archibald Alison, the front
column was comprised of the famous Black Watch, eighty men of
the 23rd Fusiliers, Rait’s artillery, two small rifled guns manned by
Houssas, and two rocket troughs, with a detachment of the Royal
Engineers. The left column was under the command of Brigadier
McLeod, of the Black Watch, and contained half of the blue-jackets,
Russell’s native troops, two rocket troughs, and Royal Engineers.
Lieutenant-Colonel Wood, V.C., of the Perthshire Light Infantry, had
charge of the right column, which consisted of the remaining half of
the naval brigade, seamen and marines, detachments of the Royal
Engineers, and artillery, with rockets and a regiment of African
levies. The rear column was made up of the second battalion of the
Rifle Brigade, 580 strong, and the entire force was under the skilful
command of Sir Garnet Wolseley.

The forces were disposed so as to form a large square. By this


means Sir Garnet hoped to nullify the favourite flank tactics of the
enemy, but to some extent the formation had to be broken on
account of the entangling brushwood.
The battle of Amoaful was fought on the 31st January. Lord Gifford
and his scouts were the first to get in touch with the enemy, and the
desultory firing heard warned the leading column that the conflict
was opening. The British forces met opposition about eight in the
morning, and soon after the spirting of red musketry and the curl of
white smoke were conspicuous in the dark, thick bush. So fierce was
the onslaught that it is calculated that had the Ashantees used
bullets instead of slugs scarcely a man of the Black Watch would
have lived to tell the tale. Nine officers and about a hundred men of
the regiment were rendered useless by the blinding fire of the
Ashantees. The marshy nature of the ground impeded progress, and
in the underwood the skulking natives fired incessantly at the
advancing troops.

Under a heavy fire, the left column were struggling to oust the
enemy. There, while urging on his men, the gallant Captain Buckle,
R.E., was mortally wounded, having been hit by two slugs in the
region of the heart.

The troops succeeded in occupying the crest of the hill, where a


clearing had been made, and the enemy was driven away from this
position by an advance of the naval brigade and Russell’s regiment.

“Colonel McLeod,” says Sir Garnet Wolseley, “having cleared his


front, and having lost touch of the left column, now cut his way in a
north-easterly direction, and came into the rear of the Highlanders
about the same hour that the advance occupied Amoaful. I
protected his left rear by a detachment of the Rifle Brigade. Our left
flank was now apparently clear of the enemy.”

The right column were also soon hotly engaged, and so dense was
the jungle between it and the main road that the men, in firing, had
the greatest difficulty to avoid hitting their comrades of the Black
Watch.
Mr. Henty, regarding this, says:—“Anxious to see the nature of the
difficulties with which the troops were contending, I went out to the
right column, and found the naval brigade lying down and firing into
a dense bush, from which, in spite of their heavy firing, answering
discharges came incessantly, at a distance of some twenty yards or
so. The air above was literally alive with slugs, and a perfect shower
of leaves continued to fall upon the earth. The sailors complained
that either the 23rd or 42nd were firing at them, and the same
complaint was made against the naval brigade by the 42nd and
23rd. No doubt there was, at times, justice in these complaints, for
the bush was so bewilderingly dense that men soon lost all idea of
the points of the compass, and fired in any direction from which
shots came.”

Casualties in the right columns were also numerous, and Colonel


Wood, the commander, was brought in with an iron slug in his chest.
The command of the wing now devolved on Captain Luxmore. But
though the village was entered, the fighting was by no means at an
end, and a final great effort was made by the Ashantees to turn the
rear and drive the British from Amoaful. Sir Garnet immediately
ordered the Rifle Brigade, hitherto unemployed in the battle, to take
the back track and defend the line of communication towards
Querman.

This was about one o’clock in the afternoon, and the Rifles
succeeded in repulsing the natives. It will thus be seen that on all
sides of the square the Ashantees had tried to break through. For
more than an hour they maintained the attack, but the resistance
offered completely set their attempts at nought. The climax came
when Sir Garnet, observing that the Ashantee fire was slackening,
gave orders for the line to advance, and to wheel round, so as to
drive the enemy northwards before it.

The movement was splendidly carried out. The wild Kosses and
Bonnymen of Wood’s regiment, cannibals, who had fought steadily
and silently so long as they had been on the defensive, now raised
their shrill war-cry, slung their rifles, drew their cutlasses, and like so
many wild beasts, dashed into the bush to close with the enemy,
while the Rifles, quietly and in an orderly manner as if upon parade,
went on in extended order, scouring every bush with their bullets,
and in five minutes from the time the “Advance” sounded, the
Ashantees were in full and final retreat. Even then the enemy were
not inclined to take their beating without protest, and for several
hours continued to harass the troops by sudden but abortive rushes.

Terrible carnage had been wrought on the Ashantees. The losses


they suffered have been estimated at between 800 and 1200 killed
and wounded. The king of Mampon, who commanded the Ashantee
right, was mortally wounded. Amanquatia, who commanded the left,
was killed; and Appia, one of the great chiefs engaged in the centre,
was also slain.

The British loss was over 200 officers and men killed and wounded,
the Black Watch suffering most heavily, having one officer killed, and
7 officers and 104 men wounded. In his despatch Sir Garnet said:—

“Nothing could have exceeded the admirable conduct of the 42nd


Highlanders, on whom fell the hardest share of the work”—the
highest praise for which any regiment could wish.

Having thus delivered a crushing blow to native power, the troops


marched forward to complete the work which they had so well
begun. It was evident that before the spirit of the Ashantee savage
could be thoroughly broken Coomassie must be entered. Towards
this end, Sir Garnet and his troops immediately set their faces.

Hard fighting, however, was not yet at an end, and on the day
following the rout at Amoaful, February 1st, the Ashantees made a
stand at Becquah, an important town standing a short distance from
the line of communication, and which would undoubtedly have been
the cause of considerable trouble and loss of life had the General
moved directly north without causing the place to be destroyed.
Only about a mile separated the camp from Becquah, and the force
creeping silently upon the village, soon engaged with the enemy.
Sharp firing took place, and the natives, unable to withstand the
assault, turned tail and fled. The men of the naval brigade were the
first to enter the place, and soon the huts were a mass of flames.
Some native accoutrements and much corn fell into our hands.
Following this, several villages which lay between Amoaful and
Coomassie were taken with comparatively little fighting, the
Ashantees having evidently taken much to heart the severe loss
inflicted on them on 31st January. Each village passed through had
its human sacrifice lying in the middle of the path, for the purpose of
affrighting the conquerors.

“The sacrifice,” says Mr. Stanley, “was of either sex, sometimes a


young man, sometimes a woman. The head, severed from the body,
was turned to meet the advancing army, the body was evenly laid
out, with the feet towards Coomassie. This laying out meant no
doubt, ‘regard this face, white men; ye whose feet are hurrying on
to our capital, and learn the fate awaiting you.’”

The spectacle was sickening, and the wanton cruelty made the
victorious troops even more determined and anxious to put an end
to these frightful barbarities.

From behind a series of ambuscades, the advance was again


resisted at the river Ordah. After clearing out the enemy, it was
learned that a large force had assembled at Ordashu, a village
situated about a mile and a half beyond the northern bank of the
river. Things had become serious for the Ashantees, and King Koffee
now sent another letter to Sir Garnet, imploring him to halt in order
that he might gather the indemnity, at the same time promising to
give up his hostages, the heir-apparent and the queen mother. Sir
Garnet’s reply was firm. He would march to Coomassie unless King
Koffee fulfilled his promise by the next morning. The hostages failed
to arrive, and the British troops were on the forward move at half-
past seven in the morning.
The advance guard, consisting of Gifford’s scouts, the Rifle Brigade,
Russell’s regiment, and Rait’s artillery, were early in touch with the
enemy, who had sought to impede progress at Ordashu. King Koffee
himself directed the battle from a village nearly a couple of miles
from the scene of conflict. As the successive companies marched up
they became engaged, and the firing was fast and furious. The
enemy must now drive back the invaders or submit, and the throes
of this final struggle for supremacy between barbarity and
civilisation, the Ashantees fought with great bravery. But the Rifle
Brigade proved as steady as a rock. When they moved it was
forward, the rapid fire of the sniders and the well-placed shots of
Rait’s artillery gradually demoralising the defenders.

In this fashion the Rifle Brigade were gradually drawing close up to


the village, and at the critical moment, with a ringing cheer and a
rush, they carried the day. Although the village had been occupied
the natives continued to rush to their doom, and the terrible loss
inflicted on them by the Rifles was greatly added to by the naval
brigade’s fire and that of the troops of the main column, as they
attempted to carry out their favourite flank movement.

The corpses lay thick on the roadside, while the bush was littered
with dead and dying. Sir Garnet rushed the whole of the army
through Ordashu, and then, without loss of time, “the Forty-Twa”
were again in the van, heading towards Coomassie, a sufficient force
having been left to guard Ordashu.

At Coomassie the troops had little difficulty in effecting occupation.


The king and his household had fled, and further fight in the
Ashantees there was none. Lord Gifford’s scouts were the first to
enter the town, and were followed by the Black Watch.

Coomassie, a veritable Golgotha, was razed to the ground, the


palace destroyed, and the fierce spirit of the Ashantees quelled.
CHAPTER LVIII.
THE BATTLES WITH THE ZULUS.
1879.

Says a writer in “Blackwood’s Magazine,” in March, 1879:—“To break


the military power of the Zulu nation, to save our colonies from
apprehensions which have been paralysing all efforts at
advancement, and to transform the Zulus from the slaves of a
despot who has shown himself both tyrannical and cruel, and as
reckless of the lives as of the rights of his subjects ... is the task
which has devolved upon us in South Africa, and to perform which
our troops have crossed the Tugela.”

Such causes enumerated above would appear to the unprejudiced


observer to be more than sufficient raison d’être for the British
invasion of Zululand, but when one takes into account the
unimpeachable statements of those long resident in the adjacent
colony of Natal, one cannot help believing them to be a direct, if not
wilful, misrepresentation of the facts.

The kingdom of Zululand in 1873 lay, as all are aware, between the
British colony of Natal on the south and the Transvaal Republic on
the north. Now, while the Natal border had always been in a state of
quiet and peacefulness, and the nearer settlers were on friendly
terms with their Zulu neighbours, the northern border of the
kingdom was in a constant state of unrest. For one thing, the
Transvaal Boers were, upon one pretext and another, constantly
encroaching in a southerly direction on the confines of Zululand; for
another, they were in the habit of treating the Zulus and other tribes
with an unpardonable severity.

The accusations brought above against Cetewayo, King of Zululand,


appear also to have been largely unfounded. He was crowned, at his
own request, by the British Commissioner, on the 8th August, 1873,
and had ruled his people well and in a fairly enlightened manner,
though it is true he observed many barbarous native customs in the
punishment of Zulu offenders. He may, however, be declared to be a
competent and capable native ruler.

Zululand being at this time under British protection, though ruled by


Cetewayo, the Zulus were not permitted to resent the intrusions of
the Boers upon their borders by a recourse to arms. When, however,
on April 17, 1877, Great Britain, in the person of Sir Theophilus
Shepstone, annexed the Transvaal Republic, on the ground of its
mismanagement, incapability, and gross ill-treatment of the native
races by slavery and other means, it was felt by Cetewayo that the
time had at last come when the question of his disturbed border
would be satisfactorily adjusted.

The Transvaal Boers were “paralysed” when the edict of annexation


was read to them, and strong protests were issued to the British
Government against this high-handed proceeding. Accordingly every
effort was made to conciliate the Boers until such time as they
should have settled down under the new regime, almost the first of
these concessions taking the form of an anti-Zulu view of the border
question. Upon this question of the Transvaal-Zulu border, the whole
matter of the war now turned.

As late as 1876 the Zulu people begged that the Governor of Natal
“will take a strip of the country, the length and breadth of which is to
be agreed upon between the Zulus and the Commissioners (for
whom they ask) sent from Natal, the strip to abut on the colony of
Natal and to run to the northward and eastward in such a manner as
to interpose all its length between the Boers and the Zulus, and to
be governed by the colony of Natal.”

Such a Commission was appointed, and, on December 11th, 1878,


the boundary award was delivered to the Zulus at the Lower Tugela
Drift. It was, on the whole, favourable to the Zulus, but so fenced
about with warnings and restrictions as to be virtually negative in
tone, and, in fact, many have asserted that by this time the British
Government had made up its mind to the annexation of Zululand. In
any event, the award was followed up with an ultimatum from Sir
Bartle Frere, containing thirteen specific demands. One of these
entailed the “disbanding of the Zulu army, and the discontinuance of
the Zulu military system.”

By this time a considerable British force was present in Natal to


protect the interests of the colony, and as a “means of defending
whatever the British Government finds to be its unquestionable
rights.”

The reasons given for the issue of the ultimatum were three in
particular. The first had reference to the affair of Sihayo. On July 28,
1878, a wife of the chief Sihayo, an under-chief of Cetewayo’s, had
left her husband and escaped into Natal. Hither she was followed by
Sihayo’s two chief sons and brother, conveyed back to Zululand, and
there put to death in accordance with the native custom for such an
offence. These culprits the Natal Government now demanded should
be given up to be tried in the Natal courts. Cetewayo, however, did
not regard the offence as a serious one, and offered money
compensation in place of the surrender of the young men, “looking
upon the whole affair as the act of rash boys, who, in their zeal for
their father’s honour, did not think what they were doing.”

The demand for the person of the Swazi chief, Umbilini, formed the
second point. This chief, a Swazi, was not under the jurisdiction of
Cetewayo, and though he was charged, and had been frequently
convicted of raiding, Cetewayo was in no way responsible for his
acts, otherwise than as an over-lord.

The temporary detention of two Englishmen, Messrs. Smith and


Deighton, formed the third especial grievance, and for these several
offences large fines in the way of cattle were demanded in the
ultimatum. Says Miss Colenso, daughter of the then Bishop of Natal,
and historian of the war:—

“The High Commissioner (Sir Bartle Frere) was plainly determined


not to allow the Zulus the slightest “law,” which, indeed, was wise in
the interests of war, as there was considerable fear that, in spite of
all grievances and vexations, Cetewayo, knowing full well, as he
certainly did, that collision with the British must eventually result in
his destruction, might prefer half a loaf to no bread, and submit to
our exactions with what grace he could. And so probably he would;
for from all accounts every effort was made by the king to collect the
fines of cattle and propitiate the Government.”

Such efforts were, however, unavailing, owing to the shortness of


time allowed for collecting the cattle, and no extension of the period
was granted. Moreover, in the natural agitation caused among the
Zulus by the grave turn events were taking, any concentration of
troops on the other side of the border was construed into an
intention on the part of the Zulu king to attack Natal, and urged as
an additional reason for our beginning hostilities.

On the 11th January, 1879, the allotted period having expired, war
was declared.

“The British forces,” ran the document, “are crossing into Zululand to
exact from Cetewayo reparation for violations of British territory
committed by the sons of Sihayo and others,” and to enforce better
government of his people. “All who lay down their arms will be
provided for, ... and when the war is finished the British Government
will make the best arrangements in its power for the future good
government of the Zulus.”

On the 4th inst., Lieutenant-General Lord Chelmsford, who had been


resident in the colony since August, ’78, was appointed commander-
in-chief of Her Majesty’s forces in South Africa.

Ulundi was to be the objective of the campaign, the British force to


be divided into four columns, which should enter Zululand at four
different points, and concentrated on Ulundi.

No. 1 Column, under Colonel Pearson, was to assemble on the


Lower Tugela at Fort Pearson. It consisted of a company of the Royal
Engineers, 2nd Battalion of the Buffs, 99th regiment, naval brigade
with two guns and one gatling, one squadron of mounted infantry,
about 200 Natal volunteers, two battalions of the 2nd regiment Natal
native contingent, one company of Natal native pioneers, and a
detachment of Royal Artillery.

No. 2 Column was to co-operate with No. 1. Colonel Dumford was in


command, and the corps was composed almost entirely of natives;
the Natal native horse, 315 in number, the Natal native contingent
and pioneers, and three battalions of the 1st regiment, with a rocket
battery composed it.

Colonel Glyn commanded the 3rd Column, and Rorke’s Drift was the
point selected for the crossing of this body of troops. It consisted of
six guns of the Royal Artillery, one squadron of mounted infantry, the
24th regiment, 200 Natal volunteers, 150 mounted police, the
second battalion of the 3rd regiment, with pioneers, native
contingent, and a company of Royal Engineers.

No. 4 Column, under Colonel (afterwards Sir Evelyn) Wood, V.C., was
to advance on the Blood River. Its strength was made up of Royal
Artillery, the 13th regiment, 90th regiment, frontier light horse, and
200 of the native contingent.
In addition to the four columns, a fifth, under Colonel Rowlands,
composed of the 80th regiment and mounted irregulars, was
available. The total fighting force numbered some 7000 British and
9000 native troops—16,000 in all, with drivers. The Zulu army was
estimated at not less than 40,000 strong.

Probably no campaign has ever opened so disastrously for British


arms as that which was undertaken against Cetewayo in January,
1879. At first sight, all appeared easy enough. Preparations were
made upon a complete scale. Both transport and means of
communication were regarded as highly satisfactory, and the first
movements were conducted with success, and the two centre
columns, Nos. 2 and 3, crossed the Tugela in safety, and effected
their proposed junction in front of Rorke’s Drift. Many cattle and
sheep were captured in these first skirmishes of the campaign, and
some few Zulus were killed with but slight loss on the British side.

On the morning of the 22nd January information came to hand of


the presence of a large Zulu army in front of the two centre
columns, and Lord Chelmsford himself, with the greater portion of
his force, advanced to clear the way. A force consisting of five
companies of the 1st battalion 24th regiment, a company of the 2nd
battalion, with two guns, 104 mounted colonials, and 800 natives
were left to guard the camp at Isandhlwana, which contained a
valuable convoy of supplies. It was 1.30 a.m. or thereabouts when
the advance columns with Lord Chelmsford left camp, coming first
into contact with the enemy at about five miles distant. Till about 8
a.m. nothing happened in camp worthy of notice. About this time,
however, detachments of Zulus were noticed coming in from the
north-east, and immediately the force got under arms.

Slowly the Zulus began to work round to the rear of the British
camp, and very shortly the 24th regiment found themselves
surrounded. At this point the camp followers and native troops fled
as best they could, the Zulus killing with the assegai all they could
lay hands on. In a little while the British were entirely overwhelmed.
Says Miss Colenso:—“After this period (1.30 p.m.) no one living
escaped from Isandhlwana, and it is supposed the troops had
broken, and falling into confusion, all had perished after a brief
struggle.”

One bright incident alone stands out distinctly on this fatal 22nd
January. On the storming of the camp by the Zulus, Lieutenants
Melville and Coghill rode from the camp with the colours of their
regiment. On they spurred in their frantic flight to the Tugela, and
Coghill safely stemmed the torrent and landed on the farther shore.
Melville, however, while in mid stream, lost his horse, but clinging to
the beloved colours, battled with the furious torrent with all the
energy of despair. The Zulus pressed upon them. Quick as thought,
Coghill put his charger once more into the current, and struggled to
the assistance of his brother officer, and, despite the fact that a Zulu
bullet made short work of his horse, the two devoted men
succeeded in making their escape with the colours still in their
hands. The respite was not for long, however. Soon the yelling
hordes were upon them, and, fighting fiercely to the last,
Lieutenants Melville and Coghill died bravely upholding the honour of
their country.

Meantime the advance party had pushed forward, and came in touch
now and again with the enemy, who ever fell back before them, till
about midday, when it was determined to return to camp. About this
time word came to hand of heavy firing near the camp, and
returning gradually till about six o’clock, when at a distance of only
two miles from the waggons, “four men were observed slowly
advancing towards the returning force. Thinking them to be enemy,
fire was opened, and one of the men fell. The others ran into the
open, holding up their hands, to show themselves unarmed.” They
proved to be the only survivors of the native contingent. “The camp
was found tenanted by those who were taking their last long sleep.”

Nearly 4000 Zulus were found dead in the neighbourhood of


Isandhlwana, showing the stout resistance made by our men. But, at
the best, the disaster was a fearful one, the total Imperial losses
being put at over 800 officers and men.

The night of the 22nd January saw another historic incident of the
war—the heroic defence of Rorke’s Drift. At this important ford of
the Tugela, vital to the British lines of communication, were
stationed Lieutenants Chard and Bromhead, and B company, 2nd
battalion 24th regiment. One hundred and thirty-nine men in all
constituted the numbers of this devoted band. A mission station, one
building of which was used as a hospital, and one as a commissariat
store, made up Rorke’s Drift.

At 3.15 p.m. (the time has been noted with great accuracy),
Lieutenant Chard, who was down by the river, heard the sound of
furious galloping. Louder and louder grew the hoof-beats, and ere
long two spent and almost beaten horsemen drew sudden rein upon
the Zulu bank of the Tugela. Wildly they demanded to be ferried
across, and in a few frenzied words told the terrible tale of
Isandhlwana. The Zulus were coming, they cried, and not a moment
was to be lost!

One of them, Lieutenant Adendorf, remained behind to aid in the


defence; the other was despatched post haste to Helpmakaar, the
next point in the communications, to warn the troops and bring up
reinforcements. Rorke’s Drift must be held at whatever cost and
against any odds! With feverish, but well-directed haste, all hands
set to work to put the mission buildings into a state of defence. Mr.
Dalton, of the Commissariat Department, assisted ably in the work
that every man now tackled with a will. Loopholes were made in the
buildings, and by means of two waggons and walls of mealie bags,
they were connected and provisioned with the stores.

At this time, between 4 and 4.30 p.m., an officer of Dumford’s


Horse, with about 100 men, arrived, but these being totally spent,
were sent on to Helpmakaar, and the Rorke’s Drift garrison prepared
cheerfully to face the foe. They were not long in coming. Whilst
Lieutenant Chard was in the midst of constructing “an inner work of
biscuit boxes, already two boxes high,” about 4.30 p.m., the first of
the enemy, some 600, appeared in sight. Rushing up to within fifty
yards of the now extended position, they yelled defiance at the
defenders, but a heavy fire from the loopholed masonry gave them
pause at once.

From now on, the defence of Rorke’s Drift became one prolonged
and watchful struggle. Again and again the frenzied Zulus threw
themselves against the slender defences of the gallant band, and
again and again were they hurled back, now with rifle fire, now with
bayonet, but ever backward. Darkness set in, and still the rushes
continued, till at length it was found necessary to retreat into the
inner line of defence composed of the biscuit-boxes aforementioned.
At length the enemy succeeded in setting the hospital on fire, and
the awful task of removing the sick, under the fearful odds, was
taken in hand. Alas! not all could be removed, and many perished.
No effort, however, was spared to get them all out, and at the last,
with ammunition all expended, Privates Williams, Hook, R. Jones,
and W. Jones held the door with the bayonet against the Zulu horde.

Now and again the battered entrenchments were repaired with


mealie bags, and still the unequal fight went on. By midnight the
little band was completely surrounded, and the light of the burning
hospital, showing off garrison and assailants, revealed the awful
struggle that was going on in the lurid light. “Never say die!” was the
principle of the garrison, and it was carried out to the letter.

At 4 a.m. on the 23rd January the Zulu fire slackened, and by


daybreak the enemy was out of sight. Hand grasped hand, as it was
slowly realised that the foe were beaten back and the flag was still
fluttering over the gallant garrison. Even now Lieutenant Chard,
nearly dead beat as was he and were all his men, relaxed no effort,
and the work of repairing the defences went forward. Not without
cause, for about 7 a.m. more Zulus appeared upon the hills to the
south-west, but about an hour later No. 3 Column arrived upon the
Welcome to Our Bookstore - The Ultimate Destination for Book Lovers
Are you passionate about books and eager to explore new worlds of
knowledge? At our website, we offer a vast collection of books that
cater to every interest and age group. From classic literature to
specialized publications, self-help books, and children’s stories, we
have it all! Each book is a gateway to new adventures, helping you
expand your knowledge and nourish your soul
Experience Convenient and Enjoyable Book Shopping Our website is more
than just an online bookstore—it’s a bridge connecting readers to the
timeless values of culture and wisdom. With a sleek and user-friendly
interface and a smart search system, you can find your favorite books
quickly and easily. Enjoy special promotions, fast home delivery, and
a seamless shopping experience that saves you time and enhances your
love for reading.
Let us accompany you on the journey of exploring knowledge and
personal growth!

ebookball.com

You might also like