100% found this document useful (2 votes)

5 views

Machine learning refined foundations algorithms and applications Second Edition Borhani download

The document provides information about the book 'Machine Learning Refined: Foundations, Algorithms, and Applications, Second Edition' by Jeremy Watt and Reza Borhani, which emphasizes a rigorous yet intuitive approach to machine learning. It covers essential mathematical prerequisites, practical applications across various disciplines, and includes over 300 color illustrations and coding exercises. The text is designed for graduate courses and self-study, supplemented with online resources for enhanced learning.

Uploaded by

carnimorayj9

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (2 votes)

5 views

Machine learning refined foundations algorithms and applications Second Edition Borhani download

Uploaded by

carnimorayj9

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 55

Machine learning refined foundations algorithms

and applications Second Edition Borhani download

https://textbookfull.com/product/machine-learning-refined-
foundations-algorithms-and-applications-second-edition-borhani/

Download more ebook from https://textbookfull.com

We believe these products will be a great fit for you. Click
the link to download now, or visit textbookfull.com
to discover even more!

Machine Learning Refined: Foundations, Algorithms, and

Applications Second Edition Jeremy Watt

https://textbookfull.com/product/machine-learning-refined-
foundations-algorithms-and-applications-second-edition-jeremy-
watt/

Foundations of Machine Learning second edition Mehryar

Mohri

https://textbookfull.com/product/foundations-of-machine-learning-
second-edition-mehryar-mohri/

Machine Learning Algorithms for Industrial Applications

Santosh Kumar Das

https://textbookfull.com/product/machine-learning-algorithms-for-
industrial-applications-santosh-kumar-das/

Machine Learning Foundations: Supervised, Unsupervised,

and Advanced Learning Taeho Jo

https://textbookfull.com/product/machine-learning-foundations-
supervised-unsupervised-and-advanced-learning-taeho-jo/
Pro Machine Learning Algorithms V Kishore Ayyadevara

https://textbookfull.com/product/pro-machine-learning-algorithms-
v-kishore-ayyadevara/

Analysis for computer scientists foundations methods

and algorithms Second Edition Oberguggenberger

https://textbookfull.com/product/analysis-for-computer-
scientists-foundations-methods-and-algorithms-second-edition-
oberguggenberger/

Learning Microsoft Cognitive Services leverage machine

learning APIs to build smart applications Second
Edition. Edition Larsen

https://textbookfull.com/product/learning-microsoft-cognitive-
services-leverage-machine-learning-apis-to-build-smart-
applications-second-edition-edition-larsen/

Machine learning and security protecting systems with

data and algorithms First Edition Chio

https://textbookfull.com/product/machine-learning-and-security-
protecting-systems-with-data-and-algorithms-first-edition-chio/

Machine Learning and its Applications 1st Edition Peter

Wlodarczak

https://textbookfull.com/product/machine-learning-and-its-
applications-1st-edition-peter-wlodarczak/
Machine Learning Reﬁned

With its intuitive yet rigorous approach to machine learning, this text provides students
with the fundamental knowledge and practical tools needed to conduct research and
build data-driven products. The authors prioritize geometric intuition and algorithmic
thinking, and include detail on all the essential mathematical prerequisites, to offer a
fresh and accessible way to learn. Practical applications are emphasized, with examples
from disciplines including computer vision, natural language processing, economics,
neuroscience, recommender systems, physics, and biology. Over 300 color illustra-
tions are included and have been meticulously designed to enable an intuitive grasp
of technical concepts, and over 100 in-depth coding exercisesPython
(in ) provide a
real understanding of crucial machine learning algorithms. A suite of online resources
including sample code, data sets, interactive lecture slides, and a solutions manual are
provided online, making this an ideal text both for graduate courses on machine learning
and for individual reference and self-study.

Jeremy Watt received his PhD in Electrical Engineering from Northwestern University,
and is now a machine learning consultant and educator. He teaches machine learning,
deep learning, mathematical optimization, and reinforcement learning at Northwestern
University.

Reza Borhani received his PhD in Electrical Engineering from Northwestern University,

and is now a machine learning consultant and educator. He teaches a variety of courses
in machine learning and deep learning at Northwestern University.

Aggelos K. Katsaggelos is the Joseph Cummings Professor at Northwestern University,

where he heads the Image and Video Processing Laboratory. He is a Fellow of IEEE,
SPIE, EURASIP, and OSA and the recipient of the IEEE Third Millennium Medal
(2000).
Machine Learning Reﬁned

Foundations, Algorithms, and Applications

J E R E M Y W AT T
Northwestern University, Illinois

REZA BORHANI
Northwestern University, Illinois

A G G E L O S K . K AT S A G G E L O S
Northwestern University, Illinois
University Printing House, Cambridge CB2 8BS, United Kingdom
One Liberty Plaza, 20th Floor, New York, NY 10006, USA
477 Williamstown Road, Port Melbourne, VIC 3207, Australia
314–321, 3rd Floor, Plot 3, Splendor Forum, Jasola District Centre, New Delhi – 110025, India
79 Anson Road, #06–04/06, Singapore 079906

Cambridge University Press is part of the University of Cambridge.

It furthers the University’s mission by disseminating knowledge in the pursuit of
education, learning, and research at the highest international levels of excellence.

www.cambridge.org
Information on this title:
www.cambridge.org/9781108480727
DOI: 10.1017/9781108690935
© Cambridge University Press 2020
This publication is in copyright. Subject to statutory exception
and to the provisions of relevant collective licensing agreements,
no reproduction of any part may take place without the written
permission of Cambridge University Press.
First published 2020
Printed and bound in Great Britain by Clays Ltd, Elcograf S.p.A.
A catalogue record for this publication is available from the British Library.
ISBN 978-1-108-48072-7 Hardback
Additional resources for this publication www.cambridge.org/watt2
at
Cambridge University Press has no responsibility for the persistence or accuracy
of URLs for external or third-party internet websites referred to in this publication
and does not guarantee that any content on such websites is, or will remain,
accurate or appropriate.
To our families:

Deb, Robert, and Terri

Soheila, Ali, and Maryam

Ειρήνη Ζωή Σοφία

, , , and Ειρήνη
Contents

Preface pagexii
Acknowledgements xxii
1 Introduction to Machine Learning 1
1.1 Introduction 1
1.2 Distinguishing Cats from Dogs: a Machine Learning Approach 1
1.3 The Basic Taxonomy of Machine Learning Problems 6
1.4 Mathematical Optimization 16
1.5 Conclusion 18
Part I Mathematical Optimization 19
2 Zero-Order Optimization Techniques 21
2.1 Introduction 21
2.2 The Zero-Order Optimality Condition 23
2.3 Global Optimization Methods 24
2.4 Local Optimization Methods 27
2.5 Random Search 31
2.6 Coordinate Search and Descent 39
2.7 Conclusion 40
2.8 Exercises 42
3 First-Order Optimization Techniques 45
3.1 Introduction 45
3.2 The First-Order Optimality Condition 45
3.3 The Geometry of First-Order Taylor Series 52
3.4 Computing Gradients Eﬃciently 55
3.5 Gradient Descent 56
3.6 Two Natural Weaknesses of Gradient Descent 65
3.7 Conclusion 71
3.8 Exercises 71
4 Second-Order Optimization Techniques 75
4.1 The Second-Order Optimality Condition 75
viii Contents

4.2 The Geometry of Second-Order Taylor Series 78

4.3 Newton’s Method 81
4.4 Two Natural Weaknesses of Newton’s Method 90
4.5 Conclusion 91
4.6 Exercises 92
Part II Linear Learning 97
5 Linear Regression 99
5.1 Introduction 99
5.2 Least Squares Linear Regression 99
5.3 Least Absolute Deviations 108
5.4 Regression Quality Metrics 111
5.5 Weighted Regression 113
5.6 Multi-Output Regression 116
5.7 Conclusion 120
5.8 Exercises 121
5.9 Endnotes 124
6 Linear Two-Class Classification 125
6.1 Introduction 125
6.2 Logistic Regression and the Cross Entropy Cost 125
6.3 Logistic Regression and the Softmax Cost 135
6.4 The Perceptron 140
6.5 Support Vector Machines 150
6.6 Which Approach Produces the Best Results? 157
6.7 The Categorical Cross Entropy Cost 158
6.8 Classification Quality Metrics 160
6.9 Weighted Two-Class Classification 167
6.10 Conclusion 170
6.11 Exercises 171
7 Linear Multi-Class Classification 174
7.1 Introduction 174
7.2 One-versus-All Multi-Class Classification 174
7.3 Multi-Class Classification and the Perceptron 184
7.4 Which Approach Produces the Best Results? 192
7.5 The Categorical Cross Entropy Cost Function 193
7.6 Classification Quality Metrics 198
7.7 Weighted Multi-Class Classification 202
7.8 Stochastic and Mini-Batch Learning 203
7.9 Conclusion 205
7.10 Exercises 205
Contents ix

8 Linear Unsupervised Learning 208

8.1 Introduction 208

8.2 Fixed Spanning Sets, Orthonormality, and Projections 208

8.3 The Linear Autoencoder and Principal Component Analysis 213

8.4 Recommender Systems 219

8.5 K-Means Clustering 221

8.6 General Matrix Factorization Techniques 227

8.7 Conclusion 230

8.8 Exercises 231

8.9 Endnotes 233

9 Feature Engineering and Selection 237

9.1 Introduction 237

9.2 Histogram Features 238

9.3 Feature Scaling via Standard Normalization 249

9.4 Imputing Missing Values in a Dataset 254

9.5 Feature Scaling via PCA-Sphering 255

9.6 Feature Selection via Boosting 258

9.7 Feature Selection via Regularization 264

9.8 Conclusion 269

9.9 Exercises 269

Part III Nonlinear Learning 273

10 Principles of Nonlinear Feature Engineering 275

10.1 Introduction 275

10.2 Nonlinear Regression 275

10.3 Nonlinear Multi-Output Regression 282

10.4 Nonlinear Two-Class Classiﬁcation 286

10.5 Nonlinear Multi-Class Classiﬁcation 290

10.6 Nonlinear Unsupervised Learning 294

10.7 Conclusion 298

10.8 Exercises 298

11 Principles of Feature Learning 304

11.1 Introduction 304

11.2 Universal Approximators 307

11.3 Universal Approximation of Real Data 323

ﬃ
11.4 Naive Cross-Validation 335

11.5 E cient Cross-Validation via Boosting 340

11.6 Eﬃ cient Cross-Validation via Regularization 350

11.7 Testing Data 361

11.8 Which Universal Approximator Works Best in Practice? 365

11.9 Bagging Cross-Validated Models 366

x Contents

11.10 K-Fold Cross-Validation 373

11.11 When Feature Learning Fails 378
11.12 Conclusion 379
11.13 Exercises 380
12 Kernel Methods 383
12.1 Introduction 383
12.2 Fixed-Shape Universal Approximators 383
12.3 The Kernel Trick 386
12.4 Kernels as Measures of Similarity 396
12.5 Optimization of Kernelized Models 397
12.6 Cross-Validating Kernelized Learners 398
12.7 Conclusion 399
12.8 Exercises 399
13 Fully Connected Neural Networks 403
13.1 Introduction 403
13.2 Fully Connected Neural Networks 403
13.3 Activation Functions 424
13.4 The Backpropagation Algorithm 427
13.5 Optimization of Neural Network Models 428
13.6 Batch Normalization 430
13.7 Cross-Validation via Early Stopping 438
13.8 Conclusion 440
13.9 Exercises 441
14 Tree-Based Learners 443
14.1 Introduction 443
14.2 From Stumps to Deep Trees 443
14.3 Regression Trees 446
14.4 Classiﬁcation Trees 452
14.5 Gradient Boosting 458
14.6 Random Forests 462
14.7 Cross-Validation Techniques for Recursively Deﬁned Trees 464
14.8 Conclusion 467
14.9 Exercises 467
Part IV Appendices 471
Appendix A Advanced First- and Second-Order Optimization Methods 473
A.1 Introduction 473
A.2 Momentum-Accelerated Gradient Descent 473
A.3 Normalized Gradient Descent 478
A.4 Advanced Gradient-Based Methods 485
Contents xi

A.5 Mini-Batch Optimization 487

A.6 Conservative Steplength Rules 490

A.7 Newton’s Method, Regularization, and Nonconvex Functions 499

A.8 Hessian-Free Methods 502

Appendix B Derivatives and Automatic Differentiation 511

B.1 Introduction 511

B.2 The Derivative 511

B.3 Derivative Rules for Elementary Functions and Operations 514

B.4 The Gradient 516

B.5 The Computation Graph 517

B.6 The Forward Mode of Automatic Di ﬀerentiation 520

B.7 The Reverse Mode of Automatic Diﬀerentiation 526

B.8 Higher-Order Derivatives 529

B.9 Taylor Series 531

B.10 Using the autograd Library 536

Appendix C Linear Algebra 546

C.1 Introduction 546

C.2 Vectors and Vector Operations 546

C.3 Matrices and Matrix Operations 553

C.4 Eigenvalues and Eigenvectors 556

C.5 Vector and Matrix Norms 559

References 564

Index 569
Preface

For eons we humans have sought out rules or patterns that accurately describe

how important systems in the world around us work, whether these systems

be agricultural, biological, physical, ﬁnancial, etc. We do this because such rules

allow us to understand a system better, accurately predict its future behavior

and ultimately, control it. However, the process of ﬁnding the ”right” rule that

seems to govern a given system has historically been no easy task. For most of

our history data (glimpses of a given system at work) has been an extremely

scarce commodity. Moreover, our ability to compute, to try out various rules

to see which most accurately represents a phenomenon, has been limited to

what we could accomplish by hand. Both of these factors naturally limited

the range of phenomena scientiﬁc pioneers of the past could investigate and

inevitably forced them to use philosophical and /or visual approaches to rule-

ﬁnding. Today, however, we live in a world awash in data, and have colossal

computing power at our ﬁngertips. Because of this, we lucky descendants of the

great pioneers can tackle a much wider array of problems and take a much more

empirical approach to rule-ﬁnding than our forbears could. Machine learning,

the topic of this textbook, is a term used to describe a broad (and growing)

collection of pattern-ﬁnding algorithms designed to properly identify system

rules empirically and by leveraging our access to potentially enormous amounts

of data and computing power.

In the past decade the user base of machine learning has grown dramatically.

From a relatively small circle in computer science, engineering, and mathe-

matics departments the users of machine learning now include students and

researchers from every corner of the academic universe, as well as members of

industry, data scientists, entrepreneurs, and machine learning enthusiasts. This

textbook is the result of a complete tearing down of the standard curriculum

of machine learning into its most fundamental components, and a curated re-

assembly of those pieces (painstakingly polished and organized) that we feel

will most beneﬁt this broadening audience of learners. It contains fresh and

intuitive yet rigorous descriptions of the most fundamental concepts necessary

to conduct research, build products, and tinker.

Preface xiii

Book Overview
The second edition of this text is a complete revision of our ﬁrst endeavor, with

virtually every chapter of the original rewritten from the ground up and eight

new chapters of material added, doubling the size of the ﬁrst edition. Topics from

the ﬁrst edition, from expositions on gradient descent to those on One-versus-

All classiﬁcation and Principal Component Analysis have been reworked and

polished. A swath of new topics have been added throughout the text, from

derivative-free optimization to weighted supervised learning, feature selection,

nonlinear feature engineering, boosting-based cross-validation, and more.

While heftier in size, the intent of our original attempt has remained un-

changed: to explain machine learning, from ﬁrst principles to practical imple-

mentation, in the simplest possible terms. A big-picture breakdown of the second

edition text follows below.

Part I: Mathematical Optimization (Chapters 2–4)

Mathematical optimization is the workhorse of machine learning, powering not

only the tuning of individual machine learning models (introduced in Part II)

but also the framework by which we determine appropriate models themselves

via cross-validation (discussed in Part III of the text).

In this ﬁrst part of the text we provide a complete introduction to mathemat-

ical optimization, from basic zero-order (derivative-free) methods detailed in

Chapter 2 to fundamental and advanced ﬁrst-order and second-order methods

in Chapters 3 and 4, respectively. More speciﬁcally this part of the text con-

tains complete descriptions of local optimization, random search methodologies,

gradient descent, and Newton’s method.

Part II: Linear Learning (Chapters 5–9)

In this part of the text we describe the fundamental components of cost function

based machine learning, with an emphasis on linear models.

This includes a complete description of supervised learning in Chapters 5–7

including linear regression, two-class, and multi-class classiﬁcation. In each of

these chapters we describe a range of perspectives and popular design choices

made when building supervised learners.

In Chapter 8 we similarly describe unsupervised learning, and Chapter 9 con-

tains an introduction to fundamental feature engineering practices including pop-

ular histogram features as well as various input normalization schemes, and

feature selection paradigms.

xiv Preface

Part III: Nonlinear Learning (Chapters 10–14)

In the ﬁnal part of the text we extend the fundamental paradigms introduced in

Part II to the general nonlinear setting.

We do this carefully beginning with a basic introduction to nonlinear super-

vised and unsupervised learning in Chapter 10, where we introduce the motiva-

tion, common terminology, and notation of nonlinear learning used throughout

the remainder of the text.

In Chapter 11 we discuss how to automate the selection of appropriate non-

linear models, beginning with an introduction to universal approximation. This

naturally leads to detailed descriptions of cross-validation, as well as boosting,

regularization, ensembling, and K-folds cross-validation.

With these fundamental ideas in-hand, in Chapters 12–14 we then dedicate an

individual chapter to each of the three popular universal approximators used in

machine learning: ﬁxed-shape kernels, neural networks, and trees, where we discuss

the strengths, weaknesses, technical eccentricities, and usages of each popular

universal approximator.

To get the most out of this part of the book we strongly recommend that

Chapter 11 and the fundamental ideas therein are studied and understood before

moving on to Chapters 12–14.

Part IV: Appendices

This shorter set of appendix chapters provides a complete treatment on ad-

vanced optimization techniques, as well as a thorough introduction to a range

of subjects that the readers will need to understand in order to make full use of

the text.

Appendix A continues our discussion from Chapters 3 and 4, and describes

advanced ﬁrst- and second-order optimization techniques. This includes a discussion

of popular extensions of gradient descent, including mini-batch optimization,

momentum acceleration, gradient normalization, and the result of combining these

enhancements in various ways (producing e.g., the RMSProp and Adam ﬁrst

order algorithms) – and Newton’s method – including regularization schemes

and Hessian-free methods.

Appendix B contains a tour of computational calculus including an introduc-

/
tion to the derivative gradient, higher-order derivatives, the Hessian matrix,

numerical di ﬀerentiation, forward and backward (backpropogation) automatic

di ﬀerentiation, and Taylor series approximations.
Appendix C provides a suitable background in linear and matrix algebra , in-

/
cluding vector matrix arithmetic, the notions of spanning sets and orthogonality,

as well as eigenvalues and eigenvectors.

Preface xv

Readers: How To Use This Book

This textbook was written with ﬁrst-time learners of the subject in mind, as

well as for more knowledgeable readers who yearn for a more intuitive and

serviceable treatment than what is currently available today. To make full use of

the text one needs only a basic understanding of vector algebra (mathematical

functions, vector arithmetic, etc.) and computer programming (for example,

basic proﬁciency with a dynamically typed language like Python). We provide

complete introductory treatments of other prerequisite topics including linear

algebra, vector calculus, and automatic di ﬀ erentiation in the appendices of the

text. Example ”roadmaps,” shown in Figures 0.1–0.4, provide suggested paths

for navigating the text based on a variety of learning outcomes and university

courses (ranging from a course on the essentials of machine learning to special

topics – as described further under ”Instructors: How to use this Book” below).

We believe that intuitive leaps precede intellectual ones, and to this end defer

the use of probabilistic and statistical views of machine learning in favor of a

fresh and consistent geometric perspective throughout the text. We believe that

this perspective not only permits a more intuitive understanding of individ-

ual concepts in the text, but also that it helps establish revealing connections

between ideas often regarded as fundamentally distinct (e.g., the logistic re-

gression and Support Vector Machine classiﬁers, kernels and fully connected

neural networks, etc.). We also highly emphasize the importance of mathemati-

cal optimization in our treatment of machine learning. As detailed in the ”Book

Overview” section above, optimization is the workhorse of machine learning

and is fundamental at many levels – from the tuning of individual models to

the general selection of appropriate nonlinearities via cross-validation. Because

of this a strong understanding of mathematical optimization is requisite if one

wishes to deeply understand machine learning, and if one wishes to be able to

implement fundamental algorithms.

To this end, we place signiﬁcant emphasis on the design and implementa-

tion of algorithms throughout the text with implementations of fundamental

algorithms given in Python. These fundamental examples can then be used as

building blocks for the reader to help complete the text’s programming exer-

cises, allowing them to ”get their hands dirty” and ”learn by doing,” practicing

the concepts introduced in the body of the text. While in principle any program-

ming language can be used to complete the text’s coding exercises, we highly

recommend using Python for its ease of use and large support community. We
also recommend using the open-source Python libraries NumPy, autograd, and
matplotlib, as well as the Jupyter notebook editor to make implementing and
testing code easier. A complete set of installation instructions, datasets, as well

as starter notebooks for many exercises can be found at

https://github.com/jermwatt/machine_learning_refined
xvi Preface

Instructors: How To Use This Book

Chapter slides associated with this textbook, datasets, along with a large array of

instructional interactive Python widgets illustrating various concepts through-

out the text, can be found on the github repository accompanying this textbook

https://github.com/jermwatt/machine_learning_refined
This site also contains instructions for installing Python as well as a number

of other free packages that students will ﬁnd useful in completing the text’s

exercises.

This book has been used as a basis for a number of machine learning courses

at Northwestern University, ranging from introductory courses suitable for un-

dergraduate students to more advanced courses on special topics focusing on

optimization and deep learning for graduate students. With its treatment of

foundations, applications, and algorithms this text can be used as a primary

resource or in fundamental component for courses such as the following.

Machine learning essentials treatment : an introduction to the essentials

of machine learning is ideal for undergraduate students, especially those in

quarter-based programs and universities where a deep dive into the entirety

of the book is not feasible due to time constraints. Topics for such a course

can include: gradient descent, logistic regression, Support Vector Machines,

One-versus-All and multi-class logistic regression, Principal Component Anal-

ysis, K-means clustering, the essentials of feature engineering and selection,

cross-validation, regularization, ensembling, bagging, kernel methods, fully

connected neural networks, and trees. A recommended roadmap for such a

course – including recommended chapters, sections, and corresponding topics

– is shown in Figure 0.1.

Machine learning full treatment: a standard machine learning course based

on this text expands on the essentials course outlined above both in terms

of breadth and depth. In addition to the topics mentioned in the essentials

course, instructors may choose to cover Newton’s method, Least Absolute

Deviations, multi-output regression, weighted regression, the Perceptron, the

Categorical Cross Entropy cost, weighted two-class and multi-class classiﬁca-

tion, online learning, recommender systems, matrix factorization techniques,

boosting-based feature selection, universal approximation, gradient boosting,

random forests, as well as a more in-depth treatment of fully connected neu-

ral networks involving topics such as batch normalization and early-stopping-

based regularization. A recommended roadmap for such a course – including

recommended chapters, sections, and corresponding topics – is illustrated in

Figure 0.2.
Preface xvii

Mathematical optimization for machine learning and deep learning: such

a course entails a comprehensive description of zero-, ﬁrst-, and second-order

optimization techniques from Part I of the text (as well as Appendix A) in-

cluding: coordinate descent, gradient descent, Newton’s method, quasi-Newton

methods, stochastic optimization, momentum acceleration, ﬁxed and adaptive

steplength rules, as well as advanced normalized gradient descent schemes

(e.g., Adam and RMSProp). These can be followed by an in-depth description

of the feature engineering processes (especially standard normalization and

PCA-sphering) that speed up (particularly ﬁrst-order) optimization algorithms.

All students in general, and those taking an optimization for machine learning

course in particular, should appreciate the fundamental role optimization plays

in identifying the ”right” nonlinearity via the processes of boosting and regular-

iziation based cross-validation, the principles of which are covered in Chapter

11. Select topics from Chapter 13 and Appendix B – including backpropagation,

/
batch normalization, and foward backward mode of automatic di ﬀ erentiation
– can also be covered. A recommended roadmap for such a course – including

recommended chapters, sections, and corresponding topics – is given in Figure

0.3.

Introductory portion of a course on deep learning : such a course is best suit-

able for students who have had prior exposure to fundamental machine learning

concepts, and can begin with a discussion of appropriate ﬁrst order optimiza-

tion techniques, with an emphasis on stochastic and mini-batch optimization,

momentum acceleration, and normalized gradient schemes such as Adam and

RMSProp. Depending on the audience, a brief review of fundamental elements

of machine learning may be needed using selected portions of Part II of the text.

A complete discussion of fully connected networks, including a discussion of

/
backpropagation and forward backward mode of automatic di ﬀerentiation, as
well as special topics like batch normalization and early-stopping-based cross-

validation, can then be made using Chapters 11, 13 , and Appendices A and B of

the text. A recommended roadmap for such a course – including recommended

chapters, sections, and corresponding topics – is shown in Figure 0.4. Additional

text’s github repository.

xviii Preface

CHAPTER SECTIONS TOPICS

1 2 3 4 5
Machine Learning Taxonomy
1

1 2 3 4 5
2 Global/Local Optimization Curse of Dimensionality

1 2 3 4 5
3 Gradient Descent

1 2
5 Least Squares Linear Regression

1 2 3 5 6 8
6 Logistic Regression Cross Entropy/Softmax Cost SVMs

1 2 3 4 6
7 One-versus-All Multi-Class Logistic Regression

1 2 3 5
Principal Component Analysis K-means
8

2 7
Feature Engineering Feature Selection
9

1 2 4
Nonlinear Regression Nonlinear Classification
10

1 2 3 4 6 7 9
11 Universal Approximation Cross-Validation Regularization

Ensembling Bagging

1 2 3
Kernel Methods The Kernel Trick
12

1 2 4
Fully Connected Networks Backpropagation
13

1 2 3 4
14 Regression Trees Classification Trees

Figure 0.1 Recommended study roadmap for a course on the essentials of machine

learning, including requisite chapters (left column), sections (middle column), and

corresponding topics (right column). This essentials plan is suitable for

time-constrained courses (in quarter-based programs and universities) or self-study, or

where machine learning is not the sole focus but a key component of some broader

course of study. Note that chapters are grouped together visually based on text layout

detailed under ”Book Overview” in the Preface. See the section titled ”Instructors: How

To Use This Book” in the Preface for further details.

Preface xix

CHAPTER SECTIONS TOPICS

1 2 3 4 5
1 Machine Learning Taxonomy

1 2 3 4 5
Global/Local Optimization Curse of Dimensionality
2
1 2 3 4 5
3 Gradient Descent

1 2 3
4 Newton’s method

1 2 3 4 5 6
5 Least Squares Linear Regression Least Absolute Deviations

Multi-Output Regression Weighted Regression

1 2 3 4 5 6 7 8 9 10
6 Logistic Regression Cross Entropy/Softmax Cost The Perceptron

SVMs Categorical Cross Entropy Weighted Two-Class Classification

1 2 3 4 5 6 7 8 9
7 One-versus-All Multi-Class Logistic Regression

Weighted Multi-Class Classification Online Learning

1 2 3 4 5 6 7
PCA K-means Recommender Systems Matrix Factorization
8
1 2 3 6 7
Feature Engineering Feature Selection Boosting Regularization
9

1 2 3 4 5 6 7
Nonlinear Supervised Learning Nonlinear Unsupervised Learning
10

1 2 3 4 5 6 7 8 9 10 11 12
Universal Approximation Cross-Validation Regularization
11
Ensembling Bagging K-Fold Cross-Validation

1 2 3 4 5 6 7
Kernel Methods The Kernel Trick
12
1 2 3 4 5 6 7 8
Fully Connected Networks Backpropagation Activation Functions
13
Batch Normalization Early Stopping

1 2 3 4 5 6 7 8
14 Regression/Classification Trees Gradient Boosting Random Forests

Figure 0.2 Recommended study roadmap for a full treatment of standard machine

learning subjects, including chapters, sections, as well as corresponding topics to cover.

This plan entails a more in-depth coverage of machine learning topics compared to the

essentials roadmap given in Figure 0.1, and is best suited for senior undergraduate/early

graduate students in semester-based programs and passionate independent readers. See

the section titled ”Instructors: How To Use This Book” in the Preface for further details.
xx Preface

CHAPTER SECTIONS TOPICS

1 2 3 4 5
Machine Learning Taxonomy
1

1 2 3 4 5 6 7
2 Global/Local Optimization Curse of Dimensionality

Random Search Coordinate Descent

1 2 3 4 5 6 7
3 Gradient Descent

1 2 3 4 5
Newton’s Method
4

6
8
Online Learning
7

8
3 4 5
Feature Scaling PCA-Sphering Missing Data Imputation
9

10
5 6
Regularization
11 Boosting

12
6
13 Batch Normalization

1 2 3 4 5 6 7 8
Momentum Acceleration Normalized Schemes: Adam, RMSProp
A
Fixed Lipschitz Steplength Rules Backtracking Line Search

Stochastic/Mini-Batch Optimization Hessian-Free Optimization

1 2 3 4 5 6 7 8 9 10
Forward/Backward Mode of Automatic Differentiation
B

Figure 0.3 Recommended study roadmap for a course on mathematical optimization

for machine learning and deep learning, including chapters, sections, as well as topics

to cover. See the section titled ”Instructors: How To Use This Book” in the Preface for

further details.
Preface xxi

CHAPTER SECTIONS TOPICS

2
1 2 3 4 5 6 7
3 Gradient Descent

1 2 3 4 5
10 Nonlinear Regression Nonlinear Classification Nonlinear Autoencoder

1 2 3 4 6
11 Universal Approximation Cross-Validation Regularization

12
1 2 3 4 5 6 7 8
13 Fully Connected Networks Backpropagation Activation Functions

Batch Normalization Early Stopping

1 2 3 4 5 6
A Momentum Acceleration Normalized Schemes: Adam, RMSProp

Fixed Lipschitz Steplength Rules Backtracking Line Search

Stochastic/Mini-Batch Optimization

1 2 3 4 5 6 7 8 9 10
B Forward/Backward Mode of Automatic Differentiation

Figure 0.4 Recommended study roadmap for an introductory portion of a course on

deep learning, including chapters, sections, as well as topics to cover. See the section

titled ”Instructors: How To Use This Book” in the Preface for further details.
Acknowledgements

This text could not have been written in anything close to its current form

without the enormous work of countless genius-angels in the Python open-

source community, particularly authors and contributers of NumPy, Jupyter,
and matplotlib. We are especially grateful to the authors and contributors of

autograd including Dougal Maclaurin, David Duvenaud, Matt Johnson, and

Jamie Townsend, as autograd allowed us to experiment and iterate on a host of

new ideas included in the second edition of this text that greatly improved it as

well as, we hope, the learning experience for its readers.

We are also very grateful for the many students over the years that provided

insightful feedback on the content of this text, with special thanks to Bowen

Tian who provided copious amounts of insightful feedback on early drafts of

the work.

Finally, a big thanks to Mark McNess Rosengren and the entire Standing

Passengers crew for helping us stay ca ﬀeinated during the writing of this text.
1 Introduction to Machine
Learning

1.1 Introduction
Machine learning is a uniﬁed algorithmic framework designed to identify com-

putational models that accurately describe empirical data and the phenomena

underlying it, with little or no human involvement. While still a young dis-

cipline with much more awaiting discovery than is currently known, today

machine learning can be used to teach computers to perform a wide array

of useful tasks including automatic detection of objects in images (a crucial

component of driver-assisted and self-driving cars), speech recognition (which

powers voice command technology), knowledge discovery in the medical sci-

ences (used to improve our understanding of complex diseases), and predictive

analytics (leveraged for sales and economic forecasting), to just name a few.

In this chapter we give a high-level introduction to the ﬁeld of machine

learning as well as the contents of this textbook.

1.2 Distinguishing Cats from Dogs: a Machine Learning

Approach
To get a big-picture sense of how machine learning works, we begin by dis-

cussing a toy problem: teaching a computer how to distinguish between pic-

tures of cats from those with dogs. This will allow us to informally describe the

terminology and procedures involved in solving the typical machine learning

problem.

Do you recall how you first learned about the di ff erence between cats and
dogs, and how they are di ff erent animals? The answer is probably no, as most
humans learn to perform simple cognitive tasks like this very early on in the

course of their lives. One thing is certain, however: young children do not need

some kind of formal scientiﬁc training, or a zoological lecture on felis catus and

canis familiaris species, in order to be able to tell cats and dogs apart. Instead,

they learn by example. They are naturally presented with many images of

what they are told by a supervisor (a parent, a caregiver, etc.) are either cats

or dogs, until they fully grasp the two concepts. How do we know when a

child can successfully distinguish between cats and dogs? Intuitively, when
2 Introduction to Machine Learning

they encounter new (images of) cats and dogs, and can correctly identify each

new example or, in other words, when they can generalize what they have learned

to new, previously unseen, examples.

Like human beings, computers can be taught how to perform this sort of task

in a similar manner. This kind of task where we aim to teach a computer to

distinguish between di ﬀ erent types or classes of things (here cats and dogs) is

referred to as a classiﬁcation problem in the jargon of machine learning, and is

done through a series of steps which we detail below.

1. Data collection. Like human beings, a computer must be trained to recognize

the diﬀ erence between these two types of animals by learning from a batch of
examples, typically referred to as a training set of data. Figure 1.1 shows such a

training set consisting of a few images of di ﬀerent cats and dogs. Intuitively, the
larger and more diverse the training set the better a computer (or human) can

perform a learning task, since exposure to a wider breadth of examples gives

the learner more experience.

Figure 1.1 A training set consisting of six images of cats (highlighted in blue) and six

images of dogs (highlighted in red). This set is used to train a machine learning model

that can distinguish between future images of cats and dogs. The images in this ﬁgure

were taken from [1].

2. Feature design. Think for a moment about how we (humans) tell the di ﬀ erence
between images containing cats from those containing dogs. We use color, size,

/
the shape of the ears or nose, and or some combination of these features in order

to distinguish between the two. In other words, we do not just look at an image

as simply a collection of many small square pixels. We pick out grosser details,

or features, from images like these in order to identify what it is that we are

looking at. This is true for computers as well. In order to successfully train a

computer to perform this task (and any machine learning task more generally)
1.2 Distinguishing Cats from Dogs: a Machine Learning Approach 3

we need to provide it with properly designed features or, ideally, have it ﬁnd or

learn such features itself.

Designing quality features is typically not a trivial task as it can be very ap-

plication dependent. For instance, a feature like color would be less helpful in

discriminating between cats and dogs (since many cats and dogs share similar

hair colors) than it would be in telling grizzly bears and polar bears apart! More-

over, extracting the features from a training dataset can also be challenging. For

example, if some of our training images were blurry or taken from a perspective

where we could not see the animal properly, the features we designed might

not be properly extracted.

However, for the sake of simplicity with our toy problem here, suppose we

can easily extract the following two features from each image in the training set:

size of nose relative to the size of the head, ranging from small to large, and shape

of ears, ranging from round to pointy.

pointy
ear shape
round

small nose size large

Figure 1.2 Feature space representation of the training set shown in Figure 1.1 where

the horizontal and vertical axes represent the features nose size and ear shape,

respectively. The fact that the cats and dogs from our training set lie in distinct regions

of the feature space reﬂects a good choice of features.

Examining the training images shown in Figure 1.1 , we can see that all cats

have small noses and pointy ears, while dogs generally have large noses and

round ears. Notice that with the current choice of features each image can now

be represented by just two numbers: a number expressing the relative nose size,

and another number capturing the pointiness or roundness of the ears. In other

words, we can represent each image in our training set in a two-dimensional

4 Introduction to Machine Learning

feature space where the features nose size and ear shape are the horizontal and

vertical coordinate axes, respectively, as illustrated in Figure 1.2.

3. Model training. With our feature representation of the training data the

machine learning problem of distinguishing between cats and dogs is now a

simple geometric one: have the machine ﬁnd a line or a curve that separates

the cats from the dogs in our carefully designed feature space. Supposing for

simplicity that we use a line, we must ﬁnd the right values for its two parameters

– a slope and vertical intercept – that deﬁne the line’s orientation in the feature

space. The process of determining proper parameters relies on a set of tools

known as mathematical optimization detailed in Chapters 2 through 4 of this text,

and the tuning of such a set of parameters to a training set is referred to as the

training of a model.

Figure 1.3 shows a trained linear model (in black) which divides the feature

space into cat and dog regions. This linear model provides a simple compu-

tational rule for distinguishing between cats and dogs: when the feature rep-

resentation of a future image lies above the line (in the blue region) it will be

considered a cat by the machine, and likewise any representation that falls below

the line (in the red region) will be considered a dog.

pointy
ear shape
round

small nose size large

Figure 1.3 A trained linear model (shown in black) provides a computational rule for

distinguishing between cats and dogs. Any new image received in the future will be

classiﬁed as a cat if its feature representation lies above this line (in the blue region), and

a dog if the feature representation lies below this line (in the red region).
1.2 Distinguishing Cats from Dogs: a Machine Learning Approach 5

Figure 1.4 A validation set of cat and dog images (also taken from [1]). Notice that the

images in this set are not highlighted in red or blue (as was the case with the training set

shown in Figure 1.1) indicating that the true identity of each image is not revealed to the

learner. Notice that one of the dogs, the Boston terrier in the bottom right corner, has

both a small nose and pointy ears. Because of our chosen feature representation the

computer will think this is a cat!

4. Model validation. To validate the e ﬃcacy of our trained learner we now show
the computer a batch of previously unseen images of cats and dogs, referred to

generally as a validation set of data, and see how well it can identify the animal

in each image. In Figure 1.4 we show a sample validation set for the problem at

hand, consisting of three new cat and dog images. To do this, we take each new

image, extract our designed features (i.e., nose size and ear shape), and simply

check which side of our line (or classiﬁer) the feature representation falls on. In

this instance, as can be seen in Figure 1.5, all of the new cats and all but one dog

from the validation set have been identiﬁed correctly by our trained model.

The misidentiﬁcation of the single dog (a Boston terrier) is largely the result

of our choice of features, which we designed based on the training set in Figure

1.1, and to some extent our decision to use a linear model (instead of a nonlinear

one). This dog has been misidentiﬁed simply because its features, a small nose

and pointy ears, match those of the cats from our training set. Therefore, while

it ﬁrst appeared that a combination of nose size and ear shape could indeed

distinguish cats from dogs, we now see through validation that our training set

was perhaps too small and not diverse enough for this choice of features to be

completely e ﬀ ective in general.

We can take a number of steps to improve our learner. First and foremost we

should collect more data, forming a larger and more diverse training set. Second,

/
we can consider designing including more discriminating features (perhaps eye

color, tail shape, etc.) that further help distinguish cats from dogs using a linear

model. Finally, we can also try out (i.e., train and validate) an array of nonlinear

models with the hopes that a more complex rule might better distinguish be-

tween cats and dogs. Figure 1.6 compactly summarizes the four steps involved

in solving our toy cat-versus-dog classiﬁcation problem.

6 Introduction to Machine Learning

pointy
ear shape
round

small nose size large

Figure 1.5 Identiﬁcation of (the feature representation of) validation images using our

trained linear model. The Boston terrier (pointed to by an arrow) is misclassiﬁed as a cat

since it has pointy ears and a small nose, just like the cats in our training set.

Data collection Feature design Model training Model validation

Training set

Validation set

Figure 1.6 The schematic pipeline of our toy cat-versus-dog classiﬁcation problem. The

same general pipeline is used for essentially all machine learning problems.

1.3 The Basic Taxonomy of Machine Learning Problems

The sort of computational rules we can learn using machine learning generally

fall into two main categories called supervised and unsupervised learning, which
we discuss next.
1.3 The Basic Taxonomy of Machine Learning Problems 7

1.3.1 Supervised learning

Supervised learning problems (like the prototypical problem outlined in Section

1.2) refer to the automatic learning of computational rules involving input /out-

put relationships. Applicable to a wide array of situations and data types, this

type of problem comes in two forms, called regression and classiﬁcation, depend-

ing on the general numerical form of the output.

Regression
Suppose we wanted to predict the share price of a company that is about to

go public. Following the pipeline discussed in Section 1.2, we ﬁrst gather a

training set of data consisting of a number of corporations (preferably active in

the same domain) with known share prices. Next, we need to design feature(s)

that are thought to be relevant to the task at hand. The company’s revenue is one

such potential feature, as we can expect that the higher the revenue the more

expensive a share of stock should be. To connect the share price (output) to the

revenue (input) we can train a simple linear model or regression line using our

training data.
share price

share price

revenue revenue
share price

share price

new company’s revenue estimated share price

revenue revenue

Figure 1.7 (top-left panel) A toy training dataset consisting of ten corporations’ share

price and revenue values. (top-right panel) A linear model is ﬁt to the data. This trend

line models the overall trajectory of the points and can be used for prediction in the

future as shown in the bottom-left and bottom-right panels.

The top panels of Figure 1.7 show a toy dataset comprising share price versus

revenue information for ten companies, as well as a linear model ﬁt to this data.

Once the model is trained, the share price of a new company can be predicted
8 Introduction to Machine Learning

based on its revenue, as depicted in the bottom panels of this ﬁgure. Finally,

comparing the predicted price to the actual price for a validation set of data

we can test the performance of our linear regression model and apply changes

as needed, for example, designing new features (e.g., total assets, total equity,

number of employees, years active, etc.) and/or trying more complex nonlinear

models.

This sort of task, i.e., ﬁtting a model to a set of training data so that predictions

about a continuous-valued output (here, share price) can be made, is referred to as

regression. We begin our detailed discussion of regression in Chapter 5 with the

linear case, and move to nonlinear models starting in Chapter 10 and throughout

Chapters 11–14. Below we describe several additional examples of regression to

help solidify this concept.

Example 1.1 The rise of student loan debt in the United States

Figure 1.8 (data taken from [2]) shows the total student loan debt (that is money

borrowed by students to pay for college tuition, room and board, etc.) held

by citizens of the United States from 2006 to 2014, measured quarterly. Over

the eight-year period reﬂected in this plot the student debt has nearly tripled,

totaling over one trillion dollars by the end of 2014. The regression line (in

black) ﬁts this dataset quite well and, with its sharp positive slope, emphasizes

the point that student debt is rising dangerously fast. Moreover, if this trend

continues, we can use the regression line to predict that total student debt will

surpass two trillion dollars by the year 2026 (we revisit this problem later in

Exercise 5.1).
[in trillions of dollars]
student debt

year

Figure 1.8 Figure associated with Example 1.1, illustrating total student loan debt in the

United States measured quarterly from 2006 to 2014. The rapid increase rate of the debt,

measured by the slope of the trend line ﬁt to the data, conﬁrms that student debt is

growing very fast. See text for further details.

Another Random Scribd Document
with Unrelated Content
NORWAY PINE

Norway Pine
NORWAY PINE
(Pinus Resinosa)

arly explorers who were not botanists mistook this tree for Norway

E spruce, and gave it the name which has since remained in nearly all
parts of its range. It is called red pine also, and this name is strictly
descriptive. The brown or red color of the bark is instantly noticed by
one who sees the tree for the first time. In the Lake States it has
been called hard pine for the purpose of distinguishing it from the softer
white pine with which it is associated. In England they call it Canadian red
pine, because the principal supply in England is imported from the
Canadian provinces.
Its chief range lies in the drainage basin of the St. Lawrence river, which
includes the Great Lakes and the rivers which flow into them.
Newfoundland forms the eastern and Manitoba the western outposts of
this species. It is found as far south as Massachusetts, Pennsylvania,
northern Ohio, central Michigan, Wisconsin, and Minnesota. It conforms
pretty generally to the range of white pine but does not accompany that
species southward along the Appalachian mountain ranges across West
Virginia, Virginia, Kentucky, and Tennessee. Where it was left to compete
in nature’s way with white pine, the contest was friendly, but white pine
got the best of it. The two species grew in intermixture, but in most
instances white pine had from five to twenty trees to Norway’s one. As a
survivor under adversity, however, the Norway pine appears to surpass its
great friendly rival, at least in the Lake States where the great pineries
once flourished and have largely passed away. Solitary or small clumps of
Norway pines are occasionally found where not a white pine, large or
small, is in sight.
The forest appearance of Norway pine resembles the southern yellow
pines. The stand is open, the trunks are clean and tall, the branches are at
the top. The Norway’s leaves are in clusters of two, and are five or six
inches long. They fall during the fourth or fifth year. Cones are two inches
long, and when mature, closely resemble the color of the tree’s bark, that
is, light chestnut brown. Exceptionally tall Norway pines may reach a
height of 150 feet, but the average is seventy or eighty, with diameters of
from two to four. Young trees are limby, but early in life the lower branches
die and fall, leaving few protruding stubs or knots. It appears to be a
characteristic that trunks are seldom quite straight. They do not have the
plumb appearance of forest grown white pine and spruce.
The wood of Norway pine is medium light, its strength and stiffness about
twenty-five per cent greater than white pine, and it is moderately soft. The
annual rings are rather wide, indicating rapid growth. The bands of
summerwood are narrow compared with the springwood, which gives a
generally light color to the wood, though not as light as the wood of white
pine. The resin passages are small and fairly numerous. The sapwood is
thick, and the wood is not durable in contact with the soil.
Norway pine has always had a place of its own in the lumber trade, but
large quantities have been marketed as white pine. If such had not been
the case, Norway pine would have been much oftener heard of during the
years when the Lake State pineries were sending their billions of feet of
lumber to the markets of the world.
Because of the deposit of resinous materials in the wood, Norway pine
stumps resist decay much better than white pine. In some of the early
cuttings in Michigan, where only stumps remain to show how large the
trees were and how thick they stood, the Norway stumps are much better
preserved than the white pine. Using that fact as a basis of estimate, it
may be shown that in many places the Norway pine constituted one-fifth
or one-fourth of the original stand. The lumbermen cut clean, and
statistics of that period do not show that the two pines were generally
marketed separately. In recent years many of the Norway stumps have
been pulled, and have been sold to wood-distillation plants where the rosin
and turpentine are extracted.
At an early date Norway pine from Canada and northern New York was
popular ship timber in this country and England. Slender, straight trunks
were selected as masts, or were sawed for decking planks thirty or forty
feet long. Shipbuilders insisted that planks be all heartwood, because
when sapwood was exposed to rain and sun, it changed to a green color,
due to the presence of fungus. The wood wears well as ship decking. The
British navy was still using some Norway pine masts as late as 1875.
The scarcity of this timber has retired it from some of the places which it
once filled, and the southern yellow pines have been substituted. It is still
employed for many important purposes, the chief of which is car building,
if statistics for the state of Illinois are a criterion for the whole country. In
1909 in that state 24,794,000 feet of it were used for all purposes, and
14,783,000 feet in car construction.
For many years Chicago has been the center of the Norway pine trade. It
is landed there by lake steamers and by rail, and is distributed to ultimate
consumers. The uses for the wood, as reported by Illinois manufacturers,
follow: Baskets, boxes, boats, brackets, casing and frames for doors and
windows, crating, derricks for well-boring machines, doors, elevators,
fixtures for stores and offices, foot or running boards for tank cars, foundry
flasks, freight cars, hand rails, insulation for refrigerator cars, ladders,
picture moldings, roofing, sash, siding for cattle cars, sign boards and
advertising signs, tanks, and windmill towers.
As with white pine, Norway pine has passed the period of greatest
production, though much still goes to market every year and will long
continue to do so. The land which lumbermen denuded in the Lake States,
particularly Michigan and Wisconsin, years ago, did not reclothe itself with
Norway seedlings. That would have taken place in most instances but for
fires which ran periodically through the slashings until all seedlings were
destroyed. In many places there are now few seedlings and few large trees
to bear seeds, and consequently the pine forest in such places is a thing of
the past. The outlook is better in other localities.
The Norway pine is much planted for ornament, and is rated one of the
handsomest of northern park trees.
Pitch Pine (Pinus rigida). The name pitch pine is locally applied to almost every species of
hard, resinous pine in this country. The Pinus rigida has other names than pitch pine. In
Delaware it is called longleaved pine, since its needles are longer than the scrub pine’s
with which it is associated. For the same reason it is known in some localities as longschat
pine. In Massachusetts it is called hard pine, in Pennsylvania yellow pine, in North
Carolina and eastern Tennessee black pine, and black Norway pine in New York. The
botanical name is translated “rigid pine,” but the rigid refers to the leaves, not the wood.
Its range covers New England, New York, Pennsylvania, southern Canada, eastern Ohio,
and southward along the mountains to northern Georgia. It has three leaves in a cluster,
from three to five inches long, and they fall the second year. Cones range in length from
one to three inches, and they hang on the branches ten or twelve years. The wood is
medium light, moderately strong, but low in stiffness. It is soft and brittle. The annual
rings are wide, the summerwood broad, distinct, and very resinous. Medullary rays are
few but prominent; color, light brown or red, the thick sapwood yellow or often nearly
white. The difference in the hardness between springwood and summerwood renders it
difficult to work, and causes uneven wear when used as flooring. It is fairly durable in
contact with the soil.
The tree attains a height of from forty to eighty feet and a diameter of three. This pine is
not found in extensive forests, but in scattered patches, nearly always on poor soil where
other trees will not crowd it. Light and air are necessary to its existence. If it receives
these, it will fight successfully against adversities which would be fatal to many other
species. In resistance to forest fires, it is a salamander among trees. That is primarily due
to its thick bark, but it is favored also by the situations in which it is generally found—
open woods, and on soil so poor that ground litter is thin. It is a useful wood for many
purposes, and wherever it is found in sufficient quantity, it goes to market, but under its
own name only in restricted localities. Its resinous knots were once used in place of
candles in frontier homes. Tar made locally from its rich wood was the pioneer wagoner’s
axle grease, and the ever-present tar bucket and tar paddle swung from the rear axle.
Torches made by tying splinters in bundles answered for lanterns in night travel. It was
the best pine for floors in some localities. It is probably used more for boxes than for
anything else at present. In 1909 Massachusetts box makers bought 600,000 feet, and a
little more went to Maryland box factories. Its poor holding power on spikes limits its
employment as railroad ties and in shipbuilding. Carpenters and furniture makers object
to the numerous knots. Country blacksmiths who repair and make wagons as a side line,
find it suitable for wagon beds. It is much used as fuel where it is convenient.
Torrey Pine (Pinus torreyana), called del mar pine and Soledad pine, is an interesting tree
from the fact that its range is so restricted that the actual number of trees could be easily
known to one who would take the trouble to count them. A rather large quantity formerly
occupied a small area in San Diego county, California, but woodchoppers who did not
appreciate the fact that they were exterminating a species of pine from the face of the
earth, cut nearly all of the trees for fuel. Its range covered only a few square miles, and
fortunately part of that was included in the city limits of San Diego. An ordinance was
passed prohibiting the cutting of a Torrey pine under heavy penalty, and the tree was thus
saved. A hundred and fifty miles off the San Diego coast a few Torrey pines grow on the
islands of Santa Cruz and Santa Rosa, and owing to their isolated situation they bid fair to
escape the cordwood cutter for years to come. Those who have seen this tree on its
native hills have admired the gameness of its battle for existence against the elements.
Standing in the full sweep of the ocean winds, its strong, short branches scarcely move,
and all the agitation is in the thick tufts of needles which cling to the ends of the
branches. Trees exposed to the seawinds are stunted, and are generally less than a foot
in diameter and thirty feet high; but those which are so fortunate as to occupy sheltered
valleys are three or four times that size. The needles are five in a cluster. The cones
persist on the branches three or four years. The wood is light, soft, moderately strong,
very brittle; the rings of yearly growth are broad, and the yellow bands of summerwood
occupy nearly half. The sapwood is very thick and is nearly white.
WESTERN YELLOW PINE

Western Yellow Pine

WESTERN YELLOW PINE
(Pinus Ponderosa)

he range of western yellow pine covers a million square miles. Its

T eastern boundary is a line drawn from South Dakota to western

Texas. The species covers much of the country between that line
and the Pacific ocean. It is natural that it should have more names
than one in a region so extensive. It is best known as western
yellow pine, but lumbermen often call it California white pine. The standing
timber is frequently designated bull pine, but that name is not often given
to the lumber. Where there is no likelihood of confusing it with southern
pines, it is called simply yellow pine. The name heavy-wooded pine,
sometimes applied to the lumber in England, is misleading. When well
seasoned it weighs about thirty pounds per cubic foot, and ordinarily it
would not be classed heavy. In California it is called heavy pine, but that is
to distinguish it from sugar pine which is considerably lighter. The color of
its bark has given it the name Sierra brownbark pine. The same tree in
Montana is called black pine.
The tree has developed two forms. Some botanists have held there are
two species, but that is not the general opinion. In the warm, damp
climate of the Pacific slope the tree is larger, and somewhat different in
appearance from the form in the Rocky Mountain region. The same
observation holds true of Douglas fir.
The wood of western yellow pine is medium light, not strong, is low in
elasticity, medullary rays prominent but not numerous; resinous, color light
to reddish, the thick sapwood almost white. The annual rings are variable
in width, and the proportionate amounts of springwood and summerwood
also vary. It is not durable in contact with the ground.
The wood is easy to work and some of the best of it resembles white pine,
but as a whole it is inferior to that wood, though it is extensively employed
as a substitute for it in the manufacture of doors, sash, and frames. It is
darker than white pine, harder, heavier, stronger, almost exactly equal in
stiffness, but the annual rings of the two woods do not bear close
resemblance.
The tree reaches a height of from 100 to 200 feet, a diameter from three
to seven. It is occasionally much larger. Its size depends much on its
habitat. The best development occurs on the Sierra Nevada mountains in
California and the best wood comes from that region, though certain other
localities produce high-grade lumber.
Western yellow pine holds and will long hold an important place in the
country’s timber resources. The total stand has been estimated at
275,000,000,000 feet, and is second only to that of Douglas fir, though the
combined stand of the four southern yellow pines is about
100,000,000,000 feet larger. It is a vigorous species, able to hold its
ground under ordinary circumstances. Next to incense cedar and the giant
sequoias which are associated with it in the Sierra Nevada mountains, it is
the most prolific seed bearer of the western conifers, and its seeds are
sufficiently light to insure wide distribution. It is gaining ground within its
range by taking possession of vacant areas which have been bared by
lumbering or fire. In some cases it crowds to death the more stately sugar
pine by cutting off its light and moisture. It resists fire better than most of
the forest trees with which it is associated. On the other hand, it suffers
from enemies more than its associates do. A beetle (Dendroctonus
ponderosæ), destroys large stands. In the Black Hills in 1903 its ravages
killed 600,000,000 feet.
This splendid pine has run the gamut of uses from the corral pole of the
first settler to the paneled door turned out by the modern factory. It has
almost an unlimited capacity for usefulness. It grows in dry regions of the
Rocky Mountains where it is practically the only source of wood supply;
and it is equally secure in its position where forests are abundant and fine.
It has supplied props, stulls, and lagging for mines in nearly every state
touched by its range. Without its ties and other timbers some of the early
railroads through the western mountains could scarcely have been built. It
has been one of the leading flume timbers in western lumber and irrigation
development. It fenced many ranches in early times and is still doing so. It
is used in general construction, and in finish; from the shingle to the
foundation sill of houses. It finds its way to eastern lumber markets.
Almost 20,000,000 feet a year are used in Illinois alone. Competition with
eastern white pine is met in the Lake States because, grade for grade, the
western wood is cheaper, until lower grades are reached. The western
yellow pine, in the eastern market, is confused with the western white pine
of Idaho and Montana (Pinus monticola) and separate statistics of use are
impossible.
The makers of fruit boxes in California often employ the yellow pine in lieu
of sugar pine which once supplied the whole trade. It is also used by
coopers for various containers, but not for alcoholic liquors.
The leaves are in clusters of twos and threes, and are from five to eleven
inches long. Most of them fall during the third year. The cones are from
three to six inches long, and generally fall soon after they reach maturity.
Coulter Pine (Pinus coulteri) is also known as nut pine, big cone pine, and long cone pine.
It is a California species, scarce, but of much interest because of its cones. They are
larger than those of any other American pine and are armed with formidable curved
spines from half an inch to an inch and a half in length. The cones are from ten to
fourteen inches long. The tree is found on the Coast Range mountains from the latitude
of San Francisco to the boundary between California and Mexico. It thrives at altitudes of
from 3,000 to 6,000 feet. It never occurs in pure stands and the total amount is small. It
looks like the western yellow pine, but is much inferior in size. Trunks seldom attain a
length of fifteen feet or a diameter of two. There is no evidence that Coulter pine is
increasing its stand on the ground which it already occupies, or spreading to new ground.
The wood is light, soft, moderately strong, and very tough. The annual rings are narrow
and consist largely of summerwood. The heartwood is light red, the thick sapwood nearly
white. It is a poor tree for lumber, and it has been little used in that way, but has been
burned for charcoal for blacksmith shops, and much is sold as cordwood. The leaves of
Coulter pine are in clusters of three, and they fall during the third and fourth years.
California Swamp Pine (Pinus muricata) clearly belongs among minor species listed as
timber trees. It meets a small demand for skids, corduroy log roads, bridge floors, and
scaffolds in the redwood logging operations in California. It is scattered along the Pacific
coast 500 miles, beginning in Lower California and ending a hundred miles north of San
Francisco. It is known as dwarf marine pine, pricklecone pine, bishop pine, and obispo
pine. The last name is the Spanish translation of the English word bishop. The largest
trees seldom exceed two feet in diameter, and a height of ninety feet. The average size is
little more than half as much. The wood is very strong, hard, and compact, and the
annual growth ring is largely dense summerwood. Resin passages are few, but the wood
is resinous, light brown in color, and the thick sapwood is nearly white. The needles are in
clusters of two, and are from four to six inches long. They begin to fall the second year.
Some of the trees retain their cones until death, but the seeds are scattered from year to
year. Under the stimulus of artificial conditions in the redwood districts this pine seems to
be spreading. Its seeds blow into vacant ground from which redwood has been removed,
and growth is prompt. The seedlings are not at all choice as to soil, but take root in cold
clay, in peat bogs, on barren sand and gravel, and on wind-swept ridges exposed to
ocean fogs. Its ability to grow where few other trees can maintain themselves holds out
some hope that its usefulness will increase.
Monterey Pine (Pinus radiata). This scarce and local species is restricted to the California
coast south of San Francisco, and to adjacent islands. Under favorable circumstances it
grows rapidly and promises to be of more importance as a lumber source in the future
than it has been in the past. It is, however, somewhat particular as to soil. It must have
ground not too wet or too dry. If these requirements are observed, it is a good tree for
planting. Its average height is seventy or ninety feet, diameter from eighteen to thirty
inches. Trunks six feet in diameter are occasionally heard of. The wood is light, soft,
moderately strong, tough, annual rings very wide and largely of springwood; color, light
brown, the very thick sapwood nearly white. The leaves are from four to six inches long,
in clusters of two and three, and fall the third year. Cones are from three to five inches
long. The lumber is too scarce at present to have much importance, but its quality is
good. In appearance it resembles wide-ringed loblolly pine, and appears to be suitable for
doors and sash, and frames for windows and doors. Its present uses are confined chiefly
to ranch timbers and fuel. If it ever amounts to much as a lumber resource, it will be as a
planted pine, and not in its natural state.
Jack Pine (Pinus divaricata) is a far northern species which extends its range southward in
the United States, from Maine to Minnesota, and reaches northern Indiana and Illinois. It
grows almost far enough north in the valley of Mackenzie river to catch the rays of the
midnight sun. It must necessarily adapt itself to circumstances. When these are favorable,
it develops a trunk up to two feet in diameter and seventy feet tall; but in adversity, it
degenerates into a many-branched shrub a few feet high. The average tree in the United
States is thirty or forty feet tall, and a foot or more in diameter. Its name is intended as a
term of contempt, which it does not deserve. Others call it scrub pine which is little better.
Its other names are more respectful, Prince’s pine in Ontario, black pine in Wisconsin and
Minnesota, cypress in Quebec and the Hudson Bay country, Sir Joseph Banks’ pine in
England, and juniper in some parts of Canada. “Chek pine” is frequently given in its list of
names, but the name is said to have originated in an attempt of a German botanist to
pronounce “Jack pine” in dictating to a stenographer. The tree straggles over landscapes
which otherwise would be treeless. It is often a ragged and uncouth specimen of the
vegetable kingdom, but that is when it is at its worst. At its best, as it may be seen where
cared for in some of the Michigan cemeteries, it is as handsome a tree as anyone could
desire. The characteristic thinness and delicacy of its foliage distinguish it at once from its
associates. The peculiar green of its soft, short needles wins admiration. The wood is
light, soft, not strong; annual rings are moderately wide, and are largely composed of
springwood. The thin bands of summerwood are resinous, and the small resin ducts are
few. The thick sapwood is nearly white, the heartwood brown or orange. It is not durable.
Jack pine can never be an important timber tree, because too small; but a considerable
amount is used for bed slats, nail kegs, plastering lath, barrel headings, boxes, mine
props, pulpwood, and fuel. Aside from its use as lumber and small manufactured
products, it has a value for other purposes. It can maintain its existence in waste sands;
and its usefulness is apparent in fixing drifting dunes along some of the exposed shores
of Lake Michigan and Lake Superior. It lives on dry sand and sends its roots several feet
to water; or, under circumstances entirely different, it thrives in swamps where the
watertable is little below the surface of the ground. It fights a brave battle against
adversities while it lasts, but it does not live long. Sixty years is old age for this tree. It
grows fast while young, but later it devotes all its energies to the mere process of living,
and its increase in size is slow, until at a period when most trees are still in early youth, it
dies of old age, and the northern winds quickly whip away its limbs, leaving the barkless
trunk to stand a few years longer.
LODGEPOLE PINE

Lodgepole Pine
LODGEPOLE PINE
(Pinus Contorta)

he common name of this tree was given it because its tall, slender,

T very light poles were used by Indians of the region in the

construction of their lodges. They selected poles fifteen feet long
and two inches in diameter, set them in a circle, bent the tops
together, tied them, and covered the frame with skins or bark. The
poles were peeled in early summer, when the Indians set out upon their
summer hunt, and were left to season until fall, when they were carried to
the winter’s camping place, probably fifty miles distant. Tamarack is a
common name for this pine in much of its range; it is likewise known as
black pine, spruce pine, and prickly pine. Its leaves are from one to two
inches long, in clusters of two. The small cones adhere to the branches
many years—sometimes as long as twenty—without releasing the seeds,
which are sealed within the cone by accumulated resin. The vitality of the
seeds is remarkable. They don’t lose their power of germination during
their long imprisonment.
The lodgepole pine has been called a fire tree, and the name is not
inappropriate. It profits by severe burning, as some other trees of the
United States do, such as paper birch and bird cherry. The sealed cones
are opened by fire, which softens the resin, and the seeds are liberated
after the fire has passed, and wing their flight wherever the wind carries
them. The passing fire may be severe enough to kill the parent tree
without destroying or bringing down the cones. The seeds soon fall on the
bared mineral soil, where they germinate by thousands. More than one
hundred thousand small seedling trees may occupy a single acre. Most of
them are ultimately crowded to death, but a thick stand results. Most
lodgepole pine forests occupy old burns. The tree is one of the slowest of
growers. It never reaches large size—possibly three feet is the limit. It is
very tall and slender. A hundred years will scarcely produce a sawlog of the
smallest size.
The range of this tree covers a million square miles from Alaska to New
Mexico, and to the Pacific coast. Its characters vary in different parts of its
range. A scrub form was once thought to be a different species, and was
called shore pine.
The wood is of about the same weight as eastern white pine. It is light in
color, rather weak, and brittle, annual rings very narrow, summerwood
small in amount, resin passages few and small; medullary rays numerous,
broad, and prominent. The wood is characterized by numerous small
knots. It is not durable in contact with the ground, but it readily receives
preservative treatment. In height it ranges from fifty to one hundred feet.
The government’s estimate of the stand of lodgepole pine in the United
States in 1909 placed it at 90,000,000,000 feet. That makes it seventh in
quantity among the timber trees of this country, those above it being
Douglas fir, the southern yellow pines (considered as one), western yellow
pine, redwood, western hemlock, and the red cedar of Washington,
Oregon, and Idaho.
Lodgepole pine has been long and widely used as a ranch timber in the Far
West, serving for poles and rails in fences, for sheds, barns, corrals, pens,
and small bridges. Where it could be had at all, it was generally plentiful.
Stock ranges high among the mountains frequently depend almost solely
upon lodgepole pine for necessary timber.
Mine operators find it a valuable resource. As props it is cheap, substantial,
and convenient in many parts of Colorado, New Mexico, Wyoming, and
Montana. A large proportion of this timber which is cut for mining purposes
has been standing dead from fire injury many years, and is thoroughly
seasoned and very light. It is in excellent condition for receiving
preservative treatment.
Sawmills do not list lodgepole pine separately in reports of lumber cut, and
it is impossible to determine what the annual supply from the species is. It
is well known that the quantity made into lumber in Colorado, Wyoming,
Montana, and Idaho is large. Its chief market is among the newly
established agricultural communities in those states. They use it for fruit
and vegetable shipping boxes, fencing plank, pickets, and plastering lath.
Railroads buy half a million lodgepole pine crossties yearly. When
creosoted, they resist decay many years. Lodgepole pine has been a tie
material since the first railroads entered the region, and while by no means
the best, it promises to fill a much more important place in the future than
in the past. It is an ideal fence post material as far as size and form are
concerned, and with preservative treatment it is bound to attain a high
place. It is claimed that treated posts will last twenty years, and that puts
them on a par with the cedars.
In Colorado and Wyoming much lodgepole was formerly burned for
charcoal to supply the furnaces which smelted ore and the blacksmith
shops of the region. This is done now less than formerly, since railroad
building has made coal and coke accessible.
In one respect, lodgepole pine is to the western mountains what loblolly
pine is to the flat country of the south Atlantic and other southern states.
It is aggressive, and takes possession of vacant ground. Although the
wood is not as valuable as loblolly, it is useful, and has an important place
to fill in the western country’s development. Its greatest drawback is its
exceedingly slow growth. A hundred years is a long time to wait for trees
of pole size. Two crops of loblolly sawlogs can be harvested in that time.
However, the land on which the lodgepole grows is fit only for timber, and
the acreage is so vast that there is enough to grow supplies, even with the
wait of a century or two for harvest. The stand has increased enormously
within historic time, the same as loblolly, and for a similar reason. Men
cleared land in the East, and loblolly took possession; fires destroyed
western forests of other species and lodgepole seized and held the burned
tracts.
If fires cease among the western mountains, as will probably be the case
under more efficient methods of patrol, and with stricter enforcement of
laws against starting fires, the spread of lodgepole pine will come to a
standstill, and existing forests will grow old without much extension of
their borders.
Jeffrey Pine (Pinus jeffreyi) is often classed as western yellow pine, both in
the forest and at the mill. Its range extends from southern Oregon to
Lower California, a distance of 1,000 miles, and its width east and west
varies from twenty to one hundred and fifty miles. It is a mountain tree
and generally occupies elevations above the western yellow pine. In the
North its range reaches 3,600 feet above sea level; in the extreme South it
is 10,000 feet. The darker and more deeply-furrowed bark of the Jeffrey
pine is the usual character by which lumbermen distinguish it from the
western yellow pine. It is known under several names, most of them
relating to the tree’s appearance, such as black pine, redbark pine,
blackbark pine, sapwood pine, and bull pine. It reaches the same size as
the western yellow pine, though the average is a little smaller. The leaves
are from four to nine inches long, and fall in eight or nine years. The cones
are large, and armed with slender, curved spines. The seeds are too heavy
to fly far, their wing area being small. It is a vigorous tree, and in some
regions it forms good forests. Some botanists have considered the Jeffrey
pine a variety of the western yellow pine.
Gray Pine (Pinus sabiniana), called also Digger pine because the Digger Indians formerly
collected the seeds, which are as large as peanuts, to help eke out a living, is confined to
California, and grows in a belt on the foothills surrounding the San Joaquin and
Sacramento valleys. Its cones are large and armed with hooked spines. When green, the
largest cones weigh three or four pounds. Leaves are from eight to twelve inches long, in
clusters of two and three, and fall the third and fourth years. The wood is remarkable for
the quickness of its decay in damp situations. It lasts one or two years as fence posts. A
mature gray pine is from fifty to seventy feet high, and eighteen to thirty inches in
diameter. Some trees are much larger. It is of considerable importance, but is not in the
same class as western yellow and sugar pine. The wood is light, soft, rather strong,
brittle. The annual rings are generally wide, indicating rapid growth. Very old gray pines
are not known. An age of 185 years seems to be the highest on record. The wood is
resinous, and it has helped in a small way to supply the Pacific coast markets with high-
grade turpentine, distilled from roots. It yields resin when boxed like the southern
longleaf pine. There are two flowing seasons. One is very early, and closes when the
weather becomes hot; the other is in full current by the middle of August. It maintains life
among the California foothills during the long rainless seasons, on ground so dry that
semi-desert chaparral sometimes succumbs; but it is able to make the most of favorable
conditions, and it grows rapidly under the slightest encouragement. The seedlings are
more numerous now than formerly, which is attributed to decrease of forest fires. The
tree has enemies which generally attack it in youth. Two fungi, Peridermium harknessi,
and Dædalia vorax, destroy the young tree’s leader or topmost shoot, causing the
development of a short trunk. The latter fungus is the same or is closely related to that
which tunnels the trunk of incense cedar and produces pecky cypress.
Gray pine has been cut to some extent for lumber, but its principal uses have been as fuel
and mine timbers. Many quartz mines have been located in the region where the tree
grows; and the engines which pumped the shafts and raised and crushed the ore were
often heated with this pine. Thousands of acres of hillsides in the vicinity of mines were
stripped of it, and it went to the engine house ricks in wagons, on sleds, and on the backs
of burros. In two respects it is an economical fuel for remote mines: it is light in weight,
and gives more heat than an equal quantity of the oak that is associated with it.
Chihuahua Pine (Pinus chihuahuana) is not abundant, but it exists in small commercial
quantities in southwestern New Mexico and southern Arizona. Trees are from fifty to
eighty feet high, and from fifteen to twenty inches in diameter. The wood is medium light,
soft, rather strong, brittle, narrow ringed and compact. The resin passages are few, large,
and conspicuous; color, clear light orange, the thick sapwood lighter. The tree reaches
best development at altitudes of from 5,000 to 7,000 feet. When the wood is used, it
serves the same purposes as western yellow pine; but the small size of the tree makes
lumber of large size impossible. The leaves are in clusters of three, and fall the fourth
year. The cones have long stalks and are from one and a half to two inches long.
Welcome to our website – the ideal destination for book lovers and
knowledge seekers. With a mission to inspire endlessly, we offer a
vast collection of books, ranging from classic literary works to
specialized publications, self-development books, and children's
literature. Each book is a new journey of discovery, expanding
knowledge and enriching the soul of the reade

Our website is not just a platform for buying books, but a bridge
connecting readers to the timeless values of culture and wisdom. With
an elegant, user-friendly interface and an intelligent search system,
we are committed to providing a quick and convenient shopping
experience. Additionally, our special promotions and home delivery
services ensure that you save time and fully enjoy the joy of reading.