Get Machine Learning Refined Foundations Algorithms and Applications Second Edition Borhani Free All Chapters
Get Machine Learning Refined Foundations Algorithms and Applications Second Edition Borhani Free All Chapters
com
https://textbookfull.com/product/machine-learning-
refined-foundations-algorithms-and-applications-
second-edition-borhani/
https://textbookfull.com/product/machine-learning-refined-foundations-
algorithms-and-applications-second-edition-jeremy-watt/
textbookfull.com
https://textbookfull.com/product/foundations-of-machine-learning-
second-edition-mehryar-mohri/
textbookfull.com
https://textbookfull.com/product/machine-learning-algorithms-for-
industrial-applications-santosh-kumar-das/
textbookfull.com
https://textbookfull.com/product/exhibiting-the-nazi-past-museum-
objects-between-the-material-and-the-immaterial-chloe-paver/
textbookfull.com
American Abolitionism: Its Direct Political Impact from
Colonial Times Into Reconstruction Stanley Harrold
https://textbookfull.com/product/american-abolitionism-its-direct-
political-impact-from-colonial-times-into-reconstruction-stanley-
harrold/
textbookfull.com
https://textbookfull.com/product/awakening-bharat-mata-the-political-
beliefs-of-the-indian-right-swapan-dasgupta/
textbookfull.com
https://textbookfull.com/product/robotics-vision-and-control-
fundamental-algorithms-in-matlab-peter-corke/
textbookfull.com
https://textbookfull.com/product/the-concise-laws-of-human-nature-
robert-greene/
textbookfull.com
Billmeyer and Saltzman’s Principles of Color Technology
4th Edition Roy S. Berns
https://textbookfull.com/product/billmeyer-and-saltzmans-principles-
of-color-technology-4th-edition-roy-s-berns/
textbookfull.com
Machine Learning Refined
With its intuitive yet rigorous approach to machine learning, this text provides students
with the fundamental knowledge and practical tools needed to conduct research and
build data-driven products. The authors prioritize geometric intuition and algorithmic
thinking, and include detail on all the essential mathematical prerequisites, to offer a
fresh and accessible way to learn. Practical applications are emphasized, with examples
from disciplines including computer vision, natural language processing, economics,
neuroscience, recommender systems, physics, and biology. Over 300 color illustra-
tions are included and have been meticulously designed to enable an intuitive grasp
of technical concepts, and over 100 in-depth coding exercisesPython
(in ) provide a
real understanding of crucial machine learning algorithms. A suite of online resources
including sample code, data sets, interactive lecture slides, and a solutions manual are
provided online, making this an ideal text both for graduate courses on machine learning
and for individual reference and self-study.
Jeremy Watt received his PhD in Electrical Engineering from Northwestern University,
and is now a machine learning consultant and educator. He teaches machine learning,
deep learning, mathematical optimization, and reinforcement learning at Northwestern
University.
Reza Borhani received his PhD in Electrical Engineering from Northwestern University,
and is now a machine learning consultant and educator. He teaches a variety of courses
in machine learning and deep learning at Northwestern University.
where he heads the Image and Video Processing Laboratory. He is a Fellow of IEEE,
SPIE, EURASIP, and OSA and the recipient of the IEEE Third Millennium Medal
(2000).
Machine Learning Refined
J E R E M Y W AT T
Northwestern University, Illinois
REZA BORHANI
Northwestern University, Illinois
A G G E L O S K . K AT S A G G E L O S
Northwestern University, Illinois
University Printing House, Cambridge CB2 8BS, United Kingdom
One Liberty Plaza, 20th Floor, New York, NY 10006, USA
477 Williamstown Road, Port Melbourne, VIC 3207, Australia
314–321, 3rd Floor, Plot 3, Splendor Forum, Jasola District Centre, New Delhi – 110025, India
79 Anson Road, #06–04/06, Singapore 079906
www.cambridge.org
Information on this title:
www.cambridge.org/9781108480727
DOI: 10.1017/9781108690935
© Cambridge University Press 2020
This publication is in copyright. Subject to statutory exception
and to the provisions of relevant collective licensing agreements,
no reproduction of any part may take place without the written
permission of Cambridge University Press.
First published 2020
Printed and bound in Great Britain by Clays Ltd, Elcograf S.p.A.
A catalogue record for this publication is available from the British Library.
ISBN 978-1-108-48072-7 Hardback
Additional resources for this publication www.cambridge.org/watt2
at
Cambridge University Press has no responsibility for the persistence or accuracy
of URLs for external or third-party internet websites referred to in this publication
and does not guarantee that any content on such websites is, or will remain,
accurate or appropriate.
To our families:
Preface pagexii
Acknowledgements xxii
1 Introduction to Machine Learning 1
1.1 Introduction 1
1.2 Distinguishing Cats from Dogs: a Machine Learning Approach 1
1.3 The Basic Taxonomy of Machine Learning Problems 6
1.4 Mathematical Optimization 16
1.5 Conclusion 18
Part I Mathematical Optimization 19
2 Zero-Order Optimization Techniques 21
2.1 Introduction 21
2.2 The Zero-Order Optimality Condition 23
2.3 Global Optimization Methods 24
2.4 Local Optimization Methods 27
2.5 Random Search 31
2.6 Coordinate Search and Descent 39
2.7 Conclusion 40
2.8 Exercises 42
3 First-Order Optimization Techniques 45
3.1 Introduction 45
3.2 The First-Order Optimality Condition 45
3.3 The Geometry of First-Order Taylor Series 52
3.4 Computing Gradients Efficiently 55
3.5 Gradient Descent 56
3.6 Two Natural Weaknesses of Gradient Descent 65
3.7 Conclusion 71
3.8 Exercises 71
4 Second-Order Optimization Techniques 75
4.1 The Second-Order Optimality Condition 75
viii Contents
ffi
11.4 Naive Cross-Validation 335
References 564
Index 569
Visit https://textbookfull.com
now to explore a rich
collection of eBooks, textbook
and enjoy exciting offers!
Preface
For eons we humans have sought out rules or patterns that accurately describe
how important systems in the world around us work, whether these systems
and ultimately, control it. However, the process of finding the ”right” rule that
seems to govern a given system has historically been no easy task. For most of
our history data (glimpses of a given system at work) has been an extremely
scarce commodity. Moreover, our ability to compute, to try out various rules
the range of phenomena scientific pioneers of the past could investigate and
inevitably forced them to use philosophical and /or visual approaches to rule-
finding. Today, however, we live in a world awash in data, and have colossal
great pioneers can tackle a much wider array of problems and take a much more
the topic of this textbook, is a term used to describe a broad (and growing)
In the past decade the user base of machine learning has grown dramatically.
matics departments the users of machine learning now include students and
of machine learning into its most fundamental components, and a curated re-
will most benefit this broadening audience of learners. It contains fresh and
Book Overview
The second edition of this text is a complete revision of our first endeavor, with
virtually every chapter of the original rewritten from the ground up and eight
new chapters of material added, doubling the size of the first edition. Topics from
All classification and Principal Component Analysis have been reworked and
polished. A swath of new topics have been added throughout the text, from
While heftier in size, the intent of our original attempt has remained un-
only the tuning of individual machine learning models (introduced in Part II)
in Chapters 3 and 4, respectively. More specifically this part of the text con-
vised and unsupervised learning in Chapter 10, where we introduce the motiva-
machine learning: fixed-shape kernels, neural networks, and trees, where we discuss
universal approximator.
To get the most out of this part of the book we strongly recommend that
Chapter 11 and the fundamental ideas therein are studied and understood before
of subjects that the readers will need to understand in order to make full use of
the text.
enhancements in various ways (producing e.g., the RMSProp and Adam first
/
tion to the derivative gradient, higher-order derivatives, the Hessian matrix,
/
cluding vector matrix arithmetic, the notions of spanning sets and orthogonality,
well as for more knowledgeable readers who yearn for a more intuitive and
serviceable treatment than what is currently available today. To make full use of
the text one needs only a basic understanding of vector algebra (mathematical
for navigating the text based on a variety of learning outcomes and university
topics – as described further under ”Instructors: How to use this Book” below).
We believe that intuitive leaps precede intellectual ones, and to this end defer
fresh and consistent geometric perspective throughout the text. We believe that
ual concepts in the text, but also that it helps establish revealing connections
between ideas often regarded as fundamentally distinct (e.g., the logistic re-
gression and Support Vector Machine classifiers, kernels and fully connected
cises, allowing them to ”get their hands dirty” and ”learn by doing,” practicing
the concepts introduced in the body of the text. While in principle any program-
ming language can be used to complete the text’s coding exercises, we highly
recommend using Python for its ease of use and large support community. We
also recommend using the open-source Python libraries NumPy, autograd, and
matplotlib, as well as the Jupyter notebook editor to make implementing and
testing code easier. A complete set of installation instructions, datasets, as well
https://github.com/jermwatt/machine_learning_refined
xvi Preface
at
https://github.com/jermwatt/machine_learning_refined
This site also contains instructions for installing Python as well as a number
of other free packages that students will find useful in completing the text’s
exercises.
This book has been used as a basis for a number of machine learning courses
optimization and deep learning for graduate students. With its treatment of
quarter-based programs and universities where a deep dive into the entirety
of the book is not feasible due to time constraints. Topics for such a course
on this text expands on the essentials course outlined above both in terms
Figure 0.2.
Preface xvii
optimization techniques from Part I of the text (as well as Appendix A) in-
All students in general, and those taking an optimization for machine learning
in identifying the ”right” nonlinearity via the processes of boosting and regular-
/
batch normalization, and foward backward mode of automatic di ff erentiation
– can also be covered. A recommended roadmap for such a course – including
0.3.
able for students who have had prior exposure to fundamental machine learning
concepts, and can begin with a discussion of appropriate first order optimiza-
of machine learning may be needed using selected portions of Part II of the text.
/
backpropagation and forward backward mode of automatic di fferentiation, as
well as special topics like batch normalization and early-stopping-based cross-
validation, can then be made using Chapters 11, 13 , and Appendices A and B of
ing – like convolutional and recurrent networks – can be found by visiting the
1 2 3 4 5
Machine Learning Taxonomy
1
1 2 3 4 5
2 Global/Local Optimization Curse of Dimensionality
1 2 3 4 5
3 Gradient Descent
1 2
5 Least Squares Linear Regression
1 2 3 5 6 8
6 Logistic Regression Cross Entropy/Softmax Cost SVMs
1 2 3 4 6
7 One-versus-All Multi-Class Logistic Regression
1 2 3 5
Principal Component Analysis K-means
8
2 7
Feature Engineering Feature Selection
9
1 2 4
Nonlinear Regression Nonlinear Classification
10
1 2 3 4 6 7 9
11 Universal Approximation Cross-Validation Regularization
Ensembling Bagging
1 2 3
Kernel Methods The Kernel Trick
12
1 2 4
Fully Connected Networks Backpropagation
13
1 2 3 4
14 Regression Trees Classification Trees
Figure 0.1 Recommended study roadmap for a course on the essentials of machine
learning, including requisite chapters (left column), sections (middle column), and
where machine learning is not the sole focus but a key component of some broader
course of study. Note that chapters are grouped together visually based on text layout
detailed under ”Book Overview” in the Preface. See the section titled ”Instructors: How
1 2 3 4 5
1 Machine Learning Taxonomy
1 2 3 4 5
Global/Local Optimization Curse of Dimensionality
2
1 2 3 4 5
3 Gradient Descent
1 2 3
4 Newton’s method
1 2 3 4 5 6
5 Least Squares Linear Regression Least Absolute Deviations
1 2 3 4 5 6 7 8 9 10
6 Logistic Regression Cross Entropy/Softmax Cost The Perceptron
1 2 3 4 5 6 7 8 9
7 One-versus-All Multi-Class Logistic Regression
1 2 3 4 5 6 7
PCA K-means Recommender Systems Matrix Factorization
8
1 2 3 6 7
Feature Engineering Feature Selection Boosting Regularization
9
1 2 3 4 5 6 7
Nonlinear Supervised Learning Nonlinear Unsupervised Learning
10
1 2 3 4 5 6 7 8 9 10 11 12
Universal Approximation Cross-Validation Regularization
11
Ensembling Bagging K-Fold Cross-Validation
1 2 3 4 5 6 7
Kernel Methods The Kernel Trick
12
1 2 3 4 5 6 7 8
Fully Connected Networks Backpropagation Activation Functions
13
Batch Normalization Early Stopping
1 2 3 4 5 6 7 8
14 Regression/Classification Trees Gradient Boosting Random Forests
Figure 0.2 Recommended study roadmap for a full treatment of standard machine
This plan entails a more in-depth coverage of machine learning topics compared to the
essentials roadmap given in Figure 0.1, and is best suited for senior undergraduate/early
the section titled ”Instructors: How To Use This Book” in the Preface for further details.
xx Preface
1 2 3 4 5
Machine Learning Taxonomy
1
1 2 3 4 5 6 7
2 Global/Local Optimization Curse of Dimensionality
1 2 3 4 5 6 7
3 Gradient Descent
1 2 3 4 5
Newton’s Method
4
6
8
Online Learning
7
8
3 4 5
Feature Scaling PCA-Sphering Missing Data Imputation
9
10
5 6
Regularization
11 Boosting
12
6
13 Batch Normalization
14
1 2 3 4 5 6 7 8
Momentum Acceleration Normalized Schemes: Adam, RMSProp
A
Fixed Lipschitz Steplength Rules Backtracking Line Search
1 2 3 4 5 6 7 8 9 10
Forward/Backward Mode of Automatic Differentiation
B
for machine learning and deep learning, including chapters, sections, as well as topics
to cover. See the section titled ”Instructors: How To Use This Book” in the Preface for
further details.
Preface xxi
2
1 2 3 4 5 6 7
3 Gradient Descent
1 2 3 4 5
10 Nonlinear Regression Nonlinear Classification Nonlinear Autoencoder
1 2 3 4 6
11 Universal Approximation Cross-Validation Regularization
12
1 2 3 4 5 6 7 8
13 Fully Connected Networks Backpropagation Activation Functions
14
1 2 3 4 5 6
A Momentum Acceleration Normalized Schemes: Adam, RMSProp
Stochastic/Mini-Batch Optimization
1 2 3 4 5 6 7 8 9 10
B Forward/Backward Mode of Automatic Differentiation
deep learning, including chapters, sections, as well as topics to cover. See the section
titled ”Instructors: How To Use This Book” in the Preface for further details.
Visit https://textbookfull.com
now to explore a rich
collection of eBooks, textbook
and enjoy exciting offers!
Acknowledgements
This text could not have been written in anything close to its current form
new ideas included in the second edition of this text that greatly improved it as
We are also very grateful for the many students over the years that provided
insightful feedback on the content of this text, with special thanks to Bowen
the work.
Finally, a big thanks to Mark McNess Rosengren and the entire Standing
Passengers crew for helping us stay ca ffeinated during the writing of this text.
1 Introduction to Machine
Learning
1.1 Introduction
Machine learning is a unified algorithmic framework designed to identify com-
putational models that accurately describe empirical data and the phenomena
underlying it, with little or no human involvement. While still a young dis-
cipline with much more awaiting discovery than is currently known, today
analytics (leveraged for sales and economic forecasting), to just name a few.
tures of cats from those with dogs. This will allow us to informally describe the
problem.
Do you recall how you first learned about the di ff erence between cats and
dogs, and how they are di ff erent animals? The answer is probably no, as most
humans learn to perform simple cognitive tasks like this very early on in the
course of their lives. One thing is certain, however: young children do not need
some kind of formal scientific training, or a zoological lecture on felis catus and
canis familiaris species, in order to be able to tell cats and dogs apart. Instead,
they learn by example. They are naturally presented with many images of
what they are told by a supervisor (a parent, a caregiver, etc.) are either cats
or dogs, until they fully grasp the two concepts. How do we know when a
child can successfully distinguish between cats and dogs? Intuitively, when
2 Introduction to Machine Learning
they encounter new (images of) cats and dogs, and can correctly identify each
new example or, in other words, when they can generalize what they have learned
Like human beings, computers can be taught how to perform this sort of task
distinguish between di ff erent types or classes of things (here cats and dogs) is
the diff erence between these two types of animals by learning from a batch of
examples, typically referred to as a training set of data. Figure 1.1 shows such a
training set consisting of a few images of di fferent cats and dogs. Intuitively, the
larger and more diverse the training set the better a computer (or human) can
Figure 1.1 A training set consisting of six images of cats (highlighted in blue) and six
images of dogs (highlighted in red). This set is used to train a machine learning model
that can distinguish between future images of cats and dogs. The images in this figure
2. Feature design. Think for a moment about how we (humans) tell the di ff erence
between images containing cats from those containing dogs. We use color, size,
/
the shape of the ears or nose, and or some combination of these features in order
to distinguish between the two. In other words, we do not just look at an image
as simply a collection of many small square pixels. We pick out grosser details,
or features, from images like these in order to identify what it is that we are
looking at. This is true for computers as well. In order to successfully train a
computer to perform this task (and any machine learning task more generally)
1.2 Distinguishing Cats from Dogs: a Machine Learning Approach 3
we need to provide it with properly designed features or, ideally, have it find or
Designing quality features is typically not a trivial task as it can be very ap-
plication dependent. For instance, a feature like color would be less helpful in
discriminating between cats and dogs (since many cats and dogs share similar
hair colors) than it would be in telling grizzly bears and polar bears apart! More-
over, extracting the features from a training dataset can also be challenging. For
example, if some of our training images were blurry or taken from a perspective
where we could not see the animal properly, the features we designed might
However, for the sake of simplicity with our toy problem here, suppose we
can easily extract the following two features from each image in the training set:
size of nose relative to the size of the head, ranging from small to large, and shape
Figure 1.2 Feature space representation of the training set shown in Figure 1.1 where
the horizontal and vertical axes represent the features nose size and ear shape,
respectively. The fact that the cats and dogs from our training set lie in distinct regions
Examining the training images shown in Figure 1.1 , we can see that all cats
have small noses and pointy ears, while dogs generally have large noses and
round ears. Notice that with the current choice of features each image can now
be represented by just two numbers: a number expressing the relative nose size,
and another number capturing the pointiness or roundness of the ears. In other
feature space where the features nose size and ear shape are the horizontal and
3. Model training. With our feature representation of the training data the
simple geometric one: have the machine find a line or a curve that separates
the cats from the dogs in our carefully designed feature space. Supposing for
simplicity that we use a line, we must find the right values for its two parameters
– a slope and vertical intercept – that define the line’s orientation in the feature
and the tuning of such a set of parameters to a training set is referred to as the
training of a model.
Figure 1.3 shows a trained linear model (in black) which divides the feature
space into cat and dog regions. This linear model provides a simple compu-
tational rule for distinguishing between cats and dogs: when the feature rep-
resentation of a future image lies above the line (in the blue region) it will be
considered a cat by the machine, and likewise any representation that falls below
Figure 1.3 A trained linear model (shown in black) provides a computational rule for
distinguishing between cats and dogs. Any new image received in the future will be
classified as a cat if its feature representation lies above this line (in the blue region), and
a dog if the feature representation lies below this line (in the red region).
Random documents with unrelated
content Scribd suggests to you:
PLEASE READ THIS BEFORE YOU DISTRIBUTE OR USE THIS WORK
1.D. The copyright laws of the place where you are located also
govern what you can do with this work. Copyright laws in most
countries are in a constant state of change. If you are outside
the United States, check the laws of your country in addition to
the terms of this agreement before downloading, copying,
displaying, performing, distributing or creating derivative works
based on this work or any other Project Gutenberg™ work. The
Foundation makes no representations concerning the copyright
status of any work in any country other than the United States.
1.E.6. You may convert to and distribute this work in any binary,
compressed, marked up, nonproprietary or proprietary form,
including any word processing or hypertext form. However, if
you provide access to or distribute copies of a Project
Gutenberg™ work in a format other than “Plain Vanilla ASCII” or
other format used in the official version posted on the official
Project Gutenberg™ website (www.gutenberg.org), you must, at
no additional cost, fee or expense to the user, provide a copy, a
means of exporting a copy, or a means of obtaining a copy upon
request, of the work in its original “Plain Vanilla ASCII” or other
form. Any alternate format must include the full Project
Gutenberg™ License as specified in paragraph 1.E.1.
• You pay a royalty fee of 20% of the gross profits you derive from
the use of Project Gutenberg™ works calculated using the
method you already use to calculate your applicable taxes. The
fee is owed to the owner of the Project Gutenberg™ trademark,
but he has agreed to donate royalties under this paragraph to
the Project Gutenberg Literary Archive Foundation. Royalty
payments must be paid within 60 days following each date on
which you prepare (or are legally required to prepare) your
periodic tax returns. Royalty payments should be clearly marked
as such and sent to the Project Gutenberg Literary Archive
Foundation at the address specified in Section 4, “Information
about donations to the Project Gutenberg Literary Archive
Foundation.”
• You comply with all other terms of this agreement for free
distribution of Project Gutenberg™ works.
1.F.
Most people start at our website which has the main PG search
facility: www.gutenberg.org.