100% found this document useful (3 votes)
63 views

Get Machine Learning Refined Foundations Algorithms and Applications Second Edition Borhani Free All Chapters

algorithms

Uploaded by

yuckyompii
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (3 votes)
63 views

Get Machine Learning Refined Foundations Algorithms and Applications Second Edition Borhani Free All Chapters

algorithms

Uploaded by

yuckyompii
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 42

Download the full version of the textbook now at textbookfull.

com

Machine learning refined foundations


algorithms and applications Second Edition
Borhani

https://textbookfull.com/product/machine-learning-
refined-foundations-algorithms-and-applications-
second-edition-borhani/

Explore and download more textbook at https://textbookfull.com


Recommended digital products (PDF, EPUB, MOBI) that
you can download immediately if you are interested.

Machine Learning Refined: Foundations, Algorithms, and


Applications Second Edition Jeremy Watt

https://textbookfull.com/product/machine-learning-refined-foundations-
algorithms-and-applications-second-edition-jeremy-watt/

textbookfull.com

Foundations of Machine Learning second edition Mehryar


Mohri

https://textbookfull.com/product/foundations-of-machine-learning-
second-edition-mehryar-mohri/

textbookfull.com

Machine Learning Algorithms for Industrial Applications


Santosh Kumar Das

https://textbookfull.com/product/machine-learning-algorithms-for-
industrial-applications-santosh-kumar-das/

textbookfull.com

Exhibiting the Nazi Past: Museum Objects Between the


Material and the Immaterial Chloe Paver

https://textbookfull.com/product/exhibiting-the-nazi-past-museum-
objects-between-the-material-and-the-immaterial-chloe-paver/

textbookfull.com
American Abolitionism: Its Direct Political Impact from
Colonial Times Into Reconstruction Stanley Harrold

https://textbookfull.com/product/american-abolitionism-its-direct-
political-impact-from-colonial-times-into-reconstruction-stanley-
harrold/
textbookfull.com

Awakening Bharat Mata: The Political Beliefs of the Indian


Right Swapan Dasgupta

https://textbookfull.com/product/awakening-bharat-mata-the-political-
beliefs-of-the-indian-right-swapan-dasgupta/

textbookfull.com

Machine Learning Q and AI: 30 Essential Questions and


Answers on Machine Learning and AI 1 / converted Edition
Sebastian Raschka
https://textbookfull.com/product/machine-learning-q-and-
ai-30-essential-questions-and-answers-on-machine-learning-and-
ai-1-converted-edition-sebastian-raschka/
textbookfull.com

Robotics Vision and Control Fundamental Algorithms In


MATLAB Peter Corke

https://textbookfull.com/product/robotics-vision-and-control-
fundamental-algorithms-in-matlab-peter-corke/

textbookfull.com

The Concise Laws of Human Nature Robert Greene

https://textbookfull.com/product/the-concise-laws-of-human-nature-
robert-greene/

textbookfull.com
Billmeyer and Saltzman’s Principles of Color Technology
4th Edition Roy S. Berns

https://textbookfull.com/product/billmeyer-and-saltzmans-principles-
of-color-technology-4th-edition-roy-s-berns/

textbookfull.com
Machine Learning Refined

With its intuitive yet rigorous approach to machine learning, this text provides students
with the fundamental knowledge and practical tools needed to conduct research and
build data-driven products. The authors prioritize geometric intuition and algorithmic
thinking, and include detail on all the essential mathematical prerequisites, to offer a
fresh and accessible way to learn. Practical applications are emphasized, with examples
from disciplines including computer vision, natural language processing, economics,
neuroscience, recommender systems, physics, and biology. Over 300 color illustra-
tions are included and have been meticulously designed to enable an intuitive grasp
of technical concepts, and over 100 in-depth coding exercisesPython
(in ) provide a
real understanding of crucial machine learning algorithms. A suite of online resources
including sample code, data sets, interactive lecture slides, and a solutions manual are
provided online, making this an ideal text both for graduate courses on machine learning
and for individual reference and self-study.

Jeremy Watt received his PhD in Electrical Engineering from Northwestern University,
and is now a machine learning consultant and educator. He teaches machine learning,
deep learning, mathematical optimization, and reinforcement learning at Northwestern
University.

Reza Borhani received his PhD in Electrical Engineering from Northwestern University,

and is now a machine learning consultant and educator. He teaches a variety of courses
in machine learning and deep learning at Northwestern University.

Aggelos K. Katsaggelos is the Joseph Cummings Professor at Northwestern University,

where he heads the Image and Video Processing Laboratory. He is a Fellow of IEEE,
SPIE, EURASIP, and OSA and the recipient of the IEEE Third Millennium Medal
(2000).
Machine Learning Refined

Foundations, Algorithms, and Applications

J E R E M Y W AT T
Northwestern University, Illinois

REZA BORHANI
Northwestern University, Illinois

A G G E L O S K . K AT S A G G E L O S
Northwestern University, Illinois
University Printing House, Cambridge CB2 8BS, United Kingdom
One Liberty Plaza, 20th Floor, New York, NY 10006, USA
477 Williamstown Road, Port Melbourne, VIC 3207, Australia
314–321, 3rd Floor, Plot 3, Splendor Forum, Jasola District Centre, New Delhi – 110025, India
79 Anson Road, #06–04/06, Singapore 079906

Cambridge University Press is part of the University of Cambridge.


It furthers the University’s mission by disseminating knowledge in the pursuit of
education, learning, and research at the highest international levels of excellence.

www.cambridge.org
Information on this title:
www.cambridge.org/9781108480727
DOI: 10.1017/9781108690935
© Cambridge University Press 2020
This publication is in copyright. Subject to statutory exception
and to the provisions of relevant collective licensing agreements,
no reproduction of any part may take place without the written
permission of Cambridge University Press.
First published 2020
Printed and bound in Great Britain by Clays Ltd, Elcograf S.p.A.
A catalogue record for this publication is available from the British Library.
ISBN 978-1-108-48072-7 Hardback
Additional resources for this publication www.cambridge.org/watt2
at
Cambridge University Press has no responsibility for the persistence or accuracy
of URLs for external or third-party internet websites referred to in this publication
and does not guarantee that any content on such websites is, or will remain,
accurate or appropriate.
To our families:

Deb, Robert, and Terri

Soheila, Ali, and Maryam

Ειρήνη Ζωή Σοφία


, , , and Ειρήνη
Contents

Preface pagexii
Acknowledgements xxii
1 Introduction to Machine Learning 1
1.1 Introduction 1
1.2 Distinguishing Cats from Dogs: a Machine Learning Approach 1
1.3 The Basic Taxonomy of Machine Learning Problems 6
1.4 Mathematical Optimization 16
1.5 Conclusion 18
Part I Mathematical Optimization 19
2 Zero-Order Optimization Techniques 21
2.1 Introduction 21
2.2 The Zero-Order Optimality Condition 23
2.3 Global Optimization Methods 24
2.4 Local Optimization Methods 27
2.5 Random Search 31
2.6 Coordinate Search and Descent 39
2.7 Conclusion 40
2.8 Exercises 42
3 First-Order Optimization Techniques 45
3.1 Introduction 45
3.2 The First-Order Optimality Condition 45
3.3 The Geometry of First-Order Taylor Series 52
3.4 Computing Gradients Efficiently 55
3.5 Gradient Descent 56
3.6 Two Natural Weaknesses of Gradient Descent 65
3.7 Conclusion 71
3.8 Exercises 71
4 Second-Order Optimization Techniques 75
4.1 The Second-Order Optimality Condition 75
viii Contents

4.2 The Geometry of Second-Order Taylor Series 78


4.3 Newton’s Method 81
4.4 Two Natural Weaknesses of Newton’s Method 90
4.5 Conclusion 91
4.6 Exercises 92
Part II Linear Learning 97
5 Linear Regression 99
5.1 Introduction 99
5.2 Least Squares Linear Regression 99
5.3 Least Absolute Deviations 108
5.4 Regression Quality Metrics 111
5.5 Weighted Regression 113
5.6 Multi-Output Regression 116
5.7 Conclusion 120
5.8 Exercises 121
5.9 Endnotes 124
6 Linear Two-Class Classification 125
6.1 Introduction 125
6.2 Logistic Regression and the Cross Entropy Cost 125
6.3 Logistic Regression and the Softmax Cost 135
6.4 The Perceptron 140
6.5 Support Vector Machines 150
6.6 Which Approach Produces the Best Results? 157
6.7 The Categorical Cross Entropy Cost 158
6.8 Classification Quality Metrics 160
6.9 Weighted Two-Class Classification 167
6.10 Conclusion 170
6.11 Exercises 171
7 Linear Multi-Class Classification 174
7.1 Introduction 174
7.2 One-versus-All Multi-Class Classification 174
7.3 Multi-Class Classification and the Perceptron 184
7.4 Which Approach Produces the Best Results? 192
7.5 The Categorical Cross Entropy Cost Function 193
7.6 Classification Quality Metrics 198
7.7 Weighted Multi-Class Classification 202
7.8 Stochastic and Mini-Batch Learning 203
7.9 Conclusion 205
7.10 Exercises 205
Contents ix

8 Linear Unsupervised Learning 208

8.1 Introduction 208

8.2 Fixed Spanning Sets, Orthonormality, and Projections 208

8.3 The Linear Autoencoder and Principal Component Analysis 213

8.4 Recommender Systems 219

8.5 K-Means Clustering 221

8.6 General Matrix Factorization Techniques 227

8.7 Conclusion 230

8.8 Exercises 231

8.9 Endnotes 233

9 Feature Engineering and Selection 237

9.1 Introduction 237

9.2 Histogram Features 238

9.3 Feature Scaling via Standard Normalization 249

9.4 Imputing Missing Values in a Dataset 254

9.5 Feature Scaling via PCA-Sphering 255

9.6 Feature Selection via Boosting 258

9.7 Feature Selection via Regularization 264

9.8 Conclusion 269

9.9 Exercises 269

Part III Nonlinear Learning 273

10 Principles of Nonlinear Feature Engineering 275

10.1 Introduction 275

10.2 Nonlinear Regression 275

10.3 Nonlinear Multi-Output Regression 282

10.4 Nonlinear Two-Class Classification 286

10.5 Nonlinear Multi-Class Classification 290

10.6 Nonlinear Unsupervised Learning 294

10.7 Conclusion 298

10.8 Exercises 298

11 Principles of Feature Learning 304

11.1 Introduction 304

11.2 Universal Approximators 307

11.3 Universal Approximation of Real Data 323


11.4 Naive Cross-Validation 335

11.5 E cient Cross-Validation via Boosting 340

11.6 Effi cient Cross-Validation via Regularization 350

11.7 Testing Data 361

11.8 Which Universal Approximator Works Best in Practice? 365

11.9 Bagging Cross-Validated Models 366


x Contents

11.10 K-Fold Cross-Validation 373


11.11 When Feature Learning Fails 378
11.12 Conclusion 379
11.13 Exercises 380
12 Kernel Methods 383
12.1 Introduction 383
12.2 Fixed-Shape Universal Approximators 383
12.3 The Kernel Trick 386
12.4 Kernels as Measures of Similarity 396
12.5 Optimization of Kernelized Models 397
12.6 Cross-Validating Kernelized Learners 398
12.7 Conclusion 399
12.8 Exercises 399
13 Fully Connected Neural Networks 403
13.1 Introduction 403
13.2 Fully Connected Neural Networks 403
13.3 Activation Functions 424
13.4 The Backpropagation Algorithm 427
13.5 Optimization of Neural Network Models 428
13.6 Batch Normalization 430
13.7 Cross-Validation via Early Stopping 438
13.8 Conclusion 440
13.9 Exercises 441
14 Tree-Based Learners 443
14.1 Introduction 443
14.2 From Stumps to Deep Trees 443
14.3 Regression Trees 446
14.4 Classification Trees 452
14.5 Gradient Boosting 458
14.6 Random Forests 462
14.7 Cross-Validation Techniques for Recursively Defined Trees 464
14.8 Conclusion 467
14.9 Exercises 467
Part IV Appendices 471
Appendix A Advanced First- and Second-Order Optimization Methods 473
A.1 Introduction 473
A.2 Momentum-Accelerated Gradient Descent 473
A.3 Normalized Gradient Descent 478
A.4 Advanced Gradient-Based Methods 485
Contents xi

A.5 Mini-Batch Optimization 487

A.6 Conservative Steplength Rules 490

A.7 Newton’s Method, Regularization, and Nonconvex Functions 499

A.8 Hessian-Free Methods 502

Appendix B Derivatives and Automatic Differentiation 511

B.1 Introduction 511

B.2 The Derivative 511

B.3 Derivative Rules for Elementary Functions and Operations 514

B.4 The Gradient 516

B.5 The Computation Graph 517

B.6 The Forward Mode of Automatic Di fferentiation 520

B.7 The Reverse Mode of Automatic Differentiation 526

B.8 Higher-Order Derivatives 529

B.9 Taylor Series 531

B.10 Using the autograd Library 536

Appendix C Linear Algebra 546

C.1 Introduction 546

C.2 Vectors and Vector Operations 546

C.3 Matrices and Matrix Operations 553

C.4 Eigenvalues and Eigenvectors 556

C.5 Vector and Matrix Norms 559

References 564

Index 569
Visit https://textbookfull.com
now to explore a rich
collection of eBooks, textbook
and enjoy exciting offers!
Preface

For eons we humans have sought out rules or patterns that accurately describe

how important systems in the world around us work, whether these systems

be agricultural, biological, physical, financial, etc. We do this because such rules

allow us to understand a system better, accurately predict its future behavior

and ultimately, control it. However, the process of finding the ”right” rule that

seems to govern a given system has historically been no easy task. For most of

our history data (glimpses of a given system at work) has been an extremely

scarce commodity. Moreover, our ability to compute, to try out various rules

to see which most accurately represents a phenomenon, has been limited to

what we could accomplish by hand. Both of these factors naturally limited

the range of phenomena scientific pioneers of the past could investigate and

inevitably forced them to use philosophical and /or visual approaches to rule-

finding. Today, however, we live in a world awash in data, and have colossal

computing power at our fingertips. Because of this, we lucky descendants of the

great pioneers can tackle a much wider array of problems and take a much more

empirical approach to rule-finding than our forbears could. Machine learning,

the topic of this textbook, is a term used to describe a broad (and growing)

collection of pattern-finding algorithms designed to properly identify system

rules empirically and by leveraging our access to potentially enormous amounts

of data and computing power.

In the past decade the user base of machine learning has grown dramatically.

From a relatively small circle in computer science, engineering, and mathe-

matics departments the users of machine learning now include students and

researchers from every corner of the academic universe, as well as members of

industry, data scientists, entrepreneurs, and machine learning enthusiasts. This

textbook is the result of a complete tearing down of the standard curriculum

of machine learning into its most fundamental components, and a curated re-

assembly of those pieces (painstakingly polished and organized) that we feel

will most benefit this broadening audience of learners. It contains fresh and

intuitive yet rigorous descriptions of the most fundamental concepts necessary

to conduct research, build products, and tinker.


Preface xiii

Book Overview
The second edition of this text is a complete revision of our first endeavor, with

virtually every chapter of the original rewritten from the ground up and eight

new chapters of material added, doubling the size of the first edition. Topics from

the first edition, from expositions on gradient descent to those on One-versus-

All classification and Principal Component Analysis have been reworked and

polished. A swath of new topics have been added throughout the text, from

derivative-free optimization to weighted supervised learning, feature selection,

nonlinear feature engineering, boosting-based cross-validation, and more.

While heftier in size, the intent of our original attempt has remained un-

changed: to explain machine learning, from first principles to practical imple-

mentation, in the simplest possible terms. A big-picture breakdown of the second

edition text follows below.

Part I: Mathematical Optimization (Chapters 2–4)


Mathematical optimization is the workhorse of machine learning, powering not

only the tuning of individual machine learning models (introduced in Part II)

but also the framework by which we determine appropriate models themselves

via cross-validation (discussed in Part III of the text).

In this first part of the text we provide a complete introduction to mathemat-

ical optimization, from basic zero-order (derivative-free) methods detailed in

Chapter 2 to fundamental and advanced first-order and second-order methods

in Chapters 3 and 4, respectively. More specifically this part of the text con-

tains complete descriptions of local optimization, random search methodologies,

gradient descent, and Newton’s method.

Part II: Linear Learning (Chapters 5–9)


In this part of the text we describe the fundamental components of cost function

based machine learning, with an emphasis on linear models.

This includes a complete description of supervised learning in Chapters 5–7

including linear regression, two-class, and multi-class classification. In each of

these chapters we describe a range of perspectives and popular design choices

made when building supervised learners.

In Chapter 8 we similarly describe unsupervised learning, and Chapter 9 con-

tains an introduction to fundamental feature engineering practices including pop-

ular histogram features as well as various input normalization schemes, and

feature selection paradigms.


xiv Preface

Part III: Nonlinear Learning (Chapters 10–14)


In the final part of the text we extend the fundamental paradigms introduced in

Part II to the general nonlinear setting.

We do this carefully beginning with a basic introduction to nonlinear super-

vised and unsupervised learning in Chapter 10, where we introduce the motiva-

tion, common terminology, and notation of nonlinear learning used throughout

the remainder of the text.

In Chapter 11 we discuss how to automate the selection of appropriate non-

linear models, beginning with an introduction to universal approximation. This

naturally leads to detailed descriptions of cross-validation, as well as boosting,

regularization, ensembling, and K-folds cross-validation.

With these fundamental ideas in-hand, in Chapters 12–14 we then dedicate an

individual chapter to each of the three popular universal approximators used in

machine learning: fixed-shape kernels, neural networks, and trees, where we discuss

the strengths, weaknesses, technical eccentricities, and usages of each popular

universal approximator.

To get the most out of this part of the book we strongly recommend that

Chapter 11 and the fundamental ideas therein are studied and understood before

moving on to Chapters 12–14.

Part IV: Appendices


This shorter set of appendix chapters provides a complete treatment on ad-

vanced optimization techniques, as well as a thorough introduction to a range

of subjects that the readers will need to understand in order to make full use of

the text.

Appendix A continues our discussion from Chapters 3 and 4, and describes

advanced first- and second-order optimization techniques. This includes a discussion

of popular extensions of gradient descent, including mini-batch optimization,

momentum acceleration, gradient normalization, and the result of combining these

enhancements in various ways (producing e.g., the RMSProp and Adam first

order algorithms) – and Newton’s method – including regularization schemes

and Hessian-free methods.

Appendix B contains a tour of computational calculus including an introduc-

/
tion to the derivative gradient, higher-order derivatives, the Hessian matrix,

numerical di fferentiation, forward and backward (backpropogation) automatic


di fferentiation, and Taylor series approximations.
Appendix C provides a suitable background in linear and matrix algebra , in-

/
cluding vector matrix arithmetic, the notions of spanning sets and orthogonality,

as well as eigenvalues and eigenvectors.


Preface xv

Readers: How To Use This Book


This textbook was written with first-time learners of the subject in mind, as

well as for more knowledgeable readers who yearn for a more intuitive and

serviceable treatment than what is currently available today. To make full use of

the text one needs only a basic understanding of vector algebra (mathematical

functions, vector arithmetic, etc.) and computer programming (for example,

basic proficiency with a dynamically typed language like Python). We provide


complete introductory treatments of other prerequisite topics including linear

algebra, vector calculus, and automatic di ff erentiation in the appendices of the


text. Example ”roadmaps,” shown in Figures 0.1–0.4, provide suggested paths

for navigating the text based on a variety of learning outcomes and university

courses (ranging from a course on the essentials of machine learning to special

topics – as described further under ”Instructors: How to use this Book” below).

We believe that intuitive leaps precede intellectual ones, and to this end defer

the use of probabilistic and statistical views of machine learning in favor of a

fresh and consistent geometric perspective throughout the text. We believe that

this perspective not only permits a more intuitive understanding of individ-

ual concepts in the text, but also that it helps establish revealing connections

between ideas often regarded as fundamentally distinct (e.g., the logistic re-

gression and Support Vector Machine classifiers, kernels and fully connected

neural networks, etc.). We also highly emphasize the importance of mathemati-

cal optimization in our treatment of machine learning. As detailed in the ”Book

Overview” section above, optimization is the workhorse of machine learning

and is fundamental at many levels – from the tuning of individual models to

the general selection of appropriate nonlinearities via cross-validation. Because

of this a strong understanding of mathematical optimization is requisite if one

wishes to deeply understand machine learning, and if one wishes to be able to

implement fundamental algorithms.

To this end, we place significant emphasis on the design and implementa-

tion of algorithms throughout the text with implementations of fundamental

algorithms given in Python. These fundamental examples can then be used as


building blocks for the reader to help complete the text’s programming exer-

cises, allowing them to ”get their hands dirty” and ”learn by doing,” practicing

the concepts introduced in the body of the text. While in principle any program-

ming language can be used to complete the text’s coding exercises, we highly

recommend using Python for its ease of use and large support community. We
also recommend using the open-source Python libraries NumPy, autograd, and
matplotlib, as well as the Jupyter notebook editor to make implementing and
testing code easier. A complete set of installation instructions, datasets, as well

as starter notebooks for many exercises can be found at

https://github.com/jermwatt/machine_learning_refined
xvi Preface

Instructors: How To Use This Book


Chapter slides associated with this textbook, datasets, along with a large array of

instructional interactive Python widgets illustrating various concepts through-


out the text, can be found on the github repository accompanying this textbook

at

https://github.com/jermwatt/machine_learning_refined
This site also contains instructions for installing Python as well as a number

of other free packages that students will find useful in completing the text’s

exercises.

This book has been used as a basis for a number of machine learning courses

at Northwestern University, ranging from introductory courses suitable for un-

dergraduate students to more advanced courses on special topics focusing on

optimization and deep learning for graduate students. With its treatment of

foundations, applications, and algorithms this text can be used as a primary

resource or in fundamental component for courses such as the following.

Machine learning essentials treatment : an introduction to the essentials

of machine learning is ideal for undergraduate students, especially those in

quarter-based programs and universities where a deep dive into the entirety

of the book is not feasible due to time constraints. Topics for such a course

can include: gradient descent, logistic regression, Support Vector Machines,

One-versus-All and multi-class logistic regression, Principal Component Anal-

ysis, K-means clustering, the essentials of feature engineering and selection,

cross-validation, regularization, ensembling, bagging, kernel methods, fully

connected neural networks, and trees. A recommended roadmap for such a

course – including recommended chapters, sections, and corresponding topics

– is shown in Figure 0.1.

Machine learning full treatment: a standard machine learning course based

on this text expands on the essentials course outlined above both in terms

of breadth and depth. In addition to the topics mentioned in the essentials

course, instructors may choose to cover Newton’s method, Least Absolute

Deviations, multi-output regression, weighted regression, the Perceptron, the

Categorical Cross Entropy cost, weighted two-class and multi-class classifica-

tion, online learning, recommender systems, matrix factorization techniques,

boosting-based feature selection, universal approximation, gradient boosting,

random forests, as well as a more in-depth treatment of fully connected neu-

ral networks involving topics such as batch normalization and early-stopping-

based regularization. A recommended roadmap for such a course – including

recommended chapters, sections, and corresponding topics – is illustrated in

Figure 0.2.
Preface xvii

Mathematical optimization for machine learning and deep learning: such

a course entails a comprehensive description of zero-, first-, and second-order

optimization techniques from Part I of the text (as well as Appendix A) in-

cluding: coordinate descent, gradient descent, Newton’s method, quasi-Newton

methods, stochastic optimization, momentum acceleration, fixed and adaptive

steplength rules, as well as advanced normalized gradient descent schemes

(e.g., Adam and RMSProp). These can be followed by an in-depth description

of the feature engineering processes (especially standard normalization and

PCA-sphering) that speed up (particularly first-order) optimization algorithms.

All students in general, and those taking an optimization for machine learning

course in particular, should appreciate the fundamental role optimization plays

in identifying the ”right” nonlinearity via the processes of boosting and regular-

iziation based cross-validation, the principles of which are covered in Chapter

11. Select topics from Chapter 13 and Appendix B – including backpropagation,

/
batch normalization, and foward backward mode of automatic di ff erentiation
– can also be covered. A recommended roadmap for such a course – including

recommended chapters, sections, and corresponding topics – is given in Figure

0.3.

Introductory portion of a course on deep learning : such a course is best suit-

able for students who have had prior exposure to fundamental machine learning

concepts, and can begin with a discussion of appropriate first order optimiza-

tion techniques, with an emphasis on stochastic and mini-batch optimization,

momentum acceleration, and normalized gradient schemes such as Adam and

RMSProp. Depending on the audience, a brief review of fundamental elements

of machine learning may be needed using selected portions of Part II of the text.

A complete discussion of fully connected networks, including a discussion of

/
backpropagation and forward backward mode of automatic di fferentiation, as
well as special topics like batch normalization and early-stopping-based cross-

validation, can then be made using Chapters 11, 13 , and Appendices A and B of

the text. A recommended roadmap for such a course – including recommended

chapters, sections, and corresponding topics – is shown in Figure 0.4. Additional

recommended resources on topics to complete a standard course on deep learn-

ing – like convolutional and recurrent networks – can be found by visiting the

text’s github repository.


xviii Preface

CHAPTER SECTIONS TOPICS

1 2 3 4 5
Machine Learning Taxonomy
1

1 2 3 4 5
2 Global/Local Optimization Curse of Dimensionality

1 2 3 4 5
3 Gradient Descent

1 2
5 Least Squares Linear Regression

1 2 3 5 6 8
6 Logistic Regression Cross Entropy/Softmax Cost SVMs

1 2 3 4 6
7 One-versus-All Multi-Class Logistic Regression

1 2 3 5
Principal Component Analysis K-means
8

2 7
Feature Engineering Feature Selection
9

1 2 4
Nonlinear Regression Nonlinear Classification
10

1 2 3 4 6 7 9
11 Universal Approximation Cross-Validation Regularization

Ensembling Bagging

1 2 3
Kernel Methods The Kernel Trick
12

1 2 4
Fully Connected Networks Backpropagation
13

1 2 3 4
14 Regression Trees Classification Trees

Figure 0.1 Recommended study roadmap for a course on the essentials of machine

learning, including requisite chapters (left column), sections (middle column), and

corresponding topics (right column). This essentials plan is suitable for

time-constrained courses (in quarter-based programs and universities) or self-study, or

where machine learning is not the sole focus but a key component of some broader

course of study. Note that chapters are grouped together visually based on text layout

detailed under ”Book Overview” in the Preface. See the section titled ”Instructors: How

To Use This Book” in the Preface for further details.


Preface xix

CHAPTER SECTIONS TOPICS

1 2 3 4 5
1 Machine Learning Taxonomy

1 2 3 4 5
Global/Local Optimization Curse of Dimensionality
2
1 2 3 4 5
3 Gradient Descent

1 2 3
4 Newton’s method

1 2 3 4 5 6
5 Least Squares Linear Regression Least Absolute Deviations

Multi-Output Regression Weighted Regression

1 2 3 4 5 6 7 8 9 10
6 Logistic Regression Cross Entropy/Softmax Cost The Perceptron

SVMs Categorical Cross Entropy Weighted Two-Class Classification

1 2 3 4 5 6 7 8 9
7 One-versus-All Multi-Class Logistic Regression

Weighted Multi-Class Classification Online Learning

1 2 3 4 5 6 7
PCA K-means Recommender Systems Matrix Factorization
8
1 2 3 6 7
Feature Engineering Feature Selection Boosting Regularization
9

1 2 3 4 5 6 7
Nonlinear Supervised Learning Nonlinear Unsupervised Learning
10

1 2 3 4 5 6 7 8 9 10 11 12
Universal Approximation Cross-Validation Regularization
11
Ensembling Bagging K-Fold Cross-Validation

1 2 3 4 5 6 7
Kernel Methods The Kernel Trick
12
1 2 3 4 5 6 7 8
Fully Connected Networks Backpropagation Activation Functions
13
Batch Normalization Early Stopping

1 2 3 4 5 6 7 8
14 Regression/Classification Trees Gradient Boosting Random Forests

Figure 0.2 Recommended study roadmap for a full treatment of standard machine

learning subjects, including chapters, sections, as well as corresponding topics to cover.

This plan entails a more in-depth coverage of machine learning topics compared to the

essentials roadmap given in Figure 0.1, and is best suited for senior undergraduate/early

graduate students in semester-based programs and passionate independent readers. See

the section titled ”Instructors: How To Use This Book” in the Preface for further details.
xx Preface

CHAPTER SECTIONS TOPICS

1 2 3 4 5
Machine Learning Taxonomy
1

1 2 3 4 5 6 7
2 Global/Local Optimization Curse of Dimensionality

Random Search Coordinate Descent

1 2 3 4 5 6 7
3 Gradient Descent

1 2 3 4 5
Newton’s Method
4

6
8
Online Learning
7

8
3 4 5
Feature Scaling PCA-Sphering Missing Data Imputation
9

10
5 6
Regularization
11 Boosting

12
6
13 Batch Normalization

14

1 2 3 4 5 6 7 8
Momentum Acceleration Normalized Schemes: Adam, RMSProp
A
Fixed Lipschitz Steplength Rules Backtracking Line Search

Stochastic/Mini-Batch Optimization Hessian-Free Optimization

1 2 3 4 5 6 7 8 9 10
Forward/Backward Mode of Automatic Differentiation
B

Figure 0.3 Recommended study roadmap for a course on mathematical optimization

for machine learning and deep learning, including chapters, sections, as well as topics

to cover. See the section titled ”Instructors: How To Use This Book” in the Preface for

further details.
Preface xxi

CHAPTER SECTIONS TOPICS

2
1 2 3 4 5 6 7
3 Gradient Descent

1 2 3 4 5
10 Nonlinear Regression Nonlinear Classification Nonlinear Autoencoder

1 2 3 4 6
11 Universal Approximation Cross-Validation Regularization

12
1 2 3 4 5 6 7 8
13 Fully Connected Networks Backpropagation Activation Functions

Batch Normalization Early Stopping

14

1 2 3 4 5 6
A Momentum Acceleration Normalized Schemes: Adam, RMSProp

Fixed Lipschitz Steplength Rules Backtracking Line Search

Stochastic/Mini-Batch Optimization

1 2 3 4 5 6 7 8 9 10
B Forward/Backward Mode of Automatic Differentiation

Figure 0.4 Recommended study roadmap for an introductory portion of a course on

deep learning, including chapters, sections, as well as topics to cover. See the section

titled ”Instructors: How To Use This Book” in the Preface for further details.
Visit https://textbookfull.com
now to explore a rich
collection of eBooks, textbook
and enjoy exciting offers!
Acknowledgements

This text could not have been written in anything close to its current form

without the enormous work of countless genius-angels in the Python open-


source community, particularly authors and contributers of NumPy, Jupyter,
and matplotlib. We are especially grateful to the authors and contributors of

autograd including Dougal Maclaurin, David Duvenaud, Matt Johnson, and


Jamie Townsend, as autograd allowed us to experiment and iterate on a host of

new ideas included in the second edition of this text that greatly improved it as

well as, we hope, the learning experience for its readers.

We are also very grateful for the many students over the years that provided

insightful feedback on the content of this text, with special thanks to Bowen

Tian who provided copious amounts of insightful feedback on early drafts of

the work.

Finally, a big thanks to Mark McNess Rosengren and the entire Standing

Passengers crew for helping us stay ca ffeinated during the writing of this text.
1 Introduction to Machine
Learning

1.1 Introduction
Machine learning is a unified algorithmic framework designed to identify com-

putational models that accurately describe empirical data and the phenomena

underlying it, with little or no human involvement. While still a young dis-

cipline with much more awaiting discovery than is currently known, today

machine learning can be used to teach computers to perform a wide array

of useful tasks including automatic detection of objects in images (a crucial

component of driver-assisted and self-driving cars), speech recognition (which

powers voice command technology), knowledge discovery in the medical sci-

ences (used to improve our understanding of complex diseases), and predictive

analytics (leveraged for sales and economic forecasting), to just name a few.

In this chapter we give a high-level introduction to the field of machine

learning as well as the contents of this textbook.

1.2 Distinguishing Cats from Dogs: a Machine Learning


Approach
To get a big-picture sense of how machine learning works, we begin by dis-

cussing a toy problem: teaching a computer how to distinguish between pic-

tures of cats from those with dogs. This will allow us to informally describe the

terminology and procedures involved in solving the typical machine learning

problem.

Do you recall how you first learned about the di ff erence between cats and
dogs, and how they are di ff erent animals? The answer is probably no, as most
humans learn to perform simple cognitive tasks like this very early on in the

course of their lives. One thing is certain, however: young children do not need

some kind of formal scientific training, or a zoological lecture on felis catus and

canis familiaris species, in order to be able to tell cats and dogs apart. Instead,

they learn by example. They are naturally presented with many images of

what they are told by a supervisor (a parent, a caregiver, etc.) are either cats

or dogs, until they fully grasp the two concepts. How do we know when a

child can successfully distinguish between cats and dogs? Intuitively, when
2 Introduction to Machine Learning

they encounter new (images of) cats and dogs, and can correctly identify each

new example or, in other words, when they can generalize what they have learned

to new, previously unseen, examples.

Like human beings, computers can be taught how to perform this sort of task

in a similar manner. This kind of task where we aim to teach a computer to

distinguish between di ff erent types or classes of things (here cats and dogs) is

referred to as a classification problem in the jargon of machine learning, and is

done through a series of steps which we detail below.

1. Data collection. Like human beings, a computer must be trained to recognize

the diff erence between these two types of animals by learning from a batch of
examples, typically referred to as a training set of data. Figure 1.1 shows such a

training set consisting of a few images of di fferent cats and dogs. Intuitively, the
larger and more diverse the training set the better a computer (or human) can

perform a learning task, since exposure to a wider breadth of examples gives

the learner more experience.

Figure 1.1 A training set consisting of six images of cats (highlighted in blue) and six

images of dogs (highlighted in red). This set is used to train a machine learning model

that can distinguish between future images of cats and dogs. The images in this figure

were taken from [1].

2. Feature design. Think for a moment about how we (humans) tell the di ff erence
between images containing cats from those containing dogs. We use color, size,

/
the shape of the ears or nose, and or some combination of these features in order

to distinguish between the two. In other words, we do not just look at an image

as simply a collection of many small square pixels. We pick out grosser details,

or features, from images like these in order to identify what it is that we are

looking at. This is true for computers as well. In order to successfully train a

computer to perform this task (and any machine learning task more generally)
1.2 Distinguishing Cats from Dogs: a Machine Learning Approach 3

we need to provide it with properly designed features or, ideally, have it find or

learn such features itself.

Designing quality features is typically not a trivial task as it can be very ap-

plication dependent. For instance, a feature like color would be less helpful in

discriminating between cats and dogs (since many cats and dogs share similar

hair colors) than it would be in telling grizzly bears and polar bears apart! More-

over, extracting the features from a training dataset can also be challenging. For

example, if some of our training images were blurry or taken from a perspective

where we could not see the animal properly, the features we designed might

not be properly extracted.

However, for the sake of simplicity with our toy problem here, suppose we

can easily extract the following two features from each image in the training set:

size of nose relative to the size of the head, ranging from small to large, and shape

of ears, ranging from round to pointy.


pointy
ear shape
round

small nose size large

Figure 1.2 Feature space representation of the training set shown in Figure 1.1 where

the horizontal and vertical axes represent the features nose size and ear shape,

respectively. The fact that the cats and dogs from our training set lie in distinct regions

of the feature space reflects a good choice of features.

Examining the training images shown in Figure 1.1 , we can see that all cats

have small noses and pointy ears, while dogs generally have large noses and

round ears. Notice that with the current choice of features each image can now

be represented by just two numbers: a number expressing the relative nose size,

and another number capturing the pointiness or roundness of the ears. In other

words, we can represent each image in our training set in a two-dimensional


4 Introduction to Machine Learning

feature space where the features nose size and ear shape are the horizontal and

vertical coordinate axes, respectively, as illustrated in Figure 1.2.

3. Model training. With our feature representation of the training data the

machine learning problem of distinguishing between cats and dogs is now a

simple geometric one: have the machine find a line or a curve that separates

the cats from the dogs in our carefully designed feature space. Supposing for

simplicity that we use a line, we must find the right values for its two parameters

– a slope and vertical intercept – that define the line’s orientation in the feature

space. The process of determining proper parameters relies on a set of tools

known as mathematical optimization detailed in Chapters 2 through 4 of this text,

and the tuning of such a set of parameters to a training set is referred to as the

training of a model.

Figure 1.3 shows a trained linear model (in black) which divides the feature

space into cat and dog regions. This linear model provides a simple compu-

tational rule for distinguishing between cats and dogs: when the feature rep-

resentation of a future image lies above the line (in the blue region) it will be

considered a cat by the machine, and likewise any representation that falls below

the line (in the red region) will be considered a dog.


pointy
ear shape
round

small nose size large

Figure 1.3 A trained linear model (shown in black) provides a computational rule for

distinguishing between cats and dogs. Any new image received in the future will be

classified as a cat if its feature representation lies above this line (in the blue region), and

a dog if the feature representation lies below this line (in the red region).
Random documents with unrelated
content Scribd suggests to you:
PLEASE READ THIS BEFORE YOU DISTRIBUTE OR USE THIS WORK

To protect the Project Gutenberg™ mission of promoting the


free distribution of electronic works, by using or distributing this
work (or any other work associated in any way with the phrase
“Project Gutenberg”), you agree to comply with all the terms of
the Full Project Gutenberg™ License available with this file or
online at www.gutenberg.org/license.

Section 1. General Terms of Use and


Redistributing Project Gutenberg™
electronic works
1.A. By reading or using any part of this Project Gutenberg™
electronic work, you indicate that you have read, understand,
agree to and accept all the terms of this license and intellectual
property (trademark/copyright) agreement. If you do not agree to
abide by all the terms of this agreement, you must cease using
and return or destroy all copies of Project Gutenberg™
electronic works in your possession. If you paid a fee for
obtaining a copy of or access to a Project Gutenberg™
electronic work and you do not agree to be bound by the terms
of this agreement, you may obtain a refund from the person or
entity to whom you paid the fee as set forth in paragraph 1.E.8.

1.B. “Project Gutenberg” is a registered trademark. It may only


be used on or associated in any way with an electronic work by
people who agree to be bound by the terms of this agreement.
There are a few things that you can do with most Project
Gutenberg™ electronic works even without complying with the
full terms of this agreement. See paragraph 1.C below. There
are a lot of things you can do with Project Gutenberg™
electronic works if you follow the terms of this agreement and
help preserve free future access to Project Gutenberg™
electronic works. See paragraph 1.E below.
1.C. The Project Gutenberg Literary Archive Foundation (“the
Foundation” or PGLAF), owns a compilation copyright in the
collection of Project Gutenberg™ electronic works. Nearly all the
individual works in the collection are in the public domain in the
United States. If an individual work is unprotected by copyright
law in the United States and you are located in the United
States, we do not claim a right to prevent you from copying,
distributing, performing, displaying or creating derivative works
based on the work as long as all references to Project
Gutenberg are removed. Of course, we hope that you will
support the Project Gutenberg™ mission of promoting free
access to electronic works by freely sharing Project
Gutenberg™ works in compliance with the terms of this
agreement for keeping the Project Gutenberg™ name
associated with the work. You can easily comply with the terms
of this agreement by keeping this work in the same format with
its attached full Project Gutenberg™ License when you share it
without charge with others.

1.D. The copyright laws of the place where you are located also
govern what you can do with this work. Copyright laws in most
countries are in a constant state of change. If you are outside
the United States, check the laws of your country in addition to
the terms of this agreement before downloading, copying,
displaying, performing, distributing or creating derivative works
based on this work or any other Project Gutenberg™ work. The
Foundation makes no representations concerning the copyright
status of any work in any country other than the United States.

1.E. Unless you have removed all references to Project


Gutenberg:

1.E.1. The following sentence, with active links to, or other


immediate access to, the full Project Gutenberg™ License must
appear prominently whenever any copy of a Project
Gutenberg™ work (any work on which the phrase “Project
Gutenberg” appears, or with which the phrase “Project
Gutenberg” is associated) is accessed, displayed, performed,
viewed, copied or distributed:

This eBook is for the use of anyone anywhere in the


United States and most other parts of the world at no
cost and with almost no restrictions whatsoever. You may
copy it, give it away or re-use it under the terms of the
Project Gutenberg License included with this eBook or
online at www.gutenberg.org. If you are not located in the
United States, you will have to check the laws of the
country where you are located before using this eBook.

1.E.2. If an individual Project Gutenberg™ electronic work is


derived from texts not protected by U.S. copyright law (does not
contain a notice indicating that it is posted with permission of the
copyright holder), the work can be copied and distributed to
anyone in the United States without paying any fees or charges.
If you are redistributing or providing access to a work with the
phrase “Project Gutenberg” associated with or appearing on the
work, you must comply either with the requirements of
paragraphs 1.E.1 through 1.E.7 or obtain permission for the use
of the work and the Project Gutenberg™ trademark as set forth
in paragraphs 1.E.8 or 1.E.9.

1.E.3. If an individual Project Gutenberg™ electronic work is


posted with the permission of the copyright holder, your use and
distribution must comply with both paragraphs 1.E.1 through
1.E.7 and any additional terms imposed by the copyright holder.
Additional terms will be linked to the Project Gutenberg™
License for all works posted with the permission of the copyright
holder found at the beginning of this work.

1.E.4. Do not unlink or detach or remove the full Project


Gutenberg™ License terms from this work, or any files
containing a part of this work or any other work associated with
Project Gutenberg™.
1.E.5. Do not copy, display, perform, distribute or redistribute
this electronic work, or any part of this electronic work, without
prominently displaying the sentence set forth in paragraph 1.E.1
with active links or immediate access to the full terms of the
Project Gutenberg™ License.

1.E.6. You may convert to and distribute this work in any binary,
compressed, marked up, nonproprietary or proprietary form,
including any word processing or hypertext form. However, if
you provide access to or distribute copies of a Project
Gutenberg™ work in a format other than “Plain Vanilla ASCII” or
other format used in the official version posted on the official
Project Gutenberg™ website (www.gutenberg.org), you must, at
no additional cost, fee or expense to the user, provide a copy, a
means of exporting a copy, or a means of obtaining a copy upon
request, of the work in its original “Plain Vanilla ASCII” or other
form. Any alternate format must include the full Project
Gutenberg™ License as specified in paragraph 1.E.1.

1.E.7. Do not charge a fee for access to, viewing, displaying,


performing, copying or distributing any Project Gutenberg™
works unless you comply with paragraph 1.E.8 or 1.E.9.

1.E.8. You may charge a reasonable fee for copies of or


providing access to or distributing Project Gutenberg™
electronic works provided that:

• You pay a royalty fee of 20% of the gross profits you derive from
the use of Project Gutenberg™ works calculated using the
method you already use to calculate your applicable taxes. The
fee is owed to the owner of the Project Gutenberg™ trademark,
but he has agreed to donate royalties under this paragraph to
the Project Gutenberg Literary Archive Foundation. Royalty
payments must be paid within 60 days following each date on
which you prepare (or are legally required to prepare) your
periodic tax returns. Royalty payments should be clearly marked
as such and sent to the Project Gutenberg Literary Archive
Foundation at the address specified in Section 4, “Information
about donations to the Project Gutenberg Literary Archive
Foundation.”

• You provide a full refund of any money paid by a user who


notifies you in writing (or by e-mail) within 30 days of receipt that
s/he does not agree to the terms of the full Project Gutenberg™
License. You must require such a user to return or destroy all
copies of the works possessed in a physical medium and
discontinue all use of and all access to other copies of Project
Gutenberg™ works.

• You provide, in accordance with paragraph 1.F.3, a full refund of


any money paid for a work or a replacement copy, if a defect in
the electronic work is discovered and reported to you within 90
days of receipt of the work.

• You comply with all other terms of this agreement for free
distribution of Project Gutenberg™ works.

1.E.9. If you wish to charge a fee or distribute a Project


Gutenberg™ electronic work or group of works on different
terms than are set forth in this agreement, you must obtain
permission in writing from the Project Gutenberg Literary
Archive Foundation, the manager of the Project Gutenberg™
trademark. Contact the Foundation as set forth in Section 3
below.

1.F.

1.F.1. Project Gutenberg volunteers and employees expend


considerable effort to identify, do copyright research on,
transcribe and proofread works not protected by U.S. copyright
law in creating the Project Gutenberg™ collection. Despite
these efforts, Project Gutenberg™ electronic works, and the
medium on which they may be stored, may contain “Defects,”
such as, but not limited to, incomplete, inaccurate or corrupt
data, transcription errors, a copyright or other intellectual
property infringement, a defective or damaged disk or other
medium, a computer virus, or computer codes that damage or
cannot be read by your equipment.

1.F.2. LIMITED WARRANTY, DISCLAIMER OF DAMAGES -


Except for the “Right of Replacement or Refund” described in
paragraph 1.F.3, the Project Gutenberg Literary Archive
Foundation, the owner of the Project Gutenberg™ trademark,
and any other party distributing a Project Gutenberg™ electronic
work under this agreement, disclaim all liability to you for
damages, costs and expenses, including legal fees. YOU
AGREE THAT YOU HAVE NO REMEDIES FOR NEGLIGENCE,
STRICT LIABILITY, BREACH OF WARRANTY OR BREACH
OF CONTRACT EXCEPT THOSE PROVIDED IN PARAGRAPH
1.F.3. YOU AGREE THAT THE FOUNDATION, THE
TRADEMARK OWNER, AND ANY DISTRIBUTOR UNDER
THIS AGREEMENT WILL NOT BE LIABLE TO YOU FOR
ACTUAL, DIRECT, INDIRECT, CONSEQUENTIAL, PUNITIVE
OR INCIDENTAL DAMAGES EVEN IF YOU GIVE NOTICE OF
THE POSSIBILITY OF SUCH DAMAGE.

1.F.3. LIMITED RIGHT OF REPLACEMENT OR REFUND - If


you discover a defect in this electronic work within 90 days of
receiving it, you can receive a refund of the money (if any) you
paid for it by sending a written explanation to the person you
received the work from. If you received the work on a physical
medium, you must return the medium with your written
explanation. The person or entity that provided you with the
defective work may elect to provide a replacement copy in lieu
of a refund. If you received the work electronically, the person or
entity providing it to you may choose to give you a second
opportunity to receive the work electronically in lieu of a refund.
If the second copy is also defective, you may demand a refund
in writing without further opportunities to fix the problem.

1.F.4. Except for the limited right of replacement or refund set


forth in paragraph 1.F.3, this work is provided to you ‘AS-IS’,
WITH NO OTHER WARRANTIES OF ANY KIND, EXPRESS
OR IMPLIED, INCLUDING BUT NOT LIMITED TO
WARRANTIES OF MERCHANTABILITY OR FITNESS FOR
ANY PURPOSE.

1.F.5. Some states do not allow disclaimers of certain implied


warranties or the exclusion or limitation of certain types of
damages. If any disclaimer or limitation set forth in this
agreement violates the law of the state applicable to this
agreement, the agreement shall be interpreted to make the
maximum disclaimer or limitation permitted by the applicable
state law. The invalidity or unenforceability of any provision of
this agreement shall not void the remaining provisions.

1.F.6. INDEMNITY - You agree to indemnify and hold the


Foundation, the trademark owner, any agent or employee of the
Foundation, anyone providing copies of Project Gutenberg™
electronic works in accordance with this agreement, and any
volunteers associated with the production, promotion and
distribution of Project Gutenberg™ electronic works, harmless
from all liability, costs and expenses, including legal fees, that
arise directly or indirectly from any of the following which you do
or cause to occur: (a) distribution of this or any Project
Gutenberg™ work, (b) alteration, modification, or additions or
deletions to any Project Gutenberg™ work, and (c) any Defect
you cause.

Section 2. Information about the Mission of


Project Gutenberg™
Project Gutenberg™ is synonymous with the free distribution of
electronic works in formats readable by the widest variety of
computers including obsolete, old, middle-aged and new
computers. It exists because of the efforts of hundreds of
volunteers and donations from people in all walks of life.

Volunteers and financial support to provide volunteers with the


assistance they need are critical to reaching Project
Gutenberg™’s goals and ensuring that the Project Gutenberg™
collection will remain freely available for generations to come. In
2001, the Project Gutenberg Literary Archive Foundation was
created to provide a secure and permanent future for Project
Gutenberg™ and future generations. To learn more about the
Project Gutenberg Literary Archive Foundation and how your
efforts and donations can help, see Sections 3 and 4 and the
Foundation information page at www.gutenberg.org.

Section 3. Information about the Project


Gutenberg Literary Archive Foundation
The Project Gutenberg Literary Archive Foundation is a non-
profit 501(c)(3) educational corporation organized under the
laws of the state of Mississippi and granted tax exempt status by
the Internal Revenue Service. The Foundation’s EIN or federal
tax identification number is 64-6221541. Contributions to the
Project Gutenberg Literary Archive Foundation are tax
deductible to the full extent permitted by U.S. federal laws and
your state’s laws.

The Foundation’s business office is located at 809 North 1500


West, Salt Lake City, UT 84116, (801) 596-1887. Email contact
links and up to date contact information can be found at the
Foundation’s website and official page at
www.gutenberg.org/contact

Section 4. Information about Donations to


the Project Gutenberg Literary Archive
Foundation
Project Gutenberg™ depends upon and cannot survive without
widespread public support and donations to carry out its mission
of increasing the number of public domain and licensed works
that can be freely distributed in machine-readable form
accessible by the widest array of equipment including outdated
equipment. Many small donations ($1 to $5,000) are particularly
important to maintaining tax exempt status with the IRS.

The Foundation is committed to complying with the laws


regulating charities and charitable donations in all 50 states of
the United States. Compliance requirements are not uniform
and it takes a considerable effort, much paperwork and many
fees to meet and keep up with these requirements. We do not
solicit donations in locations where we have not received written
confirmation of compliance. To SEND DONATIONS or
determine the status of compliance for any particular state visit
www.gutenberg.org/donate.

While we cannot and do not solicit contributions from states


where we have not met the solicitation requirements, we know
of no prohibition against accepting unsolicited donations from
donors in such states who approach us with offers to donate.

International donations are gratefully accepted, but we cannot


make any statements concerning tax treatment of donations
received from outside the United States. U.S. laws alone swamp
our small staff.

Please check the Project Gutenberg web pages for current


donation methods and addresses. Donations are accepted in a
number of other ways including checks, online payments and
credit card donations. To donate, please visit:
www.gutenberg.org/donate.

Section 5. General Information About Project


Gutenberg™ electronic works
Professor Michael S. Hart was the originator of the Project
Gutenberg™ concept of a library of electronic works that could
be freely shared with anyone. For forty years, he produced and
distributed Project Gutenberg™ eBooks with only a loose
network of volunteer support.

Project Gutenberg™ eBooks are often created from several


printed editions, all of which are confirmed as not protected by
copyright in the U.S. unless a copyright notice is included. Thus,
we do not necessarily keep eBooks in compliance with any
particular paper edition.

Most people start at our website which has the main PG search
facility: www.gutenberg.org.

This website includes information about Project Gutenberg™,


including how to make donations to the Project Gutenberg
Literary Archive Foundation, how to help produce our new
eBooks, and how to subscribe to our email newsletter to hear
about new eBooks.

You might also like