100% found this document useful (3 votes)
39 views

Convolutional Neural Networks in Visual Computing: A Concise Guide 1st Edition Ragav Venkatesan 2024 Scribd Download

Networks

Uploaded by

abqaravlik
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (3 votes)
39 views

Convolutional Neural Networks in Visual Computing: A Concise Guide 1st Edition Ragav Venkatesan 2024 Scribd Download

Networks

Uploaded by

abqaravlik
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 52

Download the full version of the textbook now at textbookfull.

com

Convolutional Neural Networks in Visual


Computing: A Concise Guide 1st Edition Ragav
Venkatesan

https://textbookfull.com/product/convolutional-
neural-networks-in-visual-computing-a-concise-
guide-1st-edition-ragav-venkatesan/

Explore and download more textbook at https://textbookfull.com


Recommended digital products (PDF, EPUB, MOBI) that
you can download immediately if you are interested.

Neural Networks A Visual Introduction for Beginners


Michael Taylor

https://textbookfull.com/product/neural-networks-a-visual-
introduction-for-beginners-michael-taylor/

textbookfull.com

Convolutional Neural Networks with Swift for Tensorflow:


Image Recognition and Dataset Categorization 1st Edition
Brett Koonce
https://textbookfull.com/product/convolutional-neural-networks-with-
swift-for-tensorflow-image-recognition-and-dataset-categorization-1st-
edition-brett-koonce/
textbookfull.com

Visual Experiences: A Concise Guide to Digital Interface


Design 1st Edition Carla Viviana Coleman

https://textbookfull.com/product/visual-experiences-a-concise-guide-
to-digital-interface-design-1st-edition-carla-viviana-coleman/

textbookfull.com

Trust in Media and Journalism Empirical Perspectives on


Ethics Norms Impacts and Populism in Europe 1st Edition
Kim Otto
https://textbookfull.com/product/trust-in-media-and-journalism-
empirical-perspectives-on-ethics-norms-impacts-and-populism-in-
europe-1st-edition-kim-otto/
textbookfull.com
A first course in machine learning Second Edition Mark
Girolami

https://textbookfull.com/product/a-first-course-in-machine-learning-
second-edition-mark-girolami/

textbookfull.com

Fluorinated polymers in 2 Vol v 1 Synthesis properties


processing and simulation Ameduri Bruno. (Ed.)

https://textbookfull.com/product/fluorinated-polymers-
in-2-vol-v-1-synthesis-properties-processing-and-simulation-ameduri-
bruno-ed/
textbookfull.com

Master the Wards Surgery Flashcards Dec 3 2015 1st Edition


Sonpal Niket Fischer Conrad

https://textbookfull.com/product/master-the-wards-surgery-flashcards-
dec-3-2015-1st-edition-sonpal-niket-fischer-conrad/

textbookfull.com

Intelligent numerical methods Applications to Fractional


Calculus George A. Anastassiou

https://textbookfull.com/product/intelligent-numerical-methods-
applications-to-fractional-calculus-george-a-anastassiou/

textbookfull.com

Java XML and JSON Friesen Jeff

https://textbookfull.com/product/java-xml-and-json-friesen-jeff/

textbookfull.com
Computational Science and Its Applications ICCSA 2016 16th
International Conference Beijing China July 4 7 2016
Proceedings Part IV 1st Edition Osvaldo Gervasi
https://textbookfull.com/product/computational-science-and-its-
applications-iccsa-2016-16th-international-conference-beijing-china-
july-4-7-2016-proceedings-part-iv-1st-edition-osvaldo-gervasi/
textbookfull.com
Convolutional Neural
Networks in Visual
Computing
DATA-ENABLED ENGINEERING
SERIES EDITOR
Nong Ye
Arizona State University, Phoenix, USA

PUBLISHED TITLES

Convolutional Neural Networks in Visual Computing: A Concise Guide


Ragav Venkatesan and Baoxin Li
Convolutional Neural
Networks in Visual
Computing
A Concise Guide

By
Ragav Venkatesan and Baoxin Li
CRC Press
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742

© 2018 by Taylor & Francis Group, LLC


CRC Press is an imprint of Taylor & Francis Group, an Informa business

No claim to original U.S. Government works

Printed on acid-free paper

International Standard Book Number-13: 978-1-4987-7039-2 (Hardback); 978-1-138-74795-1 (Paperback)

This book contains information obtained from authentic and highly regarded sources. Reasonable efforts
have been made to publish reliable data and information, but the author and publisher cannot assume
responsibility for the validity of all materials or the consequences of their use. The authors and publishers
have attempted to trace the copyright holders of all material reproduced in this publication and apologize
to copyright holders if permission to publish in this form has not been obtained. If any copyright material
has not been acknowledged please write and let us know so we may rectify in any future reprint.

Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced,
transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter
invented, including photocopying, microfilming, and recording, or in any information storage or retrieval
system, without written permission from the publishers.

For permission to photocopy or use material electronically from this work, please access www.copyright
.com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood
Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that provides licenses and
registration for a variety of users. For organizations that have been granted a photocopy license by the CCC,
a separate system of payment has been arranged.
Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are
used only for identification and explanation without intent to infringe.

Library of Congress Cataloging-in-Publication Data


Names: Venkatesan, Ragav, author. | Li, Baoxin, author.
Title: Convolutional neural networks in visual computing : a concise guide /
Ragav Venkatesan, Baoxin Li.
Description: Boca Raton ; London : Taylor & Francis, CRC Press, 2017. |
Includes bibliographical references and index.
Identifiers: LCCN 2017029154| ISBN 9781498770392 (hardback : alk. paper) |
ISBN 9781315154282 (ebook)
Subjects: LCSH: Computer vision. | Neural networks (Computer science)
Classification: LCC TA1634 .V37 2017 | DDC 006.3/2--dc23
LC record available at https://lccn.loc.gov/2017029154

Visit the Taylor & Francis Web site at


http://www.taylorandfrancis.com

and the CRC Press Web site at


http://www.crcpress.com
To Jaikrishna Mohan, for growing up with me;
you are a fierce friend, and my brother.
and to Prof. Ravi Naganathan for helping me grow up;
my better angels have always been your philosophy and principles.
—Ragav Venkatesan
To my wife, Julie,
for all your unwavering support over the years.
—Baoxin Li
Contents

P r e fa c e xi
Acknowledgments xv
Authors xvii

Chapter 1 Introduction to V i s ua l C o m p u t i n g 1
Image Representation Basics 3
Transform-Domain Representations 6
Image Histograms 7
Image Gradients and Edges 10
Going beyond Image Gradients 15
Line Detection Using the Hough Transform 15
Harris Corners 16
Scale-Invariant Feature Transform 17
Histogram of Oriented Gradients 17
Decision-Making in a Hand-Crafted Feature Space 19
Bayesian Decision-Making 21
Decision-Making with Linear Decision Boundaries 23
A Case Study with Deformable Part Models 25
Migration toward Neural Computer Vision 27
Summary 29
References 30

C h a p t e r 2 L e a r n i n g as a Regression Problem 33
Supervised Learning 33
Linear Models 36
Least Squares 39

vii
viii C o n t en t s

Maximum-Likelihood Interpretation 41
Extension to Nonlinear Models 43
Regularization 45
Cross-Validation 48
Gradient Descent 49
Geometry of Regularization 55
Nonconvex Error Surfaces 57
Stochastic, Batch, and Online Gradient Descent 58
Alternative Update Rules Using Adaptive Learning Rates 59
Momentum 60
Summary 62
References 63

C h a p t e r 3 A r t i f i c i a l N e u r a l N e t w o r k s 65
The Perceptron 66
Multilayer Neural Networks 74
The Back-Propagation Algorithm 79
Improving BP-Based Learning 82
Activation Functions 82
Weight Pruning 85
Batch Normalization 85
Summary 86
References 87

C h a p t e r 4 C o n v o l u t i o n a l N e u r a l N e t w o r k s 89
Convolution and Pooling Layer 90
Convolutional Neural Networks 97
Summary 114
References 115

C h a p t e r 5 M o d e r n and N ov e l Usag e s of CNN s 117


Pretrained Networks 118
Generality and Transferability 121
Using Pretrained Networks for Model Compression 126
Mentee Networks and FitNets 130
Application Using Pretrained Networks: Image
Aesthetics Using CNNs 132
Generative Networks 134
Autoencoders 134
Generative Adversarial Networks 137
Summary 142
References 143

A pp e n d i x A Ya a n 147
Structure of Yann 148
Quick Start with Yann: Logistic Regression 149
Multilayer Neural Networks 152
C o n t en t s ix

Convolutional Neural Network 154


Autoencoder 155
Summary 157
References 157

P o s t s c r ip t 159
References 162
Index 163
Visit https://textbookfull.com
now to explore a rich
collection of eBooks, textbook
and enjoy exciting offers!
Preface

Deep learning architectures have attained incredible popularity in


recent years due to their phenomenal success in, among other appli-
cations, computer vision tasks. Particularly, convolutional neural
networks (CNNs) have been a significant force contributing to state-
of-the-art results. The jargon surrounding deep learning and CNNs
can often lead to the opinion that it is too labyrinthine for a beginner
to study and master. Having this in mind, this book covers the funda-
mentals of deep learning for computer vision, designing and deploying
CNNs, and deep computer vision architecture. This concise book was
intended to serve as a beginner’s guide for engineers, undergraduate
seniors, and graduate students who seek a quick start on learning and/
or building deep learning systems of their own. Written in an easy-
to-read, mathematically nonabstruse tone, this book aims to provide
a gentle introduction to deep learning for computer vision, while still
covering the basics in ample depth.
The core of this book is divided into five chapters. Chapter 1 pro-
vides a succinct introduction to image representations and some com-
puter vision models that are contemporarily referred to as hand-carved.
The chapter provides the reader with a fundamental understanding of
image representations and an introduction to some linear and non-
linear feature extractors or representations and to properties of these
representations. Onwards, this chapter also demonstrates detection

xi
x ii P refac e

of some basic image entities such as edges. It also covers some basic
machine learning tasks that can be performed using these representa-
tions. The chapter concludes with a study of two popular non-neural
computer vision modeling techniques.
Chapter 2 introduces the concepts of regression, learning machines,
and optimization. This chapter begins with an introduction to super-
vised learning. The first learning machine introduced is the linear
regressor. The first solution covered is the analytical solution for least
squares. This analytical solution is studied alongside its maximum-
likelihood interpretation. The chapter moves on to nonlinear models
through basis function expansion. The problem of overfitting and gen-
eralization through cross-validation and regularization is further intro-
duced. The latter part of the chapter introduces optimization through
gradient descent for both convex and nonconvex error surfaces. Further
expanding our study with various types of gradient descent methods
and the study of geometries of various regularizers, some modifications
to the basic gradient descent method, including second-order loss mini-
mization techniques and learning with momentum, are also presented.
Chapters 3 and 4 are the crux of this book. Chapter 3 builds on
Chapter 2 by providing an introduction to the Rosenblatt perceptron
and the perceptron learning algorithm. The chapter then introduces a
logistic neuron and its activation. The single neuron model is studied
in both a two-class and a multiclass setting. The advantages and draw-
backs of this neuron are studied, and the XOR problem is introduced.
The idea of a multilayer neural network is proposed as a solution to
the XOR problem, and the backpropagation algorithm, introduced
along with several improvements, provides some pragmatic tips that
help in engineering a better, more stable implementation. Chapter 4
introduces the convpool layer and the CNN. It studies various proper-
ties of this layer and analyzes the features that are extracted for a typi-
cal digit recognition dataset. This chapter also introduces four of the
most popular contemporary CNNs, AlexNet, VGG, GoogLeNet, and
ResNet, and compares their architecture and philosophy.
Chapter 5 further expands and enriches the discussion of deep
architectures by studying some modern, novel, and pragmatic uses of
CNNs. The chapter is broadly divided into two contiguous sections.
The first part deals with the nifty philosophy of using download-
able, pretrained, and off-the-shelf networks. Pretrained networks are
essentially trained on a wholesome dataset and made available for the
P refac e x iii

public-at-large to fine-tune for a novel task. These are studied under


the scope of generality and transferability. Chapter 5 also studies the
compression of these networks and alternative methods of learning a
new task given a pretrained network in the form of mentee networks.
The second part of the chapter deals with the idea of CNNs that are
not used in supervised learning but as generative networks. The sec-
tion briefly studies autoencoders and the newest novelty in deep com-
puter vision: generative adversarial networks (GANs).
The book comes with a website (convolution.network) which is a
supplement and contains code and implementations, color illustra-
tions of some figures, errata and additional materials. This book also
led to a graduate level course that was taught in the Spring of 2017
at Arizona State University, lectures and materials for which are also
available at the book website.
Figure 1 in Chapter 1 of the book is an original image (original.jpg),
that I shot and for which I hold the rights. It is a picture of the monu-
ment valley, which as far as imagery goes is representative of the south-
west, where ASU is. The art in memory.png was painted in the style of
Salvador Dali, particularly of his painting “the persistence of memory”
which deals in abstract about the concept of the mind hallucinating and
picturing and processing objects in shapeless forms, much like what
some representations of the neural networks we study in the book are.
The art in memory.png is not painted by a human but by a neural
network similar to the ones we discuss in the book. Ergo the connec-
tion to the book. Below is the citation reference.

@article{DBLP:journals/corr/GatysEB15a,
author = {Leon A. Gatys and
Alexander S. Ecker and
Matthias Bethge},
title = {A Neural Algorithm of Artistic Style},
journal = {CoRR},
volume = {abs/1508.06576},
year = {2015},
url = {http://arxiv.org/abs/1508.06576},
timestamp = {Wed, 07 Jun 2017 14:41:58 +0200},
biburl = {http://dblp.unitrier.de/rec/bib/
journals/corr/GatysEB15a},
bibsource = {dblp computer science bibliography,
http://dblp.org}
}
xiv P refac e

This book is also accompanied by a CNN toolbox based on Python


and Theano, which was developed by the authors, and a webpage con-
taining color figures, errata, and other accompaniments. The toolbox,
named yann for “Yet Another Neural Network” toolbox, is available
under MIT License at the URL http://www.yann.network. Having
in mind the intention of making the material in this book easily acces-
sible for a beginner to build upon, the authors have developed a set
of tutorials using yann. The tutorial and the toolbox cover the differ-
ent architectures and machines discussed in this book with examples
and sample code and application programming interface (API) docu-
mentation. The yann toolbox is under active development at the time
of writing this book, and its customer support is provided through
GitHub. The book’s webpage is hosted at http://guide2cnn.com.
While most figures in this book were created as grayscale illustra-
tions, there are some figures that were originally created in color and
converted to grayscale during production. The color versions of these
figures as well as additional notes, information on related courses, and
FAQs are also found on the website.
This toolbox and this book are also intended to be reading mate-
rial for a semester-long graduate-level course on Deep Learning for
Visual Computing offered by the authors at Arizona State University.
The course, including recorded lectures, course materials and home-
work assignments, are available for the public at large at http://www
.course.convolution.network. The authors are available via e-mail for
both queries regarding the material and supporting code, and for
humbly accepting any criticisms or comments on the content of the
book. The authors also gladly encourage requests for reproduction of
figures, results, and materials described in this book, as long as they
conform to the copyright policies of the publisher. The authors hope
that readers enjoy this concise guide to convolutional neural networks
for computer vision and that a beginner will be able to quickly build
his/her own learning machines with the help of this book and its tool-
box. We encourage readers to use the knowledge they may gain from
this material for the good of humanity while sincerely discouraging
them from building “Skynet” or any other apocalyptic artificial intel-
ligence machines.
Acknowledgments

It is a pleasure to acknowledge many colleagues who have made this


time-consuming book project possible and enjoyable. Many current
and past members of the Visual Representation and Processing Group
and the Center for Cognitive and Ubiquitous Computing at Arizona
State University have worked on various aspects of deep learning and
its applications in visual computing. Their efforts have supplied ingre-
dients for insightful discussion related to the writing of this book,
and thus are greatly appreciated. Particularly, we would like to thank
Parag Sridhar Chandakkar for providing comments on Chapters 4
and 5, as well as Yuzhen Ding, Yikang Li, Vijetha Gattupalli, and
Hemanth Venkateswara for always being available for discussions.
This work stemmed from efforts in several projects sponsored by
the National Science Foundation, the Office of Naval Research, the
Army Research Office, and Nokia, whose support is greatly appreci-
ated, although any views/conclusions in this book are solely of the
authors and do not necessarily reflect those of the sponsors. We also
gratefully acknowledge the support of NVIDIA Corporation with the
donation of the Tesla K40 GPU, which has been used in our research.
We are grateful to CRC Press, Taylor and Francis Publications, and
in particular, to Cindy Carelli, Executive Editor, and Renee Nakash,
for their patience and incredible support throughout the writing of
this book. We would also like to thank Dr. Alex Krizhevsky for

xv
xvi Ac k n o w l ed g m en t s

gracefully giving us permission to use figures from the AlexNet paper.


We would further like to acknowledge the developers of Theano and
other Python libraries that are used by the yann toolbox and are used
in the production of some of the figures in this book. In particular,
we would like to thank Frédéric Bastien and Pascal Lamblin from
the Theano users group and Montreal Institute of Machine Learning
Algorithms of the Université de Montréal for the incredible customer
support. We would also like to thank GitHub and Read the Docs for
free online hosting of data, code, documentation, and tutorials.
Last, but foremost, we thank our friends and families for their
unwavering support during this fun project and for their understand-
ing and tolerance of many weekends and long nights spent on this
book by the authors. We dedicate this book to them, with love.
Ragav Venkatesan and Baoxin Li
Authors

Ragav Venkatesan is currently completing his PhD study in computer


science in the School of Computing, Informatics and Decision Systems
Engineering at Arizona State University (ASU), Tempe, Arizona.
He has been a research associate with the Visual Representation and
Processing Group at ASU and has worked as a teaching assistant for
several graduate-level courses in machine learning, pattern recogni-
tion, video processing, and computer vision. Prior to this, he was a
research assistant with the Image Processing and Applications Lab in
the School of Electrical & Computer Engineering at ASU, where he
obtained an MS degree in 2012. From 2013 to 2014, Venkatesan was
with the Intel Corporation as a computer vision research intern work-
ing on technologies for autonomous vehicles. Venkatesan regularly
serves as a reviewer for several peer-reviewed journals and conferences
in machine learning and computer vision.

Baoxin Li received his PhD in electrical engineering from the


University of Maryland, College Park, in 2000. He is currently a pro-
fessor and chair of the Computer Science and Engineering program
and a graduate faculty in the Electrical Engineering and Computer
Engineering programs at Arizona State University, Tempe, Arizona.
From 2000 to 2004, he was a senior researcher with SHARP
Laboratories of America, Camas, Washington, where he was a

x vii
x viii Au t h o rs

technical lead in developing SHARP’s trademarked HiMPACT


Sports technologies. From 2003 to 2004, he was also an adjunct pro-
fessor with Portland State University, Oregon. He holds 18 issued US
patents and his current research interests include computer vision and
pattern recognition, multimedia, social computing, machine learning,
and assistive technologies. He won SHARP Laboratories’ President’s
Award in 2001 and 2004. He also received the SHARP Laboratories’
Inventor of the Year Award in 2002. He is a recipient of the National
Science Foundation’s CAREER Award.
1
I ntroducti on to
Visual C omputin g

The goal of human scientific exploration is to advance human


capabilities. We invented fire to cook food, therefore outgrowing our
dependence on the basic food processing capability of our own stom-
ach. This led to increased caloric consumption and perhaps sped up
the growth of civilization—something that no other known species
has accomplished. We invented the wheel and vehicles therefore our
speed of travel does not have to be limited to the ambulatory speed of
our legs. Indeed, we built airplanes, if for no other reason than to real-
ize our dream of being able to take to the skies. The story of human
invention and technological growth is a narrative of the human spe-
cies endlessly outgrowing its own capabilities and therefore endlessly
expanding its horizons and marching further into the future.
Much of these advances are credited to the wiring in the human
brain. The human neural system and its capabilities are far-reaching
and complicated. Humans enjoy a very intricate neural system capable
of thought, emotion, reasoning, imagination, and philosophy. As sci-
entists working on computer vision, perhaps we are a little tenden-
tious when it comes to the significance of human vision, but for us,
the most fascinating part of human capabilities, intelligence included,
is the cognitive-visual system. Although human visual system and its
associated cognitive decision-making processes are one of the fastest
we know of, humans may not have the most powerful visual system
among all the species, if, for example, acuity or night vision capa-
bilities are concerned (Thorpe et al., 1996; Watamaniuk and Duchon,
1992). Also, humans peer through a very narrow range of the electro-
magnetic spectrum. There are many other species that have a wider
visual sensory range than we do. Humans have also become prone to
many corneal visual deficiencies such as near-sightedness. Given all
this, it is only natural that we as humans want to work on improving

1
Visit https://textbookfull.com
now to explore a rich
collection of eBooks, textbook
and enjoy exciting offers!
2 C O N V O LU TI O N A L NEUR A L NE T W O RKS

our visual capabilities, like we did with other deficiencies in human


capabilities.
We have been developing tools for many centuries trying to see
further and beyond the eye that nature has bestowed upon us.
Telescopes, binoculars, microscopes, and magnifiers were invented to
see much farther and much smaller objects. Radio, infrared, and x-ray
devices make us see in parts of the electromagnetic spectrum, beyond
the visible band that we can naturally perceive. Recently, interfer-
ometers were perfected and built, extending human vision to include
gravity waves, making way for yet another way to look at the world
through gravitational astronomy. While all these devices extend the
human visual capability, scholars and philosophers have long since
realized that we do not see just with our eyes. Eyes are but mere imag-
ing instruments; it is the brain that truly sees.
While many scholars from Plato, Aristotle, Charaka, and Euclid to
Leonardo da Vinci studied how the eye sees the world, it was Hermann
von Helmholtz in 1867 in his Treatise on the Physiological Optics who
first postulated in scientific terms that the eye only captures images
and it is the brain that truly sees and recognizes the objects in the
image (Von Helmholtz, 1867). In his book, he presented novel theo-
ries on depth and color perception, motion perception, and also built
upon da Vinci’s earlier work. While it had been studied in some form
or the other since ancient times in many civilizations, Helmholtz first
described the idea of unconscious inference where he postulated that
not all ideas, thoughts, and decisions that the brain makes are done so
consciously. Helmholtz noted how susceptible humans are to optical
illusions, famously quoting the misunderstanding of the sun revolv-
ing around the earth, while in reality it is the horizon that is moving,
and that humans are drawn to emotions of a staged actor even though
they are only staged. Using such analogies, Helmholtz proposed that
the brain understands the images that the eye sees and it is the brain
that makes inferences and understanding on what objects are being
seen, without the person consciously noticing them. This was prob-
ably the first insight into neurological vision. Some early-modern sci-
entists such as Campbell and Blakemore started arguing what is now
an established fact: that there are neurons in the brain responsible for
estimating object sizes and sensitivity to orientation (Blakemore and
Campbell, 1969). Later studies during the same era discovered more
In t r o d u c ti o n t o V isua l C o m p u tin g 3

complex intricacies of the human visual system and how we perceive


and detect color, shapes, orientation, depth, and even objects (Field
et al., 1993; McCollough, 1965; Campbell and Kulikowski, 1966;
Burton, 1973).
The above brief historical accounts serve only to illustrate that the
field of computer vision has its own place in the rich collection of
stories of human technological development. This book focuses on a
concise presentation of modern computer vision techniques, which
might be stamped as neural computer vision since many of them stem
from artificial neural networks. To ensure the book is self-contained,
we start with a few foundational chapters that introduce a reader to
the general field of visual computing by defining basic concepts, for-
mulations, and methodologies, starting with a brief presentation of
image representation in the subsequent section.

Image Representation Basics

Any computer vision pipeline begins with an imaging system that


captures light rays reflected from the scene and converts the optical
light signals into an image in a format that a computer can read and
process. During the early years of computational imaging, an image
was obtained by digitizing a film or a printed picture; contemporarily,
images are typically acquired directly by digital cameras that capture
and store an image of a scene in terms of a set of ordered numbers
called pixels. There are many textbooks covering image acquisition
and a camera’s inner workings (like its optics, mechanical controls and
color filtering, etc.) (Jain, 1989; Gonzalez and Woods, 2002), and thus
we will present only a brief account here. We use the simple illustration
of Figure 1.1 to highlight the key process of sampling (i.e., discretiza-
tion via the image grid) and quantization (i.e., representing each pixel’s
color values with only a finite set of integers) of the light ray coming
from a scene into the camera to form an image of the world.
Practically any image can be viewed as a matrix (or three matrices if
one prefers to explicitly consider the color planes separately) of quan-
tized numbers of a certain bit length encoding the intensity and color
information of the optical projection of a scene onto the imaging plane
of the camera. Consider Figure 1.1. The picture shown was captured by
a camera as follows: The camera has a sensor array that determines the
4 C O N V O LU TI O N A L NEUR A L NE T W O RKS

190

Figure 1.1 Image sampling and quantization.

size and resolution of the image. Let us suppose that the sensor array
had n × m sensors, implying that the image it produced was n × m in its
size. Each sensor grabbed a sample of light that was incident on that
area of the sensor after it passed through a lens. The sensor assigned
that sample a value between 0 and (2b − 1) for a b-bit image. Assuming
that the image was 8 bit, the sample will be between 0 and 255, as
shown in Figure 1.1. This process is called sampling and quantiza-
tion, sampling because we only picked certain points in the continuous
field of view and quantization, because we limited the values of light
intensities within a finite number of choices. Sampling, quantization,
and image formation in camera design and camera models are them-
selves a much broader topic and we recommend that interested readers
follow up on the relevant literature for a deeper discussion (Gonzalez
and Woods, 2002). Cameras for color images typically produce three
images corresponding to the red (R), green (G), and blue (B) spectra,
respectively. How these R, G, and B images are produced depends on
the camera, although most consumer-grade cameras employ a color
filter in front of a single sensor plane to capture a mosaicked image of
all three color channels and then rely on a “de-mosaicking” process to
create full-resolution, separate R, G, and B images.
With this apparatus, we are able to represent an image in the com-
puter as stored digital data. This representation of the image is called
the pixel representation of the image. Each image is a matrix or tensor
of one (grayscale) or three (colored) or more (depth and other fields)
channels. The ordering of the pixels is the same as that of the order-
ing of the samples that were collected, which is in turn the order
of the sensor locations from which they were collected. The higher
the value of the pixel, the greater the intensity of color present. This
In t r o d u c ti o n t o V isua l C o m p u tin g 5

is the most explicit representation of an image that is possible. The


larger the image, the more pixels we have. The closer the sensors are,
the higher resolution the produced image will have when capturing
details of a scene. If we consider two images of different sizes that
sample the same area and field of view of the real world, the larger
image has a higher resolution than the smaller one as the larger image
can resolve more detail. For a grayscale image, we often use a two-
dimensional discrete array I (n1 , n2 ) to represent the underlying matrix
of pixel values, with n1 and n2 indexing the pixel at the n1th row and
the n1th column of the matrix, and the value of I (n1 , n2 ) corresponding
to the pixel’s intensity, respectively.
While each pixel is sampled independently of the others, the
pixel incenties are in general not independent of each other. This is
because a typical scene does not change drastically everywhere and
thus adjacent samples will in general be quite similar, except for pixels
lying on the border between two visually different entities in the
world. Therefore, edges in images that are defined by discontinuities
(or large changes) in pixel values, are a good indicator of entities in the
image. In general, images capturing a natural scene would be smooth
(i.e., with no changes or only small changes) everywhere except for
pixels corresponding to the edges.
The basic way of representing images as matrices of pixels as
discussed above is often called spatial domain representation since
the pixels are viewed as measurements, sampling the light intensi-
ties in the space or more precisely on the imaging plane. There
are other ways of looking at or even acquiring the images using
the so-called frequency-domain approaches, which decompose an
image into its frequency components, much like a prism breaking
down incident sunlight into different color bands. There are also
approaches, like wavelet transform, that analyze/decompose an
image using time–frequency transformations, where time actually
refers to space in the case of images (Meyer, 1995). All of these may
be called transform-domain representations for images. In general,
a transform-domain representation of an image is invertible, mean-
ing that it is possible to go back to the original image from its
transform-domain representation. Practically, which representation
to use is really an issue of convenience for a particular processing
task. In addition to representations in the spatial and transform
6 C O N V O LU TI O N A L NEUR A L NE T W O RKS

domains, many computer vision tasks actually first compute various


types of features from an image (either the original image or some
transform-domain representation), and then perform some analy-
sis/inference tasks based on the computed features. In a sense, such
computed features serve as a new representation of the underly-
ing image, and hence we will call them feature representations. In
the following section, we briefly introduce several commonly used
transform-domain representations and feature representations for
images.

Transform-Domain Representations

Perhaps the most-studied transform-domain representation for


images (or in general for any sequential data) is through Fourier anal-
ysis (see Stein and Shakarchi, 2003). Fourier representations use lin-
ear combinations of sinusoids to represent signals. For a given image
I (n1 , n2 ), we may decompose it using the following expression (which is
the inverse Fourier transform):
n −1 m −1  un vn 
1
∑∑
j 2 π 1 + 2 
I (n1 , n2 ) = I F (u, v )e  n m  (1.1)
nm u=0 v=0

where, I F (u , v ) are the Fourier coefficients and can be found by the


following expression (which is the Fourier transform):
n −1 m −1  un vn 

∑∑
− j 2 π 1 + 2 
 n m 
I F (u, v ) = I (n1 , n2 )e (1.2)
n1 = 0 n2 = 0

In this representation, the pixel representation of the image I (n1 , n2 )


is broken down into frequency components. Each frequency compo-
nent has an associated coefficient that describes how much that fre-
quency component is present. Each frequency component becomes the
basis with which we may now represent the image. One popular use of
this approach is the variant discrete cosine transform (DCT) for Joint
Photographic Experts Group (JPEG) image compression. The JPEG
codec uses only the cosine components of the sinusoid in Equation 1.2
and is therefore called the discrete cosine basis. The DCT basis func-
tions are picturized in Figure 1.2.
Random documents with unrelated
content Scribd suggests to you:
to the French on) as in the following phrases; “one is apt to think;” “one sees;” “one
supposes.” Who, which, that, are called Relatives, because they more directly refer to
some Substantive going before; which therefore is called the Antecedent. They also
connect the following part of the Sentence with the foregoing. These belong to all the
three Persons; whereas the rest belong only to the Third. One of them only is varied to
express the three Cases; Who, whose[13], (that is, who’s[14]) whom: none of them have
different endings for the Numbers. Who, which, what, are called Interrogatives, when they
are used in asking questions. The two latter of them have no variation of Number or
Case.
Own, and self, in the Plural selves, are joined to the Possessives my, our, thy, your,
his, her, their; as, my own hand; myself, yourselves; both of them expressing emphasis,
or opposition; as, “I did it my own self,” that is, and no one else: the latter also forming the
Reciprocal Pronoun; as, “he hurt himself.” Himself, themselves, seem to be used in the
Nominative Case by corruption instead of his self, their selves: as, “he came himself;”
“they did it themselves;” where himself, themselves, cannot be in the Objective Case. If
this be so, self must be in these instances, not a Pronoun, but a Noun. Thus Dryden uses
it:

“What I show,
Thy self may freely on thy self bestow.”

Ourself, the Plural Pronominal Adjective with the Singular Substantive, is peculiar to the
Regal Style.
Own is an Adjective; or perhaps the Participle (owen) of the obsolete verb owe; to
possess; to be the right owner of a thing.
All Nouns whatever in Grammatical Construction are of the Third Person: except when
an address is made to a Person; then the Noun, answering to the Vocative Case in Latin,
is of the Second Person.

ADJECTIVE.

An Adjective is a word joined to a Substantive to express its Quality[15].


In English the Adjective is not varied on account of Gender, Number, or Case. The only
variation it admits of is that of the Degrees of Comparison.
Qualities admit of more and less, or of different degrees: and the words that express
Qualities have accordingly proper forms to express different degrees. When a Quality is
simply expressed, without any relation to the same in a different degree, it is called the
Positive; as, wise, great. When it is expressed with augmentation, or with reference to a
less degree of the same, it is called the Comparative; as, wiser, greater. When it is
expressed as being in the highest degree of all, it is called the Superlative; as, wisest,
greatest.
So that the simple word, or Positive, becomes Comparative by adding r or er; and
Superlative by adding st, or est, to the end of it. And the Adverbs more and most placed
before the Adjective have the same effect; as, wise, more wise, most wise[16].
Monosyllables, for the most part, are compared by er and est; and Dissyllables by more
and most: as, mild, milder, mildest; frugal, more frugal, most frugal. Dissyllables ending in
y easily admit of er and est; as happy, lovely. Words of more than two syllables hardly
ever admit of er and est.
In some few words the Superlative is formed by adding the Adverb most to the end of
them: as, nethermost, uttermost, or utmost, undermost, uppermost, foremost.
In English, as in most languages, there are some words of very common use that are
irregular in this respect: as, good, better, best; bad, worse, worst; little, less[17], least;
much, or many, more, most; and a few others.

VERB.
A Verb is a word which signifies to be, to do, or to suffer.
There are three kinds of Verbs; Active, Passive, and Neuter Verbs.
A Verb Active expresses an Action, and necessarily implies an agent, and an object
acted upon: as, to love; “I love Thomas.”
A Verb Passive expresses a Passion, or a Suffering, or the receiving of an Action; and
necessarily implies an Object acted upon, and an Agent by which it is acted upon: as, to
be loved; “Thomas is loved by me.”
So when the Agent takes the lead in the Sentence, the Verb is Active, and the Object
follows: when the Object takes the lead, the Verb is Passive, and the Agent follows.
A Verb Neuter expresses Being, or a state or condition of being; when the Agent and
the Object acted upon coincide, and the event is properly neither Action nor Passion, but
rather something between both: as, I am; I walk; I sleep.
The Verb Active is called also Transitive, because the Action passeth over to the
Object, or hath an effect upon some other thing: and the Verb Neuter is called Intransitive,
because the effect is confined within the Agent, and doth not pass over to any object.
In English many Verbs are used both in an Active and a Neuter signification, the
construction only determining of which kind they are.
In a Verb are to be considered the Person, the Number, the Time, and the Mode.
The Verb varies its endings to express, or agree with, the different Persons: as, “I love,
Thou lovest, He loveth, or loves.”
So also to express the different Numbers of the same Person: as, “Thou lovest, ye
love; He loveth, they love[18].”
So likewise to express different Times: as, “I love, I loved; I bear, I bore, I have born.”
The Mode is the Manner of representing the Action or Passion. When it is simply
declared, or a question is asked concerning it, it is called the Indicative Mode; when it is
bidden, it is called the Imperative; when it is subjoined as the end or design, or mentioned
under a condition, a supposition, or the like, for the most part depending on some other
Verb, and having a Conjunction before it, it is called the Subjunctive; when it is barely
expressed without any limitation of person or number, it is called the Infinitive; and when it
is expressed in a form in which it may be joined to a Noun as its quality or accident,
partaking thereby of the nature of an Adjective, it is called the Participle.
But to express the Time of the Verb the English uses also the assistance of other
Verbs, called therefore Auxiliaries, or Helpers; do, be, have, shall, will: as, “I do love, I did
love; I am loved, I was loved; I have loved, I have been loved; I shall, or will, love, or be
loved.”
The two principal auxiliaries, to have, and to be, are thus varied according to Person,
Number, Time, and Mode.
Time is Present, Past, or Future.

To HAVE.

Indicative Mode.
Present Time.
Person. Sing. Plur.
1. I have, We }
2. Thou hast[19], Ye } have.
3. He hath, or has; They }
Past Time.
1. I had, We }
2. Thou hadst, Ye } had.
3. He had; They }
Future Time.
1. I shall, or will, } We }
2. Thou shalt, or wilt, } have; Ye } shall, or will, have.
3. He shall, or will, } They }
Imperative Mode.
1. Let us have,
2. Have thou, or, Have ye, or,
Do thou have, Do ye have,
3. Let him have; Let them have.
Subjunctive Mode.
Present Time.
1. I } We }
2. Thou } have; Ye } have.
3. He } They }
Infinitive Mode.
Present, To have: Past, To have had.
Participle.
Present, Having: Perfect[20], Had: Past, Having had.

To BE.
Indicative Mode.
Present Time.
1. I am, We }
2. Thou art, Ye } are.
3. He is; They }
Or,
1. I be, We }
2. Thou beest, Ye } be.
3. He is; They }
Past Time.
1. I was, We }
2. Thou wast, Ye } were.
3. He was; They }
Future Time.
1. I shall, or will, } We }
2. Thou shalt, or wilt, } be; Ye } shall, or will, be.
3. He shall, or will, } They }
Imperative Mode.
1. Let us be,
2. Be thou, or, Be ye, or,
Do thou be, Do ye be,
3. Let him be; Let them be.
Subjunctive Mode.
Present Time.
1. I } We }
2. Thou } be; Ye } be.
3. He } They }
Past Time.
1. I were, We }
2. Thou wert[21], Ye } were.
3. He were; They }
Infinitive Mode.
Present, To be: Past, To have been.
Participle.
Present, Being: Perfect, Been: Past, Having been.

The Verb Active is thus varied according to Person, Number, Time and Mode.

Indicative Mode.
Present Time.
Person. Sing. Plur.
1. I love, We } love.
2. Thou lovest, Ye }
3. He loveth, or loves; They }
Past Time.
1. I loved, We }
2. Thou lovedst, Ye } loved.
3. He loved; They }
Future Time.
1. I shall, or will, } We }
2. Thou shalt, or wilt, } love; Ye } shall, or will, love.
3. He shall, or will, } They }
Imperative Mode.
1. Let us love,
2. Love thou, or, Love ye, or,
Do thou love, Do ye love,
3. Let him love; Let them love.
Subjunctive Mode.
Present Time.
1. I } We }
2. Thou } love; Ye } love.
3. He } They }
And,
1. I may } We }
may love; and
2. Thou mayst } love; Ye }
have loved[22].
3. He may } They }
Past Time.
1. I might } We }
might love; and
2. Thou mightest } love; Ye }
have loved[22].
3. He might } They }
And,

I could, should, would; Thou couldst, &c. love; and have loved.

Infinitive Mode.

Present, To love: Past, To have loved.

Participle.

Present, Loving: Perfect, Loved: Past, Having loved.

But in discourse we have often occasion to speak of Time not only as Present, Past,
and Future, at large and indeterminately, but also as such with some particular distinction
and limitation; that is, as passing, or finished; as imperfect, or perfect. This will best be
seen in an example of a Verb laid out and distributed according to these distinctions of
Time.

Indefinite, or Undetermined, Time:


Present, Past, Future,
I love; I loved; I shall love.

Definite, or Determined, Time:

Present Imperfect: I am (now) loving.


Present Perfect: I have (now) loved.
Past Imperfect: I was (then) loving.
Past Perfect: I had (then) loved.
Future Imperf. I shall (then) be loving.
Future Perf. I shall (then) have loved.

To express the Present and Past Imperfect of the Active and Neuter Verb the Auxiliary
do is sometimes used: I do (now) love; I did (then) love.
Thus with very little variation of the Principal Verb the several circumstances of Mode
and Time are clearly expressed by the help of the Auxiliaries, be, have, do, let, may, can,
shall, will.
The peculiar force of the several Auxiliaries is to be observed. Do and did mark the
Action itself, or the Time of it[23], with greater force and distinction. They are also of
frequent and almost necessary use in Interrogative and Negative Sentences. Let does
not only express permission; but praying, exhorting, commanding. May and might
express the possibility or liberty of doing a thing; can and could, the power. Must is
sometimes called in for a helper, and denotes necessity. Would expresses the intention of
the doer; should simply the event. Will in the first Person singular and plural promises or
threatens; in the second and third Persons only foretells: shall on the contrary, in the first
Person simply foretells; in the second and third Persons commands or threatens[24].
Do and have make the Present Time; did, had, the Past; shall, will, the Future: let the
Imperative Mode; may, might, could, would, should, the Subjunctive. The Preposition to
placed before the Verb makes the Infinitive Mode. Have, through its several Modes and
Times, is placed only before the Perfect Participle; and be, in like manner, before the
Present and Passive Participles: the rest only before the Verb itself in its Primary
Form[25].
The Passive Verb is only the Participle Passive, (which for the most part is the same
with the Indefinite Past Time Active, and always the same with the Perfect Participle)
joined to the Auxiliary Verb to be through all its Variations: as, I am loved; I was loved; I
have been loved; I shall be loved: and so on through all the Persons, Numbers, Times,
and Modes.
The Neuter Verb is varied like the Active; but, having somewhat of the Nature of the
Passive, admits in many instances of the Passive form, retaining still the Neuter
signification; chiefly in such Verbs as signify some sort of motion, or change of place or
condition: as, I am come; I was gone; I am grown; I was fallen[26]. The Verb am in this
case precisely defines the Time of the action or event, but does not change the nature of
it; the Passive form still expressing, not properly a Passion, but only a state or condition
of Being.
IRREGULAR VERBS.

In English both the Past Time Active and the Participle Perfect, or Passive, are formed
by adding to the Verb ed; or d only when the Verb ends in e: as, turn, turned; love, loved.
The Verbs that vary from this rule, in either or in both cases, are esteemed Irregular.
The nature of our language, the Accent and Pronunciation of it, inclines us to contract
even all our Regular Verbs: thus loved, turned, are commonly pronounced in one syllable,
lov’d, turn’d; and the second Person which was originally in three syllables, lovedest,
turnedest, is become a dissyllable, lovedst, turnedst: for as we generally throw the accent
as far back as possible towards the first part of the word, (in some even to the fourth
syllable from the end,) the stress being laid on the first syllables, the rest are pronounced
in a lower tone, more rapidly and indistinctly; and so are often either wholly dropt, or
blended into one another.
It sometimes happens also, that the word which arises from a regular change does not
sound easily or agreeably; sometimes by the rapidity of our pronunciation the vowels are
shortened or lost; and the consonants which are thrown together do not easily coalesce
with one another, and are therefore changed into others of the same organ, or of a
kindred species: this occasions a further deviation from the regular form: thus, loveth,
turneth, are contracted into lov’th, turn’th, and these for easier pronunciation immediately
become loves, turns.
Verbs ending in ch, ck, p, x, ll, ss, in the Past Time Active and the Participle Perfect or
Passive admit the change of ed into t; as, snatcht, checkt, snapt, mixt, dropping also one
of the double letters, dwelt, past; for snatched, checked, snapped, mixed, dwelled,
passed: those that end in l, m, n, p, after a diphthong, moreover shorten the diphthong, or
change it into a single short vowel; as, dealt, dreamt, meant, felt, slept, &c: all for the
same reason; from the quickness of the pronunciation, and because the d after a short
vowel will not easily coalesce with the preceding consonant. Those that end in ve change
also v into f; as, bereave, bereft; leave, left; because likewise v after a short vowel will not
easily coalesce with t.
All these, of which we have hitherto given examples, are considered not as Irregular,
but as Contracted only; and in all of them the Intire as well as the Contracted form is
used.
The formation of Verbs in English, both Regular and Irregular, is derived from the
Saxon.
The Irregular Verbs in English are all Monosyllables, unless Compounded; and they are
for the most part the same words which are Irregular Verbs in the Saxon.
As all our Regular Verbs are subject to some kind of Contraction, so the first Class of
Irregulars is of those that become so from the same cause.
I.
Irregulars by Contraction.
Some Verbs ending in d or t have the Present, the Past Time, and the Participle Perfect
and Passive, all alike, without any variation: as, Beat, burst[27], cast, cost, cut, hit, hurt,
knit, let, lift[28], put, read[29], rent, rid, set, shed, shred, shut, slit, spread, thrust, wet[28].
These are Contractions from beated, bursted, casted, &c; because of the disagreeable
sound of the syllable ed after d or t[30].
Others in the Past Time, and Participle Perfect and Passive, vary a little from the
Present by shortening the diphthong, or changing the d into t: as, Lead, led; sweat, swet;
meet, met; bleed, bled; breed, bred; feed, fed; speed, sped; bend, bent[28]; lend, lent;
rend, rent; send, sent; spend, spent; build, built[28]; geld, gelt[28]; gild, gilt[28]; gird, girt[28].
Others not ending in d or t are formed by Contraction; have, had, for haved; make,
made, for maked; flee, fled, for flee-ed.
The following beside the Contraction change also the Vowel; Sell, sold; tell, told; clothe,
clad[28].
Stand, stood; and dare, durst, (which in the Participle hath regularly dared;) are directly
from the Saxon, standan, stod; dyrran, dorste.
II.
Irregulars in ght.
The Irregulars of the Second Class end in ght, both in the Past Time and Participle; and
change the vowel or diphthong into au or ou: they are taken from the Saxon, in which the
termination is hte.

Saxon.
Bring, brought: Bringan, brohte.
Buy, bought: Bycgean, bohte.
Catch, caught:
Fight, fought: Feotan, fuht.
Teach, taught: Tæchan, tæhte.
Think, thought: Thencan, thohte.
Seek, sought: Secan, sohte.
Work, wrought: Weorcan, worhte.

Fraught seems rather to be an Adjective than the Participle of the Verb to freight, which
has regularly freighted. Raught from reach is obsolete.
III.
Irregulars in en.
The Irregulars of the Third Class form the Past Time by changing the vowel or
diphthong of the Present; and the Participle Perfect and Passive by adding the
termination en, beside, for the most part, the change of the vowel or diphthong. These
also derive their formation in both parts from the Saxon.

Present. Past. Participle.

a changed into e.
Fall, fell, fallen.
a into o.
Awake, awoke, [awaked.]
a into oo.
Forsake, forsook, forsaken.
Shake, shook, shaken.
Take, took, taken.
aw into ew.
Draw, drew, drawn[31].
ay into ew.
Slay, slew, slayn[31].
e into a or o, o.
Get, gat, or got, gotten.
Help, [helped,] holpen.
Melt, [melted,] molten[28].
Swell, [swelled,] swollen[28].
ea into a or o.
Eat, ate, eaten.
Bear, bare, or bore, born.
Break, brake, or broke, broken.
Cleave, clave, or clove[28], cloven[28].
Speak, spake, or spoke, spoken.
Swear, sware, or swore, sworn.
Tear, tare, or tore, torn.
Wear, ware, or wore, worn.
Heave, hove[28], hoven.
Shear, shore, shorn.
Steal, stole, stolen, or stoln.
Tread, trode, trodden.
Weave, wove, woven.
ee into o, o.
Creep, crope, [creeped, or crept.]
Freeze, froze, frozen.
Seethe, sod, sodden.
ee into aw.
See, saw, seen.
i long into i short, i short.
Bite, bit, bitten.
Chide, chid, chidden.
Hide, hid, hidden.
Slide, slid, slidden.
i long into o, i short.
Abide, abode.
Drive, drove, driven.
Ride, rode, ridden.
Rise, rose, risen.
Shine, shone, [shined.]
Shrive, shrove, shriven.
Smite, smote, smitten.
Stride, strode, stridden.
Strive, strove[28], striven[28].
Thrive, throve, thriven.
Write[32], wrote, written.

i long into u, i short.


Strike, struck, stricken, or strucken.
i short into a.
Bid, bade, bidden.
Give, gave, given.
Sit[33], sat, sitten.
Spit, spat, spitten.
i short into u.
Dig, dug[28], [digged.]

ie into ay.
Lie[34], lay, lien, or lain.

o into e.
Hold, held, holden.
o into i.
Do, did, done, i. e. doen.
oo into o, o.
Choose, chose, chosen[35].
ow into ew.
Blow, blew, blown.
Crow, crew, [crowed.]
Grow, grew, grown.
Know, knew, known.
Throw, threw, thrown.
y into ew, ow.
Fly[36], flew, flown.

The following are Irregular only in the Participle; and that without changing the vowel.

Bake, [baked,] baken[28].


Grave, [graved,] graven[28].
Hew, [hewed,] hewen, or hewn[28].
Lade, [laded,] laden.
Load, [loaded,] loaden[28].
Mow, [mowed,] mown[28].
Rive, [rived,] riven.
Saw, [sawed,] sawn[28].
Shave, [shaved,] shaven[28].
Shew, [shewed,] shewn[28].
Sow, [sowed,] sown[28].
Straw, -ew, or -ow, [strawed,
&c.] strown[28].
Wax, [waxed,] waxen[28].

Some Verbs which change i short into a or u, and i long into ou, have dropt the
termination en in the Participle.

i short into a or u, u.
Begin, began, begun.
Cling, clang, or clung, clung.
Drink, drank, drunk, or drunken.
Fling, flung, flung.
Ring, rang, or rung, rung.
Shrink, shrank, or shrunk, shrunk.
Sing, sang, or sung, sung.
Sink, sank, or sunk, sunk.
Sling, slang, or slung, slung.
Slink, slunk, slunk.
Spin, span, or spun, spun.
Spring, sprang, or sprung, sprung.
Sting, stung, stung.
Stink, stank, or stunk, stunk.
String, strung, strung.
Swim, swam, or swum, swum.
Swing, swung, swung.
Wring, wrung, wrung.

In many of the foregoing the original and analogical form of the Past Time in a, which
distinguished it from the Participle, is grown quite obsolete.

i long into ou, ou.


Bind, bound, bound, or bounden.
Find, found, found.
Grind, ground, ground.
Wind, wound, wound.

That all these had originally the termination en in the Participle, is plain from the
following considerations. Drink and bind still retain it; drunken, bounden; from the Saxon,
druncen, bunden: and the rest are manifestly of the same analogy with these. Begonnen,
sonken, and founden, are used by Chaucer; and some others of them appear in their
proper shape in the Saxon; scruncen, spunnen, sprungen, stungen, wunden. As likewise
in the German, which is only another off-spring of the Saxon: begunnen, geklungen,
getruncken, gesungen, gesuncken, gespunnen, gesprungen, gestuncken,
geschwummen, geschwungen.
The following seem to have lost the en of the Participle in the same manner:

Hang, hung, hung.


Shoot, shot, shot.
Stick, stuck, stuck.
Come, came, come.
Run, ran, run.
Win, won, won.

Hangen, and scoten, are the Saxon originals of the two first Participles; the latter of
which is likewise still in use in its first form in one phrase; a shotten herring. Stuck seems
to be a contraction from stucken, as struck now in use for strucken. Chaucer hath comen
and wonnen: becommen is even used by Lord Bacon[37]. And most of them still subsist
intire in the German; gehangen, kommen, gerunnen, gewonnen.
To this third Class belong the Defective Verbs, Be, been; and Go, gone; i. e. goen.
From this Distribution and account of the Irregular Verbs, if it be just, it appears, that
originally there was no exception whatever from the Rule, That the Participle Præterit, or
Passive, in English ends in d, t, or n. The first form included all the Regular Verbs, and
those which are become Irregular by Contraction ending in t. To the second properly
belonged only those which end in ght, from the Saxon Irregulars in hte. To the third, those
from the Saxon Irregulars in en, which have still, or had originally, the same termination.
The same Rule affords a proper foundation for a division of the English Verbs into
Three Conjugations, of which the three different Terminations of the Participle might
respectively be the Characteristics. The Contracted Verbs, whose Participles now end in
t, might perhaps be best reduced to the first Conjugation, to which they naturally and
originally belonged; and they seem to be of a very different analogy from those in ght. But
as the Verbs of the first Conjugation would so greatly exceed in number those of both the
others, which together make but about 110[38]; and as those of the third Conjugation are
so various in their form, and so incapable of being reduced to one plain Rule; it seems
better in practice to consider the first in ed as the only Regular form, and the others as
deviations from it; after the example of the Saxon and German Grammarians.
To the Irregular Verbs are to be added the Defective; which are not only for the most
part Irregular, but are also wanting in some of their parts. They are in general words of
most frequent and vulgar use; in which Custom is apt to get the better of Analogy. Such
are the Auxiliary Verbs, most of which are of this number. They are in use only in some of
their Times, and Modes; and some of them are a Composition of Times of several
Defective Verbs having the same signification.

Present. Past. Participle.


Am, or Be, was, been.
Can, could.
Go, went, gone.
May, might.
Must.
Ought, ought.
Quoth, quoth.
Shall, should.
Weet, wit, or wot; wot.
Will, would.
Wist, wist.

There are not in English so many as a Hundred Verbs, (being only the chief part, but
not all, of the Irregulars of the Third Class,) which have a distinct and different form for the
Past Time Active and the Participle Perfect or Passive. The General bent and turn of the
language is towards the other form, which makes the Past Time and the Participle the
same. This general inclination and tendency of the language, seems to have given
occasion to the introducing of a very great Corruption; by which the Form of the Past
Time is confounded with that of the Participle in these Verbs, few in proportion, which
have them quite different from one another. This confusion prevails greatly in common
discourse, and is too much authorised by the example of some of our best Writers[39].
Thus it is said, He begun, for he began; he run, for he ran; he drunk, for he drank: the
Participle being used instead of the Past Time. And much more frequently the Past Time
instead of the Participle: as, I had wrote, it was wrote, for I had written, it was written; I
have drank, for I have drunk; bore, for born; chose, for chosen; bid, for bidden; got, for
gotten; &c. This abuse has been long growing upon us, and is continually making further
incroachments: as it may be observed in the example of those Irregular Verbs of the Third
Class, which change i short into a and u; as, Cling, clang, clung; in which the original and
analogical form of the Past Time in a is almost grown obsolete; and, the u prevailing
instead of it, the Past Time is now in most of them confounded with the Participle. The
Vulgar Translation of the Bible, which is the best standard of our language, is free from
this corruption, except in a few instances; as, hid is used for hidden; held, for holden,
frequently: bid, for bidden; begot, for begotten, once or twice: in which, and a few other
like words, it may perhaps be allowed as a Contraction. And in some of these Custom
has established it beyond recovery. In the rest it seems wholly inexcusable. The absurdity
of it will be plainly perceived in the example of some of these Verbs, which Custom has
not yet so perverted. We should be immediately shocked at I have knew, I have saw, I
have gave, &c: but our ears are grown familiar with I have wrote, I have drank, I have
bore, &c. which are altogether as barbarous.

ADVERB.
Adverbs are added to Verbs and Adjectives to denote some modification or
circumstance of an action or quality: as, the manner, order, time, place, distance, motion,
relation, quantity, quality, comparison, doubt, affirmation, negation, demonstration,
interrogation.
In English they admit of no Variation; except some few of them, which have the
degrees of Comparison: as,[40] “often, oftener, oftenest;” “soon, sooner, soonest.”
An Adverb is sometimes joined to another Adverb to modify or qualify its meaning; as,
“very much; much too little; not very prudently.”

PREPOSITION.
Prepositions, so called because they are commonly put before the words to which
they are applied, serve to connect words with one another, and to shew the relation
between them.
One great use of Prepositions in English, is to express those relations which in some
languages are chiefly marked by Cases, or the different endings of the Noun.
Most Prepositions originally denote the relation of Place, and have been thence
transferred to denote by similitude other relations. Thus, out, in, through, under, by, to,
from, of, &c. Of is much the same with from; “ask of me,” that is, from me: “made of
wood;” “Son of Philip;” that is, sprung from him. For, in its primary sense, is pro, loco
alterius, in the stead, or place, of another. The notion of Place is very obvious in all the
rest.

CONJUNCTION.
The Conjunction connects or joins together Sentences; so as out of two to make one
Sentence.
Thus, “You, and I, and Peter, rode to London,” is one Sentence made up of these three
by the Conjunction and twice employed; “You rode to London; I rode to London; Peter
rode to London.” Again, “You and I rode to London, but Peter staid at home,” is one
Sentence made up of three by the Conjunctions and and but: both of which equally
connect the Sentences, but the latter expresses an Opposition in the Sense. The first is
therefore called a Conjunction Copulative; the other a Conjunction Disjunctive.
The use of Copulative Conjunctions is to connect, or to continue, the Sentence, by
expressing an addition, and; a supposition, or condition, if, as; a cause, because[41], then;
a motive, that; an inference, therefore; &c.
The use of Disjunctives is to connect and to continue the Sentence; but to express
Opposition of meaning in different degrees: as, or, but, than, altho’, unless, &c.

INTERJECTION.
Interjections, so called because they are thrown in between the parts of a sentence
without making any other alteration in it, are a kind of Natural Sounds to express the
affection of the Speaker.
The different Passions have for the most part different Interjections to express them.
The Interjection O placed before a Substantive expresses more strongly an address
made to that person or thing; as it marks in Latin what is called the Vocative Case.
SENTENCES.
A Sentence is an assemblage of words, expressed in proper
form, and ranged in proper order, and concurring to make a
complete sense.
Concord, or agreement of words, is when one word is required to
be in like case, number, gender, or person, with another.
Regimen, or government, is when a word causeth a following word
to be in some case, or mode.
Sentences are Simple, or Compounded.
A Simple Sentence hath in it but one Subject, and one Finite Verb;
that is, a Verb in the Indicative, Imperative, or Subjunctive Mode.
A Phrase is two or more words rightly put together in order to
make a part of a Sentence; and sometimes making a whole
Sentence.

The most common Phrases used in simple Sentences are as


follows:
1st Phrase: The Substantive before a Verb Active, Passive, or
Neuter; when it is said what thing is, does, or is done: as, “I am;”
“Thou writest;” “Thomas is loved:” where I, Thou, Thomas, are the
Nominative[42] Cases; and answer to the question who, or what? as,
“Who is loved? Thomas.” And the Verb agrees with the Nominative
Case in number and person[43]; as, Thou being the Second Person
Singular, the Verb writest is so too.
2d Phrase: The Substantive after a Verb Neuter or Passive; when
it is said, that such a thing is, or is made, or thought, or called, such
another thing; or, when the Substantive after the Verb is spoken of
the same thing or person with the Substantive before the Verb: as, “a
calf becomes an ox;” “Plautus is accounted a Poet;” “I am He.” Here
the latter Substantive is in the Nominative Case as well as the
former; and the Verb is said to govern the Nominative Case: or, the
latter Substantive may be said to agree in Case with the former.
3d Phrase: The Adjective after a Verb Neuter or Passive, in like
manner: as, “Life is short, and Art is long.” “Exercise is esteemed
wholesome.”
4th Phrase: The Substantive after a Verb Active, or Transitive: as
when one thing is said to act upon, or do something to another: as,
“to open a door;” “to build a house;” “Alexander conquered the
Persians.” Here the thing acted upon is in the Objective[44] Case; as
it appears plainly when it is expressed by the Pronoun, which has a
proper termination for that Case; “Alexander conquered them;” and
the Verb is said to govern the Objective Case.
5th Phrase: A Verb following another Verb; as, “boys love to play:”
where the latter Verb is in the Infinitive Mode.
6th Phrase: When one thing is said to belong to another; as,
“Milton’s poems:” where the thing to which the other belongs is
placed first, and is in the Possessive Case; or else last with the
Preposition of before it; as, “the poems of Milton.”
7th Phrase: When another Substantive is added to express and
explain the former more fully; as, “Paul the Apostle;” “King George:”
where they are both in the same case; and the latter is said to be put
in Apposition to the former.
8th Phrase: When the quality of the Substantive is expressed by
adding an Adjective to it: as, “a wise man;” “a black horse.”
Participles have the nature of Adjectives; as, “a learned man;” “a
loving father.”
9th Phrase: An Adjective with a Verb in the Infinitive Mode
following it: as, “worthy to die;” “fit to be trusted.”
10th Phrase: When a circumstance is added to a Verb, or to an
Adjective, by an Adverb: as, “you read well;” “he is very prudent.”
11th Phrase: When a circumstance is added to a Verb or an
Adjective by a Substantive with a Preposition before it: as, “I write for
you;” “he reads with care;” “studious of praise;” “ready for mischief.”
12th Phrase: When the same Quality in different Subjects is
compared; the Adjective in the Positive having after it the
Conjunction as, in the Comparative the Conjunction than, and in the
Superlative the Preposition of: as, “white as snow;” “wiser than I;”
“greatest of all.”

The Principal parts of a Simple Sentence are the Agent, the


Attribute, and the Object. The Agent is the thing chiefly spoken of;
the Attribute is the thing or action affirmed or denied of it; and the
Object is the thing affected by such action.
In English the Nominative Case denoting the Agent, usually goes
before the Verb, or Attribution, and the Objective Case, denoting the
Object, follows the Verb; and it is the order that determines the cases
in Nouns: as, “Alexander conquered the Persians.” But the Pronoun,
having a proper form for each of those cases, sometimes when it is
in the Objective Case is placed before the Verb, and when it is in the
Nominative Case follows the Object and Verb: as, “Whom ye
ignorantly worship, him declare I unto you.” And the Nominative
Case is sometimes placed after a Verb Neuter: as, “Upon thy right
hand did stand the Queen:” “On a sudden appeared the King.” And
frequently with the Adverbs there and then: as, “There was a man:”
“Then came unto him the Pharisees.” The reason of it is plain: the
Neuter Verb not admitting of an Objective Case after it, no ambiguity
of case can arise from such a position of the Noun.
Who, which, what, and the Relative that, though in the Objective
Case, are always placed before the Verb; as are also their
Compounds, whoever, whosoever, &c: as, “He whom you seek.”
“This is what, or the thing which, or that, you want.” “Whomsoever
you please to appoint.”
When the Verb is a Passive, the Agent and Object change places
in the Sentence; and the thing acted upon is in the Nominative Case,
and the Agent is accompanied with a Preposition: as, “The Persians
were conquered by Alexander.”
A Noun of Multitude[45], or signifying many; and two Nouns in the
Singular Number, joined together by a Conjunction Copulative; have
Verbs, Nouns, and Pronouns, agreeing with them in the Plural
Number: as, “When the King’s trump, the mob are for the King.”
Dryden. “Socrates and Plato were wise; they were the most eminent
Philosophers of Greece.”
If the Singulars so joined together are of several Persons, in
making the Plural Pronoun agree with them in Person, the second
Person takes place of the third, and the first of both: “He and You
and I won it at the hazard of our lives: You and He shared it between
you.”
The Verb to Be has always a Nominative Case after it; as, “it was
I, and not He, that did it:” unless it be in the Infinitive Mode; “though
you took it to be Him[46].”
The Adverbs when, while, after, &c. being left out, the Phrase is
formed with the Participle independently of the rest of the Sentence:
as, “The doors being shut, Jesus stood in the midst.” This is called
the Case Absolute. And the Case is in English always the
Nominative: as,

“God from the mount of Sinai, whose gray top


Shall tremble, He descending[47], will himself,
In thunder, lightning, and loud trumpet’s sound,
Ordain them laws.”

Milton, P. L. xii. 227.

To before a Verb is the sign of the Infinitive Mode: but there are
some few Verbs, which have other Verbs following them in the
Infinitive Mode without the sign to: as, bid, dare, need, make, see,
hear; and, let, have, not used as Auxiliaries: as, “I bade him do it;
you dare not do it; I saw him[48] do it; I heard him say it.”
The Infinitive Mode has much of the nature of a Substantive,
expressing the Action itself which the Verb signifies; as the Participle
has the nature of an Adjective. Thus the Infinitive Mode does the
office of a Substantive in different cases; in the Nominative; as, “to
play is pleasant:” in the Objective; as, “boys love to play.” In Greek it
admits of the Article through all its cases, with the Preposition in the
Oblique cases: in English the Article is not wanted, but the
Preposition may be used: “For to will is present with me; but to
perform that which is good I find not[49].” “All their works they do for
to be seen of men[50].”

“For not to have been dip’d in Lethe’s lake


Could save the Son of Thetis from to die.”

Spenser.

Perhaps therefore the Infinitive and the Participle might be more


properly called the Substantive Mode and the Adjective Mode[51].
The Participle with a Preposition before it, and still retaining its
Government, answers to what is called in Latin the Gerund: as,
“Happiness is to be attained, by avoiding evil, and by doing good; by
seeking peace, and by pursuing it.”
The Participle, with an Article before it, and the Preposition of after
it, becomes a Substantive, expressing the action itself which the
Verb signifies[52]: as, “These are the Rules of Grammar, by the
observing of which you may avoid mistakes.” Or it may be expressed
by the Participle, or Gerund; “by observing which:” not, “by observing
of which;” nor, “by the observing which:” for either of those two
Phrases would be a confounding of two distinct forms.
I will add another example, and that of the best authority: “The
middle station of life seems to be the most advantageously situated
for the gaining of wisdom. Poverty turns our thoughts too much upon

You might also like