100% found this document useful (1 vote)
20 views

Object Detection with Deep Learning Models: Principles and Applications 1st Edition S. Poonkuntran pdf download

The document is a comprehensive overview of the book 'Object Detection with Deep Learning Models: Principles and Applications', edited by S. Poonkuntran and others, published in 2023. It includes various chapters on deep learning applications in object detection, computer vision frameworks, and real-time systems, among others. The book serves as a resource for researchers and practitioners in the field of deep learning and computer vision.

Uploaded by

eberzgurnorp
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
20 views

Object Detection with Deep Learning Models: Principles and Applications 1st Edition S. Poonkuntran pdf download

The document is a comprehensive overview of the book 'Object Detection with Deep Learning Models: Principles and Applications', edited by S. Poonkuntran and others, published in 2023. It includes various chapters on deep learning applications in object detection, computer vision frameworks, and real-time systems, among others. The book serves as a resource for researchers and practitioners in the field of deep learning and computer vision.

Uploaded by

eberzgurnorp
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 73

Object Detection with Deep Learning Models:

Principles and Applications 1st Edition S.


Poonkuntran install download

https://ebookmeta.com/product/object-detection-with-deep-
learning-models-principles-and-applications-1st-edition-s-
poonkuntran/

Download more ebook from https://ebookmeta.com


We believe these products will be a great fit for you. Click
the link to download now, or visit ebookmeta.com
to discover even more!

Advancement of Deep Learning and its Applications in


Object Detection and Recognition 1st Edition Roohie
Naaz Mir

https://ebookmeta.com/product/advancement-of-deep-learning-and-
its-applications-in-object-detection-and-recognition-1st-edition-
roohie-naaz-mir/

Advanced Applied Deep Learning: Convolutional Neural


Networks and Object Detection 1st Edition Umberto
Michelucci

https://ebookmeta.com/product/advanced-applied-deep-learning-
convolutional-neural-networks-and-object-detection-1st-edition-
umberto-michelucci/

Deep Learning for Computer Vision Image Classification


Object Detection and Face Recognition in Python Jason
Brownlee

https://ebookmeta.com/product/deep-learning-for-computer-vision-
image-classification-object-detection-and-face-recognition-in-
python-jason-brownlee/

Music Education for Social Change Constructing an


Activist Music Education 1st Edition Juliet Hess

https://ebookmeta.com/product/music-education-for-social-change-
constructing-an-activist-music-education-1st-edition-juliet-hess/
Good Girl Wicked 1 1st Edition Piper Lawson Lawson
Piper

https://ebookmeta.com/product/good-girl-wicked-1-1st-edition-
piper-lawson-lawson-piper/

Sordid Sordid 1 1st Edition Nikki Sloane Sloane Nikki

https://ebookmeta.com/product/sordid-sordid-1-1st-edition-nikki-
sloane-sloane-nikki/

One Dom to Love The Doms of Her Life 1 1st Edition


Shayla Black Jenna Jacob Isabella Lapearl

https://ebookmeta.com/product/one-dom-to-love-the-doms-of-her-
life-1-1st-edition-shayla-black-jenna-jacob-isabella-lapearl/

Virtual Menageries Animals as Mediators in Network


Cultures Berland Jody

https://ebookmeta.com/product/virtual-menageries-animals-as-
mediators-in-network-cultures-berland-jody/

Cambridge IGCSE and O Level History Workbook 2C - Depth


Study: the United States, 1919-41 2nd Edition Benjamin
Harrison

https://ebookmeta.com/product/cambridge-igcse-and-o-level-
history-workbook-2c-depth-study-the-united-states-1919-41-2nd-
edition-benjamin-harrison/
Must Know High School Basic French 1st Edition Annie
Heminway

https://ebookmeta.com/product/must-know-high-school-basic-
french-1st-edition-annie-heminway/
Object Detection with Deep
Learning Models
Principles and Applications

Edited by
S. Poonkuntran
Rajesh Kumar Dhanraj
Balamurugan Balusamy
First edition published 2023
by CRC Press
6000 Broken Sound Parkway NW, Suite 300, Boca Raton, FL 33487-2742

and by CRC Press


4 Park Square, Milton Park, Abingdon, Oxon, OX14 4RN

CRC Press is an imprint of Taylor & Francis Group, LLC

© 2023 selection and editorial matter, [S Poonkuntran, Rajesh Kumar Dhanraj, Balamurugan Balusamy]; indi-
vidual chapters, the contributors

Reasonable efforts have been made to publish reliable data and information, but the author and publisher cannot
assume responsibility for the validity of all materials or the consequences of their use. The authors and publishers
have attempted to trace the copyright holders of all material reproduced in this publication and apologize to
copyright holders if permission to publish in this form has not been obtained. If any copyright material has not
been acknowledged please write and let us know so we may rectify in any future reprint.

Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted,
or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, includ-
ing photocopying, microfilming, and recording, or in any information storage or retrieval system, without writ-
ten permission from the publishers.

For permission to photocopy or use material electronically from this work, access www.copyright.com or contact
the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. For works
that are not available on CCC please contact [email protected]

Trademark notice: Product or corporate names may be trademarks or registered trademarks and are used only for
identification and explanation without intent to infringe.

Library of Congress Cataloging‑in‑Publication Data


Names: Poonkuntran, S., editor. | Dhanraj, Rajesh Kumar, editor. |
Balusamy, Balamurugan, editor.
Title: Object detection with deep learning models : principles and
applications / edited by S Poonkuntran, Rajesh Kumar Dhanraj,
Balamurugan Balusamy.
Description: First edition. | Boca Raton : Chapman & Hall/CRC Press, 2023.
| Includes bibliographical references and index.
Identifiers: LCCN 2022015567 (print) | LCCN 2022015568 (ebook) | ISBN
9781032074009 (hardback) | ISBN 9781032349244 (paperback) | ISBN
9781003206736 (ebook)
Subjects: LCSH: Computer vision. | Pattern recognition systems. | Deep
learning (Machine learning)
Classification: LCC TA1634 .O255 2023 (print) | LCC TA1634 (ebook) | DDC
006.3/7--dc23/eng/20220725
LC record available at https://lccn.loc.gov/2022015567
LC ebook record available at https://lccn.loc.gov/2022015568

ISBN: 978-1-032-07400-9 (hbk)


ISBN: 978-1-032-34924-4 (pbk)
ISBN: 978-1-003-20673-6 (ebk)

DOI: 10.1201/9781003206736

Typeset in Palatino
by SPi Technologies India Pvt Ltd (Straive)
Contents

Editors............................................................................................................................................. vii
List of Contributors.........................................................................................................................ix

1. Introduction: Deep Learning and Computer Vision...................................................... 1


A.S. Renugadevi

2. Object Detection Frameworks and Services in Computer Vision............................. 23


Sachi Choudhary, Rashmi Sharma, and Gargeya Sharma

3. Real-Time Tracing and Alerting System for Vehicles and Children to Ensure
Safety and Security, Using LabVIEW.............................................................................. 49
R. Deepalakshmi and R. Vijayalakshmi

4. Mobile Application-based Assistive System for Visually Impaired People:


A Hassle-Free Shopping Support System...................................................................... 65
E. Ramanujam and M. Manikandakumar

5. Traffic Density and On-road Moving Object Detection Management, Using


Video Processing................................................................................................................. 81
Ankit Shrivastava and S. Poonkuntran

6. Automated Vehicle Number Plate Recognition System, Using Convolution


Long Short-Term Memory Technique........................................................................... 101
S. Srinivasan, D. Prabha, N. Mohammed Raffic, K. Ganesh Babu,
S. Thirumurugaveerakumar, and K. Sangeetha

7. Deep Learning-based Indian Vehicle Number Plate Detection and


Recognition......................................................................................................................... 117
M. Arun Anoop, S. Poonkuntran, and P. Karthikeyan

8. Smart Diabetes System Using CNN in Health Data Analytics................................ 137


P. Ravikumaran, K. Vimala Devi, and K. Valarmathi

9. Independent Automobile Intelligent Motion Controller and Redirection,


Using a Deep Learning System......................................................................................165
S. Aanjanadevi, V. Palanisamy, S. Aanjankumar, S. Poonkuntran, and P. Karthikeyan

10. Deep Learning Solutions for Pest Detection...............................................................179


C. Nandhini and M. Brindha

11. Deep Learning Solutions for Pest Identification in Agriculture.............................. 199


Monika Vyas, Amit Kumar, and Vivek Sharma

v
vi Contents

12. A Complete Framework for LULC Classification of Madurai Remote


Sensing Images with Deep Learning-based Fusion Technique............................... 215
T. Gladima Nisia and S. Rajesh

13. Human Behavioral Identifiers: A Detailed Discussion............................................. 237


T. Suba Nachiar, T. Shanmuga Priya, P.R. Hemalatha, and J.V. Anchitaalagammai

Index.............................................................................................................................................. 253
Editors

Poonkuntran Shanmugam earned a BE degree in Information


Technology from Bharathidasan University, Tiruchirapalli, India; and
MTech and PhD degrees in Computer and Information Technology
from Manonmaniam Sundaranar University, Tirunelveli, India. He
is presently with VIT Bhopal University, Madhya Pradesh, India
as Professor & Dean for the School of Computing Science and
Engineering. He has more than a decade of experience in teaching
and research and successfully executed three funded research grant
projects from the Indian Space Research Organization, Defense
Research Development Organization, and Ministry of New and
Renewable Energy, Government of India, to the tune of 1.10 Crores.
He received two seminar grants from Anna University, Chennai, and the All India Council
for Technical Education-Indian Society for Technical Education for the tune of 4 Lacs.
He has published more than 80 technical publications, authored 6 books and 2 chap-
ters. He is the recipient of Cognizant Best Faculty Award 2017–18 and served as a State
Level Student Coordinator for Region VII, CSI, India in 2016–17. He is a lifetime member
of IACSIT, Singapore, CSI, India, and ISTE, India. His research areas of interests include
information security, computer vision, artificial intelligence, and machine learning.

Dr Rajesh Kumar Dhanraj is a Professor in the School of Computing


Science and Engineering at Galgotias University, Greater Noida,
India. He earned a BE degree in Computer Science and Engineering
from the Anna University Chennai, India in 2007, then an MTech
from the Anna University Coimbatore, India in 2010 and a PhD in
Computer Science from Anna University, Chennai, India, in 2017.
He has contributed to 30+ authored and edited books on various
technologies, 21 Patents and 53 articles and papers in various ref-
ereed journals and international conferences and contributed chap-
ters to books. His research interests include Machine Learning, Cyber-Physical Systems
and Wireless Sensor Networks. He is a senior member of the Institute of Electrical and
Electronics Engineers (IEEE), member of the Computer Science Teacher Association
(CSTA), and the International Association of Engineers (IAENG). He is an associate editor
and guest editor for reputed journals. He is an Expert Advisory Panel Member of Texas
Instruments Inc., USA.

vii
viii Editors

Balamurugan Balusamy is currently an Associate Dean Student in


Shiv Nadar University, Delhi-NCR. Prior to this assignment he was
Professor, School of Computing Sciences & Engineering and Director
of International Relations at Galgotias University, Greater Noida,
India. His contributions focus on Engineering Education, Block
Chain and Data Sciences. His Academic degrees and twelve years of
experience working as a faculty member in a global University like
VIT University, Vellore, has made him more receptive and promi-
nent in his domain. He has 200 plus high impact factor papers in
Springer, Elsevier and IEEE. He has done more than 80 edited and
authored books and collaborated with eminent professors across the
world from top QS ranked universities.
Prof. Balamurugan Balusamy has served up to the position of associate professor in his
12 years stint with VIT University, Vellore. He completed his Bachelors, Masters and PhD
degrees at top premier institutions in India. His passion is teaching, and he adapts differ-
ent design thinking principles while delivering his lectures. He has published 80+ books
about various technologies and visited over 15 countries for his technical courses. He has
several top-notch conferences in his resume and has published over 200 quality journal
articles, conferences and book chapters combined. He serves in the advisory committee for
several start-ups and forums and does consultancy work for industry on Industrial IOT.
He has given over 195 talks at various events and symposiums.
List of Contributors

S. Aanjanadevi P.R. Hemalatha


Alagappa University Velammal College of Engineering &
Tamil Nadu, India Technology
Madurai, India
S. Aanjankumar
School of Computing Science and P. Karthikeyan
Engineering, VIT Bhopal University, Velammal College of Engineering and
India Technology
Madurai, Tamil Nadu, India
J.V. Anchitaalagammai
Velammal College of Engineering & Amit Kumar
Technology IIIT Kota
Madurai, India Rajasthan, India
M. Arun Anoop
Royal College of Engineering and M. Manikandakumar
Technology Thiagarajar College of Engineering
Akkikkavu, Thrissur, Kerala Tamil Nadu, India

M. Brindha N. Mohammed Raffic


NIT Tiruchirappalli Nehru Institute of Technology
Tamil Nadu, India Coimbatore, India

Sachi Choudhary C. Nandhini


University of Petroleum & Energy Studies NIT Tiruchirappalli
Dehradun, India Tamil Nadu, India

R. Deepalakshmi V. Palanisamy
Velammal College of Engineering and Alagappa University
Technology, Viraganoor Tamil Nadu, India
Tamil Nadu, India

K. Ganesh Babu S. Poonkuntran


Chendhuran College of Engineering & School of Computing Science and
Technology Engineering, VIT Bhopal University,
Pudukottai, India Madhya Pradesh, India

T. Gladima Nisia D. Prabha


AAA College of Engineering and Sri Krishna College of Engineering and
Technology Technology
Sivakasi, India Coimbatore, India

ix
x List of Contributors

S. Rajesh Ankit Shrivastava


Mepco Schlenk Engineering College School of Computing Science and
Sivakasi, India Engineering
VIT Bhopal University, India
E. Ramanujam
National Institute of Technology Silchar, S. Srinivasan
Assam, India Nehru Institute of Technology
Coimbatore, India
P. Ravikumaran
Fatima Michael College of Engineering &
Technology T. Suba Nachiar
Madurai, Tamil Nadu, India Velammal College of Engineering &
Technology Madurai, India
A.S. Renugadevi
Kongu Engineering College S. Thirumurugaveerakumar
Tamil Nadu, India Panimalar Engineering College Chennai,
India
K. Sangeetha
Panimalar Engineering College
Chennai, India K. Valarmathi
P.S.R Engineering College
T. Shanmuga Priya Sivakasi, India
Vuram Technologies
India R. Vijayalakshmi
Velammal College of Engineering and
Gargeya Sharma Technology
University of Petroleum & Energy Studies Tamil Nadu, India
Dehradun, India

Rashmi Sharma K. Vimala Devi


University of Petroleum & Energy Studies Vellore Institute of Technology
Dehradun, India Vellore, India

Vivek Sharma Monika Vyas


MNIT Jaipur IIIT Kota
Rajasthan, India Rajasthan, India
1
Introduction: Deep Learning and Computer Vision

A.S. Renugadevi
Kongu Engineering College, Tamil Nadu, India

CONTENTS
1.1 Introduction to Deep Learning�������������������������������������������������������������������������������������������� 2
1.1.1 Deep Learning������������������������������������������������������������������������������������������������������������ 2
1.1.2 Machine Learning and Deep Learning������������������������������������������������������������������� 3
1.1.3 Types of Networks in Deep Learning��������������������������������������������������������������������� 3
1.1.3.1 Connection Type of Networks������������������������������������������������������������������ 4
1.1.3.2 Topology-based Neural Networks����������������������������������������������������������� 6
1.1.3.3 Learning Methods��������������������������������������������������������������������������������������� 8
1.2 Convolutional Neural Networks����������������������������������������������������������������������������������������� 9
1.2.1 Description of Five Layers of General CNN Architecture����������������������������������� 9
1.2.1.1 Input Layer������������������������������������������������������������������������������������������������� 10
1.2.1.2 Convolutional Layer��������������������������������������������������������������������������������� 10
1.2.1.3 Pooling Layer��������������������������������������������������������������������������������������������� 11
1.2.1.4 Fully Connected Layers��������������������������������������������������������������������������� 12
1.2.1.5 Output Layer��������������������������������������������������������������������������������������������� 13
1.2.2 Types of Architecture in CNN�������������������������������������������������������������������������������� 13
1.2.2.1 LeNet-5������������������������������������������������������������������������������������������������������� 13
1.2.2.2 AlexNet������������������������������������������������������������������������������������������������������� 14
1.2.2.3 ZFNet���������������������������������������������������������������������������������������������������������� 14
1.2.2.4 GoogLeNet/Inception������������������������������������������������������������������������������ 14
1.2.2.5 VGGNet������������������������������������������������������������������������������������������������������ 15
1.2.2.6 ResNet��������������������������������������������������������������������������������������������������������� 15
1.2.3 Applications of Deep Learning������������������������������������������������������������������������������ 16
1.3 Image Classification, Object Detection and Face Recognition��������������������������������������17
1.3.1 Dataset Creation������������������������������������������������������������������������������������������������������� 17
1.3.2 Data Preprocessing�������������������������������������������������������������������������������������������������� 18
1.3.3 Image Classification������������������������������������������������������������������������������������������������� 18
1.3.4 Object Detection������������������������������������������������������������������������������������������������������� 19
1.3.5 Face Recognition������������������������������������������������������������������������������������������������������ 20
References��������������������������������������������������������������������������������������������������������������������������������������� 21

DOI: 10.1201/9781003206736-1 1
2 Object Detection with Deep Learning Models

1.1 Introduction to Deep Learning


1.1.1 Deep Learning
Artificial intelligence enables computers to mimic human behavior. The machine l­ earning
and deep learning concepts are covered under artificial intelligence. In machine learn-
ing, the algorithm for feature extraction should be given by humans, whereas in deep
learning, the feature extraction is done automatically by the perception of neurons. The
emergence of deep learning is illustrated in Figure 1.1.
The classification, clustering and predicting of the result by the neural networks are
done in deep learning. The central part of deep learning is neural networks. The patterns
are identified by using neural networks. The neural networks can be designed as a set of
algorithms for predicting results. Nothing has emerged in deep learning newly; instead,
due to the exponential increase in processing capacity, both machine learning and deep
learning came into existence.
The neurons in the brain of humans are data carriers, and billions of neurons are con-
nected with one another. Based on the logic of neurons in the human brain, the neurons are
designed in the system. The artificial neural network is created with the help of neurons.
Some neurons are going to act as input collectors, some neurons are going to act as output
displayers, and some neurons are used in the processing of input.

FIGURE 1.1
Emergence of deep learning.
Deep Learning and Computer Vision 3

FIGURE 1.2
Illustration of an artificial neuron.

In the artificial neural network, the neuron plays a major role. The structure of an artifi-
cial neuron consists of inputs from x0 through xn and weights w1 through wn. Each input
value is passed to the summation function. After that, the summed value obtained is
passed to the activation function, and output y is generated. The structure of the neuron is
given in Figure 1.2.

1.1.2 Machine Learning and Deep Learning


Machine learning can be called an approach for achieving artificial intelligence. It is the
method of using algorithms to trace the data, learn from it, and make predictions over
some things in the real world. Deep learning can be taken as a technique for implement-
ing machine learning. The difference between machine learning and deep learning is
explained in Table 1.1.

1.1.3 Types of Networks in Deep Learning


The neural networks that come under the category of machine learning help to predict the
patterns by learning from the data and utilizing the neurons effectively. The types of neural
networks are classified according to various features:

TABLE 1.1
Differences between Machine Learning and Deep Learning
Machine Learning Deep Learning

Small amount of data is needed to provide accuracy. Large amount of data is needed for training.
It requires low system specifications. It requires high system specifications.
The given problem is divided into multiple tasks, and The given problem is solved fully as a node-to-
each task is solved independently. Finally, the results node problem.
are combined.
The time needed for training the model is low. The time needed for training the model is high.
But for testing the data with the model, the time Here, less time is needed to test the data with
required is high. the model.
4 Object Detection with Deep Learning Models

Types of connection
Static feedforward networks
Dynamic feedback networks
Topology of networks
Single-layer neural networks
Multilayer neural networks
Recurrent neural networks
Learning methods
Supervised Learning
Unsupervised Learning
Reinforcement Learning

1.1.3.1 Connection Type of Networks


1.1.3.1.1 Static Feedforward Networks
Feedforward neural networks can be named deep feedforward networks or multilayer
networks. This neural network is the most perfect deep learning model used. Feedforward
networks work on the principle that the function s is approximated as s. For classifying
the data or images with that of the networks, t = s * (u) connects the input u to the output
t. Also, a mapping t = f(u; θ) tells that the approximation happens with the value of θ. In
feedforward networks, the feedback connection is not considered; only the inputs are con-
sidered for the output evaluation. The data flow from the input u is passed to the various
calculations under the function f, and the output t is obtained. Figure 1.3 shows the static
feedforward networks.

FIGURE 1.3
Feedforward networks.
Deep Learning and Computer Vision 5

Applications:

• Classification
• Speech recognition
• Face recognition
• Computer vision

1.1.3.1.2 Dynamic Feedback Neural Networks


In dynamic neural networks, the data can flow in two directions, namely the forward and
backward directions. The most enhanced neural networks are the dynamic networks, and
if any error occurs in the flow of data, it is tedious to get corrected. The various changes
will take place in the states of the network until the equilibrium point is reached in the
network. The changes in the input lead to the variation in the equilibrium point, but the
network has to be at the same point till the variations occur in the equilibrium point. This
type of network can be called a recurrent or interactive network. The feedback mechanism
which is used in the networks is helpful in the process of addressing the content memories.
The dynamic neural networks are shown in Figure 1.4.

FIGURE 1.4
Feedback neural networks.
6 Object Detection with Deep Learning Models

Applications of feedback neural networks:

• Word Processing
• Speech recognition
• Tagging an image
• Process of detecting sentiments
• Translation

1.1.3.2 Topology-based Neural Networks


1.1.3.2.1 Single-layer Neural Networks
The perceptron is one of the first samples of a single-layer neural network. The perceptron
would return a function supported inputs, again, supported single neurons in the physi-
ology of the human brain. The logic gates satisfying the individual functionality can be
called a model of a perceptron in some cases. Based on the weighted inputs, the perceptron
may send data or not. The type in which the single-layer network works out is the single-
layer binary linear classifier, which helps to separate the input data as one of the two types.
Feedforward networks include single-layer neural networks because data flows only in
one direction. That is, in single-layer networks, data comes from the input layer to the out-
put layer; it does not consider the feedback from the output layer. Also, a single-layer net-
work is different from the network that uses the backpropagation and the gradient descent
along with the functions. The structure of the single-layer neural network is depicted in
Figure 1.5.

1.1.3.2.2 Multilayer Neural Networks


The multilayer neural network is the one in which information enters the network and is
sent through various layers of neurons in the network. Every node present in layer 1 is
joined to other neurons in the next layer, and similarly, layer 2 is joined to the neurons in
the consequent layer. So a fully connected network is formed. There are multiple layers
hidden between the input and output layers. There are more than two layers between the
input and output layers. Unlike single-layer neural networks, the flow of data is in both
directions. That means the forwarding of data as in feedforward networks and backward-
ing of data as in feedback networks.
The inputs given to the network are multiplied with weights and sent to the activation
function. The loss is reduced by modifying the weights and activation function along with
the backpropagation. Values learned by the machines are taken as weights in the neural

FIGURE 1.5
Single-layer neural network.
Deep Learning and Computer Vision 7

FIGURE 1.6
Multilayer neural network.

networks. The network gets self-adjusted based on the variation with the outputs pre-
dicted and the inputs trained in the network. The activation functions used are nonlinear
and they are then sent to the softmax function. Figure 1.6 depicts the multilayer neural
network.

Applications of multilayer perceptron

• Machine translation
• Recognition of speech
• Classification of complex images

1.1.3.2.3 Recurrent Neural Networks


The design motive of the recurrent network is to predict the output based on the input that
is given as feedback to the network. The first layer in the network is the simple forward-
ing layer, and after that it is followed by the recurrent neural network in which the data
or information residing in the memory is used. Forward propagation can be applied in
layer 1, and in layer 2, the information is stored in the memory for use in the future. The
incorrect prediction may be corrected by making changes with the help of the learning
rate. That will be helpful in the correct prediction of the data as well as the images at the
time of backpropagation. The recurrent neural network is shown in Figure 1.7:

Applications of recurrent neural networks

• Word Processing
• Speech recognition
• Tagging an image
• Process of detecting sentiments
• Translation
8 Object Detection with Deep Learning Models

FIGURE 1.7
Recurrent neural network.

1.1.3.3 Learning Methods
1.1.3.3.1 Supervised Learning
The most common form of deep learning is supervised learning. The set of images or data
can be taken as a training set, and it is given as input to the network with the aspect of
training the network. For every input, there will be a labeled corresponding output, such
that the input can be processed and the desired output reached. As an example, the images
are classified into X different classes. So that it needs a training set of images and a valida-
tion set of images. The training set can be written as {(r1,s1), (r2,s2),….(rx,sx)}, where the
input is ri and output is si [1]. Then the images can be trained by using the minimization
of a cost function that will connect the output along with the correct input. The trained
images are given to the model, and the model predicts the output. Figure 1.8 shows the
method of supervised learning.

1.1.3.3.2 Unsupervised Learning
Unlike supervised learning, in unsupervised learning the training data or image set is not
labeled for finding the classes or classifying the classes. So the network model finds the
common characteristics among the data or images and clubs the data based on the knowl-
edge of the model. The method of unsupervised learning is illustrated in Figure 1.9:

FIGURE 1.8
Supervised learning.
Deep Learning and Computer Vision 9

FIGURE 1.9
Unsupervised learning.

1.1.3.3.3 Reinforcement Learning
In reinforcement learning, without the training dataset, the suitable decision is taken on its
own with the help of its experience. That decision will help to receive the reward in certain
situations. It is achieved by using the different types of machines or software, whatever
they may be, but the solution is only to reach the best path or behavior. How reinforcement
learning varies from supervised and unsupervised learning is that the training data along
with the correct solution are available in the two types of learning, whereas the training
data are not available in reinforcement learning. So the reinforcement agent has to decide
what to do to perform the allocated work [2]. The diagram in Figure 1.10 gives the idea of
reinforcement learning.

1.2 Convolutional Neural Networks


1.2.1 Description of Five Layers of General CNN Architecture
The type of feedforward artificial neural network is the convolutional neural network. It
can also be called ConvNet [3, 4, 5]. The multiple layers of artificial neurons are present in

FIGURE 1.10
Reinforcement learning.
10 Object Detection with Deep Learning Models

FIGURE 1.11
Convolutional neural networks.

the convolutional neural networks. The visual images are analyzed by this convolutional
neural network. The neural networks used to analyze the visual images are also called
shift invariant or space invariant artificial neural networks (SIANN) that scan the layers
of convolutional neural networks and translation invariance characteristics on the basis of
shared weight architecture. The translation invariance characteristics can be named feature
maps. The convolutional neural network consists of convolutional layers (one or more)
along with the pooling layer and the fully connected layer (one or more). The architecture
of the convolutional neural networks is shown in the Figure 1.11.
CNN is a specific version of the neural network designed to operate with one-­
dimensional, two-dimensional, and three-dimensional data and images [6].

1.2.1.1 Input Layer
The whole CNN input depends on the input layer. The images are represented as the pixel
matrix in the neural network.

1.2.1.2 Convolutional Layer
The name of the convolutional neural networks is given because of the convolutional lay-
ers in the network. The convolution operation is performed in the convolutional layer.
In the convolutional neural network, the convolution operation can be done by multi-
plying the input with that of the set of weights as in the old neural networks. When the
two-dimensional input is taken, the two-dimensional array of weights called kernel or fil-
ter is multiplied with the two-dimensional input [6].
When the kernel used is smaller than the input data, the dot product can be said to mul-
tiply the small kernel-size input patch with the small kernel. The single value can be
obtained by adding the results obtained in the dot product, which is the elementary multi-
plication of the kernel-size patch of the input and the kernel. Since the single value is
obtained, it is called a scalar product.
The filter size should be smaller than the original input, then only the same size of the
filter can be repeatedly multiplied by the input array at multiple points in the input. The
Deep Learning and Computer Vision 11

FIGURE 1.12
Extraction of feature map.

filters can be applied to each small size of the input either in the direction of top to bottom
or in the direction of left to right.
The repeated application of the filter to the small size of the image is a very useful tech-
nique for identifying an exact feature in the input images. If the filter is applied in a similar
manner to the entire image, then the features can be easily identified throughout the image.
This concept is called translation invariance.
The single value is obtained as a result of multiplying the filter value with the small
patch of the input. But if the filter value is applied all over the array of inputs, then the
two-dimensional array of values is obtained. Those values seemed to be a filtering in input
values. The output obtained by multiplying the filter with the input array is known as a
feature map. After getting the feature map, the feature map is applied to the nonlinearity
function ReLU. The feature map extraction is shown in Figure 1.12.
The convolution operation is actually called a cross-correlation operation in technical
terms. The kernel value is rotated before applying to the input sometimes. The cross cor-
relation in deep learning is known as convolution operation.

1.2.1.3 Pooling Layer
In the convolutional neural networks, after the convolutional layer, the pooling layer is
added. The output from the convolutional layers is passed to the ReLU function, which
will apply the nonlinearity to the output of the convolutional layer (i.e., the feature maps).
So the ReLU function is added in between the convolutional layer and the pooling layer
[6, 7].
The use of the pooling layer may be repeated after each convolutional layer in the neural
network. Usage of the pooling layer may be decided based on the application. The pooling
layer is applied to the feature maps of the convolutional layer, so the pooled feature maps
are created in the same number from the pooling layer.
The pooling layer will perform a pooling operation, according to how the filter is going
to apply to the feature maps. Normally, the filter size is comparably lesser than that of the
12 Object Detection with Deep Learning Models

value of input in order to create feature maps. Similarly, the pooling operation size is also
small compared to that of the feature maps. Exactly, the pooling operation size is 2*2 pixels
which is applied to the 2 pixels stride.
The pooling layer will use the 2 factor as a size of features extracted in the map. The
reduction is carried out in each dimension to half of the original size, and as a result, the
pixel value is reduced to 1/4 of the total size. For instance, if the total number of pixels is
36 (6*6 matrix), the number of pixels in the pooling layer is reduced to 9 pixels (3*3
matrix).
The pooling operation can be performed in two ways: Average pooling and maximum
pooling.

Average pooling:
Each patch’s average value of the feature maps is calculated [6]. The average pooling func-
tion is shown in Figure 1.13.

Maximum pooling (or max pooling):


Each patch’s maximum value of the feature maps is calculated [6]. The max pooling func-
tion is shown in Figure 1.14.
The finalized version of the features identified in the input is computed in the pooled
feature maps. The downsampled feature maps are the result of the usage of the pooling
layer. The translation invariance is calculated from the convolutional layer, which is con-
verted to local translation by means of the pooling layer.
The amount of the transfer taken place in the pooling layer is very small, so the output
from the pooled layer is mostly not changeable. The approximations obtained are from
invariant to small transactions.

1.2.1.4 Fully Connected Layers


It takes the input from the final pooling layer. The output from the final pooling layer is
flattened. The flattening of inputs means that the output in the three-dimensional matrix is
unrolled into individual vectors [6, 8]. The flattening concept is given in Figure 1.15.

FIGURE 1.13
Average pooling function.
Deep Learning and Computer Vision 13

FIGURE 1.14
Max pooling function.

FIGURE 1.15
Flattening.

The flattened value obtained in the above figure is given as input to the fully connected
layer. The result of the fully connected layer is sent to the final layer, which uses the soft-
max activation function for classifying the results. The results can be classified into various
classes.

1.2.1.5 Output Layer
The output is then generated through the output layer generates the output and the error
checking is also performed. As a result, the loss function is computed and also gradient
error is calculated.

1.2.2 Types of Architecture in CNN [9]


1.2.2.1 LeNet-5
The LeNet-5 architecture was designed by LeCun et al. in 1998. It is the earliest model used
for classifying handwritten numbers and digits. The LeNet-5 architecture is thus named
because it uses five layers. Three of the five layers are convolutional layers, along with
pooling layers for each convolutional layer. The two layers are the fully connected lay-
ers. Finally, the softmax classifier is used to classify the images into respective classes.
This architecture is most popular because this is a very straightforward approach [10, 11].
Figure 1.16 shows the architecture of LeNet-5.
14 Object Detection with Deep Learning Models

FIGURE 1.16
LeNet-5.

1.2.2.2 AlexNet
The AlexNet architecture was designed by Alex Krizhevsky et al. in 2012. AlexNet has a
similar architecture to LeNet, but the depth of the network in AlexNet is increased. The
AlexNet architecture consists of eight layers. Of these, five are the convolutional layers
with the max pooling layer, and the remaining three are the fully connected layers. The
ReLU activation functions are added in each layer except the output layer. The overfit-
ting in the network can be avoided by adding the dropout layers in the network [10]. The
AlexNet architecture diagram is shown in Figure 1.17.

1.2.2.3 ZFNet
ZFNet was designed in 2013 in order to optimize the performance of AlexNet. The depth of
the networks can be increased by adding the extra filters in the same structure as AlexNet.
Instead of increasing the filter size, the number of filters or kernels is increased to optimize
the performance [10]. The architecture diagram is given in Figure 1.18.

1.2.2.4 GoogLeNet/Inception
The architecture of GoogLeNet differs from the other architectures in the way that it uses
the 1*1 convolution and global average pooling to create the deeper networks. The number
of parameters used in the convolution is decreased so that the deepness of networks gets

FIGURE 1.17
AlexNet.
Deep Learning and Computer Vision 15

FIGURE 1.18
ZFNet.

FIGURE 1.19
GoogLeNet.

increased. The accuracy of the classification is increased by means of the global average
pooling. The fully connected layer with the ReLU activation function is used, and also the
dropout layer is used for regularization [10]. The softmax classifier is used for the classifi-
cation of images or data. Figure 1.19 shows the block diagram of GoogLeNet.

1.2.2.5 VGGNet
VGGNet was designed by Simonyan and Zisserman in 2014. VGGNet architecture has a
total of 16 convolutional layers. The number of filters is increased as in AlexNet. The 3*3
filters are added to increase the depth of the network. The three fully connected layers are
added at the end after the pooling layers [10, 12]. The VGGNet architecture is depicted in
Figure 1.20.

1.2.2.6 ResNet
The ResNet was designed by Kaiming He et al. in 2015. ResNet is introduced to get rid of
the vanishing gradient. The skip connection technique is used in the ResNet network. The
skip connection works in the way that particular training is skipped from a few layers, and
the remaining is connected to the output layer [10]. The architecture of ResNet is shown in
Figure 1.21.
16 Object Detection with Deep Learning Models

FIGURE 1.20
VGGnet.

FIGURE 1.21
ResNet.

1.2.3 Applications of Deep Learning

1. Automatic text generation – The learning of text is done and the new text is also
framed with the help of the model. The model helps to learn how to punctuate,
spell and frame new sentences and also the style is captured sometimes.
2. Healthcare – Various diseases can be diagnosed and also treated earlier.
3. Automatic machine translation – The translation of text in one language is con-
verted into another language automatically. The text may be words or sentences.
4. Image recognition – Objects and people are recognized and identified with the
help of deep learning.
5. Predicting earthquakes – Deep learning trains the model to predict earthquakes
earlier.
6. Industrial applications – Object detection and localization, sorting, robotics, qual-
ity control and inspection, packaging.
7. Retail applications – analytics, warehouse management, theft prevention, intel-
ligent barcode scanners, monitoring and distribution control.
8. Entertainment/gaming – gesture recognition, user identification, emotional feed-
back, experience monitoring, advance analytics.
9. Smart homes – vacuum cleaners, automatic lawn movers, intrusion and hazard
detection, smart lights, ovens, refrigerators.
10. Agriculture – weed control, fruit harvesting, autonomous tractors and combines.
11. Smart cities and infrastructure – parking, traffic monitoring, security monitoring,
road inspection.
12. Food industry – sorting, quality control.
Deep Learning and Computer Vision 17

1.3 Image Classification, Object Detection and Face Recognition


The field of computer science, which deals with the creation of digital systems for process-
ing, analyzing and visualizing the data in the way humans do, is called computer vision.
Computer vision is training computers to understand images and to process them. Finally,
with the training made, the computers are ready to retrieve the visual data and send the
final results with the help of software algorithms. Computer vision aims to classify images,
detect objects, and recognize faces. The tasks relevant to computer vision using deep learn-
ing are:

• Dataset creation
• Preprocessing
• Image classification
• Object detection
• Face recognition

1.3.1 Dataset Creation
A dataset is a collection of data and its related values. The dataset has both the param-
eters as time and subject. The dataset creation is a challenging task in deep learning. The
data collection is a static process. The collection of data is over a period of time; labeling
the data, training the model and results are found in deep learning. There are different
types of datasets such as text data, image data, signal data, sound data, physical data,
anomaly data, biological data, multivariate data, question-answering data and other data
repositories.
The performance of deep learning is improved by improving the data. That means the
addition of more data to train the model will be helpful in classifying the data.
Data acquisition is the process by which datasets are found for training the models. The
two methods of data acquisition are:

1. Data generation
2. Data augmentation

Data generation is accomplished by crowdsourcing (connecting with people to collect the


data) and synthetic data generation (computer-generated data).
Dataset creation also involves searching for open-source datasets, available on the inter-
net. Many open source datasets, such as Kaggle, PlantVillage dataset, etc., are available.
Also, the dataset can be collected directly from the companies, hospitals, etc. using the data
gathering mechanism.
Data augmentation techniques increase the number of images in the training dataset by
applying various operations such as flipping, rotation, etc.
Some of the image repositories are:

• Scikit-Image
• OpenCV
• Python Image Library (Pillow/PIL)
18 Object Detection with Deep Learning Models

• Scipy
• SimpleITK
• Matplotlib
• Numpy
• Mahotas

1.3.2 Data Preprocessing
The preprocessing of the dataset involves both the text dataset and image dataset prepro-
cessing. The text dataset preprocessing consists of the steps such as

• Removal of punctuation
• Lower casing
• Spelling correction
• Removal of frequent words
• Chat words conversion
• Removal of URLs
• Lemmatization
• Removal of rare words
• Stemming
• Removal of emoticons
• Conversion of emoji to words
• Removal of stopwords
• Removal of emoji
• Conversion of emoticons to words
• Removal of HTML tags

The preprocessing of image datasets consists of image resizing, noise removal, segmenta-
tion, and edge smoothing.
Image resizing is varying the size of the image. Unwanted noise can be removed from
the images by using noise removal techniques. The particular part of the images can be
segmented using segmentation. The edges of the images can also be smoothed using edge
smoothing techniques.

1.3.3 Image Classification
The features extracted from the images for observing patterns in the dataset are helpful in
image classification [13]. If an artificial neural network is used for image classification, then
the classification process is very costly [14]. So CNN is used for the classification. There are
different types of classification problems, such as single label classification and multilabel
classification in supervised learning, unsupervised classification, video classification, and
3D classification.
Deep Learning and Computer Vision 19

The steps carried out in the classification process are as follows:

Step 1: Specific dataset should be chosen. Choose a dataset already available or create
your own dataset.
Step 2: Import the necessary libraries needed for the classification.
Step 3: Prepare the training dataset by assigning the path and also create the catego-
ries. Also resize the images.
Step 4: Create the data in the training data set and shuffle the dataset. Assign the labels
as well as features to the entire image.
Step 5: Normalize the X values and convert labels into categorical data. Split the X
values and Y values for using it in CNN.
Step 6: Define the model, compile it and train the CNN model.
Step 7: Find the accuracy of the model in classifying the objects.

Examples of classification problems:

• CNN model to perform classification of dogs and cats photographs


• CNN model to perform labeling of photographs of the Amazon rainforest

Binary classification

• Identification of cancer in X-ray images

Multiclass classification

• Handwritten text can be classified using CNN


• The photograph of a face can be assigned a name by using CNN

1.3.4 Object Detection
Object detection may be referred to as object recognition; since it combines the two func-
tionalities such as drawing a bounding box around each and every object, which needs to
be identified in the images and then assigning a label to the identified object [13]. Image
classification is a straightforward technique, whereas object detection also involves the
localization of the objects.
For addressing object localization, region-based convolutional neural networks are
used. R-CNNs are designed specifically for recognizing objects.
The YOLO model (you only look once) is also designed specifically for detecting objects
in the images considering the speed and the real-time usage. The variation between the
three tasks can be explained as follows:

Image classification: The type or class of the object can be identified in an image [15].

• Input: Single image or photograph is given as input.


• Output: A label of the class (corresponding to images)
20 Object Detection with Deep Learning Models

Object localization: The presence of objects is located in an image and also a bounding box
for indicating their exact location.

• Input: One or more objects can be present in the image or photograph.


• Output: One or more bounding boxes corresponding to objects along with the
width, height and point.

Object detection: The presence of objects is located in an image and also bounding box for
indicating their exact location and also the labeled classes of the exact objects should be
given as output [16, 17, 18].

• Input: One or more objects can be present in the image or photograph.


• Output: One or more bounding boxes corresponding to objects along with the
width, height and point and also the class type or label of the identified object.

The steps carried out in the object detection process are as follows:

Step 1: Specific images should be taken as input.


Step 2: The images should be divided into various regions.
Step 3: Each region should be considered as a separate image to work with.
Step 4: Send all the regions considered as individual images to the model, and classify
the images into different types of classes.
Step 5: After the classes are identified for each region, all the regions are again com-
bined to identify the objects in the original image.

Examples for object detection:

• Each object in a street scene should be identified by a bounding box, and also
object should be labeled.
• Each object in an indoor photograph should be identified by a bounding box, and
also object should be labeled.
• Each object in a landscape should be identified by a bounding box, and also object
should be labeled.
• Object detection models for locating and detecting the kangaroos in the photo-
graphs [19, 20].

1.3.5 Face Recognition
Face recognition is the task in computer vision in which human faces are identified in
photographs. Humans easily perform face detection, but it is a challenging problem for
computers to recognize human faces. Face recognition becomes a nontrivial problem for
computers to solve [21].
In face detection, the faces of different humans in the photograph should be located. The
coordinates of the faces in the images should be represented by using the bounding box.
The dynamic nature of the human face should be considered irrespective of the angle or
orientation. Also, other parameters such as hair color, clothing, light levels, accessories, age
and makeup should be considered.
Deep Learning and Computer Vision 21

There are two methods used for the recognition of faces. They are:

• Methods based on features – Detecting the faces with the help of handcrafted
filters
• Methods based on images – Extracting the faces using the holistic learning from
the entire image

The steps involved in the process of face recognition are:

Step 1: Images containing multiple faces should be given as input.


Step 2: One or more faces in the images should be located and marked with the bound-
ing box.
Step 3: The face should be normalized and consistent with the database’s photomet-
rics and geometry.
Step 4: The features should be extracted for the recognition of faces from the face.
Step 5: The exact matching of the face with one or more faces stored in the database
should be performed.

The three models frequently used for face recognition are multi-task cascaded convolu-
tional neural network (MTCNN), the VGGFace2 model, and the FaceNet model.
The MTCNN model is the most used model for detecting faces with expressions. It was
developed in 2016. As the name implies, the three neural networks are connected in a cas-
cade way, which helps detect faces and facial landmarks in the images.
Face identification and verification can be performed by using the VGGNet2 model.
VGG stands for Visual Geometry Group. The embedding of faces can also be detected
using this model.
The FaceNet model is mainly used for feature extraction from the human face. It is also
used for face identification and verification purpose.

References
1. https://towardsdatascience.com/derivative-of-the-sigmoid-function536880cf918e
2. https://www.medcalc.org/manual/tanh_function.php
3. Jie Wang and Zihao Li, “Research on Face Recognition Based on CNN,” IOP Conf. Series: Earth
and Environmental Science 170 (2018), 032110. DOI:10.1088/1755-1315/170/3/032110.
4. Keiron O’Shea, Ryan Nash An, “Introduction to Convolutional Neural Networks,”
arXiv:1511.08458v2 (2015).
5. https://towardsdatascience.com/a-comprehensive-guide-to-convolutional-neural-networks-
the-eli5-way-3bd2b1164a53
6. Athanasios Voulodimos, Nikolaos Doulamis, Anastasios Doulamis, and Eftychios
Protopapadakis, “Deep Learning for Computer Vision: A Brief Review,” Recent Developments in
Deep Learning for Engineering Applications (2018). DOI:10.1155/2018/7068349.
7. https://learnopencv.com/image-classification-using-convolutional-neural-networks-in-
keras/
8. https://www.tinymind.com/learn/terms/relu
22 Object Detection with Deep Learning Models

9. https://medium.com/geekculture/a-2021-guide-to-improving-cnns-network-architectures-
historical-network-architectures-d23f32afb1bd
10. A Ghosh, A Sufian, and F Sultana, “Fundamental Concepts of Convolutional Neural
Network,” Recent Trends and Advances in Artificial Intelligence and Internet of Things (2020).
DOI:10.1007/978-3-030-32644-9_36.
11. Laith Alzubaidi, Jinglan Zhang, Amjad J. Humaidi, Ayad Al-Dujaili, Ye Duan, Omran
Al-Shamma, J. Santamaría, Mohammed A. Fadhel, Muthana Al-Amidie and Laith Farhan,
“Review of deep learning: concepts, CNN architectures, challenges, applications, future direc-
tions,” Journal of Big Data (2021), DOI:10.1186/s40537-021-00444-8.
12. https://towardsdatascience.com/step-by-step-vgg16-implementation-in-keras-for-beginners-
a833c686ae6c
13. Geert Litjens, Thijs Kooi, Babak Ehteshami Bejnordi, Arnaud Arindra Adiyoso Setio, Francesco
Ciompi, Mohsen Ghafoorian, Jeroen A.W.M. van der Laak, Bram van Ginneken, Clara I.
Sanchez, “A Survey on Deep Learning in Medical Image Analysis,” arXiv:1702.05747v2 (2017).
14. https://en.wikibooks.org/wiki/Artificial_Neural_Networks/Activation_Functions
15. https://www.analyticsvidhya.com/blog/2021/01/image-classification-using-convolutional-
neural-networks-a-step-by-step-guide/
16. Zhong-Qiu Zhao, Member, IEEE, Peng Zheng, Shou-tao Xu, and Xindong Wu, Fellow, IEEE,
“Object Detection with Deep Learning: A Review,” arXiv:1807.05511v2 (2019).
17. Zhixue Wang, Jianping Peng, Wenwei Song, Xiaorong Gao, Yu Zhang, Xiang Zhang, Longfei
Xiao, and Li Ma, “Research Article A Convolutional Neural Network-Based Classification and
Decision-Making Model for Visible Defect Identification of High Speed Train Images,” Journal
of Sensors, (2021), 5554920, DOI:10.1155/2021/5554920.
18. https://books.google.co.in/books?hl=en&lr=&id=10jpDwAAQBAJ&oi=fnd&pg=PP1&dq=d
eep+learning+and+computer+vision&ots=wHn2HtMBT2&sig=lNP7CXdDIy2Tk1BrcsTv6QJ
wXmM#v=onepage&q=deep%20learning%20and%20computer%20vision&f=false
19. Ajeet Ram Pathak, Manjusha Pandey, and Siddharth Rautaray, “Application of Deep Learning
for object detection,” Procedia Computer Science 132 (2018), 1706–1717, DOI: 10.1016/j.
procs.2018.05.144.
20. https://www.upgrad.com/blog/ultimate-guide-to-object-detection-using-deep-learning/
21. KH Teoh, RC Ismail, SZM Naziri, R Hussin, MNM Isa and MSSM Basir, “Face Recognition and
Identification using Deep Learning Approach,” Journal of Physics: Conference Series 1755 (2021),
012006. DOI:10.1088/1742-6596/1755/1/012006.
2
Object Detection Frameworks and
Services in Computer Vision

Sachi Choudhary, Rashmi Sharma, and Gargeya Sharma


University of Petroleum & Energy Studies, Dehradun, India

CONTENTS
2.1 Neural Networks (NNs) and Deep Neural Networks (DNNs)������������������������������������ 24
2.1.1 Neural Networks����������������������������������������������������������������������������������������������������� 24
2.1.2 Single-Layer Perceptron (SLP)������������������������������������������������������������������������������� 25
2.1.3 Multilayer Perceptron (MLP)��������������������������������������������������������������������������������� 25
2.2 Activation Functions����������������������������������������������������������������������������������������������������������� 26
2.2.1 Identity Function������������������������������������������������������������������������������������������������������ 27
2.2.2 Sigmoid Function����������������������������������������������������������������������������������������������������� 27
2.2.3 Softmax Function����������������������������������������������������������������������������������������������������� 27
2.2.4 Tanh Function����������������������������������������������������������������������������������������������������������� 28
2.2.5 ReLU (Rectified Linear Unit) Function����������������������������������������������������������������� 28
2.3 Loss Functions���������������������������������������������������������������������������������������������������������������������� 29
2.4 Convolutional Neural Networks��������������������������������������������������������������������������������������� 30
2.4.1 CNN Architecture and its Components��������������������������������������������������������������� 30
2.5 Image Classification Using CNN��������������������������������������������������������������������������������������� 32
2.5.1 LeNet-5���������������������������������������������������������������������������������������������������������������������� 32
2.5.2 AlexNet���������������������������������������������������������������������������������������������������������������������� 33
2.5.3 VGGNet��������������������������������������������������������������������������������������������������������������������� 34
2.5.4 Inception and GoogLeNet�������������������������������������������������������������������������������������� 35
2.5.4.1 Inception Module�������������������������������������������������������������������������������������� 35
2.5.5 ResNet�����������������������������������������������������������������������������������������������������������������������36
2.5.5.1 Residual Block������������������������������������������������������������������������������������������� 36
2.6 Transfer Learning����������������������������������������������������������������������������������������������������������������� 37
2.6.1 Need for Transfer Learning������������������������������������������������������������������������������������ 37
2.6.2 Transfer Learning Approaches������������������������������������������������������������������������������� 37
2.6.2.1 Pre-trained Network as a Classifier�������������������������������������������������������� 38
2.6.2.2 Pre-trained Network as a Feature Extractor������������������������������������������ 38
2.6.2.3 Fine Tuning������������������������������������������������������������������������������������������������ 38
2.7 Object Detection������������������������������������������������������������������������������������������������������������������� 39
2.7.1 Object Localization�������������������������������������������������������������������������������������������������� 39
2.7.1.1 Sliding Window Detection����������������������������������������������������������������������� 39
2.7.1.2 Bounding Box Prediction������������������������������������������������������������������������� 40
2.7.2 Components of Object Detection Frameworks���������������������������������������������������� 40
2.8 Region-Based Convolutional Neural Networks (R-CNNs)������������������������������������������� 41
2.8.1 R-CNN����������������������������������������������������������������������������������������������������������������������� 41

DOI: 10.1201/9781003206736-2 23
24 Object Detection with Deep Learning Models

2.8.2 Fast R-CNN��������������������������������������������������������������������������������������������������������������� 42


2.8.2.1 Components of Fast R-CNN�������������������������������������������������������������������� 42
2.8.3 Faster R-CNN����������������������������������������������������������������������������������������������������������� 43
2.8.4 YOLO Algorithm����������������������������������������������������������������������������������������������������� 44
2.8.5 YOLOv1 Object Detection Model�������������������������������������������������������������������������� 45
2.8.6 YOLO9000 Object Detection Model���������������������������������������������������������������������� 45
2.8.7 YOLOv3 Object Detection Model�������������������������������������������������������������������������� 45
2.9 Computer Vision Application Areas��������������������������������������������������������������������������������� 46
References��������������������������������������������������������������������������������������������������������������������������������������� 47

2.1 Neural Networks (NNs) and Deep Neural Networks (DNNs)


One well-known machine learning methodology is deep learning. Learning the correla-
tion between input and output data is done via neural network computational graphs. The
concept of ‘deep learning’ evolves from the fact that NNs are highly flexible and can easily
be arranged on top of each other to form deep computational graphs as per the need of the
application.

2.1.1 Neural Networks
The stackable computational graphs used in deep learning are called neural networks,
which refer to the term from neurobiology. But they hardly resemble the workings of our
brains, not to be confused. They are a mathematical framework for learning representa-
tions from data. Any graph can be broken into smaller pieces which can be broken down
until they reach their independent atomic component. In the case of neural networks, the
smallest independent unit is called a neuron [1,2].

Wx  B  Z (2.1)

f Z  Y (2.2)

A single neuron is a group of two mathematical equations, as shown in Figure 2.1. The
equation (2.1) is the most basic linear equation where x is our input and W and B are coef-
ficients for the equation. This equation is responsible for learning the linear representations
in the data. Learning to understand only linear relationships is not sufficient in most cases,
as the real world contains so many irregularities, noise, and nonlinearity in the data. For
learning nonlinear representations, equation (2.2) can be used in each neuron to wrap a

FIGURE 2.1
An artificial neuron.
Object Detection Frameworks and Services in Computer Vision 25

function around the output from equation (2.1); these functions are called activation func-
tions. Activation functions are discussed in more detail in later topics.

2.1.2 Single-Layer Perceptron (SLP)


The SLP was the first neural network model proposed by Frank Rosenblatt in 1958 [3]. It is
among the earliest models to propose learning from data. The goal was to discover a linear
decision function that classifies the output into binary classes (categories) with the use of
the values in the weight vector (w) and the bias parameter (b) [4]. Figure 2.2 shows the
single-layer perceptron. The inputs to the neural network are multiplied by their respec-
tive weight vector values, and their sum is further combined with a bias term to produce a
single value output, equation (2.3). Such a single value output is passed as the input to the
activation function assigned to the neuron, equation (2.4).

y   xi .wi  b (2.3)

1 if y  0
y (2.4)
0 if y  0

2.1.3 Multilayer Perceptron (MLP)


It can be seen that a perceptron is a linear function, so the trained single neuron will pro-
duce a straight line to classify the data. Will this work for complex nonlinear datasets? The
answer is that many neurons are needed to optimally fit the training data. A multilayer
perceptron has the same structure as a single-layer perceptron but contains two or more
hidden layers. Hidden layers are collections of neurons that are not directly accessible
by the input data; they act as intermediate processing units between the raw input and
the final output. Typically, each neuron in the hidden layer is linked to every other neu-
ron in adjacent layers, forming a denser connection between them and providing more

FIGURE 2.2
Single-layer perceptron (SLP).
26 Object Detection with Deep Learning Models

FIGURE 2.3
Multilayer perceptron (MLP).

computation to the expected output [4]. Figure 2.3 shows a multilayer perceptron with two
hidden layers.

2.2 Activation Functions
Recalling the formation of a neuron, activation functions are applied to neurons in a layer
during prediction. They convert the linear output into a nonlinear form. It is embedded
after every perceptron, and it decides the activation of that neuron. There are several con-
straints to make activation work. Some of the primary constraints that turn a normal func-
tion into an activation function are [5]:

1. Continuity of function: They must be continuous and infinite in the domain. It


must contain an output number for any input. There should be no restriction in its
domain for it to fail to give an output value.
2. Monotonic in nature: They should never change direction. In other words, it is
either always increasing or always decreasing. This constraint is not technically
a requirement. Unlike functions with missing values, you can optimize non-­
monotonic functions. Unlike functions with missing values, one can optimize
­non-monotonic functions. Nevertheless, consider the implications of having mul-
tiple input values map to the same output value. It is not advisable to get such a
result for learning.
3. Nonlinear in nature: One of the two equations in the neuron is sufficient to identify
the linear representation in the data and build a linear prediction model. However,
for nonlinear representations, there is no mathematical equation to counteract such
nonlinear behavior in the data. Therefore, activation functions are kept nonlinear
to promote learning of nonlinear correlations and their respective representations
in the data.
Object Detection Frameworks and Services in Computer Vision 27

The activation functions can be broadly divided into linear and nonlinear functions. Some
of the most popular activation functions used in deep learning are.

2.2.1 Identity Function
In an identity function, also known as linear transfer function, the output is the same as the
input, equation (2.5) (Figure 2.4).

f x  x (2.5)

The most used nonlinear activation functions are:

2.2.2 Sigmoid Function
Also known as logistic activation function, equation (2.6). The sigmoid is extremely popu-
lar with classification tasks because it smoothly eliminates infinite amounts of input into
an output between 0 and 1 (Figure 2.5).

1
 z  (2.6)
1  ez

2.2.3 Softmax Function
The conversion of input values into probability values is done by softmax function equa-
tion (2.7). It is often used at the output layer of a classification model where prediction of
the class between more than two classes is required.

FIGURE 2.4
Linear activation function.
28 Object Detection with Deep Learning Models

FIGURE 2.5
Sigmoid activation function.

xj
e
  xj   (2.7)
e i
xi

2.2.4 Tanh Function
The tanh function is similar to the sigmoid (equation 2.8), except that it squishes the ­infinite
range of input values from −1 to 1, as opposed to 0 to 1 by the sigmoid function (Figure 2.6).

sinh  x  e x  e  x
tanh  x    (2.8)
cosh  x  e x  e  x

2.2.5 ReLU (Rectified Linear Unit) Function


The ReLU function is by far the most popular activation function used inside most neural
networks because it results in a constant value as its output. It activates the neuron only
if the output value is greater than zero, equation (2.9). The ReLU function is quite simple,
in that it converts all negative numbers to 0 and no change to positive numbers, equation
(2.10) (Figure 2.7).

z  max  o, x  (2.9)

 0 if x  0
ReLU  x    (2.10)
 x if x  0
Object Detection Frameworks and Services in Computer Vision 29

FIGURE 2.6
Tanh activation function.

FIGURE 2.7
ReLU activation function.

2.3 Loss Functions
The loss function, also known as the error function, comes under the umbrella body of
deep learning that encapsulates the idea of how well a model is performing in relation to
how it should be. It is used to measure the incorrect predictions made by the neural net-
work with respect to its true class. This is an optimization problem. Minimizing the loss by
optimizing the parameters will yield better accuracy of the model. The results of various
30 Object Detection with Deep Learning Models

loss functions for the same prediction will be different, and have significant consequences
on the performance of the trained model. The scope of this chapter does not allow for a
full explanation of the various loss functions. However, some popular loss functions are
explained below [6].

1. Mean absolute error (MAE): To work out how far the actual value deviates from
that predicted by the model, this formula is used. The mean absolute error main-
tains the same scale of error as the values by adjusting the standard deviation.
2. Mean squared error (MSE): This calculates the square of the difference between the
target price and the predicted value. This increases the scale of error by squaring
the value and making the model more sensitive to higher loss values.
3. Cross-entropy: Generally, this is used in classification problems as it calculates
the difference between two probability distributions. Classifying a single train-
ing example with respect to all available classes would mean that whichever class
shows the highest probability of representing the example as belonging to the cor-
responding class. Ideally, the aim is to get a 100 percent accurate prediction for the
correct class and 0 percent for the rest during training and learn to estimate this
score.

There are many more loss functions than discussed that one can find and use, depending
on the type of problem and their optimization approach.

2.4 Convolutional Neural Networks


An artificial neural network (ANN) or multilayer perceptron (MLP) is a layered arrange-
ment of neurons having weight (w) and bias (b). Inputs are passed to each neuron, which
is then multiplied by the weights and activation functions are applied to make the result
of that layer nonlinear. A convolutional neural network (CNN) is an advanced version of
a regular neural network or MLP designed to improve the processing of spatial data (also
known as data with a grid-like topology) [6]. For example, time-series data can be thought
of as 1D gridded data formed from aggregating those values at regular time intervals.
Similarly, image data can be thought of as a 2D grid structure formed from pixel values
and their 2-dimensional position on the grid. This section covers the development of con-
volutional neural networks (CNNs), which have produced better results for images and
computer vision applications than MLPs.

2.4.1 CNN Architecture and its Components


Like a regular NN, the input to a CNN model is also an input image or a feature vector,
which is transformed through a set of hidden layers and nonlinear activation functions.
Each layer consists set of neurons that are connected with each neuron of the previous
layer. The output layer is a fully connected layer and performs the classification. However,
NN performs directly on the raw pixels and does not perform scaling on the image. In
a CNN, layers are arranged in three dimensions: width (W), height (H), and depth (D).
Object Detection Frameworks and Services in Computer Vision 31

Pre-processing
Input
-Transformation
(Images/feature Feature Extraction ML model
-Standardization
vector)
-More

FIGURE 2.8
Image classification using ML.

The first layer in CNN starts with a convolutional layer that learns basic features (lines,
edges, etc.); the next convolutional layer is responsible for learning complex features (cir-
cles, squares, and so on). Similarly, further stacked convolutional layers (if any) learn even
more complex features (such as facial parts, complex contours, and so on) [6].
Figure 2.8 shows the steps of a classification model using machine learning techniques.
The image features must be manually extracted to be fed into a machine learning system
(e.g., SVM). The manual work of feature extraction and classification can be replaced by
MLP or CNN; see Figure 2.9.
A basic CNN architecture with series of layers works in this manner:

INPUT  CONV  ACT  POOLING  CONV  ACT


 POOL  FC  ACT(SOFTMAX MOSTLY)

The components of CNN are:

• Convolutional layer (CONV): This layer works similarly to the feature detector
window by sliding over the image (pixel by pixel) with some fixed size and step;
to mine some significant features for object identification in the respective image.

FIGURE 2.9
General architecture of CNN.
32 Object Detection with Deep Learning Models

So, in general, they are used for feature extraction and learning. While the process
is intuitive and powerful, repetitive stacking and the use of convolutional layers
increase network dimensionality and space-time complexities. This is when pool-
ing or subsampling comes to the rescue.
• Activation function (ACT): They convert the linear output into a nonlinear form. It
is embedded after every perceptron, and it decides the activation of that neuron.
• Pooling layer (POOL): Pooling reduces the parameters given to the next layer,
which results in a reduction in network size. The process of parameters reduction
resizes its input using a summary statistics function like maximum or average.
• Fully connected layer (FC): FC layer is the normal dense layer that is a stack of
neurons. It flattens the 2D grid of multiple features into a single 1D grid (a long
tube) of values. These layers are responsible for learning and performing the clas-
sification task from the trained features.
• Batch normalization (BN): It is common practice to perform normalization before
feeding the training data to the input layer; doing so benefits the model training
and results. This can be done for each or a few selected layers of the neural net-
work for better feature extraction and in turn increasing the training speed and
network flexibility. The process is called batch normalization, where batch refers
to the collection of parameters in a specific layer [6].
• Dropout layer (DO): This is an additional layer used to avoid the scenario of over-
fitting. Overfitting in learning from the training dataset occurs when the model fits
the data but does not learn its features.

2.5 Image Classification Using CNN


Image classification is a technique used to classify the object(s) in an image into its
respective class(s). There are mainly two types of image classification: multi-class and
multi-label. A single class object is associated with an image in a multi-class classifier. For
example, with a multi-class classifier, the classification of animals present in an image
will result in whether it is a dog, a cow, or a cat. The subclass of the multi-class classifier
is the binary classifier, where the CNN model differentiates only between two classes,
such as cat or dog. The second type is the multi-label classifier, where the model has to
label multiple objects in the image. For example, if there is an image containing several
types of animals, the model will label each of them. In the field of image classification,
a lot of research has been done on improving the CNN model and introducing new
techniques like inception, residuals, etc. This section covers some popular CNN models.
It also includes a walk-through of the development of CNNs from LeNet to AlexNet,
VGGNet, and ResNet.

2.5.1 LeNet-5
LeNet is the first pioneering CNN proposed by Y. LeCun et al. [7] in 1998. This architecture
was developed for textual data that is optical character recognition (OCR). The LeNet-5
Object Detection Frameworks and Services in Computer Vision 33

FIGURE 2.10
The LeNet-5 architecture.

architecture is straightforward with the essential CNN components: convolutional, sub-


sampling or pooling and fully connected layers. Figure 2.10 depicts the model, consisting
of five layers: three convolutional and two fully connected (hence the name “LeNet-5”).
The model used the tanh activation function as it was considered that it would give better
convergence than the sigmoid function. The LeNet-5 as a series of layers is as follows:

INPUT IMAGE  CONV 1  TANH  POOL 2  CONV 3  TANH  POOL 4


 CONV 5  TANH  FULLY CONNECTED 6  SOFTMAX

2.5.2 AlexNet
LeNet performs well for the simple dataset like MNIST, where images are in grayscale
and the number of classes is limited, ten in the case of the Modified National Institute
of Standards and Technology (MNIST) dataset. To build deeper networks, the AlexNet
model was proposed by A. Krzyzewski et al.[8], the winner of the ILSVRC Image
Classification competition in 2012. The model was later published in 2017 with the title
“Deep Convolutional Neural Networks with ImageNet Classification.” 1.2 million images
with high resolution from the ImageNet dataset were used to train the model, which was
then divided into 1,000 categories.
This pioneering study on “deep” convolutional networks for computer vision sparked
a storm of interest among researchers and practitioners alike. There are five convolution
layers and three completely connected layers in the architecture, as depicted in Figure 2.11.
This is how it looks:

• Five convolutional layers of kernel size 11 × 11 in Covn1, 5 × 5 in Conv2, and 3 × 3


in Conv3, Conv4 and Conv5.
• Max-pooling layer that performs maximum summary statistics function.
• Dropout layers (DO) to avoid overfitting.
• In the hidden layers, ReLU is utilized as an activation function, and in the output
layer, softmax is used. The series of layers of AlexNet is as follows:
34 Object Detection with Deep Learning Models

FIGURE 2.11
AlexNet architecture.

INPUT IMAGE  CONV 1  POOL 2  CONV 3  POOL 4  CONV 5  CONV 6


 CONV 7  POOL 8  FULLY CONNECTED 9  FULLY CONNECTED 10
 SOFTMAX

2.5.3 VGGNet
VGGNet was developed by the Visual Geometry Group at Oxford University in 2014,
which is why it was named VGG [9]. It is a deeper convolutional neural network with
more convolutional, pooling and dense layers. VGGNet is popular in two architectures:
VGG16 and VGG19.

• VGG16 consists of sixteen weight layers: thirteen convolutional layers and


three fully connected layers. The model is very simple and easy to understand.
All convolutional layers are of size 3 × 3 and pooling layers are 2 × 2. The idea
behind a small-sized kernel was to extract more fine features from the image,
Figure 2.12.
• VGG19 has sixteen convolutional layers, five max-pooling layers, three fully con-
nected layers, and a softmax layer.

2.5.4 Inception and GoogLeNet


The suggested deep CNN, called Inception, achieved state-of-the-art classification and
detection efficiency in the ImageNet Large Scale Visual Recognition Challenge 2014

S
O
Conv Block 1 Conv Block 2 Conv Block 3 Conv Block 4 Conv Block 5 F F F
(Conv1+Conv2 (Conv4+Conv5 (
(Conv7+Conv8+ (
(Conv11+Conv12+ (Conv15+Conv16 T
C C M
+POOL3) +POOL6) Conv9+POOL10) Conv13+POOL14)) +POOL17)
A
X

FIGURE 2.12
VGG16 architecture.
Object Detection Frameworks and Services in Computer Vision 35

(ILSVRC14) [10]. By including the configuration into the model, the researchers increased
the depth of the network while keeping the processing budget constant. GoogLeNet, a
22-layer deep network, was the model utilized in the ILSVRC14 proposal.

2.5.4.1 Inception Module
These are the little components that stack themselves on each other and form the Inception
Network. A single Inception module is a combination of multiple convolutional layers
aligned parallel to each other. See the Figure 2.13 for its complete architecture.
The input to these modules is the output from the previous modules. It is more compu-
tationally efficient to use the Inception module solely at the higher layers, leaving the
lower layers alone, like in standard convolutional neural networks. The Inception modules
use a 1 × 1 convolution to calculate the deduction before the expensive 3 × 3 and 5v5 con-
volutions. In addition to reducing feature dimensions and therefore being used for compu-
tation, 1 × 1 convolutions also use rectified linear activation and serve a dual purpose for
the model.
Figure 2.14 shows that GoogLeNet contains nine inception modules in total, with a
­maximum pooling layer appended after each block to reduce dimensions. Let’s divide
GoogLeNet into three sub parts:

1. Similar to LeNet and AlexNet model which contains multiple convolutional layers
and pooling layer connected in series.
2. Inception module: 9 inception modules (2 inception modules + 1 pooling layer + 5
inception modules + 1 pooling layer + 2 inception modules).
3. Classifier: fully connected output layer with softmax layer.

FIGURE 2.13
Inception module.
Discovering Diverse Content Through
Random Scribd Documents
Sinun laulut laulellasi,
Ilovirret vieretellä,
Lehot leikki lyöäksesi,
Tanner tanhuellaksesi." 60
Siitä lieto Lemminkäinen
Eleä nutustelevi
Saaren impien ilossa,
Kassapäien kauneussa;
Kunnepäin on päätä käänti, 65
Siinä suuta suihkatahan,
Kunne kättänsä kohotti,
Siinä kättä käpsätähän.
Kävi öillä oksimassa,
Syän-öillä yksinänsä; 70
Niinpä kerran käyessänsä
Kulkiessansa kylitse,
Saaren niemen pitkän päähän,
Kymmenentehen kylähän,
Ei nähnyt sitä taloa, 75
Kuss' ei miekkoja hiottu,
Tapparoita tahkaeltu,
Pään varalle Lemminkäisen.
Silloin lieto Lemminkäinen
Jo tunsi tuhon tulevan, 80
Hätäpäivän päälle saavan,
Sanan virkkoi, noin nimesi:
"Lempoko yhen urohon
Sovissansa suojelevi
Päälle saa'essa satojen, 85
Tuhansien tunkiessa!"
Astuiksen aluksehensa,
Vierähti venon perähän,
Laski laivansa ulomma;
Tuli tuuli tuon puhalti 90
Ulapalle aukealle,
Jäivät raukat rannikolle,
Saaren immet itkemähän,
Kultaiset kujertamahan.
Sini itki saaren immet, 95
Niemen neiet voikerrehti,
Kuni purjepuu näkyvi,
Rautahankki haimentavi;
Ei he itke purjepuuta,
Rautahankkia haloa, 100
Itki purjepuun alaista,
Rautahankin haltiata.
Itse itki Lemminkäinen,
Sini itki ja sureksi,
Kuni saaren maat näkyvi, 105
Saaren harjut haimentavi;
Ei hän itke saaren maita,
Saaren harjuja haloa,
Itki saaren impyitä,
Noita harjun hanhosia. 110
Siitä lieto Lemminkäinen
Päästyä kotiperille
Tunsi maat on, tunsi rannat,
Sekä saaret, jotta salmet,
Tunsi vanhat valkamansa, 115
Entiset elo-sijansa,
Ei tunne tuvan aloa,
Seinän seisonta-sijoa;
Jo tuossa tuvan sijalla
Nuori tuomikko tohisi, 120
Männikkö tupamäellä,
Katajikko kaivotiellä.
Virkkoi lieto Lemminkäinen,
Sanoi kaunis Kaukomieli:
"Tuoss' on lehto, jossa liikuin, 125
Kivet tuossa, joilla kiikuin,
Tuossa nurmet nukkeroimat,
Pientarehet piehtaroimat,
Mikä vei tutut tupani,
Kuka kaunihit katokset?" 130
Loihe siitä itkemähän,
Itki päivän, itki toisen
Ei hän itkenyt tupoa,
Eikä aittoa halannut,
Itki tuttua tuvassa, 135
Aitallista armastansa.
Virkkoi lieto Lemminkäinen,
Sanoi kaunis Kaukomieli:
"Ohoh kaunis kantajani,
Ihana imettäjäni! 140
Jo olet kuollut kantajani,
Mennyt ehtoinen emoni,
Liha mullaksi lahonnut,
Kuuset päälle kasvanehet."
Katseleikse, käänteleikse, 145
Näki jälkiä hitusen,
Ruohossa rutistunutta,
Kanervassa katkennutta,
Läksi tietä tietämähän,
Ojelvoista ottamahan, 150
Tiehyt metsähän vetävi,
Ojelvoinen ottelevi.
Vieri siitä virstan toisen,
Pakeni palasen maata,
Salon synkimmän sisähän, 155
Korven kolkon kainalohon:
Näkevi salaisen saunan
Kahen kallion lomassa,
Siellä ehtoisen emonsa,
Tuon on valta vanhempansa. 160
Siinä lieto Lemminkäinen
Ihastui iki hyväksi,
Sanan virkkoi, noin nimesi:
"Ohoh äiti armahani!
Viel' olet toki elossa, 165
Kun jo luulin kuolleheksi;
Pois itkin ihanat silmät,
Kasvon kaunihin kaotin."
Sanoi äiti Lemminkäisen:
"Viel' olen toki elossa, 170
Vaikkapa piti paeta
Tänne synkkähän salohon:
Suori Pohjola sotoa,
Takajoukko tappeloa,
Poltti huonehet poroksi, 175
Kaiken kartanon hävitti."
Sanoi lieto Lemminkäinen:
"Ellös olko milläkänä!
Tuvat uuet tehtänehe,
Paremmat osattanehe, 180
Pohjola so'ittanehe,
Lemmon kansa kaattanehe."
Siitä äiti Lemminkäisen
Itse tuon sanoiksi virkki:
"Viikon viivyit poikueni 185
Noilla mailla vierahilla."
Virkkoi lieto Lemminkäinen,
Sanoi kaunis Kaukomieli:
"Hyvä oli siellä ollakseni,
Armas aikaellakseni; 190
Siell' oli mäet simaiset,
Kalliot kananmunaiset,
Mettä vuoti kuivat kuuset,
Seipähät valoi olutta."
"Hyvä oli siellä ollakseni, 195
Armas aikaellakseni,
Siitä oli paha elämä,
Siitä outo ollakseni,
Pelkäsivät piikojansa,
Luulivat lutuksiansa, 200
Pahasti piteleväni,
Ylimäärin öitsiväni,
Minä piilin piikasia,
Varoin vaimon tyttäriä,
Kun susi sikoja piili, 205
Havukat kylän kanoja."
Kolmaskymmenes runo

Lemminkäinen muutamana päivänä kuultuansa purren valittavan


siitä kun ei päässyt sotaretkelle, päättää lähteä sotaan ja saapi
vanhan sotatoverinsa Tieran kanssansa lähtemään; vv. 1-38. —
Pohjolan matkalla kova pakkanen jäätää Lemminkäisen laivan ja
hätyyttää jo Lemminkäistä itseänsäki, joka loihtu-tiedoillaan toki
pääsee pakkasen kynsistä; vv. 39-82. — Lemminkäinen ja Tiera
lähtevät jäisin maalle astumaan ja tulevat väsyneinä Nälkäniemen
kylään meren rannalla; vv. 83-164.

Ahti poika, aino poika,


Lieto poika Lemminkäinen,
Astuihen alusmajoille,
Läksi laivavalkamoille.
Siinä itki puinen pursi, 5
Hanka rauta haikeroitsi:
"Mi minusta laatimasta,
Kurjasta kuvoamasta!
Ei Ahti sotia soua
Kuunna kymmennä kesänä 10
Hopeankana halulla,
Kullankana tarpehella."
Se on lieto Lemminkäinen
Iski purtta vanttuhulla,
Itse tuon sanoiksi virkki: 15
"Elä huoli hongan pinta!
Vielä saat sotia käyä,
Tappeloita tallustella;
Päivän huomenen perästä
Lienet täynnä soutajia." 20
Kulkevi kylitse tuonne,
Teitse Tieran kartanohon,
Sanoi sinne saatuansa:
"Tieraseni, tiettyseni,
Armahani, ainoiseni! 25
Tokko muistat muinaistamme,
Kun ennen kahen kävimme
Suurilla sotatiloilla?
Ei ollut sitä urosta,
Eikä miestä melkeätä, 30
Kuta emme kaatanehet,
Ja kahen kapistanehet."
Tiera arvasi asian,
Työnti Tiera keihä'änsä
Ahin keihojen kes'elle, 35
Sekä läksi, jotta joutui,
Ahille soan avuksi,
Liioin voivalle lisäksi.
Siitä Ahti Saarelainen,
Itse kaunis Kaukomieli, 40
Lykkäsi venon vesille
Kuni kyyn kulonalaisen,
Läksi luoen luotehesen,
Tuolle Pohjolan merelle.
Silloin Pohjolan emäntä 45
Pakkasen pahan lähetti
Tuolle Pohjolan merelle,
Itse tuon sanoiksi virkki:
"Pakko poika pienokainen,
Lähe tuonne, kunne käsken, 50
Kylmä veitikan venonen,
Pursi lieto Lemminkäisen,
Jott' ei pääse päivinänsä,
Selviä sinä ikänä!"
Pakkanen pahan sukuinen, 55
Ja poika pahan tapainen,
Läksi merta kylmämähän,
Aaltoja asettamahan,
Kylmi veitikan venehen,
Ahin laivan lainehille. 60
Aikoi kylmeä Ahinki,
Jääteä jalon urohon,
Siitä suuttui Lemminkäinen,
Kovin suuttui ja pahastui,
Sanovi sanalla tuolla, 65
Lausui tuolla lausehella:
"Pakkanen Puhurin poika,
Talven poika hyyelmöinen,
Elä kylmä kynsiäni,
Vaa'i varpahuisiani, 70
Eläkä koske korviani,
Elä päätäni palele!"
"Tuonne ma sinun manoan
Pohjan pitkähän perähän;
Sitte sinne tultuasi, 75
Kotihisi käytyäsi,
Kylmä kattilat tulelle,
Hiilet uunin lietoselle,
Käet naisen taikinahan,
Poika neitosen povehen, 80
Utarihin uuhen maito,
Vatsahan hevoisen varsa!"
Siitä lieto Lemminkäinen
Jätti laivan jäätehesen,
Sotapurren puutoksehen, 85
Itse eillehen menevi,
Tiera tuossa toisna miesnä
Väänti veitikan jälessä.
Tallasi tasaista tietä,
Sileätä siuvotteli, 90
Astui päivän, tuosta toisen,
Päivänäpä kolmantena
Jo näkyvi Nälkäniemi,
Kylä kurja kuumottavi.
Astui alle niemen linnan, 95
Sanan virkkoi, noin nimesi:
"Onko linnassa lihoa
Ja kalaista kartanossa
Urohille uupuneille,
Miehille väsynehille?" 100
Ollut ei linnassa lihoa,
Ei kalaista kartanossa.
Virkkoi lieto Lemminkäinen,
Sanoi kaunis Kaukomieli:
"Tuli polta tuhma linna, 105
Vesi vieköhön mokoman!"
Itse eistyvi etemmä,
Ylös korpehen kohosi,
Matkoille majattomille,
Teille tietämättömille. 110
Siitä lieto Lemminkäinen
Sanan virkkoi, noin nimesi:
"Ohoh Tiera veikkoseni!
Jo nyt jouvuimme johonki,
Kuuksi päiväksi kululle, 115
I'äksemme ilman alle."
Tiera tuon sanoiksi virkki,
Itse lausui, noin nimesi:
"Kostohonpa koito raukat
Saimme suurehen sotahan, 120
Oman hengen heitteheksi,
Itsemme iki menoksi."
"Ei emo mitänä tieä
Poloisesta poiastansa;
Noin se itkevi emoni, 125
Valittavi vanhempani:
'Tuoll' on poikani poloisen
Tuonen toukojen panossa;
Saapi jo minun pojalta,
Saapi jouset jouten olla, 130
Lintuset hyvin lihota,
Pyyt lehossa pyrhistellä.'"
Sanoi lieto Lemminkäinen,
Virkkoi kaunis Kaukomieli:
"Kyllä muistan muinaisenki, 135
Arvoan ajan paremman,
Yöt kävimme öitsilöissä,
Illat impien iloissa,
Ei kuin nyt tätä nykyä,
Tällä inhalla iällä: 140
Yks' on tuuli tuttuamme,
Päivä ennen nähtyämme,
Senki pilvet peittelevät,
Satehet salaelevat."
"Vaan en huoli huolimahan, 145
Suuresti sureksimahan,
Jos immet hyvin eläisi,
Kassapäiset kalkettaisi,
Naiset kaikki naurusuulla,
Mesimielin morsiamet, 150
Ikävissä itkemättä,
Huolihin häviämättä."
"Viel' ei meitä noiat noiu,
Noiat noiu, näe näkiät,
Kuolemahan korven päähän, 155
Kankahalle kaatumahan,
Nuorena nukahtamahan,
Verevänä vieremähän."
Siihen Kaukoni kaotan
Virrestäni viikommaksi, 160
Matkoille majattomille,
Teille tietämättömille,
Itse virren vierähytän,
Panen toiselle tolalle.

Yhdesneljättä runo
Veljekset Untamo ja Kalervo riitautuvat kalavedestä keskenänsä,
jonka riidan ratkaisemiseksi Untamo nostaa sodan veljeänsä
Kalervoa vastaan; vv. 1-28. — Untamon väki hävittää koko Kalervon
joukon, paitsi yhden raskaan vaimon, jonka viepi sotavankina
kanssansa. Sille sitte syntyy poika Kullervo, joka jo piennä lasna
miettii kostoa vanhempainsa puolesta; vv. 29-66. — Untamo päättää
surmata pojan; viskauttaa veteen, se ei huku sinne, laittaa
tuliroviohon, se ei pala siinä, hirttäyttää puuhun, se ei kuole
hirsipuuhunkaan; vv. 67-114. — Untamo kun ei saa poikaa hengiltä,
kasvattaa sen suuremmaksi, panee lasta hoitamaan, se kuolettaa
lapsen, laittaa kaskea hakkaamaan, se kaataa koko Untamon
metsän; vv. 115-146. — Kolmanneksi työksi Kullervo määrätään
aidan panoon. Sen panee mahdottoman korkeaksi ja puipi sen
perästä Untamon rukiit paljaaksi pölyksi. Untamo viimein
suuttuneena myöpi hänen turhasta hinnasta Ilmariselle; vv. 147-190.

Kasvatti emo kanoja,


Koko joukon joutsenia,
Kanat aiallen asetti,
Joutsenet joelle saattoi;
Tuli kokko, niin kohotti, 5
Tuli haukka, niin hajotti,
Siipilintu, niin sirotti,
Yhen kantoi Karjalahan,
Toisen vei Venäjän maalle,
Kolmannen kotihin heitti. 10
Minkä vei Venäjän maalle,
Siitä kasvoi kaupanmiesi,
Minkä kantoi Karjalahan,
Siitä se Kalervo kasvoi,
Kunkapa kotihin heitti, 15
Se sikesi Untamoinen
Ison päiviksi pahoiksi,
Emon mielimurtehiksi.
Untamoinen verkot laski
Kalervon kalavetehen; 20
Kalervoinen verkot katsoi,
Kalat konttihin kokosi.
Untamo utala miesi
Sepä suuttui ja vihastui,
Laittoi miehet miekka vyölle, 25
Urohot ase kätehen,
Läksi suurehen sotahan,
Vasten veljeä omoa.
Kalervoisen kaunis minjä
Istui ikkunan lähellä, 30
Katsoi ulos ikkunasta,
Sanan virkkoi, noin nimesi:
"Onko tuo savu sakea,
Vai onpi pimeä pilvi,
Noien peltojen perillä, 35
Kujan uuen ulkopäässä?"
Ei ollut ume umakka,
Eikäpä savu sakea,
Ne oli Untamon urohot,
Tulla suorivat sotahan. 40
Tulipa Untamon urohot,
Kaatoivat Kalervon joukon,
Talon polttivat poroksi,
Tasottivat tantereksi.
Jäi yksi Kalervon impi 45
Kera vatsan vaivaloisen,
Senpä Untamon urohot
Veivät kanssansa kotihin.
Oli aikoa vähäisen,
Syntyi pieni poikalapsi 50
Emollen osattomalle;
Miksi tuo nimitetähän?
Emo kutsui Kullervoksi,
Untamo sotijaloksi.
Pantihinpa poika pieni 55
Kätkyehen liekkumahan,
Liekkui päivän, liekkui toisen,
Jopa kohta kolmantena
Katkaisi kapalovyönsä,
Särki liekun lehmuksisen. 60
Kasvoi kuuta kaksi kolme,
Jopa kuuna kolmantena
Alkoi poika arvaella:
"Kunpa saisin suuremmaksi,
Kostaisin isoni kohlut, 65
Maksaisin emoni mahlat."
Untamo ajattelevi:
"Tästä saa sukuni surma,
Tästä kasvavi Kalervo,
Kunne tuo tuhottanehe?" 70
Pannahanpa puolikkohon,
Työnnetähän tynnyrihin,
Siitä vieähän vetehen,
Lasketahan lainehesen.
Käyähänpä katsomahan 75
Kahen kolmen yön perästä:
Poika oli pääsnyt puolikosta,
Istui aaltojen selässä,
Onkivi meren kaloja,
Merivettä mittoavi. 80
Untamo ajattelevi,
Mihin poika pantanehe:
Kokosi kovia puita,
Honkia sata-havuja,
Tulen puihin tuiskautti, 85
Roviohon roiskautti,
Pojan siihen paiskautti
Keskelle tulen palavan.
Paloi päivän, tuosta toisen,
Paloi päivän kolmannenki, 90
Käytihin katsastamahan:
Poika oli porossa polvin,
Kypenissä kyynäsvarsin,
Hiilikoukkunen käessä,
Millä tulta kiihottavi, 95
Hiiliä kokoelevi.
Untamo ä'itteleikse,
Mihin poika pantanehe,
Kunne tuo tuhottanehe,
Surma tuolle saatanehe: 100
Poika puuhun hirtetähän,
Tammehen ripustetahan.
Kului yötä kaksi kolme,
Aika on käyä katsomahan,
Joko Kullervo katosi, 105
Kuoli poika hirsipuuhun.
Ei ole Kullervo kaonnut,
Kuollut poika hirsipuuhun;
Poika puuta kirjoittavi
Pieni piikkonen käessä, 110
Koko puu kuvia täynnä,
Täynnä tammi kirjoitusta,
Siinä miehet, siinä miekat,
Siinä keihä'ät sivulla.
Mitä taisi Untamoinen 115
Tuon pojan katalan kanssa;
Piti viimeinki väsyä
Suorimasta surmiansa,
Kasvatella Kullervoinen
Orjapoikana omana. 120
Kun oli Kullervo kohonnut,
Saanut vielä vaaksan vartta,
Tuopa työlle työnnetähän,
Lapsen pienen katsontahan.
Katsoi lasta päivän, kaksi, 125
Siitä kohta kolmannella
Lapsen tau'illa tapatti,
Kätkyen tulella poltti.
Untamo ajattelevi:
"Ei tämä tähän sopiva, 130
Panenko kasken kaa'antahan?"
Pani kasken kaa'antahan.
Kullervo Kalervon poika
Suorihe kasken ajohon,
Parahasen parsikkohon, 135
Hirveähän hirsikköhön.
Niin huhuta heiahutti,
Vihellytti viuahutti,
Sanan virkkoi, noin nimesi:
"Sini kaski kaatukohon, 140
Kuni ääni kuulunevi,
Kuni vierrevi vihellys!"
Untamoinen mies utala
Kävi tuota katsomahan:
Ei kaski kas'elle tunnu, 145
Ajamaksi nuoren miehen.
Untamo ajattelevi:
"Ei tämä tähän sopiva,
Hyvän hirsikön pilasi,
Kaatoi parsikon parahan; 150
Panenko aitojen panohon?"
Pani aitojen panohon.
Kullervo Kalervon poika
Läksi aitojen panohon,
Kohastansa kokkahongat 155
Aiaksiksi asettelevi,
Kokonansa korpikuuset
Seipähiksi seisottavi,
Laittoi aian aukottoman,
Veräjättömän kyhäsi. 160
Untamo osaelevi
Tulla tuota katsomahan,
Sanan virkkoi, noin nimesi:
"Ei tämä tähän sopiva,
Aian nosti taivosehen, 165
Ylös pilvihin kohotti,
Ei tuosta ylitse pääse,
Eikä reiästä sisälle;
En tieä, mihin panisin,
Panenko puimahan rukihit?" 170
Kullervo Kalervon poika
Jo on puimassa rukihit,
Pui rukihit ruumeniksi,
Olet kaunoiksi kaotti.
Tulipa isäntä tuohon, 175
Kävi itse katsomahan:
Rukihit on ruumenina,
Olet kaunoina kahisi.
Untamo ä'itteleikse:
"Ei ole tästä raatajasta, 180
Kulle työlle työntänenki,
Työnsä tuhmin turmelevi."
Möi siitä Kalervon poian,
Pani kaupan Karjalahan,
Ilmariselle sepolle, 185
Takojalle taitavalle.
Minpä seppo tuosta antoi?
Äijän seppo tuosta antoi:
Viisi viikate-kulua,
Kuusi kuokan kuolioa. 190

Kahdesneljättä runo
Kullervo Ilmarisen talossa pannaan karjaa paimentamaan. Emäntä
nauraa saadaksensa leipoo ison kiven hänen eväskakkuunsa; vv. 1-
22. — Emäntä karjansa laitumelle laskiessa pyytää Luonnotarten ja
metsän haltiain sille onnea ja menestystä laittamaan; vv. 23-78. —
Hyvällä puheella ja muillaki sanoilla kokee sitte saada otson rauhassa
elämään; vv. 79-122. — Viimeiseksi rukoilee Ukon ja Kuippanan
näyttämään kovuutta otsolle, jos hyvistä sanoista ei tottelisi; vv. 123-
164.

Kullervo Kalervon poika,


Sinisukka äijä on lapsi,
Jo kohta sepon ko'issa
Kysyi työtä iltaisella,
Kulle työlle työntyminen, 5
Raaolle rakentuminen;
Pantihinpa paimeneksi,
Sepon karjan kaitsijaksi.
Tuopa ilkoinen emäntä,
Sepän akka irvihammas, 10
Leipoi leivän paimenelle,
Kakun paksun paistelevi,
Kauran alle, vehnän päälle,
Keskelle kiven kutovi.
Kakun voiti voi-sulalla, 15
Kuoren rasvalla rakenti,
Pani orjallen osaksi,
Palaseksi paimenelle,
Itse orjoa opasti,
Sanan virkkoi, noin nimesi: 20
"Ellös tätä ennen syökö
Karjan mentyä metsälle!"
Siitä Ilmarin emäntä
Laittoi karjan laitumelle,
Sanovi sanalla tuolla, 25
Lausui tuolla lausehella:
"Lasken lehmäni leholle,
Maion antajat aholle,
Työnnän kuuta ottamahan,
Talia tavottamahan, 30
Ahomailta auke'ilta,
Leve'iltä lehtomailta,
Korke'ilta koivikoilta,
Hopeisilta saloilta."
"Etelätär luonnon eukko, 35
Suvetar valio vaimo,
Hongatar hyvä emäntä,
Katajatar kaunis neiti,
Pihlajatar piika pieni,
Tuometar tytär Tapion! 40
Katso'ote karjoani,
Viitsiöte viljoani,
Kesä kaikki kaunihisti,
Lehen aika leppeästi,
Jotta karja kaunistuisi, 45
Eistyisi emännän vilja,
Hyvän-suovan mieltä myöten,
Pahan-suovan paitsi mieltä."
"Suvetar valio vaimo,
Etelätär luonnon eukko! 50
Käy nyt, syötä karjoani,
Raavahiani ravitse,
Syöttele metisin syömin,
Juottele metisin juomin,
Heraisilta hettehiltä, 55
Läikkyviltä lähtehiltä,
Kultaisilta kunnahilta,
Hopeisilta ahoilta."
"Mielikki metsän miniä,
Tellervo Tapion neiti! 60
Kaitse karja kaunihisti,
Viitsi vilja virkeästi,
Varjele vahingon teiltä,
Kaitse kaikista pahoista,
Ettei tuskihin tulisi, 65
Häpeähän hämmentyisi,
Sorkka suohon sorkahtaisi,
Hettehesen herkähtäisi."
"Päivän mennessä majoille,
Iltalinnun laulellessa, 70
Saata karjani kotihin
Utarilla uhkuvilla,
Nisillä pakottavilla;
Koissa on hyvä ollaksensa,
Maa imara maataksensa, 75
Vaimot valkean tekevät
Nurmelle mesinukalle,
Maalle marjanvartiselle."
"Otsonen metsän omena,
Mesikämmen käyretyinen! 80
Tehkämme sulat sovinnot,
Rajarauhat rapsakamme,
Ettet sorra sontareittä,
Kaa'a maion kantajata,
Tänä suurena suvena, 85
Luojan lämminnä kesänä."
"Käyös kaarten karjamaita,
Piilten piimäkankahia,
Kierten kellojen remua,
Ääntä paimenen paeten! 90
Kuulisitko kellon äänen,
Tahi torven toitotoksen,
Tunge korvasi kulohon,
Paina pääsi mättähäsen,
Tahi korpehen kokeos, 95
Muille kummuille kuvahu,
Jott' ei kuulu karjan kello,
Eikä paimenen pakina!"
"Kun sulle halu tulevi,
Syöä mielesi tekevi, 100
Syö'ös sieniä metsästä,
Murra muurahais-kekoja,
Juuria punaisen putken,
Metsolan mesipaloja,
Ilman ruoka-ruohoittani, 105
Minun henki-heinittäni!"
"Kun ma otsona olisin,
Mesikämmennä kävisin,
En mä noin asuisikkana
Aina akkojen jaloissa; 110
Onhan maata muuallaki,
Tarhoa ta'empanaki,
Soita kyllin sorkutella,
Viiakkoa viiletellä."
"Niin teemme sulat sovinnot, 115
Iki rauhat ratkoamme;
Vaan jos tahtonet tapella,
Eleä soan tavalla,
Tapelkamme talvikauet,
Lumi-ajat luskailkamme, 120
Sinisen salon sisässä,
Korven kuulun kainalossa!"
Siitä Ilmarin emäntä
Vielä tuon sanoiksi virkki:
"Oi Ukko ylijumala! 125
Kun kuulet toen tulevan,
Muuta muiksi lehmäseni,
Kamahuta karjaseni,
Kiviksi minun omani,
Kantoloiksi kaunoseni, 130
Kumman maita kulkiessa,
Vantturan vaeltaessa."
"Kun ei tuosta kyllin liene,
Kuippana metsän kuningas!
Korjaele koiriasi, 135
Raivaele rakkiasi,
Käske koirasi koloihin,
Rakkisi rapaja kiinni,
Kultaisihin kytky'ihin,
Hihnoihin hope'isihin, 140
Jott' ei piiloa pitäisi,
Häpehiä hämmentäisi."

You might also like