0% found this document useful (0 votes)

3 views

Object Detaction

Convolutional neural networks (CNNs) are extensively applied in object detection, localization, video, and text processing, leveraging engineered features for various multidimensional applications. The document discusses techniques for content-based image retrieval, object localization, object detection, and their integration with recurrent neural networks for video classification. It highlights the hierarchical learning of features in CNNs and their competitive performance in text processing compared to traditional recurrent networks.

Uploaded by

swati.dbit

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views

Object Detaction

Uploaded by

swati.dbit

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

8.6.

APPLICATIONS OF CONVOLUTIONAL NETWORKS 363

Figure 8.18: Example of image classification/localization in which the class “fish” is identi-
fied together with its bounding box. The image is illustrative only.

be performed by deriving the weights from a trained deep-belief convolutional network [285].
This is analogous to the approach in traditional neural networks, where stacked Boltzmann
machines were among the earliest models used for pretraining.

8.6 Applications of Convolutional Networks

Convolutional neural networks have several applications in object detection, localization,
video, and text processing. Many of these applications work on the basic principle of using
convolutional neural networks to provide engineered features, on top of which multidimen-
sional applications can be constructed. The success of convolutional neural networks remains
unmatched by almost any class of neural networks. In recent years, competitive methods
have even been proposed for sequence-to-sequence learning, which has traditionally been
the domain of recurrent networks.

8.6.1 Content-Based Image Retrieval

In content-based image retrieval, each image is first engineered into a set of multidimensional
features by using a pretrained classifier like AlexNet. The pretraining is typically done
up front using a large data set like ImageNet. A huge number of choices of such pretrained
classifiers is available at [586]. The features from the fully connected layers of the classifier
can be used to create a multidimensional representation of the images. The multidimensional
representations of the images can be used in conjunction with any multidimensional retrieval
system to provide results of high quality. The use of neural codes for image retrieval is
discussed in [16]. The reason that this approach works is because the features extracted
from AlexNet have semantic significance to the different types of shapes present in the data.
As a result, the quality of the retrieval is generally quite high when working with these
features.
364 CHAPTER 8. CONVOLUTIONAL NEURAL NETWORKS

FULLY
CONNECTED

SOFTMAX
FULLY CLASS
CONNECTED PROBABILITIES

CLASSIFICATION HEAD
CONVOLUTION LAYERS
(WEIGHTS FIXED FOR TRAIN FOR CLASSIFICATION
BOTH CLASSIFICATION
AND REGRESSION)
FULLY
CONNECTED

LINEAR LAYER
FULLY
CONNECTED
BOUNDING
BOX (FOUR
NUMBERS)
REGRESSION HEAD
TRAIN FOR REGRESSION

Figure 8.19: The broad framework of classification and localization

8.6.2 Object Localization

In object localization, we have a fixed set of objects in an image, and we would like to
identify the rectangular regions in the image in which the object occurs. The basic idea is
to take an image with a fixed number of objects and encase each of them in a bounding
box. In the following, we will consider the simple case in which a single object exists in
the image. Image localization is usually integrated with the classification problem, in which
we first wish to classify the object in the image and draw a bounding box around it. For
simplicity, we consider the case in which there is a single object in the image. We have shown
an example of image classification and localization in Figure 8.18, in which the class “fish”
is identified, and a bounding box is drawn around the portion of the image that delineates
that class.
The bounding box of an image can be uniquely identified with four numbers. A common
choice is to identify the top-left corner of the bounding box, and the two dimensions of
the box. Therefore, one can identify a box with four unique numbers. This is a regression
problem with multiple targets. Here, the key is to understand that one can train almost
the same model for both classification and regression, which vary only in terms of the final
two fully connected layers. This is because the semantic nature of the features extracted
from the convolution network are often highly generalizable across a wide variety of tasks.
Therefore, one can use the following approach:

1. First, we train a neural network classifier like AlexNet or use a pretrained version of
this classifier. In the first phase, it suffices to train the classifier only with image-class
pairs. One can even use an off-the-shelf pretrained version of the classifier, which was
trained on ImageNet.

2. The last two fully connected layers and softmax layers are removed. This removed
set of layers is referred to as the classification head. A new set of two fully connected
8.6. APPLICATIONS OF CONVOLUTIONAL NETWORKS 365

Figure 8.20: Example of object detection. Here, four objects are identified together with
their bounding boxes. The four objects are “fish,” “girl,” “bucket,” and “seat.” The image
is illustrative only.

layers and a linear regression layer is attached. Only these layers are then trained with
training data containing images and their bounding boxes. This new set of layers is
referred to as the regression head. Note that the weights of the convolution layers are
fixed, and are not changed. Both the classification and regression heads are shown
in Figure 8.19. Since the classification and regression heads are not connected to one
another in any way, these two layers can be trained independently. The convolution
layers play the role of creating visual features for both classification and regression.

3. One can optionally fine-tune the convolution layers to be sensitive to both classification
and regression (since they were originally trained only for classification). In such a
case, both classification and regression heads are attached, and the training data for
images, their classes, and bounding boxes are shown to the network. Backpropagation
is used to fine-tune all layers. This full architecture is shown in Figure 8.19.

4. The entire network (with both classification and regression heads attached) is then
used on the test images. The outputs of the classification head provide the class
probabilities, whereas the outputs of the regression head provide the bounding boxes.

One can obtain results of superior quality by using a sliding-window approach. The basic
idea in the sliding-window approach is to perform the localization at multiple locations in
the image with the use of a sliding window, and then integrate the results of the different
runs. An example of this approach is the Overfeat method [441]. Refer to the bibliographic
notes for pointers to other localization methods.

8.6.3 Object Detection

Object detection is very similar to object localization, except that there is a variable number
of objects of different classes in the image. In this case, one wishes to identify all the objects
in the image together with their classes. We have shown an example of object detection
in Figure 8.20, in which there are four objects corresponding to the classes “fish,” “girl,”
“bucket,” and “seat.” The bounding boxes of these classes are also shown in the figure.
366 CHAPTER 8. CONVOLUTIONAL NEURAL NETWORKS

Object detection is generally a more difficult problem than that of localization because of
the variable number of outputs. In fact, one does not even know a priori how many objects
there are in the image. For example, one cannot use the architecture of the previous section,
where it is not clear how many classification or regression heads one might attach to the
convolutional layers.
The simplest approach to this problem is to use a sliding window approach. In the sliding
window approach, one tries all possible bounding boxes in the image, on which the object
localization approach is applied to detect a single object. As a result, one might detect
different objects in different bounding boxes, or the same object in overlapping bounding
boxes. The detections from the different bounding boxes can then be integrated in order
to provide the final result. Unfortunately, the approach can be rather expensive. For an
image of size L × L, the number of possible bounding boxes is L4 . Note that one would have
to perform the classification/regression for each of these L4 possibilities for each image at
test time. This is a problem, because one generally expects the testing times to be modest
enough to provide real-time responses.
In order to address this issue region proposal methods were advanced. The basic idea
of a region proposal method is that it can serve as a general-purpose object detector that
merges regions with similar pixels together to create larger regions. Therefore, the region
proposal methods are used to first create a set of candidate bounding boxes, and then the
object classification/localization method is run in each of them. Note that some candidate
regions might not have valid objects, and others might have overlapping objects. These are
then used to integrate and identify all the objects in the image. This broader approach has
been used in various techniques like MCG [172], EdgeBoxes [568], and SelectiveSearch [501].

8.6.4 Natural Language and Sequence Learning

While the preferred way of machine learning with text sequences is that of recurrent neural
networks, the use of convolutional neural networks has become increasingly popular in
recent years. At first sight, convolutional neural networks do not seem like a natural fit for
text-mining tasks. First, image shapes are interpreted in the same way, irrespective of where
they are in the image. This is not quite the case for text, where the position of a word in a
sentence seems to matter quite a bit. Second, issues such as position translation and shift
cannot be treated in the same way in text data. Neighboring pixels in an image are usually
very similar, whereas neighboring words in text are almost never the same. In spite of these
differences, the systems based on convolutional networks have shown improved performance
in recent years.
Just as an image is represented as a 2-dimensional object with an additional depth
dimension defined by the number of color channels, a text sequence is represented as 1-
dimensional object with depth defined by its dimensionality of representation. The dimen-
sionality of representation of a text sentence is equal to the lexicon size for the case of
one-hot encoding. Therefore, instead of 3-dimensional boxes with a spatial extent and a
depth (color channels/feature maps), the filters for text data are 2-dimensional boxes with
a window (sequence) length for sliding along the sentence and a depth defined by the lex-
icon. In later layers of the convolutional network, the depth is defined by the number of
feature maps rather than the lexicon size. Furthermore, the number of filters in a given layer
defines the number of feature maps in the next layer (as in image data). In image data,
one performs convolutions at all 2-dimensional locations, whereas in text data one performs
convolutions at all 1-dimensional points in the sentence with the same filter. One challenge
8.6. APPLICATIONS OF CONVOLUTIONAL NETWORKS 367

with this approach is that the use of one-hot encoding increases the number of channels,
and therefore blows up the number of parameters in the filters in the first layer. The lex-
icon size of a typical corpus may often be of the order of 106 . Therefore, various types of
pretrained embeddings of words, such as word2vec or GLoVe [371] are used (cf. Chapter 2)
in lieu of the one-hot encodings of the individual words. Such word encodings are semanti-
cally rich, and the dimensionality of the representation can be reduced to a few thousand
(from a hundred-thousand). This approach can provide an order of magnitude reduction
in the number of parameters in the first layer, in addition to providing a semantically rich
representation. All other operations (like max-pooling or convolutions) in the case of text
data are similar to those of image data.

8.6.5 Video Classification

Videos can be considered generalizations of image data in which a temporal component

is inherent to a sequence of images. This type of data can be considered spatio-temporal
data, which requires us to generalize the 2-dimensional spatial convolutions to 3-dimensional
spatio-temporal convolutions. Each frame in a video can be considered an image, and one
therefore receives a sequence of images in time. Consider a situation in which each image
is of size 224 × 224 × 3, and a total of 10 frames are received. Therefore, the size of the
video segment is 224 × 224 × 10 × 3. Instead of performing spatial convolutions with a 2-
dimensional spatial filter (with an additional depth dimension capturing 3 color channels),
we perform spatiotemporal convolutions with a 3-dimensional spatiotemporal filter (and
a depth dimension capturing the color channels). Here, it is interesting to note that the
nature of the filter depends on the data set at hand. A purely sequential data set (e.g., text)
requires 1-dimensional convolutions with windows, an image data set requires 2-dimensional
convolutions, and a video data set requires 3-dimensional convolutions. We refer to the
bibliographic notes for pointers to several papers that use 3-dimensional convolutions for
video classification.
An interesting observation is that 3-dimensional convolutions add only a limited amount
to what one can achieve by averaging the classifications of individual frames by image clas-
sifiers. A part of the problem is that motion adds only a limited amount to the information
that is available in the individual frames for classification purposes. Furthermore, suffi-
ciently large video data sets are hard to come by. For example, even a data set containing a
million videos is often not sufficient because the amount of data required for 3-dimensional
convolutions is much larger than that required for 2-dimensional convolutions. Finally, 3-
dimensional convolutional neural networks are good for relatively short segments of video
(e.g., half a second), but they might not be so good for longer videos.
For the case of longer videos, it makes sense to combine recurrent neural networks
(or LSTMs) with convolutional neural networks. For example, we can use 2-dimensional
convolutions over individual frames, but a recurrent network is used to carry over states
from one frame to the next. One can also use 3-dimensional convolutional neural networks
over short segments of video, and then hook them up with recurrent units. Such an approach
helps in identifying actions over longer time horizons. Refer to the bibliographic notes for
pointers to methods that combine convolutional and recurrent neural networks.
368 CHAPTER 8. CONVOLUTIONAL NEURAL NETWORKS

8.7 Summary
This chapter discusses the use of convolutional neural networks with a primary focus on
image processing. These networks are biologically inspired and are among the earliest suc-
cess stories of the power of neural networks. An important focus of this chapter is the
classification problem, although these methods can be used for additional applications such
as unsupervised feature learning, object detection, and localization. Convolutional neural
networks typically learn hierarchical features in different layers, where the earlier layers
learn primitive shapes, whereas the later layers learn more complex shapes. The backprop-
agation methods for convolutional neural networks are closely related to the problems of
deconvolution and visualization. Recently, convolutional neural networks have also been
used for text processing, where they have shown competitive performance with recurrent
neural networks.

8.8 Bibliographic Notes

The earliest inspiration for convolutional neural networks came from Hubel and Wiesel’s
experiments with the cat’s visual cortex [212]. Based on many of these principles, the notion
of the neocognitron was proposed in early work. These ideas were then generalized to the
first convolutional network, which was referred to as LeNet-5 [279]. An early discussion on
the best practices and principles of convolutional neural networks may be found in [452].
An excellent overview of convolutional neural networks may be found in [236]. A tutorial on
convolution arithmetic is available in [109]. A brief discussion of applications may be found
in [283].
The earliest data set that was used popularly for training convolutional neural net-
works was the MNIST database of handwritten digits [281]. Later, larger datasets like
ImageNet [581] became more popular. Competitions such as the ImageNet challenge
(ILSVRC) [582] have served as sources of some of the best algorithms over the last five
years. Examples of neural networks that have done well at various competitions include
AlexNet [255], ZFNet [556], VGG [454], GoogLeNet [485], and ResNet [184]. The ResNet
is closely related to highway networks [505], and it provides an iterative view of feature
engineering. A useful precursor to GoogLeNet was the Network-in-Network (NiN) architec-
ture [297], which illustrated some useful design principles of the inception module (such
as the use of bottleneck operations). Several explanations of why ResNet works well are
provided in [185, 505]. The use of inception modules between skip connections is proposed
in [537]. The use of stochastic depth in combination with residual networks is discussed
in [210]. Wide residual networks are proposed in [549]. A related architecture, referred to
as FractalNet [268], uses both short and long paths in the network, but does not use skip
connections. Training is done by dropping subpaths in the network, although prediction is
done on the full network.
Off-the-shelf feature extraction methods with pretrained models are discussed in [223,
390, 585]. In cases where the nature of the application is very different from ImageNet data,
it might make sense to extract features only from the lower layers of the pretrained model.
This is because lower layers often encode more generic/primitive features like edges and basic
shapes, which tend to work across an array of settings. The local-response normalization
approach is closely related to the contrast normalization discussed in [221].
The work in [466] proposes that it makes sense to replace the max-pooling layer with
a convolutional layer with increased stride. Not using a max-pooling layer is an advantage

Seminar - NEUROMORPHIC COMPUTING
100% (3)
Seminar - NEUROMORPHIC COMPUTING
14 pages
CV_T3_ Unit-7
No ratings yet
CV_T3_ Unit-7
36 pages
Research and Prospect of Image Recognition Based o
No ratings yet
Research and Prospect of Image Recognition Based o
7 pages
Guddu jha_organized
No ratings yet
Guddu jha_organized
3 pages
Chap 2 DL
No ratings yet
Chap 2 DL
88 pages
Convolutional Neural Network (CNN)
No ratings yet
Convolutional Neural Network (CNN)
38 pages
Convolutional Neural Networks
No ratings yet
Convolutional Neural Networks
8 pages
FODL Unit-4
No ratings yet
FODL Unit-4
46 pages
Liu_2018_J._Phys.__Conf._Ser._1087_062032
No ratings yet
Liu_2018_J._Phys.__Conf._Ser._1087_062032
8 pages
(IJCST-V10I5P12) :mrs J Sarada, P Priya Bharathi
No ratings yet
(IJCST-V10I5P12) :mrs J Sarada, P Priya Bharathi
6 pages
What Is Convolutional Neural Network
No ratings yet
What Is Convolutional Neural Network
16 pages
Scan 30 Sep 23 18 20 44
No ratings yet
Scan 30 Sep 23 18 20 44
30 pages
DL Unit4
No ratings yet
DL Unit4
31 pages
Classify Webcam Images Using Deep Learning
No ratings yet
Classify Webcam Images Using Deep Learning
17 pages
Cnnbasics 171028092801
No ratings yet
Cnnbasics 171028092801
43 pages
Deep Learning Approach For Object Detection Using CNN: Abstract
No ratings yet
Deep Learning Approach For Object Detection Using CNN: Abstract
7 pages
Convolutional_Networks_2024
No ratings yet
Convolutional_Networks_2024
44 pages
Identify Web Cam Images Using Neural Networks
No ratings yet
Identify Web Cam Images Using Neural Networks
17 pages
Convolutional Neural Networks - Part 1
No ratings yet
Convolutional Neural Networks - Part 1
44 pages
Week8 WEB
No ratings yet
Week8 WEB
54 pages
3098 15835 1 PB 2011 PDF
No ratings yet
3098 15835 1 PB 2011 PDF
6 pages
DL unit 3
No ratings yet
DL unit 3
18 pages
Module 3 Notes
No ratings yet
Module 3 Notes
22 pages
Expt 6
No ratings yet
Expt 6
1 page
Introduction To CNN: Convolution Relu Pooling Fully Connected
No ratings yet
Introduction To CNN: Convolution Relu Pooling Fully Connected
15 pages
Theory of CNN (Convolutional Neural Network)
No ratings yet
Theory of CNN (Convolutional Neural Network)
4 pages
Intro to CNN
No ratings yet
Intro to CNN
93 pages
Convolutional Neural Network
No ratings yet
Convolutional Neural Network
9 pages
W11 Lecture ITS69204 Image Recognition (1)
No ratings yet
W11 Lecture ITS69204 Image Recognition (1)
44 pages
Image Recognition Using Neural Networks
No ratings yet
Image Recognition Using Neural Networks
18 pages
cs237b Lecture 6
No ratings yet
cs237b Lecture 6
7 pages
Part 2
No ratings yet
Part 2
225 pages
Steps involved in Image classification using CNN
No ratings yet
Steps involved in Image classification using CNN
5 pages
Image Classification Using CNN: Page - 1
No ratings yet
Image Classification Using CNN: Page - 1
13 pages
Cv Ppt Mt101
No ratings yet
Cv Ppt Mt101
16 pages
ImageNet Classification With Deep
No ratings yet
ImageNet Classification With Deep
7 pages
UNIT -4 DL
No ratings yet
UNIT -4 DL
19 pages
Unit IV Deep Leraning
No ratings yet
Unit IV Deep Leraning
35 pages
6-DeepVisualLearning L6
No ratings yet
6-DeepVisualLearning L6
82 pages
unit-3-CNN-2024
No ratings yet
unit-3-CNN-2024
58 pages
Unit3 2023 NNDL
No ratings yet
Unit3 2023 NNDL
69 pages
Dissertation
No ratings yet
Dissertation
86 pages
Assignment #1: Afzal Ali (11282) Muhammad Hammad (11293) Muhammad Bilal (11291) Mehran Ahmed (11287) Date 20/03/2019
No ratings yet
Assignment #1: Afzal Ali (11282) Muhammad Hammad (11293) Muhammad Bilal (11291) Mehran Ahmed (11287) Date 20/03/2019
7 pages
Unit-4
No ratings yet
Unit-4
19 pages
CVlecture 5
No ratings yet
CVlecture 5
56 pages
Convolution Neural Networks U2
No ratings yet
Convolution Neural Networks U2
24 pages
DL Unit-4
No ratings yet
DL Unit-4
26 pages
CV Unit V
No ratings yet
CV Unit V
18 pages
Convolutional Neural Network
No ratings yet
Convolutional Neural Network
37 pages
Exp 9 DL
No ratings yet
Exp 9 DL
5 pages
M4_IA2
No ratings yet
M4_IA2
6 pages
Understanding of Convolutional Neural Network (CNN) - Deep Learning
No ratings yet
Understanding of Convolutional Neural Network (CNN) - Deep Learning
7 pages
CNN Iitkgp
No ratings yet
CNN Iitkgp
112 pages
Day8(CNN)
No ratings yet
Day8(CNN)
35 pages
CNN For Computer Vision Problem (Session 1)
No ratings yet
CNN For Computer Vision Problem (Session 1)
43 pages
PEC CS 802C Deep Learning
No ratings yet
PEC CS 802C Deep Learning
13 pages
Image Classification Using Small Convolutional Neural Network
No ratings yet
Image Classification Using Small Convolutional Neural Network
5 pages
CNN 2
No ratings yet
CNN 2
47 pages
Unit 3
No ratings yet
Unit 3
105 pages
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
Support Vector Machine: Fundamentals and Applications
From Everand
Support Vector Machine: Fundamentals and Applications
Fouad Sabry
No ratings yet
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
9 pages
Convolutional Neural Networks: Computer Vision
No ratings yet
Convolutional Neural Networks: Computer Vision
14 pages
Perceptrón Multicapa
No ratings yet
Perceptrón Multicapa
6 pages
Labour Law - Synopsis
No ratings yet
Labour Law - Synopsis
3 pages
09-Neural Networks
No ratings yet
09-Neural Networks
1 page
6COM1044 2023 2024 SVM Classification
No ratings yet
6COM1044 2023 2024 SVM Classification
50 pages
Playing Card Detection and Identification: Project Goal
No ratings yet
Playing Card Detection and Identification: Project Goal
1 page
Human Activities Recognition and Monitoring System Using Machine Learning Techniques
No ratings yet
Human Activities Recognition and Monitoring System Using Machine Learning Techniques
5 pages
Survey On Artificial Intelligence (AI)
No ratings yet
Survey On Artificial Intelligence (AI)
5 pages
Tinker With A Neural Network in Your Browser. Don't Worry, You Can't Break It. We Promise
No ratings yet
Tinker With A Neural Network in Your Browser. Don't Worry, You Can't Break It. We Promise
1 page
GEO424 Lect15 Unsup and Object Based PDF
No ratings yet
GEO424 Lect15 Unsup and Object Based PDF
23 pages
3 AI&DS AD3501-DL (1)
No ratings yet
3 AI&DS AD3501-DL (1)
10 pages
Pattern Recognition and Clustering Techniques
No ratings yet
Pattern Recognition and Clustering Techniques
16 pages
Synopsis
No ratings yet
Synopsis
19 pages
NLP m4
No ratings yet
NLP m4
97 pages
Get Hardware Accelerator Systems For Artificial Intelligence and Machine Learning (Volume 122) (Advances in Computers, Volume 122) 1st Edition Shiho Kim (Editor) Free All Chapters
No ratings yet
Get Hardware Accelerator Systems For Artificial Intelligence and Machine Learning (Volume 122) (Advances in Computers, Volume 122) 1st Edition Shiho Kim (Editor) Free All Chapters
49 pages
Makalah Elektro Terapi (Tens)
No ratings yet
Makalah Elektro Terapi (Tens)
11 pages
Markov Decision Process
No ratings yet
Markov Decision Process
8 pages
Unit 2 (Second Order Methods)
No ratings yet
Unit 2 (Second Order Methods)
9 pages
History of AI
No ratings yet
History of AI
1 page
Community Forensics Using Thousands Generators to TrainTrain Fake Image
No ratings yet
Community Forensics Using Thousands Generators to TrainTrain Fake Image
15 pages
Send+ More Money Problem
No ratings yet
Send+ More Money Problem
1 page
Greedy Layerwise Learning
No ratings yet
Greedy Layerwise Learning
39 pages
Pneumonia Detection On X-Ray Image Using Improved Depthwise Separable Convolutional Neural Networks
No ratings yet
Pneumonia Detection On X-Ray Image Using Improved Depthwise Separable Convolutional Neural Networks
9 pages
Artificial Neural Network - ..
100% (1)
Artificial Neural Network - ..
15 pages
Comparison of Activation Function On Extreme Learning Machine (ELM) Performance For Classifying The Active Compound - 5.0023872
No ratings yet
Comparison of Activation Function On Extreme Learning Machine (ELM) Performance For Classifying The Active Compound - 5.0023872
9 pages
Chapter 9
No ratings yet
Chapter 9
73 pages
PYTHON
No ratings yet
PYTHON
5 pages
SOFT COMPUTING _NOTES_UNIT 4 and UNIT 5
No ratings yet
SOFT COMPUTING _NOTES_UNIT 4 and UNIT 5
32 pages

Object Detaction

Uploaded by

Object Detaction

Uploaded by

8.6.

APPLICATIONS OF CONVOLUTIONAL NETWORKS 363

8.6 Applications of Convolutional Networks

8.6.1 Content-Based Image Retrieval

Figure 8.19: The broad framework of classification and localization

8.6.2 Object Localization

8.6.3 Object Detection

8.6.4 Natural Language and Sequence Learning

8.6.5 Video Classification

Videos can be considered generalizations of image data in which a temporal component

8.8 Bibliographic Notes

You might also like