Deep Learning for Computer Vision: A comparision between Convolutional Neural Networks and Hierarchical Temporal Memories on object recognition tasks - Slides

Alma Mater Studiorum - University of Bologna
School of Science
Department of Computer Science and Engineering DISI
Deep Learning for Computer Vision
Candidate
dott. Vincenzo Lomonaco
Supervisor
prof. Davide Maltoni
Co-examiner
prof. Mauro Gaspari
A comparison between Convolutional Neural
Networks and Hierarchical Temporal Memories on
object recognition tasks

08.09.15 Vincenzo Lomonaco 2
ContentsBackground & Motivations
Objectives
Introduction
CNN and HTM
Key features
Implementations
NORB-sequences
Original NORB dataset
New benchmark design
Experiments and Results
Experiments design
Results
Conclusions
Contents

Objectives
Introduction
CNN and HTM
Key features
Implementations
NORB-sequences
Experiments design
Results
Conclusions
Contents

Deep Learning
In the last decade, Deep Learning techniques have shown to
perform incredibly well on a large variety of problems both in
Computer Vision and Natural Language Processing, resulting in
the state of the art in many tasks.

Deep Learning advantages
Deep Learning is a branch of machine learning based on a set of
algorithms that attempt to model high-level abstractions in data by
using model architectures composed of multiple non-linear
transformations.

Deep Learning disadvantages
● Poorly understood surrounding theory
● Non-optimal method
● Very difficult to train
● Huge quantity of data needed
● High Performance Computing environment needed
Possible limitations:

Objectives
Proving that taking inspiration from biological learning
systems can help again in advancing the field of DL.
Proving that, with less data, it is however possible to reach
good levels of accuracy.

How
We would like to show that, with a lower quantity of available
data, HTM can outperfom CNN on these tasks remaining
comparable in terms of training times.
Comparing two very different deep learning algorithms on
object recognition tasks:
– CNN: classical approach, state-of-the-art for object
recognition
– HTM: new biologically inspired approach

NORB-sequences
Conclusions
Objectives
Introduction
CNN and HTM
Key features
Implementations
Experiments design
Results
Contents

CNN
CNNs are MLP variants where individual neurons are tiled in
such a way that they respond to overlapping regions in the
visual field. They are architectural inspired by Hubel and
Wiesel’s early work on the cat’s visual cortex.
● Python
● Using Theano
● 11 source files, 2550+ lns
● Pure supervised method
● Sparse Connectivity
● Shared Weights
Key features: Implementation:

HTM
HTM is known as a new emerging paradigm that is more
biologically inspired. It tries to incorporate concepts like time,
context and attention during the learning process that are
typical of the human brain.
● C#, OPENMP version
● Provided by Biometric
System Lab (DISI)
● Mainly unsupervised method
● Top down and bottom-up
information flow
● Bayesian probabilistic
formulation
Key features: Implementations:

Conclusions
Objectives
Introduction
CNN and HTM
Key features
Implementations
Experiments design
Results
Contents
NORB-sequences

NORB-Sequences
Since the computer vision community is starting to investigate
object recognition algorithms on videos, we would like to move
our comparison to that direction.
To this purpose, a new benchmark of a large collection of image
sequences starting from the well-know small NORB DATASET
has been created.
THE original NORB DATASET:
● Stores 48,600 96x96 image (5 categories, 10 instances, 6 lightings,
9 elevations, and 18 azimuths).
● Is well-know and accepted by the research community in the
context of object-recognition

original NORB DATASET
Training instances Test instances

Java sequencer
NORB-sequences is made possible thanks to a Java software
that takes in input the small NORB DATASET, and given a
number of different tuning parameters, return a number of
training and a test image sequences.
time
● The sequences are created ad hoc to simulate a camera moving
around a specific object including changes in the surround lighting.
● Integrated KNN baseline, GUI, 10 source files, 2600+ lns
Key features:

NORB Sequences GUI

NORB-sequences
Conclusions
Objectives
Introduction
CNN and HTM
Key features
Implementations
Experiments design
Results
Contents

Experiments design
1) Validate the CNN implementation on the NORB dataset
2) Evaluate the performance of both algorithms on the plain
NORB dataset
3) Evaluate the performance of both algorithms on the NORB
sequences

CNN validation
In order to validate the new implementation,the goal was to
reproduce Y. LeCun original results on the plain NORB
DATASET.

Plain NORB results
Accuracy results comparison between CNN and HTM on the
plain NORB dataset.

Training times
Training times comparison between CNN and HTM on the
NORB sequences.
Training size CNN times HTM times
100 + 800jit 10.94 m 21.19 m
250 + 2000jit 31.15 m 23.13 m
500 + 4000jit 38.24 m 22.14 m
1000 + 4000jit 91.26 m 26.04 m
2500 + 4000jit 94.90 m 61.08 m
5000 + 4000jit 124.7 m 89.58 m
10000 + 4000jit 187.7 m 143.5 m
24300 + 4000jit 51.31 m 596.2 m
● CNN: GPU Tesla C2075 Fermi
(GPU speedup x3.2)
● HTM: CPU Xeon W3550, 4
cores.
Architectures:

NORB sequences results
Accuracy results comparison between CNN and HTM on the
NORB sequences.
Train Test dist. CNN HTM
2x20 1 84.1% 86.36%
2x20 2 82.96% 86.16%
2x20 3 82.26% 86.16%
2x20 4 81.35% 84.20%
3x20 1 89.27% 91.0%
3x20 2 88.46% 89.52%
3x20 3 88.42% 88.68%
3x20 4 88.0% 85.88%
5x20 1 95.11% 92.69%
5x20 2 94.4% 91.62%
5x20 3 93.1% 92.86%
5x20 4 92.13% 91.23%
Train Test dist. CNN HTM
2x20 1 37.56% 37.08%
2x20 2 34.38% 37.82%
2x20 3 30.71% 33.17%
2x20 4 25.47% 28.89%
3x20 1 49.89% 43.68%
3x20 2 48.01% 44.08%
3x20 3 40.56% 37.93%
3x20 4 33.77% 34.93%
5x20 1 55.17% 52.57%
5x20 2 52.55% 49.74%
5x20 3 45.86% 45.52%
5x20 4 40.08% 41.30%
5-classes 50-classes

NORB-sequences
Conclusions
Objectives
Introduction
CNN and HTM
Key features
Implementations
Experiments design
Results
Contents

Conclusions
In this dissertation three different milestones have been
achieved:
1) A LeNet-7 with Theano has been successfully implemented.
2) A new benchmark for object recognition in image
sequences has been created.
3) HTM and CNN have been compared on different object
recognition tasks.
It has been proven that the HTM bio-inspired approach can
be highly competitive and could be instrumental for
advancing the field of Deep Learning

The End
http://vincenzolomonaco.com
vincenzo.lomonaco@studio.unibo.it
“If we want machines to think, we need to teach them to see”
Fei-Fei Li, Stanford Computer Vision Lab
Thank you for your attention
Vincenzo Lomonaco

Deep Learning for Computer Vision: A comparision between Convolutional Neural Networks and Hierarchical Temporal Memories on object recognition tasks - Slides

More Related Content

What's hot (20)

Viewers also liked (20)

Similar to Deep Learning for Computer Vision: A comparision between Convolutional Neural Networks and Hierarchical Temporal Memories on object recognition tasks - Slides (20)

More from Vincenzo Lomonaco (18)

Recently uploaded (20)

Deep Learning for Computer Vision: A comparision between Convolutional Neural Networks and Hierarchical Temporal Memories on object recognition tasks - Slides