0% found this document useful (0 votes)

3 views53 pages

cv_2025_Spring_16

The document outlines the course CSCI-B 457: Introduction to Computer Vision, focusing on deep learning and traditional recognition approaches. It discusses the evolution from hand-designed features to learning feature hierarchies through deep architectures, emphasizing the importance of convolutional neural networks (CNNs) in image classification. Additionally, it covers concepts such as linear classifiers, backpropagation, and the success of CNNs in various recognition tasks, including the ImageNet Challenge.

Uploaded by

stefschr

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views53 pages

cv_2025_Spring_16

Uploaded by

stefschr

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 53

CSCI-B 457

Introduction To Computer Vision

Luddy School of Informatics, IUB
Spring, 2025
Instructor: Xuhong Zhang
Deep Learning Advance II
Traditional Recognition Approach

Image/ Video Hand-designed

Trainable Object
Pixels feature
classifier Class
extraction

• Features are not learned

• Trainable classifier is often generic (e.g. SVM)

Slide credit: Rob Fergus

Traditional Recognition Approach
• Features are key to recent progress in recognition
• Multitude of hand-designed features currently in use
• SIFT, HOG, ………….
• Where next? Better classifiers? Or keep building more features?

Felzenszwalb, Girshick, Yan & Huang

McAllester and Ramanan, PAMI 2007 (Winner of PASCAL 2010 classification competition)

Slide credit: Rob Fergus

What about learning the features?
• Learn a feature hierarchy all the way from pixels to
classifier
• Each layer extracts features from the output of
previous layer
• Train all layers jointly

Image/ Video Simple

Layer 1 Layer 2 Layer 3
Pixels Classifier

Slide credit: Rob Fergus

“Shallow” vs. “deep” architectures
Traditional recognition: “Shallow” architecture

Image/ Video Hand-designed Trainable Object

Pixels feature extraction classifier Class

Deep learning: “Deep” architecture

Image/ Video Simple Object
Pixels Layer 1 … Layer N
classifier Class

Slide credit: Rob Fergus

Linear classifiers
• Linear classifiers model the boundary between
two classes:
x2

Adapted from K. Hauser’s slide

Plane Geometry
• In 3𝐷, a plane can be expressed as the set of
solutions (𝑥, 𝑦, 𝑧) to the equation 𝑎𝑥 + 𝑏𝑦 + 𝑐𝑧 + 𝑑 = 0
• 𝑎𝑥 + 𝑏𝑦 + 𝑐𝑧 + 𝑑 > 0 is one side of the plane
• 𝑎𝑥 + 𝑏𝑦 + 𝑐𝑧 + 𝑑 < 0 is the other side
• 𝑎𝑥 + 𝑏𝑦 + 𝑐𝑧 + 𝑑 = 0 is the plane itself
z

a b

y Adapted from K. Hauser’s slide

Linear Classifier

• In 𝑑 dimensions,
• 𝑐0 + 𝑐1 ∗ 𝑥1 + ⋯ + 𝑐𝑑 ∗ 𝑥𝑑 = 0 is a hyperplane.
• Idea:
• Use 𝑐0 + 𝑐1 ∗ 𝑥1 + ⋯ + 𝑐𝑑 ∗ 𝑥𝑑 ≥ 0 to denote positive classifications
• Use 𝑐0 + 𝑐1 ∗ 𝑥1 + ⋯ + 𝑐𝑑 ∗ 𝑥𝑑 < 0 to denote negative
classifications

Adapted from K. Hauser’s slide

Perceptrons

“the embryo of an electronic

computer that [the Navy]
expects will be able to walk,
talk, see, write, reproduce
itself and be conscious of its
existence.”
Frank Rosenblatt, 1958
Unit (Neuron)
x1

…
xi w i
…
 g
y

𝑦 = 𝑔(Σ𝑖 = 1, … , 𝑛 𝑤𝑖 𝑥𝑖 )
𝑔(𝑢) = 1/[1 + exp(−𝑎𝑢)]

Adapted from K. Hauser’s slide

Training with Neurons

• Treat the problem as one of minimizing errors between the

example label and the network output, given the example
and weights as input
• Error(𝒙𝑖 , 𝑦𝑖, 𝒘) = (𝑦𝑖 – 𝑓(𝒙𝑖 , 𝒘))2

• Sum this error term over all examples

• 𝐸(𝒘) = 𝑖 Error(𝒙𝑖 , 𝑦𝑖, 𝒘) = 𝑖 (𝑦𝑖 – 𝑓(𝒙𝑖 , 𝒘))2

• Minimize errors using an optimization algorithm

• Gradient descent is typically used

Adapted from K. Hauser’s slide

Gradient direction 𝛻𝐸 is orthogonal to the level sets (contours) of E,
points in direction of steepest increase

Adapted from K. Hauser’s slide

Gradient direction 𝛻𝐸 is orthogonal to the level sets (contours) of E,
points in direction of steepest increase