0% found this document useful (0 votes)
1 views

Chapter 1 - Vision AI

Uploaded by

Muhammad Ashraf
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views

Chapter 1 - Vision AI

Uploaded by

Muhammad Ashraf
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

Vision AI

Introduction

Notes based on
EECS 498-007 / 598-005
Deep Learning for Computer Vision
At University of Michigan

9/10/2023 Zafri Baharuddin EEEB4023/ECEB463 AI 1 The Energy University


Deep Learning for Computer Vision

Building artificial systems that


process,
perceive, and
reason about visual data

9/10/2023 Zafri Baharuddin EEEB4023/ECEB463 AI 2 The Energy University


Computer Vision is everywhere
Left to right:
Image by Intel is
copyrighted
Image from Unsplash
Image from Unsplash
Image by Nasa is public

Left to right:
Image by Nasa is public
Image from Unsplash
Image from Unsplash
Image from Unsplash

Left to right:
Image from Unsplash
Image from Unsplash
Image from Unsplash
Image copyrighted by
ExtremeTech

9/10/2023 Zafri Baharuddin EEEB4023/ECEB463 AI 3 The Energy University


Deep Learning for Computer Vision

Building artificial systems that


learn from data and experience

9/10/2023 Zafri Baharuddin EEEB4023/ECEB463 AI 4 The Energy University


Deep Learning for Computer Vision

Hierarchical learning algorithms


with many “layers”,
(very) loosely inspired by the brain

9/10/2023 Zafri Baharuddin EEEB4023/ECEB463 AI 5 The Energy University


Artificial Intelligence

Machine Learning
Computer
Deep
Vision
Learning

Our focus area

9/10/2023 Zafri Baharuddin EEEB4023/ECEB463 AI 6 The Energy University


A brief history of computer vision and deep learning

9/10/2023 Zafri Baharuddin EEEB4023/ECEB463 AI 7 The Energy University


Hubel and Wiesel, 1959 [source]

Measure Display
brain activity Simple cells:
Response to light
orientation

Complex cells:
Response to light
orientation and movement

Hypercomplex cells:
response to movement
with an end point

1959
Hubel & Wiesel
Response Stimulus
No response

9/10/2023 Zafri Baharuddin EEEB4023/ECEB463 AI 8 The Energy University


Larry Roberts, 1963

(a) Original picture (b) Differentiated picture (c) Feature points selected

1959 1963
Hubel & Wiesel Roberts

9/10/2023 Zafri Baharuddin EEEB4023/ECEB463 AI 9 The Energy University


Input image Edge image 2 ½-D Sketch 3-D Model

This image from Unsplash This image from pixabay

Input Primal 2 ½-D 3-D Model


Image Sketch Sketch Representation

3-D models
Zero crossings, Local surface
hierarchically
blobs, edges, bars, orientation and
Perceived organized in terms
ends, virtual lines, discontinuities in
intensities of surface and
groups, curves depth and in surface
volumetric
boundaries orientation
primitives

1959 1963 1970s


Hubel & Wiesel Roberts David Marr

Stages of Visual Representation, David Marr, 1970s

9/10/2023 Zafri Baharuddin EEEB4023/ECEB463 AI 10 The Energy University


Recognition via Parts (1970s)

Generalized Cylinders, Pictorial Structures,


Brooks and Binford, 1979 Fischler and Elshlager, 1973

1959 1963 1970s 1979


Hubel & Wiesel Roberts David Marr Gen. Cylinders

9/10/2023 Zafri Baharuddin EEEB4023/ECEB463 AI 11 The Energy University


Recognition via Edge Detection (1980s)

Image from Unsplash Apply Canny edge detection in OpenCV

1959 1963 1970s 1979 1986


John Canny, 1986
Hubel & Wiesel Roberts David Marr Gen. Cylinders Canny David Lowe, 1987
Demo

9/10/2023 Zafri Baharuddin EEEB4023/ECEB463 AI 12 The Energy University


Recognition via Grouping (1990s)

1959 1963 1970s 1979 1986 1997 Normalized Cuts, Shi and Malik, 1997
Hubel & Wiesel Roberts David Marr Gen. Cylinders Canny Norm. Cuts

Image from Unsplash Image from Unsplash Image from Unsplash

9/10/2023 Zafri Baharuddin EEEB4023/ECEB463 AI 13 The Energy University


Recognition via Matching (2000s)

Image is Open Source Image is Open Source Implemented code

SIFT, David
1959 1963 1970s 1979 1986 1997 1999
Hubel & Wiesel Roberts David Marr Gen. Cylinders Canny Norm. Cuts SIFT Lowe, 1999
Demo

9/10/2023 Zafri Baharuddin EEEB4023/ECEB463 AI 14 The Energy University


Face Detection

Viola and Jones, 2001

One of the first successful


applications of machine
learning to vision

1959 1963 1970s 1979 1986 1997 1999 2001


Hubel & Wiesel Roberts David Marr Gen. Cylinders Canny Norm. Cuts SIFT V&J

9/10/2023 Zafri Baharuddin EEEB4023/ECEB463 AI 15 The Energy University


PASCAL Visual Object Challenge [source]

Image from Unsplash

train

person
person

airplane

Image from Unsplash Image from Unsplash

1959 1963 1970s 1979 1986 1997 1999 2001 2007


Hubel & Wiesel Roberts David Marr Gen. Cylinders Canny Norm. Cuts SIFT V&J PASCAL

9/10/2023 Zafri Baharuddin EEEB4023/ECEB463 AI 16 The Energy University


Large Scale Visual Recognition Challenge

The Image Classification Challenge:


1,000 object classes
1,431,167 images

Deng et al, 2009


Russakovsky et al. IJCV 2015

1959 1963 1970s 1979 1986 1997 1999 2001 2007 2009
Hubel & Wiesel Roberts David Marr Gen. Cylinders Canny Norm. Cuts SIFT V&J PASCAL ImageNet

9/10/2023 Zafri Baharuddin EEEB4023/ECEB463 AI 17 The Energy University


Large Scale Visual Recognition Challenge

Deep learning begins

Interactive Graph

Deng et al, 2009


Russakovsky et al. IJCV 2015

1959 1963 1970s 1979 1986 1997 1999 2001 2007 2009
Hubel & Wiesel Roberts David Marr Gen. Cylinders Canny Norm. Cuts SIFT V&J PASCAL ImageNet

9/10/2023 Zafri Baharuddin EEEB4023/ECEB463 AI 18 The Energy University


Deep Learning now famous: AlexNet

Krizhevsky, Sutskever, and Hinton, NeurIPS 2012


[paper]

1959 1963 1970s 1979 1986 1997 1999 2001 2007 2009
Hubel & Wiesel Roberts David Marr Gen. Cylinders Canny Norm. Cuts SIFT V&J PASCAL ImageNet

2012
AlexNet

9/10/2023 Zafri Baharuddin EEEB4023/ECEB463 AI 19 The Energy University


Perceptron
One of the earliest algorithms that could learn from data

Implemented in hardware! Weights stored in potentiometers,


updated with electric motors during learning

Connected to a camera that used 20x20 cadmium sulfide


photocells to make a 400-pixel image

Could learn to recognize letters of the alphabet


Mark I Perceptron
Today we would recognize it as a linear classifier [try] at the Cornell Aeronautical Laboratory
Cornell University News Service records, #4-3-15. Division of Rare
and Manuscript Collections, Cornell University Library.

1959 1963 1970s 1979 1986 1997 1999 2001 2007 2009
Hubel & Wiesel Roberts David Marr Gen. Cylinders Canny Norm. Cuts SIFT V&J PASCAL ImageNet

1958 2012
Perceptron AlexNet

9/10/2023 Zafri Baharuddin EEEB4023/ECEB463 AI 20 The Energy University


Minsky and Papert, 1969
y
X y XOR(x,y)
0 0 0 1 0
0 1 1
1 0 1
1 1 0 0 1 x

Showed that Perceptrons could not learn the XOR function.


Caused a lot of disillusionment in the field. MIT Press

1959 1963 1970s 1979 1986 1997 1999 2001 2007 2009
Hubel & Wiesel Roberts David Marr Gen. Cylinders Canny Norm. Cuts SIFT V&J PASCAL ImageNet

1958 1969 2012


Perceptron Minsky & Papert AlexNet

9/10/2023 Zafri Baharuddin EEEB4023/ECEB463 AI 21 The Energy University


Neocognitron: Fukushima, 1980 [paper]

Computational model the visual system,


directly inspired by Hubel and Wiesel’s
hierarchy of complex and simple cells

Interleaved simple cells (convolution)


and complex cells (pooling)

No practical training algorithm

1959 1963 1970s 1979 1986 1997 1999 2001 2007 2009
Hubel & Wiesel Roberts David Marr Gen. Cylinders Canny Norm. Cuts SIFT V&J PASCAL ImageNet

1958 1969 1980 2012


Perceptron Minsky & Papert Neocognitron AlexNet

9/10/2023 Zafri Baharuddin EEEB4023/ECEB463 AI 22 The Energy University


Backprop: Rumelhart, Hinton, and Williams, 1986 [paper]

Introduced backpropagation
for computing gradients in
neural networks

Successfully trained
perceptrons with multiple
layers
Illustration of Rumelhart et al., 1986 by Lane McIntosh,
copyright CS231n 2017

1959 1963 1970s 1979 1986 1997 1999 2001 2007 2009
Hubel & Wiesel Roberts David Marr Gen. Cylinders Canny Norm. Cuts SIFT V&J PASCAL ImageNet

1958 1969 1980 1985 2012


Perceptron Minsky & Papert Neocognitron Backprop. AlexNet

9/10/2023 Zafri Baharuddin EEEB4023/ECEB463 AI 23 The Energy University


Convolutional Networks: LeCun et al, 1998 [Lecun’s Paper]

• Applied backprop algorithm to a Neocognitron-like architecture


• Learned to recognize handwritten digits
• Was deployed in a commercial system by NEC, processed handwritten checks
• Very similar to our modern convolutional networks!

1959 1963 1970s 1979 1986 1997 1999 2001 2007 2009
Hubel & Wiesel Roberts David Marr Gen. Cylinders Canny Norm. Cuts SIFT V&J PASCAL ImageNet

1958 1969 1980 1985 1998 2012


Perceptron Minsky & Papert Neocognitron Backprop. LeNet AlexNet

9/10/2023 Zafri Baharuddin EEEB4023/ECEB463 AI 24 The Energy University


2000s: “Deep Learning”
• People tried to train neural
networks that were deeper and
deeper

• Not a mainstream research topic


at this time

Hinton and Salakhutdinov, 2006


Bengio et al, 2007
Lee et al, 2009
Glorot and Bengio, 2010

1959 1963 1970s 1979 1986 1997 1999 2001 2007 2009
Hubel & Wiesel Roberts David Marr Gen. Cylinders Canny Norm. Cuts SIFT V&J PASCAL ImageNet

1958 1969 1980 1985 1998 2006 2012


Perceptron Minsky & Papert Neocognitron Backprop. LeNet Deep Learning AlexNet

9/10/2023 Zafri Baharuddin EEEB4023/ECEB463 AI 25 The Energy University


2012 to Present: Deep Learning Explosion

Google Trends: “Deep Learning”


https://trends.google.com/trends/explore?date=all&q=Deep%20learning

1959 1963 1970s 1979 1986 1997 1999 2001 2007 2009
Hubel & Wiesel Roberts David Marr Gen. Cylinders Canny Norm. Cuts SIFT V&J PASCAL ImageNet

1958 1969 1980 1985 1998 2006 2012


Perceptron Minsky & Papert Neocognitron Backprop. LeNet Deep Learning AlexNet

9/10/2023 Zafri Baharuddin EEEB4023/ECEB463 AI 26 The Energy University


2012 to Present: ConvNets are everywhere
Image Classification Image Retrieval

Figures copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012.

9/10/2023 Zafri Baharuddin EEEB4023/ECEB463 AI 27 The Energy University


2012 to Present: ConvNets are everywhere
Object Detection Image Segmentation

Ren, He, Girshick, and Sun, 2015 Fabaret et al, 2012

9/10/2023 Zafri Baharuddin EEEB4023/ECEB463 AI 28 The Energy University


2012 to Present: ConvNets are everywhere

Video Classification Activity Recognition

Simonyan et al, 2014

9/10/2023 Zafri Baharuddin EEEB4023/ECEB463 AI 29 The Energy University


2012 to Present: ConvNets are everywhere
Pose Recognition (Toshev and Szegedy, 2014)

Playing Atari games (Guo et al, 2014)

9/10/2023 Zafri Baharuddin EEEB4023/ECEB463 AI 30 The Energy University


2012 to Present: ConvNets are everywhere
Medical Imaging, Levy et al, 2016 Whale recognition

Galaxy Classification, Dieleman et al, 2014

Kaggle Challenge

9/10/2023 Zafri Baharuddin EEEB4023/ECEB463 AI 31 The Energy University


2012 to Present: ConvNets are everywhere

Image Captioning
Vinyals et al, 2015
Karpathy and Fei-Fei, 2015

All images are CC0 Public domain:


https://pixabay.com/en/luggage-antique-cat-1643010/
https://pixabay.com/en/teddy-plush-bears-cute-teddy-bear-1623436/
https://pixabay.com/en/surf-wave-summer-sport-litoral-1668716/
https://pixabay.com/en/woman-female-model-portrait-adult-983967/
https://pixabay.com/en/handstand-lake-meditation-496008/
https://pixabay.com/en/baseball-player-shortstop-infield-1045263/

Captions generated by Justin Johnson using Neuraltalk2

9/10/2023 Zafri Baharuddin EEEB4023/ECEB463 AI 32 The Energy University


Style transfer https://deepart.io

9/10/2023 Zafri Baharuddin EEEB4023/ECEB463 AI 33 The Energy University


Style transfer https://deepart.io DALL·E 2 (openai.com)

9/10/2023 Zafri Baharuddin EEEB4023/ECEB463 AI 34 The Energy University


Algorithms

Data

Computation

9/10/2023 Zafri Baharuddin EEEB4023/ECEB463 AI 35 The Energy University


Higher GigaFLOPs at a lower cost

http://en.wikipedia.org/wiki/FLOPS#Hardware_costs

9/10/2023 Zafri Baharuddin EEEB4023/ECEB463 AI 36 The Energy University


2018 Turing Award For conceptual and engineering breakthroughs that have made
deep neural networks a critical component of computing.

Yoshua Bengio Geoffrey Hinton Yann LeCun

1959 1963 1970s 1979 1986 1997 1999 2001 2007 2009
Hubel & Wiesel Roberts David Marr Gen. Cylinders Canny Norm. Cuts SIFT V&J PASCAL ImageNet

1958 1969 1980 1985 1998 2006 2012 2018


Perceptron Minsky & Papert Neocognitron Backprop. LeNet Deep Learning AlexNet Turing Award

9/10/2023 Zafri Baharuddin EEEB4023/ECEB463 AI 37 The Energy University


Widespread acceptance of Machine Learning

Technology: Stable Diffusion


Examples: Midjourney, Dall-E

“Optimus prime dancing in a flower “an apple on top of a house in a


field in the style of van gogh” stadium filled with chickens in the
Sem 1 2023 Class × DALL·E style of van gogh”
Sem 2 2023 Class × DALL·E

1959 1963 1970s 1979 1986 1997 1999 2001 2007 2009 2021
Hubel & Wiesel Roberts David Marr Gen. Cylinders Canny Norm. Cuts SIFT V&J PASCAL ImageNet Dall-E

1958 1969 1980 1985 1998 2006 2012 2018


Perceptron Minsky & Papert Neocognitron Backprop. LeNet Deep Learning AlexNet Turing Award

9/10/2023 Zafri Baharuddin EEEB4023/ECEB463 AI 38 The Energy University


Widespread acceptance of Machine Learning
3D depth maps Neural Radiance Fields (NeRF)

Corridor Digital Google Presents 2023

1959 1963 1970s 1979 1986 1997 1999 2001 2007 2009 2021 2023
Hubel & Wiesel Roberts David Marr Gen. Cylinders Canny Norm. Cuts SIFT V&J PASCAL ImageNet Dall-E This Class!

1958 1969 1980 1985 1998 2006 2012 2018 2022


Perceptron Minsky & Papert Neocognitron Backprop. LeNet Deep Learning AlexNet Turing Award ChatGPT

9/10/2023 Zafri Baharuddin EEEB4023/ECEB463 AI 39 The Energy University


Artificial Intelligence
Technology
Can Better Our Lives

9/10/2023 Zafri Baharuddin EEEB4023/ECEB463 AI 40 The Energy University

You might also like