0% found this document useful (0 votes)
2 views8 pages

0

This document outlines the prerequisites and structure for a graduate-level course on Computer Vision, emphasizing the need for a background in data structures, Python programming, linear algebra, and vector calculus. The course includes assignments focused on Python programming with OpenCV, culminating in a final project that involves real-time image understanding and recognition. It also discusses the relevance of computer vision in various fields, including robotics, autonomous driving, and medical imaging.

Uploaded by

irenevolpe3
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views8 pages

0

This document outlines the prerequisites and structure for a graduate-level course on Computer Vision, emphasizing the need for a background in data structures, Python programming, linear algebra, and vector calculus. The course includes assignments focused on Python programming with OpenCV, culminating in a final project that involves real-time image understanding and recognition. It also discusses the relevance of computer vision in various fields, including robotics, autonomous driving, and medical imaging.

Uploaded by

irenevolpe3
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

PREREQUISITES

• This course is appropriate as a first course for graduate students


with an ECS background, which should have prepared the
students with these essential prerequisites:
• Data structures
• A good working knowledge of Python programming (or willingness and
COMPUTER VISION time to pick it up quickly!)
• Linear algebra
• Vector calculus

Course H02A5A/H02K5A • The course does not assume prior imaging experience, image
processing, or graphics
Course Holder: Dirk Vandermeulen
[email protected] PRIMARY TEXT
Teaching Assistant: Jeroen Bertels
[email protected]

1 2 3

CONVOLUTIONAL NEURAL NETWORKS FOR VISUAL RECOGNITION


ASSIGNMENTS
• Problem sets and the final project will require Python programming with
OpenCV packages for Computer Vision problems.
• Python is a scripting language with a syntax relatively similar to MATLAB.
• Python runs on all platforms (Windows/Linux/Unix/MacOSX). Instructions and
toolkits will be provided via the Toledo portal.
• PROBLEM SETS
• Pset0 – Python installation and basic image manipulation in Python (this week)
• Pset1-N – to be defined, probably using CIFAR-10 dataset

SECONDARY TEXT

4 5 6

WHAT IS COMPUTER VISION? VISION FOR MEASUREMENT


FINAL PROJECT • Automatic understanding of images and video Real-time stereo Structure from motion
Multi-view stereo for
community photo collections
• Project work (definition by end of march) - topic to be decided • Computing properties of the 3D world from visual data
(measurement)
• intermediate assignment to master subparts of the course
• Algorithms and representations to allow a machine to
• uses Python and OpenCV recognize objects, people, scenes, and activities. (perception
and interpretation)
• Max 2p/project NASA Mars Rover

• To be handed in 2 days prior to exam

• Exam = ppt presentation + report

• Exam schedule: on individual/group basis


Pollefeys et al.
Goesele et al.

Slide credit: L. Lazebnik


7 8 9
Vision for perception, interpretation RELATED DISCIPLINES WHY VISION?
• As image sources multiply, so do applications
Artificial • Relieve humans of boring, easy tasks
intelligence • Enhance human abilities: human-computer interaction,
Machine
Graphics visualization
learning
• Perception for robotics / autonomous agents
Computer • Organize and give access to visual content
Image vision Cognitive
processing science
Algorithms

10 11 12

WHY VISION?
• Images and video are everywhere! THE GOAL OF COMPUTER VISION HUMAN PERCEPTION HAS ITS SHORTCOMINGS…
To bridge the gap between pixels and “meaning”

Personal photo albums Movies, news, sports

Sinha and Poggio, Nature, 1996

What we see What a computer sees


Surveillance and security Medical and scientific images
Slide credit; L. Lazebnik Source: S. Narasimhan
13 14 15

Camera Obscura Hubel & Wiesel, 1959


Gemma Frisius, 1545 Encyclopedie, 18th Century
Electrical
signal from
Simple cells: brain
Response to light
orientation
A brief history of computer vision Complex cells:
Response to light
orientation and movement Stimulus
This work is in the public domain Hypercomplex cells:
response to movement
with an end point

Leonardo da Vinci,
16th Century AD Stimulus Response
This work is in the public domain
No response Response Cat image by CNXOpenStax is licensed
under CCBY4.0; changes made
This work is in the public domain (end point)

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 1 - !16 4/3/2018 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 1 - !17 4/3/2018 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 1 - !18 4/3/2018
16 17 18
2 ½-D sketch 3-D model
Input image Edge image
• Generalized Cylinder • Pictorial Structure
Brooks & Binford, 1979 Fischler and Elschlager, 1973

This image is CC01.0 public domain This image is CC01.0 public domain

Input Primal 2 ½-D 3-D Model


Imag Sketc Sketch Representatio
e h n
Zero crossings, Local surface 3-D models
blobs, edges, orientation hierarchicall
Perceived bars, ends, and y organized
intensitie virtual lines, discontinuitie in terms of
s groups, curves s in depth surface and
boundaries and in volumetric
surface primitives
orientation
David Marr, 1970s Stages of Visual Representation, David Marr, 1970s
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 1 - !19 4/3/2018 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 1 - !20 4/3/2018 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 1 - !21 4/3/2018
19 20 21

Face Detection, Viola & Jones,


Normalized Cut (Shi & Malik, 1997) 2001
Image is CC-BY2.0;
Image is CCBY3.0 Image is public domain changes made

Image is CC01.0 public domain

Image is public
domain
David Lowe, 1987
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 1 - !22 4/3/2018 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 1 - !23 4/3/2018 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 1 - !24 4/3/2018
22 23 24

PASCAL Visual Object Challenge


Image is CC01.0 public domain

(20 object categories)


[Everingham et al. 2006-2012]

Image is CC01.0 public domain

Train
Person
frequency

Airplane
Image is public domain
Deformable Part Model
Image is public domain

Felzenswalb, McAllester, Ramanan, 2009


orientation
“SIFT” & Object Recognition, David Lowe, 1999
Image is CC01.0 public domain
Image is CC01.0 public domain

Histogram of Gradients (HoG)


Dalal & Triggs, 2005

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 1 - !25 4/3/2018 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 1 - !26 4/3/2018 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 1 - !27 4/3/2018
25 26 27
Steel drum Steel drum
The Image Classification Challenge: The Image Classification Challenge:
1,000 object classes 1,000 object classes
www.image-net.org
1,431,167 images 1,431,167 images

22K categories and 14M images


Output: Output:
Scale Scale
• Animals
• Bird
• Fish
• Plants
• Tree
• Flower
• Structures
• Artifact
• Tools
• Person
• Scenes
• Indoor
T-shirt
Steel drum
✔ T-shirt
Giant panda

• Mammal • Food • Appliances • Geological Formations
Drumstick Drumstick
• Invertebrate • Materials • Structures • Sport Activities
Mud turtle Mud turtle

Deng, Dong, Socher, Li, Li, & Fei-Fei, 2009 Russakovsky et al. IJCV 2015 Russakovsky et al. IJCV 2015
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 1 - !28 4/3/2018 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 1 - !29 4/3/2018 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 1 - !30 4/3/2018
28 29 30

CURRENT STATE OF THE ART 3D RECONSTRUCTION PHOTO TOURISM OVERVIEW


Reconstruction from
Real-time stereo Structure from motion Internet photo collections

• The next slides show some examples of what current vision systems
can do
Scene
reconstruction
NASA Mars Rover
Photo Explorer
Input Relative camera positions and
orientations
photographs Point cloud
Sparse correspondence

Pollefeys et al. Goesele et al. System for interactive browsing and exploring large collections of photos of a scene.
Computes viewpoint of each photo as well as a sparse 3d model of the scene.
31 32 33

OPTICAL CHARACTER RECOGNITION (OCR) FACE DETECTION


Technology to convert scanned docs to text

License plate readers


• Many new digital cameras now detect faces (autofocus)
http://en.wikipedia.org/wiki/
PHOTO TOURISM OVERVIEW
Digit recognition, AT&T labs
Automatic_number_plate_recognition • Photo-editing software recognizes faces!

• Canon, Sony, Fuji, …


© Computer Vision Course CSE 576 (S. Seitz, R. Szeliski) © Computer Vision Course CSE 576 (S. Seitz, R. Szeliski)
34 35 36
SMILE DETECTION? OBJECT RECOGNITION (IN SUPERMARKETS)

LaneHawk by EvolutionRobotics
“A smart camera is flush-mounted in the checkout lane, continuously watching for

Sony Cyber-shot® T70 Digital Still Camera


items. When an item is detected and recognized, the cashier verifies the quantity of
items that were found under the basket, and continues to close the transaction. The
item can remain under the basket, and with LaneHawk,you are assured to get paid
FACE RECOGNITION
for it… “

© Computer Vision Course CSE 576 (S. Seitz, R. Szeliski)


Who is she?
© Computer Vision Course CSE 576 (S. Seitz, R. Szeliski) © Computer Vision Course CSE 576 (S. Seitz, R. Szeliski)
37 38 39

VISION-BASED LOGIN WITHOUT A PASSWORD… OBJECT RECOGNITION (IN MOBILE PHONES)

“How the Afghan Girl was Identified by Her Iris Patterns” Read the story

! This is becoming real:


Fingerprint scanners on many Face recognition systems now beginning to
new laptops, appear more widely
other devices http://www.sensiblevision.com/ » Lincoln Microsoft Research

© Computer Vision Course CSE 576 (S. Seitz, R. Szeliski) © Computer Vision Course CSE 576 (S. Seitz, R. Szeliski) © Computer Vision Course CSE 576 (S. Seitz, R. Szeliski)
40 41 42

SPORTS: SOCCER WORLD CUP 1966 SPORTS

Video For more details read “Goal-directed Video Metrology”


Courtesy M. Brady, A. Zissermann, Oxford University, UK
43 44 45
SMART CARS AUTONOMOUS-DRIVING CARS VISION-BASED INTERACTION (AND GAMES)

Digimask: put your face on a 3D avatar.

• Mobileye
• Vision systems currently in high-end BMW, GM, Volvo models
• By 2010: 70% of car manufacturers. Segmentation, detection, recognition
Slide content courtesy of Amnon Shashua
• Video demo © Computer Vision Course CSE 576 (S. Seitz, R. Szeliski) © Computer Vision Course CSE 576 (S. Seitz, R. Szeliski)
46 47 48

VISION IN SPACE ROBOTICS AUTOMATED MICROSCOPY


IMI’s MICRO21® Microscopy Workstation is the only completely automated
microscopy workstation on the market today. It automates the entire microscopy
process: slide making/staining, microscopic examination, result reporting, and data
(including images) storage and retrieval.

NASA'S Mars Exploration Rover Spirit captured this westward view from atop
a low plateau where Spirit spent the closing months of 2007.
MICRO21 uses NeuralVision®, a form of artificial intelligence, to locate,
pre-classify, and display up to 200 white blood cells. It’s all automatic you
Vision systems (JPL) used for several tasks don’t even have to be present. All that’s left is for you to review the results
and print the report.
• Panorama stitching
NASA’s Mars Spirit Rover http://www.robocup.org/
• 3D terrain modeling http://en.wikipedia.org/wiki/Spirit_rover
• Obstacle detection, position tracking
• For more, read “Computer Vision on Mars” by Matthies et al.
© Computer Vision Course CSE 576 (S. Seitz, R. Szeliski) © Computer Vision Course CSE 576 (S. Seitz, R. Szeliski)
49 50 51

BONE-AGE/SKELETAL MATURITY
DETERMINATION USING HAND X-RAYS
BONE-AGEING USING HAND X-RAYS DENTAL AGE ESTIMATION
Age-ranked radial bones (2 individuals) from a reference data base

52

Journal of Forensic Odonto-Stomatology. 2017;35(2):49-60 / J. De Tobel et al.

f i g u r e 1 – r E P r E SE N TAT I VE E x A M
LOW Er L E F T T h I r d MOL Ar S I N E A
d EV E LO PME N TA L STAG E.

T h i rd m o la rs a re d e p i c te d wi th i n
b o u n d i n g b ox.

study image Active Shape Model Result

52 53 54

figure 2 – ExCLudEd PANOrAMIC

T h e lowe r lef t t h i rd m o la r wa s s u
b e i n sta g e 2 , b u t i t wa s ex p e c te
so f twa re wo u ld n ot b e a b le to re
stage due to its severe (out of p
Multiple Sclerosis Lesion Quantification
T1 grey matter
CHROMOSOME CLASSIFICATION EXAMPLE: CHROMOSOME CLASSIFICATION
T2

PD

white matter
automatic

A typical solution to an image analysis or object recognition problem often consists of three
parts:
1. Image preprocessing and segmentation: enhancement of characteristic features, removal of
artifacts.
MS lesions
2. Feature extraction: calculation of image object properties.used in the recognition step.
3. Model matching: matching of image objects to model database

Note that not necessarily every object recognition strategy involves all these three steps
consecutively.

55 56 57

COMPUTER VISION
FEATURE
EXTRACTION SYLLABUS OVERVIEW IMAGE FILTERING
• Introduction
Segmentation/Delineation
• Image Filtering

• Feature Detection and Matching

Laplacian Threshold Opening/Closing • Segmentation and Grouping

• Recognition (including Deep Learning based approaches)

• Object Detection

• Face Detection

• Instance Recognition

• Object Recognition
206 Computer Vision: Algorithms and Applications (September 3, 2010 draft)

58 59 60

FEATURE DETECTION (AND MATCHING) Model matching/fitting


CORNERS AND BLOBS • We’ve learned how to
9300 Harris Corners Pkwy, Charlotte, NC

detect edges, corners,


blobs. Now what?
• We would like to form a
higher-level, more
compact representation of
(a) (b) the features in the image
by grouping multiple
features according to a
simple model

(c) (d) RS Ch.4


61 62 63
Figure 4.1 A variety of feature detectors and descriptors can be used to analyze, describe and
match images: (a) point-like interest operators (Brown, Szeliski, and Winder 2005) c 2005
IEEE; (b) region-like interest operators (Matas, Chum, Urban et al. 2004) c 2004 Elsevier;
(c) edges (Elder and Goldberg 2001) c 2001 IEEE; (d) straight lines (Sinha, Steedly, Szeliski
Segmentation and grouping Object Recognition
Computer Vision is Machine Learning

Tutorial Computing
Supervised Unsupervised Semi-Supervised Reinforcement
Learning Learning Learning Learning

[http://poseidon.csd.auth.gr/LAB_RESEARCH/Latest/imgs/
SpeakDepVidIndex_img2.jpg]

Sensory Augmented
Group video frames into shots
[Figure by J. Shi]

Determine image regions Fg / Bg

andRecognition
[Figure by Wang & Suter]

Figure-ground

Object
Perceptual
1980s 1990s to early 2000s 2000-2010…

Visual
[Figure by Grauman & Darrell]
Computer Vision
Object-level grouping
Course 6.S191: Lex Fridman: January
K. Grauman, B. Leibe References: [81] Intro to Deep Learning [email protected] 2017
64 65 66

Visual Cortex Computer Vision is Hard Image Classification Pipeline


(Its Structure is Instructive and Inspiring)

Course 6.S191: Lex Fridman: January Course 6.S191: Lex Fridman: January Course 6.S191: Lex Fridman: January
Reference: https://www.youtube.com/watch?v=_33K1zTtoow Intro to Deep Learning [email protected] 2017 References: [66, 69, 89] Intro to Deep Learning [email protected] 2017 References: [81, 89] Intro to Deep Learning [email protected] 2017
67 68 69

Famous Computer Vision Datasets Let’s Build an Image Classifier for CIFAR-10 Why Deep Learning? Scalable Machine Learning

MNIST: handwritten digits ImageNet: WordNet hierarchy

CIFAR-10(0): tiny images Places: natural scenes

References: [90, 91, 92, 93]


Course 6.S191:
Intro to Deep Learning
Lex Fridman:
[email protected]
January
2017 References: [89, 91]
Course 6.S191:
Intro to Deep Learning
Lex Fridman:
[email protected]
January
2017
For the full updated list of references visit:
https://selfdrivingcars.mit.edu/references
[283, 284] https://deeplearning.mit.edu 2019
70 71 72

You might also like