Computer vision series

Imaging and Vision
Pathfinding
Perry Lea
ACM Distinguished Lectures

Floating Point
2D
Graphics
3D
Graphics
Vision Computational
Photography
Physics Kernel Floating Point
Requirement
X Color Space
Conversion
Fixed Point
X Gaussian Blur Fixed Point
X Sobel Edge
Detection
Fixed Point
X Bilateral Filters Fixed Point
X Bilinear
Interpolation
Fixed Point
X Bicubic
Interpolation
Half or Single Precision
X Image Signal
Processor
Fixed Point
X X Exposure
Compensation
Single Precision
X X Image Blending Fixed Point
X X Scaling Fixed (for binary scaling)
X Texture Mapping Fixed Point
X Pixel Shading Single / Double Precision
X Z-Buffer Depth Test Single
X Compositing Fixed Point or Half
Precision
X Ray Tracing Single Precision
X 3D Vertex Shading Double Precision
X Fluid Dynamics Single / Double Precision
X JPEG Compression Fixed Point2

Vision Segments
4
ADAS and Automotive
Medical Imaging
Consumer Electronics and GamingIndustrial Automation &
Robotics
Security, Surveillance,
Intelligence, Defense
Facial Recognition

Vision Market
6
Tractica Research: 42% CAGR, $33B market by 2019
Market to Market: 22.6% CAGR, $22.2B market by 2020

Human Vision
| February 11,
2018
| Micron Confidential
7

What do you see here?
Do you see lines between the
circles?
 Guess what: there are none.
Rule 1: Sensory input does not
contain enough information to
explain our perception
What did you just see?
 Did you see the people on the bridge?
 Did you see the church?
 Did you see the tunnel?
Rule 2: There is too much sensory
input to include in our coherent
perceptions at any single moment
8

Human Visual Dataflow
Human vision interprets
images bottom up and top
down:
Bottom Up: Based on raw
sensory data (pixels)
Top down: based on feature
extraction
Find the Target
9
Human Brain Visual System
from Ganglion to Cortex

How Human Vision Works
Humans are born with a nearly fully
developed vision system
Cortical pathways are reinforced and
restructured within the 1st year of
development.
Vision starts at ganglion
cells and follows
the optical nerve.
Some receptors will excite with light
intensity, some will inhibit activity.
1
0

Feature Extraction
When a collection of
photoreceptors are organized
into a center-surround field,
the brain can easily perceive
light and dark regions.
Edges force ganglion cells to
deliver reinforced or
diminished signals.
Visual System does an
extraordinary job at throwing
away information.
1
1
Ganglion Cell Signal Strength

Vision Principles
SIFT in 6 slides!
Just as the human brain perceives image data top-down and bottom-up,
so are typical vision algorithms.
 Features are “interesting” parts of an image and we will rely on the same
edges, corners, and ridges. To be useful, feature points must:
Be numerous
Be repeatable
Represent orientation
and scale
Be fast to extract
and match
1
3

Typical Feature Extraction Algorithm
Detector
 Find Scale Space
Extrema
 Keypoint Localization
Improve keypoints and
throw out bad ones.
Descriptor
 Orientation Assignment
(remove effects of
rotation and scale)
 Create Descriptor
Use histograms of
orientations
1
4
Lens
Lens
Correction
White
Balance
Noise
Reduction
Demosaic
Color
Correction
Tone
Mapping
Sharpening
Gamma
Correction
3A Stats
RGB2YUV Scaler
DRAM
Image Signal Processor (Front End)
Feature Extraction (Back End)
12 MegaPixel Image (RAW10=15 MB to
37MB. @30 fps = 450 MB/s)
Preprocess Scan Image
Filter Feature
Locations
Generate
Signature
Post Process
Descriptors

Finding Scale Space
Finds keypoints in image.
Image is convolved at different
scales (variant of blob detection)
Best way to do this is a Laplacian of Gaussian:
 But a LoG is really computationally expensive (hmmm)
 So we’ll cheat and do a Difference of Gaussian Blurs:
 Convolved images are grouped by “octaves” which is simply the scale at that
point. We convolve a certain number of images per octave k
 Take the difference of the convolved images k per octave.
1
5

Finding Scale Space
Find Extreme
Choose all extrema within a 3x3x3
neighborhood
 This is done by comparing each pixel in the
DoG images to its eight neighbors at the
same scale and nine corresponding
neighboring pixels in each of the neighboring
scales. If the pixel value is the maximum or
minimum among all compared pixels, it is
selected as a candidate keypoint.
1
6

Keypoint Localization
Scale space extrema produce too many
candidates.
Minimize:
 Use Taylor series expansion to get
true extrema
Reject:
 Points with bad contrast
 Points with strong edge response in 1 direction
1
7

Orientation Assignment
Remove effects of rotation
Create a gradient of histograms (36 bins)
Weighted by magnitude of Gaussian Window
Any peak within 80% of highest is a new keypoint
Parabola a parabola is fit to the 3 histograms closest to each peak
1
8

Keypoint Descriptor
We now want to compute a descriptor for each keypoint to make
them distinctive with various illuminations, 3D views, etc.
Similar to human biological vision
Neurons respond to gradients at certain frequencies
4x4 gradient window with a histogram of 4x4 samples per
window = 4x4x8 = 128 feature vectors
1
9
Lighting gains will
not affect descriptors

Feature Detection Algorithms
Edge Detection:
Canny, Sobel, Prewitt,
Differential
Corner Detectors:
Harris, FAST, SUSAN
Blob Detectors:
Laplacian of Gaussian
Difference of Gaussian
Determinant of Gaussian
2
0
 Transforms:
– Ridge, Hough, Structural
Tensor
 Affine Invariants
– Affine shape adapter
– Harris Affine
– Hessian Affine
 Feature Descriptors
– SIFT, SURF, GLOH, HOG,
BRIEF, ORB, BRISK, FREAK

Other Vision Challenges
Segmentation
Meaningful partitioning of
image/video into non-overlapping
regions and subvolumes. Ability to
handle multi-modal data of varying
complexity
2
1
Color Image Segmentation Output
Original Image courtesy of
University of California at Berkeley
Courtesy RIT

Super Resolution
Utilizing multiple images of a given scene to obtain a high
resolution image with improved image quality
2
2

Hierarchical Scale Space
Using information at various scales to determine the semantic
structure of an image. Utilize probabilistic modeling of an image
content to build a dynamic hierarchical tree for high resolution
remote sensing.
2
3
Courtesy RIT

Computational Photography
2
4
Computational photography combines
plentiful computing, digital sensors,
modern optics, actuators, and smart lights
to escape the limitations of traditional film
cameras and enables novel imaging
applications. Unbounded dynamic range,
variable focus, resolution, and depth of
field, hints about shape, reflectance, and
lighting, and new interactive forms of
photos that are partly snapshots and partly
videos are just some of the new
applications found in Computational
Photography.
• Light Field Arrays
• Massive Image Stitching/Warping
• Computational Optics
• Holographic Imaging

Computer vision series

More Related Content

What's hot (20)

Similar to Computer vision series (20)

Recently uploaded (20)

Computer vision series