3.1 - Image Fundamentals
3.1 - Image Fundamentals
Image Fundamentals
Leandro L. Minku
Video Presentation
Assignment
• Presentation about a data science problem of your choice
• Max 5 min.
2
Module Topics
1. Time series
3. Computer vision
4. Fairness
3
Computer Vision
• Image fundamentals.
• Convolutional Neural Networks and variations.
• Images as arrays.
7
8
Pixels
9
Pixel Representation
• Grayscale
• Scalar between 0 (black) and 255 (white), i.e., in
{0,…,255}.
10
Pixel Representation
• Colour
• Various colour representations exist.
11
RGB: Three Channels
[Image source: Deep Learning for Computer Vision with Python book] 12
Images as Arrays:
1 Channel (2d Matrices)
13
Images as Arrays:
1 Channel (2d Matrices)
14
Images as Arrays:
3 Channels (3d Tensors)
M[0,0,2],
M[0,0,1],
M[0,0,0],
Height
Width
• Level of detail.
• Resolution of 1000 x 750 means Note: a pixel may have one or more
• 1000 pixels wide and channels (depth of image). This is
not considered for the resolution.
• 750 pixels tall.
• I.e., 750,000 pixels in total. 16
Demo
• image-fundamentals.ipynb
• Load an image with OpenCV library
• Convert from BGR to RGB
• Convert image to Grayscale
• Plot each channel of the image
17
Scaling (Resizing)
• Increasing or decreasing the size of an image in terms of
width and height.
• We will often need to do this when using machine learning.
[Image source: Deep Learning for Computer Vision with Python book] 18
Cropping
19
Demo
• image-fundamentals.ipynb
• Rescaling an image
• Rescaling by factor
• Cropping an image
20
Image Convolution
• Application of a lter to an image, e.g., to:
• Blur.
• Sharpen.
• Detect edges.
• Etc.
21
fi
Image Convolution
• Application of a lter to an image, e.g., to:
• Blur.
• Sharpen.
• Detect edges.
• Etc.
22
fi
Image Convolution
• Application of a lter to an image, e.g., to:
• Blur.
• Sharpen.
• Detect edges.
• Etc.
23
fi
Convolution
• Based on an element-wise multiplication of two matrices,
followed by a sum.
0 -1 0 93 139 101
-1 5 -1 ⋆ 26 252 196 =
0 -1 0 135 230 18
Note: mathematically,
(0 x 93) + (-1 x 139) + (0 x 101) this operation is
called cross-
+(-1 x 26) + (5 x 252) + (-1 x 196) correlation, but the
computer vision uses
+ (0 x 135) + (-1 x 230) + (0 x 18) the term convolution
≈ 669
for it.
24
Convolution
• Operation between a “large” matrix (image) and a “small” matrix ( lter,
a.k.a., kernel) that produces another matrix (image).
• The kernel slides over the image from left to right, top to bottom.
0 -1 0 93 139 101 4 3 5 4 3 5 4
-1 5 -1 669 196 3
26 252 2 5 3 2 5 3
0 -1 0 135 230 18 2 4 3 2 4 3 2
0 0 0 0 0 0 0 0 0 0 0 0
0 -1 0 0 93 139 101 4 3 5 4 3 5 4 0
-1 5 -1 0 26 252 196 3 2 5 3 2 5 3 0
0 -1 0 0 135 230 18 2 4 3 2 4 3 2 0
0 1 3 5 4 3 5 4 3 5 4 0
0 7 2 5 3 2 5 3 2 5 3 0
0 1 4 3 2 4 3 2 4 3 2 0
0 1 3 5 4 3 5 4 3 5 4 0
Note: other forms of padding
exist, like replicating the values at 0 7 2 5 3 2 5 3 2 5 3 0
the borders.
0 1 4 3 2 4 3 2 4 3 2 0
0 1 3 5 4 3 5 4 3 5 4 0
0 0 0 0 0 0 0 0 0 0 0 0
26
Intensities Outside Range
[0,255]
• Clip to the nearest allowed value, e.g.:
• -1 becomes 0.
• 300 becomes 255.
• Rescale, e.g.:
• values between -1 and 300 are rescaled to the range
{0,…,255}.
old_value − old_min
new_value = 255 ×
old_max − old_min
• E.g.: rescale value of 150 considering that the values are
between -1 and 300.
150 - (-1)
= 255 x ≈ 128
300 - (-1) 27
Demo
• image-fundamentals.ipynb
• Applying convolutions
28
Examples of Kernels
Sharpen Blur
0 -1 0 1/9 1/9 1/9
Edge
Laplacian Sobel X Sobel Y
Detection
-1 -1 -1 0 1 0 -1 0 1 -1 -2 -1
-1 8 -1 1 -4 1 -2 0 2 0 0 0
-1 -1 -1 0 1 0 -1 0 1 1 2 1
29
Kernels and Computer
Vision Problems
• Kernels were typically designed to create features to be given
as inputs to machine learning algorithms.
30
Summary
• Images are represented as arrays storing their pixels.
• The width is represented by the columns.
• The height by the rows.
• The channels by the depth.
31
fi
References
• Deep Learning for Computer Vision with Python, Chp 3 and
Chp 11 until 11.1.4 (inclusive).
• If you are keen on programming, you can also read Section
11.1.5.
• If you are not keen on programming, you can still read this
section, but you don’t need to pay attention at the coding
part.
• No matter if you read or don’t read this section, you need to
understand the fact that we are sliding the kernel over the
image and may need padding, as explained in this lecture.
32