Image Processing Masterclass with Python: 50+ Solutions and Techniques Solving Complex Digital Image Processing Challenges Using Numpy, Scipy, Pytorch and Keras (English Edition)
By Sandipan Dey
()
About this ebook
Next, the book focuses on solving problems based on Sampling, Convolution, Discrete Fourier transform, Frequency domain filtering and image restoration with deconvolution. It also aims at solving Image enhancement problems
using different algorithms such as spatial filters and create a super resolution image using SRGAN.
Finally, it explores popular facial image processing problems and solves them with Machine learning and Deep learning models using popular python ML / DL libraries.
Related to Image Processing Masterclass with Python
Related ebooks
Elements of Deep Learning for Computer Vision: Explore Deep Neural Network Architectures, PyTorch, Object Detection Algorithms, and Computer Vision Applications for Python Coders (English Edition) Rating: 0 out of 5 stars0 ratingsmatplotlib Plotting Cookbook Rating: 5 out of 5 stars5/5Interactive Applications Using Matplotlib Rating: 0 out of 5 stars0 ratingsMatplotlib for Python Developers Rating: 3 out of 5 stars3/5Advanced Machine Learning with Python Rating: 0 out of 5 stars0 ratingsNumpy Simply In Depth Rating: 5 out of 5 stars5/5NumPy: Beginner's Guide - Third Edition Rating: 4 out of 5 stars4/5Python Data Analysis Cookbook Rating: 4 out of 5 stars4/5Python Data Visualization Cookbook Rating: 4 out of 5 stars4/5NumPy Cookbook Rating: 5 out of 5 stars5/5Hands-on Supervised Learning with Python Rating: 0 out of 5 stars0 ratingsMastering matplotlib Rating: 0 out of 5 stars0 ratingsMastering TensorFlow 2.x: Implement Powerful Neural Nets across Structured, Unstructured datasets and Time Series Data Rating: 0 out of 5 stars0 ratingsNumPy Beginner's Guide Rating: 5 out of 5 stars5/5Getting Started with Python Data Analysis Rating: 0 out of 5 stars0 ratingsHands-On Data Analysis with Pandas: Efficiently perform data collection, wrangling, analysis, and visualization using Python Rating: 0 out of 5 stars0 ratingsLearning NumPy Array Rating: 0 out of 5 stars0 ratingsPython Data Science Essentials Rating: 0 out of 5 stars0 ratingsDeep Learning with Keras Rating: 4 out of 5 stars4/5Scientific Computing with Python 3 Rating: 0 out of 5 stars0 ratingsData Visualization with Python: Exploring Matplotlib, Seaborn, and Bokeh for Interactive Visualizations (English Edition) Rating: 0 out of 5 stars0 ratingsOpenCV with Python By Example Rating: 5 out of 5 stars5/5
Software Development & Engineering For You
Coding All-in-One For Dummies Rating: 0 out of 5 stars0 ratingsHand Lettering on the iPad with Procreate: Ideas and Lessons for Modern and Vintage Lettering Rating: 4 out of 5 stars4/5Learn to Code. Get a Job. The Ultimate Guide to Learning and Getting Hired as a Developer. Rating: 5 out of 5 stars5/5The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers Rating: 4 out of 5 stars4/5SQL For Dummies Rating: 0 out of 5 stars0 ratingsBeginning Programming For Dummies Rating: 4 out of 5 stars4/5Python For Dummies Rating: 4 out of 5 stars4/5How to Write Effective Emails at Work Rating: 4 out of 5 stars4/5Level Up! The Guide to Great Video Game Design Rating: 4 out of 5 stars4/5Coding with AI For Dummies Rating: 1 out of 5 stars1/5Photoshop For Beginners: Learn Adobe Photoshop cs5 Basics With Tutorials Rating: 0 out of 5 stars0 ratingsAgile Practice Guide Rating: 4 out of 5 stars4/5OneNote: The Ultimate Guide on How to Use Microsoft OneNote for Getting Things Done Rating: 1 out of 5 stars1/5Data Visualization: a successful design process Rating: 4 out of 5 stars4/53D Printing For Dummies Rating: 4 out of 5 stars4/5Android App Development For Dummies Rating: 0 out of 5 stars0 ratingsThe Inmates Are Running the Asylum (Review and Analysis of Cooper's Book) Rating: 4 out of 5 stars4/5Adobe Illustrator CC For Dummies Rating: 5 out of 5 stars5/5Kodi Made Easy: Complete Beginners Step by Step Guide on How to Install Kodi on Amazon Firestick Rating: 0 out of 5 stars0 ratingsTeach Yourself VISUALLY iPhone 16 Rating: 0 out of 5 stars0 ratingsGray Hat Hacking the Ethical Hacker's Rating: 5 out of 5 stars5/5Fundamentals of Software Engineering: Designed to provide an insight into the software engineering concepts Rating: 0 out of 5 stars0 ratingsGoogle SketchUp Pro 8 step by step Rating: 0 out of 5 stars0 ratingsGit Essentials Rating: 4 out of 5 stars4/5How Do I Do That In InDesign? Rating: 5 out of 5 stars5/5PYTHON: Practical Python Programming For Beginners & Experts With Hands-on Project Rating: 5 out of 5 stars5/5
Reviews for Image Processing Masterclass with Python
0 ratings0 reviews
Book preview
Image Processing Masterclass with Python - Sandipan Dey
CHAPTER 1
Basic Image and Video Processing
Introduction
Image processing refers to the automatic processing, manipulation, analysis, and interpretation of images using algorithms and codes on a computer. Video processing refers to a special case of image processing that often employs video filters and where the input and output signals are video files or video streams. Image and video processing have applications in many disciplines and fields in science and technology such as television, photography, robotics, remote sensing, medical diagnosis (CT scan/X-Ray/MRI), and industrial inspection. Social networking sites such as Facebook and Instagram, which we have got used to in our daily lives and where we upload tons of images/videos every day, are typical examples of the industries that need to use/innovate many image/video processing algorithms to process the images/videos we upload.
In this chapter, we shall solve a few initial image and video processing problems that will help us understand the basic concepts of image and video processing. Before we start processing/analysing an image/video, we need to be able to load the image into memory using a suitable data structure and also be able to save the processed image/video back to the disk. It is also important to be able to visualize (plot) the image on the computer screen (to see the impact of an image processing algorithm on an image immediately). Often an image/a video needs to be pre-processed before it can be used in some complex image/video processing algorithms (such as classification or segmentation that you will get to know more in the later chapters); some transformation/manipulation techniques (such as resizing/cropping/changing brightness and contrast) are very useful. Similarly, as a post-processing step, we may need to apply some image/video manipulation/transformation techniques to get back the desired output. With image transformation and manipulation, we can also enhance the appearance of an image (for example, by applying a filter).
In this chapter, you are going to learn how to use different Python libraries (numpy, scipy, scikit-image, opencv-python, and matplotlib) for basic image/video processing, manipulation, and transformation. We shall start by displaying the three channels of an RGB image with 3D visualizations. Next, we shall demonstrate how to capture a video from a camera and extract frames. Then, we shall show how to implement Instagram-like Gotham filter. Finally, we shall explore the following few problems on image manipulations and see how to solve them using python libraries:
Plot image montage, crop/resize images, and draw contours
Convert PNG image with a palette to grayscale
Rotate an image and convert RGB to YUV color space (using scikit-image, PIL, python-opencv, and scipy.ndimage/misc)
Structure
This chapter is organized as follows:
Objectives
Problems
Display RGB image color channels in 3D
Video I/O
Read/write video files
Capture video from camera and extract frames with OpenCV-Python
Implement Instagram-like Gotham filter
Explore image manipulations (using scikit-image, PIL, python-opencv, and scipy ndimage/misc)
Plot image montage with scikit-image
Crop/resize images with SciPy ndimage module
Draw contours with OpenCV-Python
Counting objects in an image
Convert a PNG image with a palette to grayscale with PIL
Different ways to convert an RGB image to grayscale
Rotating an image with scipy.ndimage
Image differences with PIL
Converting RGB to HSV and YUV color spaces with scikit-image
Resizing an image with OpenCV-Python
Add a logo to an image with scikit-image
Change brightness/contrast of an image with linear transformation and gamma correction with OpenCV-Python
Detecting colors and changing colors of objects with OpenCV-Python
Object removal with seam carving
Creating fake miniature effect
Summary
Questions
Key terms
References
Objectives
After studying this Chapter, you should be able to:
Understand the image/video storage and data structures in python
Do image/video file I/O in python using different libraries
Write python code to do basic image/video manipulations
Problems
Display RGB image color channels in 3D
It is very useful to be able to conceptualize an image as a function and visualize it to understand what it is and then do further analysis/processing. A grayscale image can be thought of a 2-D function f(x, y) of the pixel locations (x, y), a function that maps each pixel into its corresponding grey level (for example, an integer in [0,255] or equivalently a floating-point number in [0,1]), that is:
f : (x, y) → R
For an RGB image, there are three such functions that can be denoted as:
fR (x, y), fG (x. y) and fB(x. y)
which is corresponding to each of the channels R, G, and B, respectively. The library matplotlib’s 3-D plot functions can be used to plot each of these functions. The following Python code shows how to plot the RGB channels separately in 3D.
The following are the steps you need to follow:
First, start by importing all the required packages by using the following code. For reading an image, we need the imread() function from the scikit-image library’s io module. For array operations, we need numpy (as an image is loaded as a ndarray). For displaying an image, we shall use matplotlib.pylab module functions. For 3D plotting, we need mpl_toolkit library’s mplot3d module. The rest of the modules from the library matplotlib are also required for plotting. To display an image with matplotlib inside a notebook, we need to use %matplotlib inline - this is used only for the displaying purpose (not interactive/zoom-able).
# comment the next line only if you are not running this code from jupyter notebook
%matplotlib inline
from skimage.io import imread
import numpy as np
import matplotlib.pylab as plt
from mpl_toolkits.mplot3d import Axes3D
from matplotlib import cm
from matplotlib.ticker import LinearLocator, FormatStrFormatter
Next, let us implement a function named plot_3d() that plots the pixel values for a channel. It uses the plot_surface() function, which is the key function for 3D plotting. From Matplotlib’s documentation, we can find the following about this function:
Axes3D.plot_surface(X, Y, Z, *args, **kwargs)
Create a surface plot.
As can be seen from the following code snippet, the Y- and Z-axes are used to show the horizontal and vertical axes (on), respectively, and the X-axis is used to show the depth of the image. Note that X, Y, and Z must be of the same dimensions. The cmap is the color map used to show the different values of pixels as follows:
def plot_3d(X, Y, Z, cmap='Reds', title="):
"
This function plots 3D visualization of a channel
It displays (x, y, f(x,y)) for all x,y values
"
fig = plt.figure(figsi ze=(15,15))
ax = fig.gca(projection='3d')
surf = ax.plot_surface(X, Y, Z, cmap=cmap, linewidth=0, antialiased=False, rstrid e=2, cstride=2, alpha=0.5)
ax.xaxis.set_major_locator(LinearLocator(10))
ax.xaxis.set_major_formatter(FormatStrFormatter('%.02f'))
ax.view_init(elev=10., azim=5)
ax.set_title(title, size=20)
plt.show()
Let us first read the Lena RGB image from the disk and load it in memory using the scikit-image library’s io module’s imread() function; the description of the function is shown as follows:
skimage.io.imread(fname, as_gray=False, plugin=None, flatten=None, **plugin_args)
Load an image from file.
im = imread('images/Img_01_01.jpg')
Then, use Numpy's arange() and meshgrid() functions to create a 2D-grid of pixel coordinates (X,Y) as follows:
Y = np.arange(im.shape[0])
X = np.arange(im.shape[1])
X, Y = np.meshgrid(X, Y)
Finally, assign the red, green, and blue channels of the image to the variables Z1, Z2, and Z3, respectively. These channels are displayed in 3D using the plot_3d() function as follows:
Z1 = im[…,0]
Z2 = im[…,1]
Z3 = im[…,2]
Now, let us visualize the image in 3D. The following code block shows how to visualize the color channels of the Lena RGB image with the preceding function.
You need to use the Z-axis to be the depth axis, and the Y-axis values are subtracted from the height of the image, just to shift the coordinate from left-top to left-center (otherwise the image will appear upside-down).
Use the function plot_3d() to visualize the red color channel first as follows:
# plot 3D visualizations of the R, G, B channels of the image respectively
plot_3d(Z1, X, im.shape[1]-Y, cmap='Reds', title='3D plot for the Red Channel')
The following image shows the 3D plot for the red channel:
Use the function plot_3d() again, this time to visualize the green color channel of the input Lena image as follows:
plot_3d(Z2, X, im.shape[1]-Y, cmap='Greens', title='3D plot for the Green Channel')
The following image shows the 3D plot for the green channel:
Finally, visualize the blue color channel as follows:
plot_3d(Z3, X, im.shape[1]-Y, cmap='Blues', title='3D plot for the Blue Channel')
The following image shows the 3D plot for blue channel:
As you can see from the preceding figures, with the depth of colors in each channel (red, green, and blue), the 3D plots look like the original 2D image. Now, it is left as an exercise to you to search in the scikit-image documentation for the function to save an image to disk.
Video I/O
It is very useful to understand what a video is, how to do the video I/O, and visualize specific frames to do further analysis/processing. A video is a series of images (also called frames), played in sequence at a specified frame rate (for example, fps). Hence, if you add another dimension (that is, a sequence of time instances when the images will be played), you get the videos.
In this section, we shall demonstrate how to do the video I/O using Python library functions. First, to read/write a video, we shall use scikit-video library’s io module’s functions FFmpegReader()/FFmpegWriter(), and also we shall display some frames extracted from the video. Next, to read an image from the camera, we are going to use opencv-python library’s VideoCapture() function.
Read/write video files with scikit-video
In this problem, we shall first learn how to load a video from the disk using scikit-video library functions. This library uses the FFmpeg software for video I/O under the hood, so the code block demonstrated in this section will only work if first FFmpeg is installed and then scikit-video is installed, so that scikit-video finds the FFmpeg installation (refer to https://github.com/AlexEMG/DeepLabCut/issues/36). You need to follow the following steps for performing video I/O.
Let us start by importing all the required packages by using the following code snippet:
import skvideo.io
import numpy as np
import matplotlib.pylab as plt
The following code snippet shows how to read a video file (part of a trailer of the movie Spider-Man 3 (2007)) from the disk using the FFmpegReader() function and display a few frames (images) from the video randomly. The relevant part of the function FFmpegReader() from the documentation is shown as follows:
skvideo.io.FFmpegReader(*args, **kwargs)
Reads frames using FFmpeg
# set keys and values for parameters in ffmpeg
inputparameters = {}
outputparameters = {}
reader = skvideo.io.FFmpegReader('images/Vid_01_01.mp4',
inputdict=inputparameters,
outputdict=outputparameters)
Also, use the method getShape() (along with the object returned by the FFmpegReader() function) to get the number of frames, height, width, and number of channels of the video as follows:
## Read video file
num_frames, height, width, num_channels = reader.getShape()
print(num_frames, height, width, num_channels)
# 600 916 1920 3
Now, use the nextFrame() method (which yields frames using a python generator) to read the frames from the video by using the following code block.
Choose four frames randomly (with NumPy’s random.choice() function) and display those frames only as follows:
plt.figure(figsize=(20,10))
# iterate through the frames and display a few frames
frame_list = np.random.choice(num_frames, 4)
i, j = 0, 1
for frame in reader.nextFrame():
if i in frame_list:
plt.subplot(2,2,j)
plt.imshow(frame)
plt.title(Frame {}
.format(i), size=20)
plt.axis('off')
j += 1
i += 1
plt.show()
The video was taken from youtube trailer for the spiderman 3 (2007) like this one: https://www.youtube.com/watch?v=wPosLpgMtTY (the exact video that I used couple of years back, I can't find on youtube now)
Binary image processing is often one of the major tasks of an image-processing system (for example, morphological image processing algorithms generally need a binary input image to start with).
To compute a binary image (that is, an image with only two distinct grey-level values, for example, black and white), the simplest way is to use a threshold (above which all pixels will be white, and below which all pixels will be black).
The following code block shows how the frames from the preceding video can be thresholded (using the threshold_otsu() function from scikit-image’s filter module, we shall describe this function in detail in the segmentation chapter of the next part of the book; for the time being, let us assume it is a blackbox function that turns a grayscale image into a binary image).
Apply thresholding on each color channel to obtain a binary frame from an image frame.
Use scikit-image.io’s FFmpegWriter() function to save the binary video by accumulating the binary frames sequentially in the same order as shown in the following code snippet.
from skimage.color import rgb2gray
from skimage.filters import threshold_otsu
writer = skvideo.io.FFmpegWriter(images/spiderman_binary.mp4
, outputdict={})
for frame in skvideo.io.vreader(images/Vid_01_01.mp4
):
frame = rgb2gray(frame)
thresh = threshold_otsu(frame)
binary = np.zeros((frame.shape[0], frame.shape[1], 3), dtype=np.uint8)
binary[…,0] = binary[…,1] = binary[…,2] = 255*(frame > thresh).astype(np.uint8)
writer.writeFrame(binary)
writer.close()
Now, read the binary video you just saved using the following code snippet and then display a few random frames (as you did last time) as follows:
plt.figure(figsize=(20,10))
# iterate through the frames and display a few frames
reader = skvideo.io.FFmpegReader(images/spiderman_binary.mp4
)
num_frames, height, width, num_channels = reader.getShape()
frame_list = np.random.choice(num_frames, 4)
i, j = 0, 1
for frame in reader.nextFrame():
if i in frame_list:
plt.subplot(2,2,j)
plt.imshow(frame)
plt.title(Frame {}
.format(i), size=20)
plt.axis('off')
j += 1
i += 1
plt.show()
Capture video from camera and extract frames with OpenCV-Python
In this problem, you will learn how to capture video and extract frames using the library cv2 (opencv-python). This time we shall capture video (live stream) recorded with a camera (for example, the in-built webcam of a laptop).
Follow the following steps:
First import the required libraries
If you are using Jupyter notebook, use %matplotlib notebook this time, to get a zoom-able and resize-able notebook, the best one to work interactively as follows:
# comment the next line only if you are not running this code from jupyter notebook
# %matplotlib notebook
import cv2
import matplotlib.pyplot as plt
As explained in OpenCV documentation, to capture a video, we need to create a VideoCapture object. Its argument can be either the device index or the name of a video file.
Device index is just the number to specify which camera. Normally one camera is connected to the computer, so simply passing a 0 as a parameter works (We can select the second camera by passing 1 and so on).
You can check whether the VideoCapture object is initialized properly or not with the isOpened() method (check whether it returns true or not). If it returns true, then we can read the very first frame (and all subsequent frames) with the function read() as shown in the following code block.
The read() function is the most convenient method for capturing data from the device, and it returns the just grabbed frame. If no frames have been grabbed (camera has been disconnected, or there are no more frames in video file), the method returns false, and the function returns an empty image.
In the following code snippet, the Boolean variable is_capturing holds whether or not a frame could be grabbed as follows:
vc = cv2.VideoCapture(0)
plt.ion()
if vc.isOpened(): # try to get the first frame
is_capturing, frame = vc.read()
webcam_preview = plt.imshow(cv2.cvtColor(frame, cv2.COLOR_BGR2RGB))
else:
is_capturing = False
Once the first frame is read properly, we can capture frame-by-frame within a while loop, with the condition whether a frame can still be captured.
The following code block shows how to capture the first ten frames.
In the end, don’t forget to call the release() function on the VideoCapture object.
Also note that OpenCV uses BGR color format, and to display the frame with real RGB color, we must use the transformation function cv2.cvtColor(frame, cv2.COLOR_BGR2RGB) as follows:
# capture 10 frames
frame_index = 1
while is_capturing:
if frame_index > 10: break
is_capturing, frame = vc.read()
image = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB) # makes the blues image look real colored
webcam_preview.set_data(image)
plt.title('Frame {0:d} '.format(frame_index))
plt.draw()
frame_index += 1
try: # Avoids a NotImplementedError caused by 'plt.pause'
plt.pause(2)
except Exception:
pass
vc.release()
If the camera device attached to your computer is working, you should see the image of yourself captured in the frames when you run the preceding code snippet.
The cv2.VideoCapture() function can also be used to read a video file from the disk, and cv2.VideoWriter() can be used to save a video file to the disk. Explore these functions on your own.
Implement Instagram-like Gotham filter
In this section, you will learn to implement a filter like the one Instagram uses to enhance the images uploaded to the site. The following figure shows the input image that we want to enhance by implementing an Instagram-like filter :
The Gotham filter
The Gotham filter is computed as follows (the steps taken from https://www.practicepython.org/blog/2016/12/20/instagram-filters-python.html, applying the following operations on an image: the corresponding python code and input and output images are shown along with the operations (with the following input image).
Let us start by importing the required libraries. In this problem, we shall use the PIL library for image processing functions as follows:
from PIL import Image
import numpy as np
import matplotlib.pylab as plt
im = Image.open('images/Img_01_03.jpg') # assumed pixel values in [0,255]
print(np.max(im))
# 255
255
The Gotham filter has the following steps to be implemented:
First, a mid-tone red contrast boost needs to be applied on the input image, which is done with the following python code using numpy's interp() function, which is used to implement channel interpolation. Let us first understand how the NumPy interpolation works for the 1-D case. The following code snippet illustrates the concept.
Interpolation with NumPy interp() function
From the NumPy documentation, we get the following about the interp() function:
numpy.interp(x, xp, fp, left=None, right=None, period=None)
One-dimensional linear interpolation. Returns the one-dimensional piecewise linear interpolant to a function with given discrete data points, evaluated at x.
Let us say we want to (linearly) interpolate the values of the cosine function in the interval [0, π], starting with the actual values of the function provided to us only at ten reference points in the interval. We can use the interp() function to compute the value for the function at remaining points, starting with the values of the function at the given points and then by applying linear interpolation. The following code shows how to do it. The orange piecewise-linear curve shows the one estimated by interp() function, and the green curve shows the original cosine curve. As can be seen, the interp() function computed descent estimates for the values of the function at the new points:
# reference points
x_p = np.linspace(0, 2*np.pi, 10) # generate sequence of 10 points (numbers) evenly spaced
in the interval [0, 2π]
# true values at reference points
y_p = np.cos(x_p)
# test points
x = np.linspace(0, 2*np.pi, 50) # generate sequence of 50 test points (numbers) evenly
spaced in the interval [0, 2π]
# true values at all points
y = np.cos(x)
# interpolated values at all test points
y_interp = np.interp(x, x_p, y_p)
# now plot
plt.figure(figsize=(20,10))
plt.plot(x_p, y_p, 'o', label='reference points')
plt.plot(x, y_interp, '-x', label='interpolated')
plt.plot(x, y, '--', label='true')
plt.legend(prop={'size': 16})
plt.show()
Consider the following graph:
The preceding concept can be used in a similar way to compute channel interpolation values for the R (red) channel using the interp() function as required to be done in the first step of the implementation. The red channel values of an image are essentially a 2D array (matrix), so before you can apply the function on the channel, you need to do the following:
First, flatten the 2D array into a 1D array (using NumPy’s ravel() function)
Then, apply the channel interpolation with the interp() function, and
Finally, reshape the 1D array back to the image matrix