Discover millions of audiobooks, ebooks, and so much more with a free trial

From $11.99/month after trial. Cancel anytime.

Image Processing Masterclass with Python: 50+ Solutions and Techniques Solving Complex Digital Image Processing Challenges Using Numpy, Scipy, Pytorch and Keras (English Edition)
Image Processing Masterclass with Python: 50+ Solutions and Techniques Solving Complex Digital Image Processing Challenges Using Numpy, Scipy, Pytorch and Keras (English Edition)
Image Processing Masterclass with Python: 50+ Solutions and Techniques Solving Complex Digital Image Processing Challenges Using Numpy, Scipy, Pytorch and Keras (English Edition)
Ebook646 pages3 hours

Image Processing Masterclass with Python: 50+ Solutions and Techniques Solving Complex Digital Image Processing Challenges Using Numpy, Scipy, Pytorch and Keras (English Edition)

Rating: 0 out of 5 stars

()

Read preview

About this ebook

This book starts with basic Image Processing and manipulation problems and demonstrates how to solve them with popular Python libraries and modules. It then concentrates on problems based on Geometric image transformations and problems to be solved with Image hashing.

Next, the book focuses on solving problems based on Sampling, Convolution, Discrete Fourier transform, Frequency domain filtering and image restoration with deconvolution. It also aims at solving Image enhancement problems
using different algorithms such as spatial filters and create a super resolution image using SRGAN.
Finally, it explores popular facial image processing problems and solves them with Machine learning and Deep learning models using popular python ML / DL libraries.
LanguageEnglish
Release dateMar 10, 2021
ISBN9789389898651
Image Processing Masterclass with Python: 50+ Solutions and Techniques Solving Complex Digital Image Processing Challenges Using Numpy, Scipy, Pytorch and Keras (English Edition)

Related to Image Processing Masterclass with Python

Related ebooks

Software Development & Engineering For You

View More

Reviews for Image Processing Masterclass with Python

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Image Processing Masterclass with Python - Sandipan Dey

    CHAPTER 1

    Basic Image and Video Processing

    Introduction

    Image processing refers to the automatic processing, manipulation, analysis, and interpretation of images using algorithms and codes on a computer. Video processing refers to a special case of image processing that often employs video filters and where the input and output signals are video files or video streams. Image and video processing have applications in many disciplines and fields in science and technology such as television, photography, robotics, remote sensing, medical diagnosis (CT scan/X-Ray/MRI), and industrial inspection. Social networking sites such as Facebook and Instagram, which we have got used to in our daily lives and where we upload tons of images/videos every day, are typical examples of the industries that need to use/innovate many image/video processing algorithms to process the images/videos we upload.

    In this chapter, we shall solve a few initial image and video processing problems that will help us understand the basic concepts of image and video processing. Before we start processing/analysing an image/video, we need to be able to load the image into memory using a suitable data structure and also be able to save the processed image/video back to the disk. It is also important to be able to visualize (plot) the image on the computer screen (to see the impact of an image processing algorithm on an image immediately). Often an image/a video needs to be pre-processed before it can be used in some complex image/video processing algorithms (such as classification or segmentation that you will get to know more in the later chapters); some transformation/manipulation techniques (such as resizing/cropping/changing brightness and contrast) are very useful. Similarly, as a post-processing step, we may need to apply some image/video manipulation/transformation techniques to get back the desired output. With image transformation and manipulation, we can also enhance the appearance of an image (for example, by applying a filter).

    In this chapter, you are going to learn how to use different Python libraries (numpy, scipy, scikit-image, opencv-python, and matplotlib) for basic image/video processing, manipulation, and transformation. We shall start by displaying the three channels of an RGB image with 3D visualizations. Next, we shall demonstrate how to capture a video from a camera and extract frames. Then, we shall show how to implement Instagram-like Gotham filter. Finally, we shall explore the following few problems on image manipulations and see how to solve them using python libraries:

    Plot image montage, crop/resize images, and draw contours

    Convert PNG image with a palette to grayscale

    Rotate an image and convert RGB to YUV color space (using scikit-image, PIL, python-opencv, and scipy.ndimage/misc)

    Structure

    This chapter is organized as follows:

    Objectives

    Problems

    Display RGB image color channels in 3D

    Video I/O

    Read/write video files

    Capture video from camera and extract frames with OpenCV-Python

    Implement Instagram-like Gotham filter

    Explore image manipulations (using scikit-image, PIL, python-opencv, and scipy ndimage/misc)

    Plot image montage with scikit-image

    Crop/resize images with SciPy ndimage module

    Draw contours with OpenCV-Python

    Counting objects in an image

    Convert a PNG image with a palette to grayscale with PIL

    Different ways to convert an RGB image to grayscale

    Rotating an image with scipy.ndimage

    Image differences with PIL

    Converting RGB to HSV and YUV color spaces with scikit-image

    Resizing an image with OpenCV-Python

    Add a logo to an image with scikit-image

    Change brightness/contrast of an image with linear transformation and gamma correction with OpenCV-Python

    Detecting colors and changing colors of objects with OpenCV-Python

    Object removal with seam carving

    Creating fake miniature effect

    Summary

    Questions

    Key terms

    References

    Objectives

    After studying this Chapter, you should be able to:

    Understand the image/video storage and data structures in python

    Do image/video file I/O in python using different libraries

    Write python code to do basic image/video manipulations

    Problems

    Display RGB image color channels in 3D

    It is very useful to be able to conceptualize an image as a function and visualize it to understand what it is and then do further analysis/processing. A grayscale image can be thought of a 2-D function f(x, y) of the pixel locations (x, y), a function that maps each pixel into its corresponding grey level (for example, an integer in [0,255] or equivalently a floating-point number in [0,1]), that is:

    f : (x, y) R

    For an RGB image, there are three such functions that can be denoted as:

    fR (x, y), fG (x. y) and fB(x. y)

    which is corresponding to each of the channels R, G, and B, respectively. The library matplotlib’s 3-D plot functions can be used to plot each of these functions. The following Python code shows how to plot the RGB channels separately in 3D.

    The following are the steps you need to follow:

    First, start by importing all the required packages by using the following code. For reading an image, we need the imread() function from the scikit-image library’s io module. For array operations, we need numpy (as an image is loaded as a ndarray). For displaying an image, we shall use matplotlib.pylab module functions. For 3D plotting, we need mpl_toolkit library’s mplot3d module. The rest of the modules from the library matplotlib are also required for plotting. To display an image with matplotlib inside a notebook, we need to use %matplotlib inline - this is used only for the displaying purpose (not interactive/zoom-able).

    # comment the next line only if you are not running this code from jupyter notebook

    %matplotlib inline

    from skimage.io import imread

    import numpy as np

    import matplotlib.pylab as plt

    from mpl_toolkits.mplot3d import Axes3D

    from matplotlib import cm

    from matplotlib.ticker import LinearLocator, FormatStrFormatter

    Next, let us implement a function named plot_3d() that plots the pixel values for a channel. It uses the plot_surface() function, which is the key function for 3D plotting. From Matplotlib’s documentation, we can find the following about this function:

    Axes3D.plot_surface(X, Y, Z, *args, **kwargs)

    Create a surface plot.

    As can be seen from the following code snippet, the Y- and Z-axes are used to show the horizontal and vertical axes (on), respectively, and the X-axis is used to show the depth of the image. Note that X, Y, and Z must be of the same dimensions. The cmap is the color map used to show the different values of pixels as follows:

    def plot_3d(X, Y, Z, cmap='Reds', title="):

    "

    This function plots 3D visualization of a channel

    It displays (x, y, f(x,y)) for all x,y values

    "

    fig = plt.figure(figsi ze=(15,15))

    ax = fig.gca(projection='3d')

    surf = ax.plot_surface(X, Y, Z, cmap=cmap, linewidth=0, antialiased=False, rstrid e=2, cstride=2, alpha=0.5)

    ax.xaxis.set_major_locator(LinearLocator(10))

    ax.xaxis.set_major_formatter(FormatStrFormatter('%.02f'))

    ax.view_init(elev=10., azim=5)

    ax.set_title(title, size=20)

    plt.show()

    Let us first read the Lena RGB image from the disk and load it in memory using the scikit-image library’s io module’s imread() function; the description of the function is shown as follows:

    skimage.io.imread(fname, as_gray=False, plugin=None, flatten=None, **plugin_args)

    Load an image from file.

    im = imread('images/Img_01_01.jpg')

    Then, use Numpy's arange() and meshgrid() functions to create a 2D-grid of pixel coordinates (X,Y) as follows:

    Y = np.arange(im.shape[0])

    X = np.arange(im.shape[1])

    X, Y = np.meshgrid(X, Y)

    Finally, assign the red, green, and blue channels of the image to the variables Z1, Z2, and Z3, respectively. These channels are displayed in 3D using the plot_3d() function as follows:

    Z1 = im[…,0]

    Z2 = im[…,1]

    Z3 = im[…,2]

    Now, let us visualize the image in 3D. The following code block shows how to visualize the color channels of the Lena RGB image with the preceding function.

    You need to use the Z-axis to be the depth axis, and the Y-axis values are subtracted from the height of the image, just to shift the coordinate from left-top to left-center (otherwise the image will appear upside-down).

    Use the function plot_3d() to visualize the red color channel first as follows:

    # plot 3D visualizations of the R, G, B channels of the image respectively

    plot_3d(Z1, X, im.shape[1]-Y, cmap='Reds', title='3D plot for the Red Channel')

    The following image shows the 3D plot for the red channel:

    Use the function plot_3d() again, this time to visualize the green color channel of the input Lena image as follows:

    plot_3d(Z2, X, im.shape[1]-Y, cmap='Greens', title='3D plot for the Green Channel')

    The following image shows the 3D plot for the green channel:

    Finally, visualize the blue color channel as follows:

    plot_3d(Z3, X, im.shape[1]-Y, cmap='Blues', title='3D plot for the Blue Channel')

    The following image shows the 3D plot for blue channel:

    As you can see from the preceding figures, with the depth of colors in each channel (red, green, and blue), the 3D plots look like the original 2D image. Now, it is left as an exercise to you to search in the scikit-image documentation for the function to save an image to disk.

    Video I/O

    It is very useful to understand what a video is, how to do the video I/O, and visualize specific frames to do further analysis/processing. A video is a series of images (also called frames), played in sequence at a specified frame rate (for example, fps). Hence, if you add another dimension (that is, a sequence of time instances when the images will be played), you get the videos.

    In this section, we shall demonstrate how to do the video I/O using Python library functions. First, to read/write a video, we shall use scikit-video library’s io module’s functions FFmpegReader()/FFmpegWriter(), and also we shall display some frames extracted from the video. Next, to read an image from the camera, we are going to use opencv-python library’s VideoCapture() function.

    Read/write video files with scikit-video

    In this problem, we shall first learn how to load a video from the disk using scikit-video library functions. This library uses the FFmpeg software for video I/O under the hood, so the code block demonstrated in this section will only work if first FFmpeg is installed and then scikit-video is installed, so that scikit-video finds the FFmpeg installation (refer to https://github.com/AlexEMG/DeepLabCut/issues/36). You need to follow the following steps for performing video I/O.

    Let us start by importing all the required packages by using the following code snippet:

    import skvideo.io

    import numpy as np

    import matplotlib.pylab as plt

    The following code snippet shows how to read a video file (part of a trailer of the movie Spider-Man 3 (2007)) from the disk using the FFmpegReader() function and display a few frames (images) from the video randomly. The relevant part of the function FFmpegReader() from the documentation is shown as follows:

    skvideo.io.FFmpegReader(*args, **kwargs)

    Reads frames using FFmpeg

    # set keys and values for parameters in ffmpeg

    inputparameters = {}

    outputparameters = {}

    reader = skvideo.io.FFmpegReader('images/Vid_01_01.mp4',

    inputdict=inputparameters,

    outputdict=outputparameters)

    Also, use the method getShape() (along with the object returned by the FFmpegReader() function) to get the number of frames, height, width, and number of channels of the video as follows:

    ## Read video file

    num_frames, height, width, num_channels = reader.getShape()

    print(num_frames, height, width, num_channels)

    # 600 916 1920 3

    Now, use the nextFrame() method (which yields frames using a python generator) to read the frames from the video by using the following code block.

    Choose four frames randomly (with NumPy’s random.choice() function) and display those frames only as follows:

    plt.figure(figsize=(20,10))

    # iterate through the frames and display a few frames

    frame_list = np.random.choice(num_frames, 4)

    i, j = 0, 1

    for frame in reader.nextFrame():

    if i in frame_list:

    plt.subplot(2,2,j)

    plt.imshow(frame)

    plt.title(Frame {}.format(i), size=20)

    plt.axis('off')

    j += 1

    i += 1

    plt.show()

    The video was taken from youtube trailer for the spiderman 3 (2007) like this one: https://www.youtube.com/watch?v=wPosLpgMtTY (the exact video that I used couple of years back, I can't find on youtube now)

    Binary image processing is often one of the major tasks of an image-processing system (for example, morphological image processing algorithms generally need a binary input image to start with).

    To compute a binary image (that is, an image with only two distinct grey-level values, for example, black and white), the simplest way is to use a threshold (above which all pixels will be white, and below which all pixels will be black).

    The following code block shows how the frames from the preceding video can be thresholded (using the threshold_otsu() function from scikit-image’s filter module, we shall describe this function in detail in the segmentation chapter of the next part of the book; for the time being, let us assume it is a blackbox function that turns a grayscale image into a binary image).

    Apply thresholding on each color channel to obtain a binary frame from an image frame.

    Use scikit-image.io’s FFmpegWriter() function to save the binary video by accumulating the binary frames sequentially in the same order as shown in the following code snippet.

    from skimage.color import rgb2gray

    from skimage.filters import threshold_otsu

    writer = skvideo.io.FFmpegWriter(images/spiderman_binary.mp4, outputdict={})

    for frame in skvideo.io.vreader(images/Vid_01_01.mp4):

    frame = rgb2gray(frame)

    thresh = threshold_otsu(frame)

    binary = np.zeros((frame.shape[0], frame.shape[1], 3), dtype=np.uint8)

    binary[…,0] = binary[…,1] = binary[…,2] = 255*(frame > thresh).astype(np.uint8)

    writer.writeFrame(binary)

    writer.close()

    Now, read the binary video you just saved using the following code snippet and then display a few random frames (as you did last time) as follows:

    plt.figure(figsize=(20,10))

    # iterate through the frames and display a few frames

    reader = skvideo.io.FFmpegReader(images/spiderman_binary.mp4)

    num_frames, height, width, num_channels = reader.getShape()

    frame_list = np.random.choice(num_frames, 4)

    i, j = 0, 1

    for frame in reader.nextFrame():

    if i in frame_list:

    plt.subplot(2,2,j)

    plt.imshow(frame)

    plt.title(Frame {}.format(i), size=20)

    plt.axis('off')

    j += 1

    i += 1

    plt.show()

    Capture video from camera and extract frames with OpenCV-Python

    In this problem, you will learn how to capture video and extract frames using the library cv2 (opencv-python). This time we shall capture video (live stream) recorded with a camera (for example, the in-built webcam of a laptop).

    Follow the following steps:

    First import the required libraries

    If you are using Jupyter notebook, use %matplotlib notebook this time, to get a zoom-able and resize-able notebook, the best one to work interactively as follows:

    # comment the next line only if you are not running this code from jupyter notebook

    # %matplotlib notebook

    import cv2

    import matplotlib.pyplot as plt

    As explained in OpenCV documentation, to capture a video, we need to create a VideoCapture object. Its argument can be either the device index or the name of a video file.

    Device index is just the number to specify which camera. Normally one camera is connected to the computer, so simply passing a 0 as a parameter works (We can select the second camera by passing 1 and so on).

    You can check whether the VideoCapture object is initialized properly or not with the isOpened() method (check whether it returns true or not). If it returns true, then we can read the very first frame (and all subsequent frames) with the function read() as shown in the following code block.

    The read() function is the most convenient method for capturing data from the device, and it returns the just grabbed frame. If no frames have been grabbed (camera has been disconnected, or there are no more frames in video file), the method returns false, and the function returns an empty image.

    In the following code snippet, the Boolean variable is_capturing holds whether or not a frame could be grabbed as follows:

    vc = cv2.VideoCapture(0)

    plt.ion()

    if vc.isOpened(): # try to get the first frame

    is_capturing, frame = vc.read()

    webcam_preview = plt.imshow(cv2.cvtColor(frame, cv2.COLOR_BGR2RGB))

    else:

    is_capturing = False

    Once the first frame is read properly, we can capture frame-by-frame within a while loop, with the condition whether a frame can still be captured.

    The following code block shows how to capture the first ten frames.

    In the end, don’t forget to call the release() function on the VideoCapture object.

    Also note that OpenCV uses BGR color format, and to display the frame with real RGB color, we must use the transformation function cv2.cvtColor(frame, cv2.COLOR_BGR2RGB) as follows:

    # capture 10 frames

    frame_index = 1

    while is_capturing:

    if frame_index > 10: break

    is_capturing, frame = vc.read()

    image = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB) # makes the blues image look real colored

    webcam_preview.set_data(image)

    plt.title('Frame {0:d} '.format(frame_index))

    plt.draw()

    frame_index += 1

    try: # Avoids a NotImplementedError caused by 'plt.pause'

    plt.pause(2)

    except Exception:

    pass

    vc.release()

    If the camera device attached to your computer is working, you should see the image of yourself captured in the frames when you run the preceding code snippet.

    The cv2.VideoCapture() function can also be used to read a video file from the disk, and cv2.VideoWriter() can be used to save a video file to the disk. Explore these functions on your own.

    Implement Instagram-like Gotham filter

    In this section, you will learn to implement a filter like the one Instagram uses to enhance the images uploaded to the site. The following figure shows the input image that we want to enhance by implementing an Instagram-like filter :

    The Gotham filter

    The Gotham filter is computed as follows (the steps taken from https://www.practicepython.org/blog/2016/12/20/instagram-filters-python.html, applying the following operations on an image: the corresponding python code and input and output images are shown along with the operations (with the following input image).

    Let us start by importing the required libraries. In this problem, we shall use the PIL library for image processing functions as follows:

    from PIL import Image

    import numpy as np

    import matplotlib.pylab as plt

    im = Image.open('images/Img_01_03.jpg') # assumed pixel values in [0,255]

    print(np.max(im))

    # 255

    255

    The Gotham filter has the following steps to be implemented:

    First, a mid-tone red contrast boost needs to be applied on the input image, which is done with the following python code using numpy's interp() function, which is used to implement channel interpolation. Let us first understand how the NumPy interpolation works for the 1-D case. The following code snippet illustrates the concept.

    Interpolation with NumPy interp() function

    From the NumPy documentation, we get the following about the interp() function:

    numpy.interp(x, xp, fp, left=None, right=None, period=None)

    One-dimensional linear interpolation. Returns the one-dimensional piecewise linear interpolant to a function with given discrete data points, evaluated at x.

    Let us say we want to (linearly) interpolate the values of the cosine function in the interval [0, π], starting with the actual values of the function provided to us only at ten reference points in the interval. We can use the interp() function to compute the value for the function at remaining points, starting with the values of the function at the given points and then by applying linear interpolation. The following code shows how to do it. The orange piecewise-linear curve shows the one estimated by interp() function, and the green curve shows the original cosine curve. As can be seen, the interp() function computed descent estimates for the values of the function at the new points:

    # reference points

    x_p = np.linspace(0, 2*np.pi, 10) # generate sequence of 10 points (numbers) evenly spaced

    in the interval [0, 2π]

    # true values at reference points

    y_p = np.cos(x_p)

    # test points

    x = np.linspace(0, 2*np.pi, 50) # generate sequence of 50 test points (numbers) evenly

    spaced in the interval [0, 2π]

    # true values at all points

    y = np.cos(x)

    # interpolated values at all test points

    y_interp = np.interp(x, x_p, y_p)

    # now plot

    plt.figure(figsize=(20,10))

    plt.plot(x_p, y_p, 'o', label='reference points')

    plt.plot(x, y_interp, '-x', label='interpolated')

    plt.plot(x, y, '--', label='true')

    plt.legend(prop={'size': 16})

    plt.show()

    Consider the following graph:

    The preceding concept can be used in a similar way to compute channel interpolation values for the R (red) channel using the interp() function as required to be done in the first step of the implementation. The red channel values of an image are essentially a 2D array (matrix), so before you can apply the function on the channel, you need to do the following:

    First, flatten the 2D array into a 1D array (using NumPy’s ravel() function)

    Then, apply the channel interpolation with the interp() function, and

    Finally, reshape the 1D array back to the image matrix

    Enjoying the preview?
    Page 1 of 1