Report_mini_project_2
Report_mini_project_2
The dataset consists of two sets of images: Set A contains sharp images, while Set B comprises the
same images with varying levels of blur introduced through Gaussian filters of different kernel sizes and
sigma values. Set A serves as the ground truth for evaluation, while Set B is used as input for deblurring.
Initially, images from Set A are downscaled to (256,448) resolution for uniform processing. This was
done using python and the resized images were saved locally. Set B is generated by applying Gaussian
filters with kernel sizes of 3x3, 7x7, and 11x11, and corresponding sigma values of 0.3, 1, and 1.6, respec-
tively. This diverse dataset provides a challenging training environment.
The objective is to design a neural network capable of deblurring images from Set B to resemble their
sharp counterparts in Set A. However, model complexity is restricted to 15 million parameters to ensure
efficiency.
2 Assumption
Resized images have height and width as 256 and 448 respectively (rather than 448 and 256) to preserve
aspect ratio.
3 Pre-Processing
3.1 Data Compression
Initially running the models on personal laptop could not take place due to slow GPU and limited RAM.
So, it was necessary to shift to Google Colab. The dataset was of size 32 GB and could not be processed
on Google Colab due to limited memory. The compressed dataset was of size about 4.5 GB for sharp
images and filtered images. Thus, a total memory of 18 GB would be required if all images were to be
1
uploded which is not feasible.
To solve this, I explored the sub folders. There were 240 subfolders and each sub-folder contained 100
image frames which were taken with a very small time-gap. Out of these 100 images, 1’st, 40’th and 80’th
frames were chosen. Thus, only 3 out of 100 images were taken actually for training.
4 Models Tried
In this section, we discuss the various models explored for the image deblurring task. We experimented
with different architectures to find the most suitable one for our problem.
4.2 U-Net
Next, we experimented with the U-Net architecture, which is widely used for image segmentation tasks.
U-Net consists of an encoder-decoder structure with skip connections between corresponding encoder
and decoder layers. These skip connections allow the network to preserve spatial information during
upsampling. Despite its success in segmentation tasks, we found that U-Net struggled to deblurr images
effectively.
U-Net seemed to be a good approach but it was not much succesful. I investigated the structure of
U-Net and found that they contain several Up-Sampling layers, some close to output layer. This could
have been a cause of blurred output as Up-Sampling the image causes blurring and it seems difficult to
deblur an image in 1 Convolutional Layer.
Thus, to tackle this problem, a few only convolutional layers were added before the output. As the
input had gone through multiple convolution, pooling and upsampling, it was no longer a simple repre-
sentation of input image but rather a representation of its features.
2
This meant we need to introduce the initial image once again so that the information in image is used and
this was, our output is guaranteed to be at-least as good as the input. This was done through concate-
nation. Input Image and output of first convolution layer was concatenated just before the output layer.
Finally, this archotecture worked!
Input
Output
Conv2D
Conv2D layer just before Output
MaxPool
Concatenate + Input
Conv2D
MaxPool Conv2DTranspose
Conv2D Conv2D
MaxPool Concatenate
Conv2D Conv2DTranspose
Conv2D Conv2D
3
Table 1: Model: model
Layer (type) Output Shape Param # Connected to
InputLayer (None, 256, 448, 3) 0 []
Conv2D (None, 256, 448, 64) 1792 [’input 1[0][0]’]
Conv2D (None, 256, 448, 64) 36928 [’conv2d[0][0]’]
MaxPooling2D (None, 128, 224, 64) 0 [’conv2d 1[0][0]’]
Conv2D (None, 128, 224, 128) 73856 [’max pooling2d[0][0]’]
Conv2D (None, 128, 224, 128) 147584 [’conv2d 2[0][0]’]
MaxPooling2D (None, 64, 112, 128) 0 [’conv2d 3[0][0]’]
Conv2D (None, 64, 112, 256) 295168 [’max pooling2d 1[0][0]’]
Conv2D (None, 64, 112, 256) 590080 [’conv2d 4[0][0]’]
MaxPooling2D (None, 32, 56, 256) 0 [’conv2d 5[0][0]’]
Conv2D (None, 32, 56, 512) 1180160 [’max pooling2d 2[0][0]’]
Conv2D (None, 32, 56, 512) 2359808 [’conv2d 6[0][0]’]
Conv2D (None, 32, 56, 512) 2359808 [’conv2d 7[0][0]’]
Conv2DTranspose (None, 64, 112, 256) 524544 [’conv2d 8[0][0]’]
Concatenate (None, 64, 112, 512) 0 [’conv2d tr.[0][0]’, ’conv2d 5[0][0]’]
Conv2D (None, 64, 112, 256) 1179904 [’concatenate[0][0]’]
Conv2D (None, 64, 112, 256) 590080 [’conv2d 9[0][0]’]
Conv2DTranspose (None, 128, 224, 128) 131200 [’conv2d 10[0][0]’]
Concatenate (None, 128, 224, 256) 0 [’conv2d tr. 1[0][0]’, ’conv2d 3[0][0]’]
Conv2D (None, 128, 224, 128) 295040 [’concatenate 1[0][0]’]
Conv2D (None, 128, 224, 128) 147584 [’conv2d 11[0][0]’]
Conv2D (None, 128, 224, 128) 147584 [’conv2d 12[0][0]’]
Conv2DTranspose (None, 256, 448, 64) 32832 [’conv2d 13[0][0]’]
Concatenate (None, 256, 448, 128) 0 [’conv2d tr. 2[0][0]’, ’conv2d 1[0][0]’]
Conv2D (None, 256, 448, 64) 73792 [’concatenate 2[0][0]’]
Conv2D (None, 256, 448, 64) 36928 [’conv2d 14[0][0]’]
Conv2D (None, 256, 448, 64) 36928 [’conv2d 15[0][0]’]
Conv2D (None, 256, 448, 64) 36928 [’conv2d 16[0][0]’]
Conv2D (None, 256, 448, 64) 36928 [’conv2d 17[0][0]’]
Conv2D (None, 256, 448, 64) 36928 [’conv2d 18[0][0]’]
Conv2D (None, 256, 448, 64) 36928 [’conv2d 19[0][0]’]
Concatenate (None, 256, 448, 67) 0 [’conv2d 20[0][0]’, ’input 1[0][0]’]
Conv2D (None, 256, 448, 128) 77312 [’concatenate 3[0][0]’]
Conv2D (None, 256, 448, 128) 147584 [’conv2d 21[0][0]’]
Conv2D (None, 256, 448, 3) 387 [’conv2d 22[0][0]’]
Total params 10,577,667 (40.35 MB)
4
6 Training Details
Given Batch Size 10, the data loader returns 10 random blurred and their corresponding sharp images.
As there were 3 kernels given, images are chosen corresponding to a kernel randomly and uniformly. Out
of the 240 sub-folders, the sub-folder is also selected randomly and uniformly. From the provided code,
the following training details are utilized:
• Batch Size: 10
• Epochs: 40
• Loss Function: Mean Squared Error (MSE)
• Optimizer: Adam
• Steps per Epoch: 64 (Number of gradient steps taken per epoch)
Callbacks:
• ModelCheckpoint: Used to save the best model during training.
Training Procedure:
from tensorflow.keras.callbacks import ModelCheckpoint
import os
batch_size = 10
epochs = 40
checkpoint = ModelCheckpoint(’/content’, verbose=1, save_best_only=True)
model = create_unet_model2()
model.compile(optimizer=’adam’, loss=’mean_squared_error’)
# loss_plotter = LossPlotter()
print(model.summary())
model.fit(dataloader(blurred_image_dir_1, blurred_image_dir_2, blurred_image_dir_3,
downscaled_image_dir, batch_size),
batch_size=batch_size,
epochs=epochs,
steps_per_epoch=64,
callbacks=[checkpoint])
5
7 Training Curves
6
8 Results
8.1 Qualitative Results
8.1.1 Some Examples
7
8.2 Quantitative Results
8.2.1 Kernel 1
The Average PSNR value is although higher than kernel 2 and 3, it is lower than PSNR of the blurred
image. This is because the kernel size and sigma for the Gaussian are very small. The PSNR formula is
very sensitive for closely related images and is thus unreliable in this range.
8.2.2 Kernel 2
8
• Average PSNR of Predicted Images: 35.88
The results are very good in this category. There is a significant improvement in PSNR value.
8.2.3 Kernel 3
The results are decent in this category. There is a good improvement in PSNR value.
9
Figure 7: Prediction and Blurred Image PSNR for all Test Data
10