Simulating Weather Conditions On Digital Images - Final
Simulating Weather Conditions On Digital Images - Final
Faculty of Informatics
Supervisor Candidate
Andras Hajdu Ghais Zaher
Professor, PhD Computer Science MSc
Debrecen
2020
Dedication
iii
Acknowledgements
The code related to this research for both Foggy-CycleGAN and Image Num-
ber Annotator projects is implemented by me, Ghais Zaher* .
I would like to pay my special thanks to Mohammad Pouldoust‡ for his as-
sistance and consultation regarding Deep Learning issues.
iv
Abstract
v
Table of Contents
List of Figures ix
1 Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2.1 Generative Adversarial Networks (GANs) . . . . . . . . . . . . . . . . 1
1.2.2 Unpaired Image-to-Image Translation . . . . . . . . . . . . . . . . . . 3
1.2.3 Fog Synthesis Using Mathematical Models . . . . . . . . . . . . . . . 5
1.2.4 Fog Synthesis Using Generative Models . . . . . . . . . . . . . . . . . 6
1.2.5 State-of-the-Art Summary . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2 Methodology 9
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 Dataset Preparation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2.1 Dataset collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2.2 Image Number Annotator . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2.3 Train-Test Split . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3 Foggy-CycleGAN Model Formulation . . . . . . . . . . . . . . . . . . . . . . 12
2.3.1 Model Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.3.2 Losses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.3.3 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.4 Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.4.1 Environment Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.4.2 Training Details and Results . . . . . . . . . . . . . . . . . . . . . . . 24
2.5 Tests and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3 Discussion 35
3.1 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.2 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.2.1 Resolution and Artifacts . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.2.2 Different Kind of Images . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.3 Future Improvements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
Bibliography 39
vii
List of Figures
ix
Chapter 1
Introduction
1.1 Motivation
Numerous weather conditions can impact the driver’s vision making it difficult to see some ob-
jects on the road, or at least recognize them. Fog is one important phenomenon that can disturb
vision while driving [2], scattering the light and making further objects fuzzier. Autonomous
Driving systems face the same problem, experiments showed that object detection recall de-
creases with higher fog density [3]. Several research has been made to detect objects in foggy
conditions, either by defogging the image first [4–6], or by applying different object detection
methods on the hazy image [7].
Several approaches tried to simulate fog’s behaviour and augment it on clear images in order
to improve object detection and recognition systems in such weather conditions [8–10], study
fog impact on these systems [11, 12] or for other purposes [13, 14], as will be shown in the next
section. Most of them focused on the mathematical model that explains how fog is generated in
terms of physics, and try to project it on a fog-free photo. Such process is helpful to build up a
new dataset of annotated foggy images out of clear ones, which can be used as a training set in
an object detection system to increase its accuracy relying on artificial hazy images. Such model
will later be used on real photo frames taken during fog existence, which makes it important for
the synthesized images to look as much as like real foggy ones as possible, as this would imply
better results when real foggy images are fed to the trained model.
In this work, we will present an approach to simulate fog on clear images using a generative
model based on GANs [15] and Convolutional Neural Networks. Our main motive is to improve
existing Autonomous Driving systems’ performance during bad weather conditions with a con-
centration on fog. Hence, we will focus in our study on images that are mainly taken from a
moving vehicle or that contain similar information serving our purpose, in addition to providing
more realistic foggy image simulation.
1
1.2. Related work
data generation. The discriminator D tries to discriminate between real and fake samples. G is
trained to fool D by improving the generated samples to make them look more like real ones,
while D is thus getting better at distinguishing the fake samples generated by G from the real
ones.
Real
Dataset Sample
D Real/Fake
Generated
z ∈ Z G Sample Discriminiator
Random Input
Generator
Where:
• Ex is the expected value for real samples
• D(x) is the discriminator’s output of the probability that the real instance x is real
• D(G(z)) is the discriminator’s output of the probability that the generated instance G(z)
is real
GAN applications focused on image generation using Deep Convolutional neural networks,
namely Deep Convolutional GAN or DCGAN [16, 17]. In such applications, the dataset will
consist of images that belong to one specific category (e.g. faces, landscapes, digits, etc.). The
Generator will learn to generate new images belonging to this category given a random input.
Many improvements have been made to the traditional GAN network. For example, Style-
GAN [18, 19] maps the input to an intermediate space, before being fed to the generator at each
convolutional layer.
Conditional GAN (CGAN) [20] is another variation of GAN where G and D are provided
with a condition y, and the objective function is modified to the following:
2
1. Introduction
We can see from Figure 1.2 that a condition y is provided for every item in the dataset. The
discriminator decides whether a sample is real or fake based on the sample itself and the con-
dition. Pix2pix [21] is a good demonstration of CGAN, it tries to generate images based on
provided information, for example, generating a handbag image based on an image of its edges,
or generating a night photo based on photo taken during daytime.
Real
Dataset Sample
D Real/Fake
Generated
G Sample
z ∈ Z
GANPaint [22] is another application of CGAN, it provides a tool, available online, allowing
the user to provide new labels on a part of a given image, and the network would generate a new
image modifying the newly labeled parts in a convenient way.
Video generation is another interesting application of GAN, such as vid2vid [23] which
provides a model that generates video from a given sequence of video frames, for example, a
sequence of semantic segmentation of masks. DVD-GAN (Dual Video Discriminator GAN)
[24] and BigGAN [25] introduced models to produce videos on large-scale datasets. Face2Face
[26, 27] transmits the facial expressions from one video to another containing a different person
in real-time.
While the main focus of GAN and its variations has been on image and video generation,
other applications have been introduced, such as 3D models generation [28] and audio synthesis
[29, 30].
3
1.2. Related work
clear photo might contain different objects than the foggy one, and thus we will be training the
generator not just to add fog to the images, but also to change its content.
CycleGAN [1] introduced an unpaired Image-to-Image translation method, in which the
dataset consists of two sets of images (X,Y ), but no relation is defined between an image from
the first X set and any of the images from the second Y . CycleGAN tries to imagine what
an image from the one collection looks like if it was translated to the other. For example, to
know what a photo would look like if it was painted by Monet, we need to train a CycleGAN by
providing a set of real photos X, and an unrelated set of Monet paintings Y . The model itself
will learn the process and the its reverse, i.e. it would also learn to convert a Monet painting to
a real photo.
CycleGAN’s structure is illustrated in Figure 1.3, the training process consists of two stages.
x̂ ∈ X̂ F
ˆ
ŷ ∈ Y DY Real/Fake
Cycle-Consistency Loss
x ∈ X G
DX Real/Fake
(a)
x̂ ∈ X̂ F y ∈ Y
DY Real/Fake
Cycle-Consistency Loss
ˆ
G ŷ ∈ Y
DX Real/Fake
(b)
Figure 1.3: CycleGAN Structure. The model contains two generators G : X → Y and F :
Y → X, and two discriminators DX and DY . DY will encourage G to generate outputs that
look more like real images from Y , and DX will encourage F to generate outputs that look more
like real images from X.
4
1. Introduction
In the first step shown in Figure 1.3a, an image x from domain X is translated to an image from
domain Y by the generator G, the result ŷ is converted back to an image x̂ from the domain X by
the generator F . Both discriminators in this stage are being trained, the fake image ŷ is passed
to DY , and the real one that we started with x is passed to DX . The final result x̂ is supposed
to look exactly like x, thus a new loss is introduced here in addition to GAN’s adversarial loss
previously introduced in subsection 1.2.1 called “Cycle-Consistency Loss”:
The second phase shown in Figure 1.3b is the exact reverse of the first one, starting with an
image y from domain Y , converting it to the domain X and back to domain Y , resulting in a
“Backward Cycle-Consistency Loss”:
The research shows several applications of CycleGAN with intriguing results. For example,
converting horses to zebras, apples to oranges, summer to winter and vice-versa.
The previous equation is defined on the three RGB channels on a 2D image. (x, y) is an image
coordinate. I is the foggy image. J is the fog-free image. A is the atmospheric light, in the
day-time, it’s mostly considered equal to 255 for each of the RGB colors [8, 13], making the
day light equal to (255,255,255), i.e. white. In other cases, this value is estimated from the
given clear image. t(x, y) is the transmission map, it expresses the amount of light that survived
without being scattered at the coordinate (x, y) of the image. It is related to the depth map
by Equation 1.3, where d(x, y) is the distance of the object at coordinate (x, y) and β is the
attenuation coefficient that controls fog thickness:
β is calculated using Equation 1.4. C = 3.912 in [10] and C = ln(20) in [8, 9, 11, 12]. Rm is
the maximum visibility distance.
C
β= (1.4)
Rm
In order to simulate the foggy image I from the clear one J, we need to estimate the airlight
A and the transmission map t, which can be derived from the depth map d. Following this
5
1.2. Related work
methodology, [14] is estimating a distance-altitude map from a given image using several steps
and an interactive graph-cut algorithm, allowing the user to mark certain parts as “objects”. The
research is also allowing to change the color of the produced fog, as shown in Figure 1.4.
Same approach is followed in [13], with an additional pseudo-random factor generated as
noise, following the idea that fog takes irregular shapes due to wind and air turbulence. In [11],
FROSI dataset is built, which contains a set of artificial fog-free images, and for each image,
6 other foggy images with different visibility distances are generated. The research aimed to
study the impact of fog presence on traffic sign detection, similar to [12].
Cityscapes dataset [31] is used in [8] to synthesize fog and generate a new dataset “Foggy
Cityscapes”, this data was used in the research to train a CNN model for driving assistance.
Based on the same model, [9] proposes methods for fog detection, removal and synthesis.
Cycle-Defog2Refog [34] also uses CycleGAN as a main structure. In the proposed model,
the generator Defog-net responsible for fog removal out of a foggy image is a straightforward
CNN that accepts an image as input and produces a clear image. While Refog-net generator
that adds fog to the clear image works differently, it relies on the mathematical model (Equa-
tion 1.2). Refog-net generates the transmission map t from the clear image using a CNN. The
6
1. Introduction
atmospheric light A is estimated using a sky prior; the sky region is segmented in the foggy im-
age, then A is equal to the average pixel values of this region, as shown in Equation 1.5. After
that, the atmospheric degradation model (Equation 1.2) is applied to generate the foggy image.
c
Asky = mean (Isky (x)) (1.5)
c∈{r,g,b}
After fog removal using Defog-net or fog addition using Refog-net, the image is processed
by E-D-Net and E-R-Net respectively, these CNNs are used to enhance the generated image
removing artifacts and improving its quality.
1.3 Outline
In this thesis, we will present the methodology we followed in order to generate fog on clear
images. First we will introduce the datasets used in our work, then we will describe our network
model including its structure and objective function. After that, we will show the training process
of our model. Next, we show and evaluate the results of our method. We end our thesis with
a discussion about the contribution we made to fog generation research, the limitations of our
proposed method, what future work can be made to improve it, and a final conclusion.
7
Chapter 2
Methodology
2.1 Introduction
We have seen several methods that are used to synthesize fog on clear images, in our methodol-
ogy, we are trying to build a generative model, namely “Foggy-CycleGAN”, that has a similar
structure to CycleGAN [1] to perform this task. The model aims to add a specific amount (rate)
of fog to a clear image, since CycleGAN structure is followed, our model, which consists of two
pairs of generators and discriminators, achieves both tasks: simulating fog on clear images and
converting foggy images to clear ones.
In order to train our model, we need a dataset of both clear and foggy images. Furthermore,
the model’s ability to add a specific amount of fog to a clear image requires our foggy images
to have this information. This value will be referred to as the fog ‘Intensity’ in this document.
In other research [8] such information is obtained from the Visibility Distance, but for the sake
of simplicity, we only provide an estimated percentage of the fog in the image, represented by a
number between 0 and 1.
In this chapter, we will cover the dataset preparation process, introduce our Foggy-CycleGAN
model, explain its training process and see its results on test images.
9
2.2. Dataset Preparation
Set [35] contains thousands of outdoor images, the majority of them are taken in foggy con-
ditions, 1753 of these images were selected to be part of the dataset. Figure 2.1 shows some
samples of the chosen images in each dataset.
Cityscapes (a)
SFSU (b)
RESIDE (c)
10
2. Methodology
(a) Intensity = 0 (b) Intensity = 0.4 (c) Intensity = 0.6 (d) Intensity = 0.9
contain nothing but clear images, the feature ‘Set All To’ can be used to set all the intensity
values to zero.
Finally, the tool will produce a Comma-Separated Values file that will be saved in the same
folder that contains the images under the name Annotations.csv. This file contains two
columns: ‘Path’ which holds the image file name and ‘Intensity’ which contains the estimated
intensity. When the process of annotating a folder is done, the button ‘Delete All Empty’ is used
to delete all the unannotated images.
11
2.3. Foggy-CycleGAN Model Formulation
train images, resulting in 2677 images for training (1253 clear and 1424 foggy) and 669 images
for testing (313 clear and 356 foggy).
• clear2f og (Clear to Fog Generator): This generator is provided with a clear image and an
intensity value in the range [0, 1]. It is responsible for adding the given intensity amount
of fog to the clear image. The output of clear2f og is a fake foggy image.
clear2f og : Clear → F og
• f og2clear (Fog to Clear Generator): This generator is provided with a foggy image and
its fog intensity. Its responsibility is to estimate what is behind the fog and replace it with
that content, i.e. to remove the fog from the image. The output of f og2clear is a fake
clear image.
f og2clear : F og → Clear
• Dclear (Clear Discriminator) and Df og (Fog Discriminator) differ between real and fake
images. The discriminators expect an image to be given as an input, a clear image for
Dclear and a foggy image for Df og . The output of a discriminator is a 30×30 array of
real numbers, since this output is only important during the training process, these values
were not reduced to one Boolean output, but left as is instead. When a real clear image is
given, Dclear is expected to output a 30×30 array of ones, otherwise the output should be
all zeros. The same applies for Dclear .
Dclear Dfog
clear2f og
Clear F og
f og2clear
12
2. Methodology
Generators Structure
Both clear2f og and f og2clear share the same final structure. Nevertheless, other implemen-
tations of the generator are provided in the final code using ModelsBuilder, as will be seen in
section 2.3.3. The generators use a modified U-Net [36] structure that has two inputs: a 256×256
colored image with three channels (R,G,B) and a real number that represents the Intensity.
The detailed structure can be seen in Figure 2.5 and Figure 2.6. First, the Intensity value
is repeated 256×256 times then added as a fourth layer to the input image, resulting with a 4-
channel image (R,G,B,I), where the I represents the intensity. The remaining part of the network
is similar to U-Net structure; 8 down-sampling blocks that produce a 1 × 1 array with 512
filters, then 7 up-sampling blocks that produce a 128×128 array with 128 filters. Finally a De-
Convlutional layer is used to up-sample the previous result to a 256×256×3 image. Similar to
U-Net, skip connections are used between the down-sampling outputs and concatenated with
the equivalent up-sampling outputs as shown in Figure 2.6.
Input Image
Modified U-Net
Network
652 × 652
652 × 652
652 × 652
3 4 3
Figure 2.5: Generators Structure: The input intensity is repeated 256 × 256 times and concate-
nated to the input image as a fourth channel. The Modified U-Net Network structure is shown
in Figure 2.6.
1. A Convolutional layer with a 4×4 kernel size, a stride size of 2×2, same padding and
random normal kernel initializer ∼ N (0, 0.02). This results in an output that has half the
size of the input. The number of output filters are in order 64, 128 and 256 for the first
three down-sampling blocks, then 512 filters for all the rest.
2. Instance Normalization [37] is applied to the filters, except for the first downsample block,
as suggested and used by Tensorflow implementation of CycleGAN [38].
13
2.3. Foggy-CycleGAN Model Formulation
821 × 821
821 × 821
821 × 821
652 × 652
652 × 652
64 64
4 3
256
2
821 × 821
821 × 821
46
46
128 128 46
512
64 64
2
2
23
23
46
46
1024
2
8
2
2
61
61
61
2
23
23
512 512
61
2
61
8
512 512
512 512
1024
2
2
4
512 512
512 512
1024
2
2
2
4
2
512 512
512 512
2
1
512 512
2
512 512
Figure 2.6: Modified U-Net Network Structure: 8 down-sampling blocks followed by 7 up-
sampling blocks, skip connections are used between down-sampling and up-sampling block
results. Finally, a Deconvolutional layer outputs the final image.
14
2. Methodology
3. Leaky ReLU (Rectified Linear Units) activation with a negative slope of 0.3.
1. A Deconvolutional layer with a 4×4 kernel size, a stride size of 2×2, same padding and
random normal kernel initializer ∼ N (0, 0.02). This results in an output that has double
the size of the input. For each of the first 4 up-sampling blocks, 512 output filters are used.
For the last 3 ones, 256, 128 then 64 filters are used (same filters as in down-sampling
blocks but in reverse order).
3. Dropout is applied to the first three up-sampling blocks during the training process.
4. ReLU activation.
Every up-sampling block output is concatenated to the output of the corresponding down-sam-
pling block (the one that has the same size and filters). The concatenation is done on the last
axis (filters) level as illustrated in Figure 2.6. After the output of the last skip connection be-
tween the first down-sampling and the last up-sampling blocks outputs, the result is passed to
a Deconvolutional layer with a 4×4 kernel size, a stride size of 2×2, same padding, random
normal kernel initializer ∼ N (0, 0.02), 3 filters and hyperbolic tangent activation (tanh). The
last layer produces an image of 256 × 256 and 3 channels, the color values range between -1
and 1 (because of the usage of tanh).
Resize-Convolution Blocks
Another version of clear2f og (namely clear2f og-v2) is implemented, where the modified U-
Net network uses a different structure for the last 4 up-sampling blocks which we will refer to
as Resize-Convolution Blocks. As suggested by [39], instead of using Transposed Convolution
for up-sampling, Resize-Convolution resizes the input filters to twice their original size using
Bilinear Interpolation, then a standard Convolutional layer is used with same padding and 1×1
strides. A comparison between clear2f og and clear2f og-v2 and their results will be shown
later in section 2.5.
Discriminators Structure
Discriminators Dclear and Df og share the same final structure as well. ModelsBuilder also al-
lows to do some tweakings to them. The detailed structure of a discriminator shown in Figure 2.7
consists of the following components:
1. The network’s input is the image without its intensity value, then three down-sampling
blocks produce an image of size 256 filters with a size of 32×32. Those down-sampling
blocks are similar to the ones explained in Generators Structure.
3. A Convolutional layer with a 4×4 kernel size, a stride size of 1×1, no padding, no bias,
random normal kernel initializer ∼ N (0, 0.02) and 512 filters.
15
2.3. Foggy-CycleGAN Model Formulation
Zero Padding
Conv2D 4 × 4, 1 Stride, No Bias, Random Normal Initializer
followed by Instance Normalization & Leaky ReLU
Conv2D 4 × 4, 1 Stride
2
2
2
2
2
13
23
23
03
33
2
43
46
46
821 × 821
821 × 821
128 128
64 64
Figure 2.7: Discriminator Structure: The input image is passed to three down-sampling blocks,
followed by zero padding and Convolutional layers, resulting with a 30×30 filter.
6. Zero Padding
7. A Convolutional layer with a 4×4 kernel size, a stride size of 1×1, no padding, random
normal kernel initializer ∼ N (0, 0.02) and one filter. The final output is then one filter of
size 30×30.
2.3.2 Losses
Definitions
The following terms will be used to express the losses:
16
2. Methodology
Adversarial Loss
Similar to the standard one introduced in [15]:
LGAN (clear2f og, Df og , C, F, I) = Ef [log Df og (f, if )]+Ec [log(1−Df og (clear2f og(c, ic ), ic ))]
LGAN (f og2clear, Dclear , F, C, I) = Ec [log Dclear (c)] + Ef [log(1 − Dclear (f og2clear(f, if )))]
Cycle-Consistency Loss
As introduced in [1]:
Identity Loss
Passing a foggy image to clear2f og and passing a clear image to f og2clear should always
generate the same image:
17
2.3. Foggy-CycleGAN Model Formulation
Whitening Loss
Whitening loss along with RGB Ratio loss, make sure that any pixel in the generated fog image
is either whiter or the same as the original. This loss ensures that the image’s color is not changed
towards a darker one, it is only whitened. This is done by using the ReLU function (ReLU(x) =
max(x, 0)):
Lwhite = Ec [∥ReLU(c − clear2f og(c, ic ))∥1 ]
Where r, g, b are the channels of the input image c and r̂, ĝ, b̂ are the channels of the generated
image clear2f og(c, ic ).
Full Objective
The full objective function to be minimized:
2.3.3 Implementation
The implementation was essentially done using Python 3, Tensorflow 2.1.0 and Numpy. Addi-
tionally, the following libraries were used:
• Jupyter is used to write a Notebook where the model is trained and tested.
18
2. Methodology
• Pandas is used to read the annotation files produced by Image Number Annotator intro-
duced in subsection 2.2.2 and prepare the train and test datasets.
The code is divided to multiple parts to make it dynamic to create, change, train and test the
models.
Dataset Initializer
The class DatasetInitializer in dataset.py is responsible for preparing the dataset. Es-
sentially, the function prepare_dataset does all the important work by calling other methods.
Generally, the following steps are done by prepare_dataset:
1. Reading the annotation files recursively from the dataset folder, by searching for all files
named Annotations.csv. The files are read using Pandas into one DataFrame, where
each row contains the full path of an image and its fog intensity.
2. The dataframe is split into two: clear images that have zero intensity and foggy images
that have positive intensity.
3. Each of the two dataframes is shuffled and split into train and test parts, as specified in
subsection 2.2.3.
4. An additional DataFrame is set similarly for the Sample images, those images consist of
one clear image, and 9 foggy ones with intensities between 0.1 and 0.9.
5. In the end, the following generators are returned:
• train clear generator: shuffles the training clear dataframe, then iterates through its
rows one by one
• train fog generator: shuffles the training foggy dataframe, then iterates through its
rows one by one
• test clear generator: shuffles the testing clear dataframe, then iterates through its
rows one by one
• test fog generator: shuffles the testing foggy dataframe, then iterates through its
rows one by one
• sample clear generator: iterates through 9 different intensities for the sample clear
images, ranging from 0.1 to 0.9.
• sample fog generator: iterates through the sample foggy images.
6. Each of the previous generators returns a row of an image path and an intensity value.
Images are then read from their path and normalized as follows:
• Image is read from the disk.
• Since the generators use tanh activation, all the images are normalized to the range
[−1, 1].
• Image is resized to 256×256, this is done with another method by maintaining the
image’s aspect ratio.
• Jitter is applied to the training images by resizing them up to 286×286 then cropped
randomly back to 256×256. The image is also randomly flipped.
19
2.3. Foggy-CycleGAN Model Formulation
Models Builder
The class ModelsBuilder is used to build Generators and Discriminators in a generic way, all
the neural networks and layers are implemented using Tensorflow and Keras. The following
methods are the essential parts of the class explained in subsection 2.3.1:
• downsample: Returns a Down-sampling block with the passed number of filters and ker-
nel size.
• upsample: Returns an Up-sampling block with the passed number of filters, kernel size
and whether Dropout is applied or not.
• build_generator: This function takes the parameters that define a generator and builds
up one.
Trainer
Trainer class in train.py is responsible for the whole training process for the Foggy-Cycle-
GAN model. Loss functions, weights update, saving checkpoints and plotting results are all
handled and defined in this class. Trainer uses Adam optimizer [40] to train all the models
with a learning rate of 10−4 , β1 = 0.5 and β2 = 0.999. The class provides the following
methods:
• save_config: Saves the current configuration of the Trainer, from weights path, total
trained epochs and logs paths.
• train_step: This function is responsible for one training step, taking one batch of clear
images, and one batch of foggy ones. The function calculates all the losses explained in
subsection 2.3.2, calculates the gradient and updates the weights with the help of Tensor-
flow.
• epoch_callback: This method is called after every epoch, it plots the result of a random
sample clear and sample fog pair, showing the generators and discriminators results on
them. In addition, it stores such prediction for all the sample images in a specified folder
for logging.
• train: This is the main method for the training, it calls several other methods in order to
perform a given number of epochs. In addition it is responsible for storing Tensorboard
logs.
20
2. Methodology
In train_step, the passed parameter real_clear contains a tuple of clear images along with
random intensities for each image. And the parameter real_fog contains a tuple of foggy
images along with their fog intensity. The losses are calcualted as follows:
• real_clear is converted to fog using clear2f og, and real_clear is converted to clear
using f og2clear.
fake_fog = generator_clear2fog((real_clear, clear_intensity))
fake_clear = generator_fog2clear((real_fog, fog_intensity))
• real_fog and fake_fog are passed to Df og . real_clear and fake_clear are passed
to Dclear . The results are used to calculate Adversarial Loss.
disc_real_clear = discriminator_clear(real_clear)
disc_real_fog = discriminator_fog((real_fog, fog_intensity))
disc_fake_clear = discriminator_clear(fake_clear)
disc_fake_fog = discriminator_fog((fake_fog, clear_intensity))
disc_clear_loss = discriminator_loss(disc_real_clear, disc_fake_clear)
disc_fog_loss = discriminator_loss(disc_real_fog, disc_fake_fog)
Where discriminator_loss calculates the binary crossentropy value between the real
image and an array of ones, and between the fake image and an array of zeros.
• fake_fog image is converted back to clear and fake_clear is converted back to fog to
calculate Cycle-Consistency Loss.
cycled_clear = generator_fog2clear((fake_fog, clear_intensity))
cycled_fog = generator_clear2fog((fake_clear, fog_intensity))
total_cycle_loss = LAMBDA *
tf.reduce_mean(tf.abs(real_clear-cycled_clear)) + LAMBDA *
tf.reduce_mean(tf.abs(real_fog-cycled_fog))
• Identity Loss is calculated by passing the foggy image to clear2f og and the clear one to
f og2clear and expecting the same result.
same_clear = generator_fog2clear((real_clear, clear_intensity))
same_fog = generator_clear2fog((real_fog, fog_intensity))
identity_loss = LAMBDA * tf.reduce_mean(tf.abs(real_image - same_image))
• Transmission Map Loss is calculated by generating a transmission image using the math-
ematical formula.
t = 1 - intensity
trans_image = clear_image * t + (1 - t)
trans_loss1 = LAMBDA * tf.abs(tf.reduce_mean(real_image) -
tf.reduce_mean(trans_image))
The second part is calculated by generating a foggy image from the clear one with both 0
and 1 intensities, the result should be the same image and full white respectively.
21
2.3. Foggy-CycleGAN Model Formulation
• For RGB Ratio Loss, red, green and blue channels are extracted from both clear and
generated images, then the loss is calculated.
r = clear_image[:, :, :, 0]
g = clear_image[:, :, :, 1]
b = clear_image[:, :, :, 2]
r_hat = fake_fog[:, :, :, 0]
g_hat = fake_fog[:, :, :, 1]
b_hat = fake_fog[:, :, :, 1]
rg_loss = tf.reduce_mean(tf.abs(r * g_hat - g * r_hat))
gb_loss = tf.reduce_mean(tf.abs(g * b_hat - b * g_hat))
rgb_loss = LAMBDA * ALPHA * (rg_loss + gb_loss)
Main Notebook
The Jupyter Notebook file Foggy_CycleGAN.ipynb holds the main part that uses the previous
modules and classes to prepare the dataset, build the generators and discriminators, load pre-
saved models if any, set up Tensorboard, start the training process and plot the results.
The dataset is prepared using DatasetInitializer, as shown in Listing 2.1, the parameters
passed to DatasetInitializer are the image height and width, in our case the image size is
256×256. BATCH_SIZE constant represents the batch size for the training process, in our case
its value is 5.
22
2. Methodology
Generators and discriminators are built using ModelsBuilder, as demonstrated in Listing 2.2.
The parameter use_resize_conv specifies whether the generator is built using Resize Con-
volutional layers instead of regular up-sampling blocks. In clear2f og we are using regular
up-sampling blocks passing this value as False, while in clear2f og-v2 it is passed as True.
Training process starts by initializing the Trainer and configuring its checkpoints which loads
the weights if they already exist, as show in Listing 2.3. Calling load_config() will load the
last saved configuration for the trainer, which contains the number of trained epochs, the paths
for saving weights and Tensorboard logs.
23
2.4. Training
Code Availability
The code is publicly available in my GitHub repository github.com/ghaiszaher/Foggy-CycleGAN.
2.4 Training
2.4.1 Environment Setup
GitHub and Google Colaboratory (Colab) are mainly used to host the code and the training pro-
cess. The code is pushed to a GitHub repository, and the main Notebook Foggy_CycleGAN.ipynb
is run on Colab. Google Colaboratory allows us to create and run Jupyter Notebook files on a
Linux virtual machine, the option to run the code on CPU, GPU or TPU (Tensor Processing
Unit) with limited resources and the ability to mount a Google Drive account to the machine.
For this purpose, the dataset files are zipped and uploaded to a Google Drive folder.
The following process happens when the Notebook file is run on Colab:
1. Runtime is set to use GPU resources.
2. Code is pulled from Github
3. Google Drive is mounted to Colab.
4. A shell script copy_dataset.sh is run to copy the compressed dataset files to the ma-
chine and decompress them.
5. Dataset is initialized using DatasetInitializer with a batch size of 5 images.
6. Generators and Discriminators are built using ModelsBuilder
7. Tensorboard is initialized locally and shown in the same Notebook
8. The model is trained for 100 epochs at a time. Weights and logs are all stored in the
mounted Google Drive.
9. When the training is done, testing results are visualized and stored.
24
2. Methodology
clear2f og
real clear image
fog intensity
fake fog2fog
clear2f og image
As previously mentioned, two versions of Clear to Fog generator are implemented and tested
(clear2f og and clear2f og-v2) and thus two versions of Foggy-CycleGAN (Foggy-CycleGAN-
v1 and Foggy-CycleGAN-v2) exist. In the following sections, we will visualize the training
results for both versions.
Foggy-CycleGAN-v1
In this version, clear2f og is used to generate foggy images out of clear ones, the model is
trained for 290 epochs and the whole process took around 4 days. Each epoch lasted for an
average of 500 seconds and thus the total effective time was 40 hours. Figure 2.10 shows the
loss values for all generators and discriminators after each epoch. What is most important for us
is clear2f og’s performance, nevertheless, the other losses are also monitored as they all affect
each other. We can see that the loss value is not consistently decreasing, but each generator is
25
2.4. Training
playing a mini-max game with the discriminator. When clear2f og’s loss decreases, that means
it is getting better at generating foggy images out of clear ones, and thus Df og ’s loss will increase.
When Df og is trained, its loss decreases as it is getting better at differentiating between real and
generated foggy images, making the generator’s loss increase. The training was done for 290
epochs, but the final model is the one at the 130th epoch, as the final result seemed to be better.
Figure 2.10: Loss values for the Generators and Discriminators of Foggy-CycleGAN-v1 for 290
epochs
As image logs were stored along the training process, we are able to see how the models
improved over time. Figure 2.11 shows clear2f og’s output for a sample clear image passing an
intensity value of 0.5. We can clearly see that the model is not always giving better results as
the model is trained more, artifacts start to appear at later epochs.
We can also take a look at f og2clear’s output in Figure 2.12. As expected, the model is
learning to replace foggy parts of the image with an estimated value of what could be there. The
26
2. Methodology
Figure 2.11: Foggy-CycleGAN-v1 epochs outputs for clear2f og generator with required inten-
sity of 0.5
result is clearly very far from the truth, the model is learning how clear images look like, and
trying to place trees or buildings in place of white (foggy) parts of the image. Even though the
result is not reasonable, it is still valid for our purpose.
Figure 2.12: Foggy-CycleGAN-v1 epochs outputs for f og2clear generator with input intensity
of 0.5
Foggy-CycleGAN-v2
In this version of the model, clear2f og-v2 was used, while the rest is similar to the previous
version. This model was trained for 140 epochs and the training process was significantly slower
than the previous one, with an average of 650 seconds per epoch. The model is trained for an
effective time of 25 hours over 2 days. Figure 2.13 shows the loss values for all the models,
we can see that the loss values for the new generator and its equivalent discriminator (Df og ) are
noisier than the ones in the previous version.
The outputs of the generators are also plotted during the training, Figure 2.14 and Figure 2.15
show the outputs of clear2f og-v2 and f og2clear in this version. We notice that the fog in
the new model looks smoother. But the image quality got decreased significantly. While no
difference in the result of f og2clear is noticed.
27
2.5. Tests and Results
Figure 2.13: Loss values for the Generators and Discriminators of Foggy-CycleGAN-v2 for 140
epochs
Figure 2.14: Foggy-CycleGAN-v2 epochs outputs for clear2f og-v2 generator with required
intensity of 0.5
Figure 2.15: Foggy-CycleGAN-v2 epochs outputs for f og2clear generator with input intensity
of 0.5
28
2. Methodology
Clear Intensity=0.15
Clear Intensity=0.45
images are lower than the first version, and the result looks less believable. This can be seen in
Figure 2.22 that compares between the outputs of the two versions of Foggy-CycleGAN.
29
2.5. Tests and Results
Clear Intensity=0.65
30
2. Methodology
Clear Intensity=0.25
Intensity=0.50 Intensity=0.75
31
2.5. Tests and Results
Clear Intensity=0.25
Intensity=0.50 Intensity=0.75
32
2. Methodology
Clear Input
Intensity=0
Intensity=0.2
Intensity=0.4
Intensity=0.6
Intensity=0.8
33
Chapter 3
Discussion
3.1 Contribution
Our proposed method provides a way to synthesize fog on clear images using an unpaired
dataset. While previous methods used mathematical approaches, mostly based on the depth map
of the given image, our method only relies on the information of the fog intensity in a dataset of
unpaired the image. Additionally, we present annotations to previously existing datasets where
each image is tied to a fog intensity estimation, along with Image Number Annotator tool that
was introduced in subsection 2.2.2
3.2 Limitations
Even though the proposed models are generating some reasonable foggy images for lower fog
intensities, the results do not look quite convincing all the time. In addition, some artifacts
appear in the generated images. The usage of Resize-Convolutional blocks helped to overcome
some of these artifacts, but in other types of noise appeared there.
35
3.2. Limitations
36
3. Discussion
• If a dataset is present, we can train the same model on smoggy images and obtain a smog
simulation model. Smog is another weather condition that can distort the visibility and
affect autonomous driving systems, being able to generate such images can help improve
object detection in case of smog presence.
• Driving during foggy weather can be more dangerous at night. In our model, we focused
on day-light images, but if more night foggy images are available, a model can be trained
to synthesize fog on clear images taken at night.
• Even though CycleGAN is powerful, but if a paired dataset exists of foggy and non-
foggy images, we can produce much better images using the same above structures for
clear2f og and Df og and following Pix2pix’s idea [21]. It is difficult to obtain such pic-
tures from the nature, but if an installed fixed camera is recording 24/7 footage, we can
go through the log history and grab some foggy and non-foggy images from it. Also, if
an equipped laboratory is able to generate real fog in a small environment (like a small
city), we will be able to collect such dataset.
• Using the depth information in the images can also be a useful addition that may produce
better results if it is available. Such information will allow us to express the fog intensity
as a visibility distance.
• If more computing resources are available, having a larger and more diverse dataset will
help to train a model that is able to add fog for any kind of images, not just ones taken
from a car or that contains one specific type of information like in our case.
3.4 Conclusion
Fog simulation on digital images is a challenging goal to achieve, the reasons for this challenge
go from the complexity of applying the mathematical model on real images to the difficulty
of obtaining both foggy and clear images of the same content in real life. On the other hand,
the availability of a large dataset of both clear and foggy images allowed us to use a generative
model, following CycleGAN’s approach, to synthesize fog. Instead of going through the trouble
of formatting a mathematical formula, calculating the depth data for the clear image’s pixels and
generating fog using that information, we trained our Foggy-CycleGAN model to figure out how
foggy images look like, how clear images look like and how to convert between these two kinds
of images keeping the same content.
In our study, we annotated a large number of images giving each an estimated real number
that expresses the percentage of fog present in the image. This addition allowed us not only to
37
3.4. Conclusion
convert clear images to hazy ones, but also to specify the intensity of fog we are wishing to add
to the image.
Our implemented generative models proved to be learning to do the task with reasonable
results. Nevertheless, they do not provide the desired outcome for all the kinds of images. In
addition, the generated fog does not look smooth for all values of intensities which shows unde-
sired artifacts in some cases. More improvements can be done in a later research to overcome
these problems, if more powerful resources are available, a more complex model with a larger
dataset can be trained to perform better. Also, if it is possible to generate paired foggy and clear
images using special laboratory equipment, it will be possible to train a modified version of our
model that will have more realistic outcome with less noise.
In conclusion, we can say that generative models proved to be a good way to synthesize
weather conditions on digital images, taking away the trouble of manually building and as-
sessing a mathematical model, and passing that task to the convolutional generative models to
accomplish.
38
Bibliography
[1] Jun-Yan Zhu et al. “Unpaired Image-to-Image Translation using Cycle-Consistent Ad-
versarial Networks”. In: CoRR abs/1703.10593 (2017). arXiv: 1703.10593.
[2] Eric Dumont and Viola Cavallo. “Extended Photometric Model of Fog Effects on Road
Vision”. In: Transportation Research Record 1862 (Jan. 2004), pp. 77–81. DOI: 10 .
3141/1862-09.
[3] Zhaohui Liu et al. “Analysis of the Influence of Foggy Weather Environment on the De-
tection Effect of Machine Vision Obstacles”. In: Sensors 20 (Jan. 2020), p. 349. DOI:
10.3390/s20020349.
[4] Sahil Dhawan and Jagdish Raheja. “Obstacles Detection in Foggy Environment”. In: Aug.
2013.
[5] Sarthak Katyal et al. “Object Detection in Foggy Conditions by Fusion of Saliency Map
and YOLO”. In: Dec. 2018, pp. 154–159. DOI: 10.1109/ICSensT.2018.8603632.
[6] Gurveer Singh and Ashima Singh. “Object Detection in Fog Degraded Images”. In: In-
ternational Journal of Computer Science and Information Security (IJCSIS) 16.8 (2018).
[7] Nan Dong et al. “Adaptive Object Detection and Visibility Improvement in Foggy Im-
age”. In: Journal of Multimedia 6 (Feb. 2011), pp. 14–21. DOI: 10.4304/jmm.6.1.14-
21.
[8] Christos Sakaridis, Dengxin Dai, and Luc Van Gool. “Semantic Foggy Scene Understand-
ing with Synthetic Data”. In: CoRR abs/1708.07819 (2017). arXiv: 1708.07819.
[9] Kyeong Jeong and Byung Song. “Fog Detection and Fog Synthesis for Effective Quan-
titative Evaluation of Fog–detection-and-removal Algorithms”. In: IEIE Transactions on
Smart Processing & Computing 7 (Oct. 2018), pp. 350–360. DOI: 10.5573/IEIESPC.
2018.7.5.350.
[10] C. Sun et al. “An algorithm of imaging simulation of fog with different visibility”.
In: 2015 IEEE International Conference on Information and Automation. Aug. 2015,
pp. 1607–1611. DOI: 10.1109/ICInfA.2015.7279542.
[11] Rachid Belaroussi and Dominique Gruyer. “Impact of Reduced Visibility from Fog on
Traffic Sign Detection”. In: June 2014, pp. 1302–1306. ISBN: 978-1-4799-3638-0. DOI:
10.1109/IVS.2014.6856535.
[12] Thomas Wiesemann and Xiaoyi Jiang. “Fog Augmentation of Road Images for Perfor-
mance Analysis of Traffic Sign Detection Algorithms”. In: Advanced Concepts for In-
telligent Vision Systems. Ed. by Jacques Blanc-Talon et al. Cham: Springer International
Publishing, 2016, pp. 685–697. ISBN: 978-3-319-48680-2.
39
[13] Fan Guo, Jin Tang, and Xiaoming Xiao. “Foggy Scene Rendering Based on Transmission
Map Estimation”. In: International Journal of Computer Games Technology 2014 (Oct.
2014). DOI: 10.1155/2014/308629.
[14] Hochang Lee, J. R. Jang, and Kyunghyun Yoon. “Fog Rendering Using Distance-Altitude
Scattering Model on 2D Images”. In: 2012.
[15] Ian J. Goodfellow et al. Generative Adversarial Networks. 2014. arXiv: 1406.2661.
[16] Alec Radford, Luke Metz, and Soumith Chintala. “Unsupervised Representation Learn-
ing with Deep Convolutional Generative Adversarial Networks”. In: 4th International
Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2-4,
2016, Conference Track Proceedings. Ed. by Yoshua Bengio and Yann LeCun. 2016.
[17] Tero Karras et al. “Progressive Growing of GANs for Improved Quality, Stability, and
Variation”. In: CoRR abs/1710.10196 (2017). arXiv: 1710.10196.
[18] Tero Karras, Samuli Laine, and Timo Aila. “A style-based generator architecture for gen-
erative adversarial networks”. In: Proceedings of the IEEE Conference on Computer Vi-
sion and Pattern Recognition. 2019, pp. 4401–4410.
[19] Tero Karras et al. “Analyzing and Improving the Image Quality of StyleGAN”. In: arXiv
preprint arXiv:1912.04958 (2019).
[20] Mehdi Mirza and Simon Osindero. “Conditional Generative Adversarial Nets”. In: CoRR
abs/1411.1784 (2014). arXiv: 1411.1784.
[21] Phillip Isola et al. “Image-To-Image Translation With Conditional Adversarial Net-
works”. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
July 2017.
[22] David Bau et al. “Semantic Photo Manipulation with a Generative Image Prior”. In: ACM
Transactions on Graphics (Proceedings of ACM SIGGRAPH) 38.4 (2019).
[23] Ting-Chun Wang et al. “Video-to-Video Synthesis”. In: Advances in Neural Information
Processing Systems (NeurIPS). 2018.
[24] Aidan Clark, Jeff Donahue, and Karen Simonyan. “Efficient Video Generation on Com-
plex Datasets”. In: CoRR abs/1907.06571 (2019). arXiv: 1907.06571.
[25] Andrew Brock, Jeff Donahue, and Karen Simonyan. “Large Scale GAN Training for
High Fidelity Natural Image Synthesis”. In: CoRR abs/1809.11096 (2018). arXiv: 1809.
11096.
[26] J. Thies et al. “Real-time Expression Transfer for Facial Reenactment”. In: ACM Trans-
actions on Graphics (TOG) 34.6 (2015).
[27] J. Thies et al. “Face2Face: Real-time Face Capture and Reenactment of RGB Videos”.
In: Proc. Computer Vision and Pattern Recognition (CVPR), IEEE. 2016.
[28] Jiajun Wu et al. “Learning a Probabilistic Latent Space of Object Shapes via 3D
Generative-Adversarial Modeling”. In: CoRR abs/1610.07584 (2016). arXiv: 1610 .
07584.
[29] Chris Donahue, Julian J. McAuley, and Miller S. Puckette. “Synthesizing Audio with
Generative Adversarial Networks”. In: CoRR abs/1802.04208 (2018). arXiv: 1802 .
04208.
40
[30] Aäron van den Oord et al. “WaveNet: A Generative Model for Raw Audio”. In: CoRR
abs/1609.03499 (2016). arXiv: 1609.03499.
[31] Marius Cordts et al. “The Cityscapes Dataset for Semantic Urban Scene Understanding”.
In: CoRR abs/1604.01685 (2016). arXiv: 1604.01685.
[32] Deniz Engin, Anil Genç, and Hazim Kemal Ekenel. “Cycle-Dehaze: Enhanced Cycle-
GAN for Single Image Dehazing”. In: CoRR abs/1805.05308 (2018). arXiv: 1805 .
05308.
[33] Karen Simonyan and Andrew Zisserman. “Very deep convolutional networks for large-
scale image recognition”. In: arXiv preprint arXiv:1409.1556 (2014).
[34] Wei Liu et al. “End-to-End Single Image Fog Removal using Enhanced Cycle Consistent
Adversarial Networks”. In: CoRR abs/1902.01374 (2019). arXiv: 1902.01374.
[35] Boyi Li et al. “Benchmarking Single-Image Dehazing and Beyond”. In: IEEE Transac-
tions on Image Processing 28.1 (2019), pp. 492–505.
[36] Olaf Ronneberger, Philipp Fischer, and Thomas Brox. “U-Net: Convolutional Networks
for Biomedical Image Segmentation”. In: CoRR abs/1505.04597 (2015). arXiv: 1505.
04597.
[37] Dmitry Ulyanov, Andrea Vedaldi, and Victor S. Lempitsky. “Instance Normalization: The
Missing Ingredient for Fast Stylization”. In: CoRR abs/1607.08022 (2016). arXiv: 1607.
08022.
[38] Tensorflow. Tensorflow implementation of CycleGAN. URL: https://www.tensorflow.
org/tutorials/generative/cyclegan.
[39] Augustus Odena, Vincent Dumoulin, and Chris Olah. “Deconvolution and Checkerboard
Artifacts”. In: Distill (2016). DOI: 10.23915/distill.00003.
[40] Diederik P. Kingma and Jimmy Ba. “Adam: A Method for Stochastic Optimization”. In:
3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA,
USA, May 7-9, 2015, Conference Track Proceedings. Ed. by Yoshua Bengio and Yann
LeCun. 2015.
41