GAN Script
GAN Script
followed by the limitations of object detection model and how GAN can help enhance them .
so these are some of the papers that provided insights on advancements and applications of GAN
GAN is a deep learning generative model that has lately gained so much popularity
GANs are recently developed technique for learning in both supervised and semi-supervised modes.
The GAN architecture has 2 main neural networks that are trained simultaneously
it plays a minmax game where one network tries to reduce the loss while other increases it
with the obtained losses the model is trained again and the generator learns how to convince the
discriminator that its output is real.
-ln(predicted)+ln(1-predicted)
Input layer
FCC
upsampling
activation function
output layer
Discriminator Model Layers
input layer
conv layer
pooling layer
FCC
output layer
This is the basic general architecture of GAN and it have many variants based on the applications
Conditional GAN : In this GAN the architecture differs in this portion where along with noise an
additional data is sent to both NN which acts as a guide to create images it can be any condition
CycleGAN : this is usually used for domain shifting, the change in its architecture will be that it has
two generators and two discriminators . for each domain
DCGAN: instead of using FCC it use conv layers (transposed conv) or (conv with strides) to improve
data quality and stability
This is the general overview of What GAN is now lets look into limitations of obj detection models
It is proven and observed that no matter how accurate a DNN model is in object detection in tends to
underperform in few conditions
where the model can be sensitive to variations in image quality : when a model is only trained on hR
images and when it encounters a LR image which is common in realtime applications it can give
wrong results
for example
Difficulty with small objects: most of the major datasets available for obj detection are focused on
large object detection and in deep CNNs architectures commonly the deeper the feature map, the
lower the resolution, which is counterproductive when the object is so small that it may be lost along
the way
Limited scalability : there are limited options when we try traditional data augmentation techniques
and it has been proven that common re-scaling functions can distort the image which might be
completely different from real time image
Class Imbalance : there are high chances that a dataset can contain underrepresented class of
images and the dnn might not train well on it
We observed that in all these cases the common problem is availability of proper data
so That is the reason we use GAN to generate data that will help our model train well
It can be used in domain adaption for Bridging Domain Gaps i.e with the help of cycleGAN for
example this technique can be used in Medical sector to convert MRI to CT scans which is a kind of
domain adaption .
Data augmentation which helps use create more realistic images than traditional techniques
Synthetic data generation : where completely new data can be generated from existing once helps in
enhanced training, cost effective , customizable .
Rare/small object detection : where GAN can create LR from HR . which can be useful in aerial
applications where objects are too small in realtime.
it uses the generator model along with the image translation network to achieve this task
1 Reshape Layer
1 Output Layer
and a bounding box mask Sampled from a uniform distribution and used to resize the object patch
the resized object patch is then combined with the clean image using image translation network
Upsampling layer : Image reconstruction, Detail recovery , achieved using deconvolutional layers.
The network can integrate features from the object patch and the background image at various
stages of the upsampling process. By concatenating or adding feature maps from the object and the
background, the network can learn to blend them effectively.
The main goal of DS-GAN is to create smaller versions of high-resolution (HR) objects
when we use traditional methods for reducing the size it might lose imp features so DS-GAN helps to
downsample while maintaining the imp features.
Generator :
It takes a high-resolution (HR) object as input, along with some random noise.
The encoder extracts the important features from the HR object and compresses the
information.
The decoder then takes that compressed information and generates a smaller version while
keeping key features.
The middle part of the generator (the bottleneck) represents the most compressed form of the
image features. It captures high-level, abstract information that is necessary to regenerate the image
with proper detail
Discriminator :
The Discriminator receives both real LR objects from the LR dataset and generated objects from the
generator
The discriminator reduces the image size gradually while increasing the depth (number of channels).
This structure helps it detect high-level features that differentiate real images from generated ones.
moving on to applications
GAN can be used in auto vehicles where it can create data with small or occluded objects which will
make it robust for safer navigation
Healthcare to create synthetic data of medical scans which are hard to obtain
In a nutshell GAN is a powerful data augmentation tool for object detection models which enables
them to be robust to any variations which is common in realtime scenarios
The standard GAN has a mode collapse problem where Generator in a GAN learns to produce a
limited variety of outputs, effectively "collapsing" to a few modes.
This can be avoided by changing the loss functions and researches are going on with wesserstien loss
fuctions which can potentially overcome this
another way to achieve this is using multiple GANs ensuring that the weaknesses of one model do
not severely affect the overall output.
Outputs may lack fine details and appear less realistic due to the inability to prioritize significant
feature, more realistic images can be produced with the help of attention mechanism which helps
the model focus on relevant data.