0% found this document useful (0 votes)
20 views

GAN Script

GAN
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views

GAN Script

GAN
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

The Aim is to understand how GAN can enhance the object detection systems

In this seminar we will firstly discuss about general architecture of GAN

followed by the limitations of object detection model and how GAN can help enhance them .

so these are some of the papers that provided insights on advancements and applications of GAN

moving on to the main part

GAN is a deep learning generative model that has lately gained so much popularity

GANs are recently developed technique for learning in both supervised and semi-supervised modes.

The GAN architecture has 2 main neural networks that are trained simultaneously

Generator : learns to generate fake data

Discriminator : classifies whether the data is real/fake.

it plays a minmax game where one network tries to reduce the loss while other increases it

with the obtained losses the model is trained again and the generator learns how to convince the
discriminator that its output is real.

Loss function: Log-Loss

-ln(predicted)+ln(1-predicted)

Generator Model Layers

Input layer

FCC

upsampling

activation function

output layer
Discriminator Model Layers

input layer

conv layer

pooling layer

FCC

output layer

This is the basic general architecture of GAN and it have many variants based on the applications

Conditional GAN : In this GAN the architecture differs in this portion where along with noise an
additional data is sent to both NN which acts as a guide to create images it can be any condition

CycleGAN : this is usually used for domain shifting, the change in its architecture will be that it has
two generators and two discriminators . for each domain

DCGAN: instead of using FCC it use conv layers (transposed conv) or (conv with strides) to improve
data quality and stability

This is the general overview of What GAN is now lets look into limitations of obj detection models

It is proven and observed that no matter how accurate a DNN model is in object detection in tends to
underperform in few conditions

where the model can be sensitive to variations in image quality : when a model is only trained on hR
images and when it encounters a LR image which is common in realtime applications it can give
wrong results

for example

Difficulty with small objects: most of the major datasets available for obj detection are focused on
large object detection and in deep CNNs architectures commonly the deeper the feature map, the
lower the resolution, which is counterproductive when the object is so small that it may be lost along
the way

Limited scalability : there are limited options when we try traditional data augmentation techniques
and it has been proven that common re-scaling functions can distort the image which might be
completely different from real time image
Class Imbalance : there are high chances that a dataset can contain underrepresented class of
images and the dnn might not train well on it

We observed that in all these cases the common problem is availability of proper data

so That is the reason we use GAN to generate data that will help our model train well

It can be used in domain adaption for Bridging Domain Gaps i.e with the help of cycleGAN for
example this technique can be used in Medical sector to convert MRI to CT scans which is a kind of
domain adaption .

Data augmentation which helps use create more realistic images than traditional techniques

Synthetic data generation : where completely new data can be generated from existing once helps in
enhanced training, cost effective , customizable .

Rare/small object detection : where GAN can create LR from HR . which can be useful in aerial
applications where objects are too small in realtime.

so the main aim of this RDAGAN is robust fire detection

it uses the generator model along with the image translation network to achieve this task

The generator is used to create the fire object patch

1-2 Fully Connected Layers

1 Reshape Layer

5-7 Transposed Convolutional Layers

1 Output Layer

and a bounding box mask Sampled from a uniform distribution and used to resize the object patch

the resized object patch is then combined with the clean image using image translation network

Downsampling layers : feature extraction , dimensionality reduction . achieved using convolution


(strides) or pooling ,
ResNet : it has skip connections between layers to avoid vanishing gradients , achieved using conv
layers batch normalization , skip connections and output layer (ReLU).

Upsampling layer : Image reconstruction, Detail recovery , achieved using deconvolutional layers.

The network can integrate features from the object patch and the background image at various
stages of the upsampling process. By concatenating or adding feature maps from the object and the
background, the network can learn to blend them effectively.

Another application of GAN is for small object detection

The main goal of DS-GAN is to create smaller versions of high-resolution (HR) objects

when we use traditional methods for reducing the size it might lose imp features so DS-GAN helps to
downsample while maintaining the imp features.

In DS-GAN there are two sets of Objects

HR , LR real objects used to train generator to create synthetic small LR

It has 2 networks generator and discriminator

Generator :

It takes a high-resolution (HR) object as input, along with some random noise.

The generator produces an SLR object that is 4 times smaller image.

The generator has an encoder-decoder structure.

The encoder extracts the important features from the HR object and compresses the
information.

The decoder then takes that compressed information and generates a smaller version while
keeping key features.

The middle part of the generator (the bottleneck) represents the most compressed form of the
image features. It captures high-level, abstract information that is necessary to regenerate the image
with proper detail

Discriminator :

The Discriminator receives both real LR objects from the LR dataset and generated objects from the
generator
The discriminator reduces the image size gradually while increasing the depth (number of channels).
This structure helps it detect high-level features that differentiate real images from generated ones.

Then these LR images are blended to a image by similar process to RDAGAN

moving on to applications

GAN can be used in auto vehicles where it can create data with small or occluded objects which will
make it robust for safer navigation

CCTV : detecting objects in LR

same goes with the drones as ariel images are small

Healthcare to create synthetic data of medical scans which are hard to obtain

In a nutshell GAN is a powerful data augmentation tool for object detection models which enables
them to be robust to any variations which is common in realtime scenarios

Having said that GAN has some disadvantages futurescope

The standard GAN has a mode collapse problem where Generator in a GAN learns to produce a
limited variety of outputs, effectively "collapsing" to a few modes.

This can be avoided by changing the loss functions and researches are going on with wesserstien loss
fuctions which can potentially overcome this

another way to achieve this is using multiple GANs ensuring that the weaknesses of one model do
not severely affect the overall output.

Outputs may lack fine details and appear less realistic due to the inability to prioritize significant
feature, more realistic images can be produced with the help of attention mechanism which helps
the model focus on relevant data.

You might also like