0% found this document useful (0 votes)
5 views

project 4 report(Rohit&Gayatri)

The document presents a project titled 'Text To Image Generator' by Gayatri Palai and Bellana Rohit, aimed at developing a web application that generates images from textual descriptions using Generative Adversarial Networks (GANs). It discusses the challenges of training GANs and outlines the methodology for creating a model that converts text into image embeddings, ultimately enhancing the image generation process. The project is submitted as part of their B.Tech degree requirements at GIFT Autonomous Bhubaneswar under the guidance of Asst. Prof. Smruti Smaraki Sarangi.

Uploaded by

rohitkumar22218
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

project 4 report(Rohit&Gayatri)

The document presents a project titled 'Text To Image Generator' by Gayatri Palai and Bellana Rohit, aimed at developing a web application that generates images from textual descriptions using Generative Adversarial Networks (GANs). It discusses the challenges of training GANs and outlines the methodology for creating a model that converts text into image embeddings, ultimately enhancing the image generation process. The project is submitted as part of their B.Tech degree requirements at GIFT Autonomous Bhubaneswar under the guidance of Asst. Prof. Smruti Smaraki Sarangi.

Uploaded by

rohitkumar22218
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 36

TEXT TO IMAGE

GENERATOR
Submitted
By

Gayatri Palai (2201298331) &


Bellana Rohit (2201298311)

Under the Guidance of


Asst. Prof. Smruti Smaraki Sarangi

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

GIFT AUTONOMOUS BHUBANESWAR

2024-25
DEPARTMENT OF COMPUTER SCIENCE AND
ENGINEERING

GIFT AUTONOMOUS BHUBANESWAR

CERTIFICATE

This is to certify that the project entitled “Text To Image Generator” has
been carried out by Gayatri Palai(Regd. No. 22012984331) and Bellana
Rohit(Regd. No. 2201298311) under my guidance and supervision and be
accepted in partial fulfilment of the requirement for the degree of Masters In
Computer Science And Engineering. The report, which is based on the
candidate’s own work, has not been submitted elsewhere for a degree. To the
best of my knowledge, Mrs. Gayatri Palai and Mr. Bellana Rohit have good
moral character and decent behavior.

HOD Project Guide Project Coordinator

Dr. Sujit Kumar Panda Asst.Prof. Smruti Smaraki Sarangi Asst. Prof. Suchisnita Nayak

2
DEPARTMENT OF COMPUTER SCIENCE AND
ENGINEERING

GIFT AUTONOMOUS BHUBANESWAR

DECLARATION

We, Gayatri Palai and Bellana Rohit, hereby declare that this written
submission represents our ideas in our own words, and where others’ ideas or
words have been included, it has been adequately cited and referenced to the
original sources. I also declare that I have adhered to all principles of academic
honesty and integrity and have not misrepresented or fabricated, or falsified
any idea/ data/ fact/ source in my submission. I understand that any violation
of the above will be cause for disciplinary action by the Institute and can also
evoke penal action from the sources which have thus not been properly cited
or from whom proper permission has not been taken when needed.

Gayatri Palai Reg.no.2201298331

Bellana Rohit Reg.no.2201298311

3
DEPARTMENT OF COMPUTER SCIENCE AND
ENGINEERING

GIFT AUTONOMOUS BHUBANESWAR

BONAFIDE CERTIFICATE

This is to certify that the project work titled “ Text To Image Generator ”
is a bonafide record of the work done by Mrs. Gayatri Palai (2201298331) &
Mr. Bellana Rohit (2201298311) in partial fulfillment of the requirements for
the award of the degree B.Tech in CSE from GIFT Autonomous, Bhubaneswar
Odisha.

Asst. Prof. Smruti Smaraki Sarangi Dr. Sujit Kumar Panda


PROJECT GUIDE HOD CSE

4
DEPARTMENT OF COMPUTER SCIENCE AND
ENGINEERING

GIFT AUTONOMOUS BHUBANESWAR

ACKNOWLEDGEMENTS
I am very grateful, thankful, and wish to record our indebtedness to Dr.
Trilochan Sahu, Principal, Gandhi Institute For Technology (GIFT ),
Bhubaneswar, for his active guidance and interest in this project work.

I would also like to thank Dr. Sujit Ku. Panda, Head, Department of Computer
Science and Engineering and for his continued drive for better quality in
everything that allowed us to carry out our project work.
I would also like to take the opportunity to thank Asst. Prof. Smruti Smaraki
Sarangi, for their help and cooperation in this project work.

Lastly, words run to express my gratitude to my Parents and all the Professors,
Lecturers, Technical and Official staff, and friends for their co-operation,
constructive criticism, and valuable suggestions during the preparation of the
project report.

Gayatri Palai Reg.no.2201298331

Bellana Rohit Reg.no.2201298311

5
ABSTRACT
Synthetic Content Generation Using Machines is a very trending topic in the
field of Deep Learning and it is an extremely difficult task even for the state-of
the-Art ML algorithms. The upside of Using Deep Learning to do this is that it
can generate Content that does not exist yet. In the recent past Generative
Adversarial Networks (GAN) have shown great promise when it comes to
generating images but they are difficult to train and condition on any particular
input which acts as a downside for them. However, they have tremendous
applications in generating content in an unsupervised learning approach like
generating video, Increasing the resolution of Images or Generating Images
from Text. In this project we look at generating 64*64 Images on the fly using a
text as an Input. The images generated will be unique in terms that they do not
already exist and in doing that we will improve upon already existing
Architecture models and try to reduce the difficulties that come with training
GAN Models like Reduced Training Time and Better Convergence of The
Model.
The Final Project will be a WebApp, where you can Input a Text and a

Synthetic Image Will be generated Based on the Description of the Text.

6
INDEX
S.NO CONTENTS PAGE NO.

ABSTRACT 6

1. INTRODUCTON 8-17

2. SYSTEM DEVELOPMENT 18-27

3. PERFORMANCE ANALYSIS
28-30

4. ADVANTAGES AND
DISADVANTAGES 30-32

5. FINAL WEB-APP PRODUCTION 33

6. CONCLUSION 34

7. REFERENCES 35

7
1. INTRODUCTION

1.1 INTRODUCTION
For a human mind it is very easily too thin of new content. what if someone asks
you to “draw a flower with blue petals”. It is very easy for us to do that. but
machines process information very differently. Just understanding the structure
of the above sentence is a difficult task for them let alone generate something
based on that description. Automatic synthetic content generation is a field that
has been explored in the past and was discredited because at that time neither
the algorithms existed nor enough processing power that could help solve the
problem. However, the advent of deep learning started changing the earlier
beliefs. The tremendous power of neural networks to capture the features even
in the humongous of datasets makes them a very viable candidate for
automatic content generation. another milestone was achieved when Ian Good
Fellow proposed generative adversarial networks in 2014. GANs are a kind of
architecture in Deep learning that can produce content from random noise.
What is even more unique about GANs is that the content they create
represents the dataset on which they are being trained upon but it is totally
unique in some way or the other. Generating an image from a text-based
description is one aspect of generative adversarial networks that we will focus
upon. Since the GANs follow unsupervised learning approach we have
modified them to take am input as a condition and generate based on the input
condition. This can form base for a large number of things like synthetic audio
generation like the ones used in Siri or assistant, video content generation from
just scripts. imagine entire movies made out of just the script. These are some
uses that many companies are researching about. modifying GANs and
applying conditions on them isn’t limited to just generating images, we can use
it to create passwords that are very hard to crack and numerous similar
applications like this,
8
Deep Learning and Content Generation
Deep Learning is a field that utilises and relies completely on Various
Flavours of Neural Networks to Extract Insights from the data and find
patterns among that data. While it has been shown to be very successful
in things like Image Classification (In some datasets even beating human
level accuracy by a large margin) and Time Series Analysis (There are
so many factors involved that it
even becomes difficult for a human to take all those into account), A
completely different Aspect of it has been started to explore.

9
The big Question Being
"Can We use Deep Learning to Generate Content?"
As we know Neural Networks can extract features of a dataset that
they have been trained upon, the goal becomes using those features
to create new data points that do not belong in the dataset itself.

Generative Adversarial Networks


Generative Adversarial Networks (GAN’s) were created by Ian Good Fellow in
2014 in an attempt to generate content instead of just representing it in a
compact form and they are the most successful kind of Deep Learning
Models that are even remotely close to the task. What does GAN do?
Basically, it can be trained to generate data from a
scratch or random noise.
It consists of two building blocks:
1) Generator:
The task of the Generator is to take in some input and generate something out of
it. In cases of Images, it might take in some noise as input and generate an
image which might not mean anything Initially. It is simply the reverse of
what a standard Convolutional Neural Network (CNN) is. A CNN takes in
input as an image and down samples it along the height and width dimensions
while increasing it along the channel dimension which acts as our features
essentially. What a Generator Does is it takes in a down sampled input and
through various Up sampling Operations Generates an Image. By comparing
the real images and the images that is generated by generator, GAN builds a
discriminator that helps us to learn the differences that makes that image real
and then after it will provide feedback to generator about the image that is to
be generated next.

10
2) Discriminator:
Generator alone will just generate something random, so basically discriminator
will give guidance to generator on what images it should create. Discriminator
is nothing more than a simple convolutional neural network that takes in an
image as an input and determines whether the image came from the original
dataset or is it an image generated by the generator. Simply taking in an image
as an input it determines whether it is real or fake (Synthetically Generated by
Generator).

11
1.1 PROBLEM STATEMENT
Generating Images from Text is a very difficult problem that can be approached
by using Generative Adversarial Networks and will be extremely useful for
content creators wherein they can type a description and have the type of
content generated automatically saving them a lot of money and work.
Imagine Thinking about a Description and having to draw something that
matches the description in a meaningful way. It’s even a difficult task for
humans. But Deep Learning Can Understand the Underlying Structure of The
Content and might be able to generate that automatically. Thereby eliminating
the need of domain expertise. GANs despite having all the upside for content
generation are very difficult to train and take a lot of time to converge and are
unstable doing the training process and in this project, we also try to tackle
these problems by modifying the Underlying Structure of the GAN Model

12
1.2 OBJECTIVE
The main objective of this project is to develop a web app in which a text can be
inputted and it outputs an image matching the description of the text and in
doing so try to improve upon the generator architecture of the Generative
Adversarial Networks. By modifying the input to a generator and applying
conditions on the input we can create a model that generates images not from
noise but from a controlled input. In Our case the Controlled Input Being the
text that is Embedded after passing onto another Neural Network

13
14
1.3 METHODOLOGY

We first start by downloading the Oxford 102 Dataset which contains 102
different Categories of flowers and also contains annotation for each image
in the form of a text Description.

15
After this we download on more data set that is CUB dataset that contains 200
bird species with almost 11700 images.

16
Next, We Begin importing all the packages and the sub packages and splitting
it into the training, Validation and testing set. The following packages and
libraries are being used to process The Dataset and build the architectures:

•NumPy

•Pytorch

•OpenCV

•Flask

We first start by downloading and pre-processing the dataset. During the pre-
processing phase we convert text into embedding and normalize the images so
they are ready to be passed onto respective models We then start to build our
Customised Generator Model and use a standard Pre-trained Model as the
Discriminator After the model Creation we create a training script and take in
some best practices in the field of Deep Learning to train the model with
stability using our customised Pytorch Trainer. The Final task is to wrap up the
final trained model into a Flask Web App so that Testing becomes easy.

17
1.4 FUTURE SCOPE
As AI and machine learning technologies continue to develop, the capabilities
of AI image generators will undoubtedly improve and expand. Here are some
potential future developments and innovations that can take AI image
generation to new heights:
Improved algorithms:

As researchers and developers identify new techniques and approaches to


improve AI image generation, the algorithms employed by these tools will
become more advanced and efficient. It is anticipated that future versions of AI
image generators will generate more realistic and high-quality images, with
fewer artifacts and more precise fine details.
More diverse and coherent results:

Existing AI image generators still struggle with generating diverse and


coherent results consistently. In other words, they sometimes lack the ability to
represent a broader range of styles and may generate images with
inconsistencies or inaccuracies. In the future, AI image generators will likely
produce more diverse and consistent images while reducing these common
issues, leading to better alignment with users' expectations and requirements.
Better integration with existing tools:

Future AI image generators are likely to seamlessly integrate with various


existing design and development tools, enabling creatives to work more
efficiently and add AI-powered image generation functionalities to their
workflows. This will remove any significant effort required to implement AI
image generation in applications, such as utilizing the capabilities
of tools likeApp
a platform for backend, web, and mobile applications.
Master

18
2. SYSTEM DEVELOPMENT
2.1 TEXT DESCRIPTION TO EMBEDDING

The very first step involved in training our model is to convert the text to an
embedding. Neural networks work on vectors and numbers and cannot
essentially do anything if the input format is a text. So, the very first thing we
do is utilise a Long Short-Term Memory (LSTM) network which will take in
the input as a pre-processed text after removing unnecessary space and
improving semantics using standard text pre-processing libraries like spacy
and converting the text description into a vector of numbers which is then
given as an input to a pre-trained LSTM and the last layer is taken out which is
essentially the word embedding that we are looking for.

19
Why Word Embedding

Why exactly do we need to convert our sentence into an Embedding and not
just a one hot Encoded vector. To Understand that let us take a very simple
Example where in once we represent the Words as one hot encoded Vectors
and in the other, we Use an Embedding Matrix

The issue with representing words like this is:

1. Each word is a very high dimensional vector

2. Those Vectors do not have any kind of relation among them that a model can Learn and
it becomes very difficult for it to learn when it cannot even understand the relation between
words Now let us Represent them in an Embedding

20
When Represented like this the embedding for each vector has a meaning.
When representing these in Euclidean Space we will see that The Two Fruits
are closer to each other while the King and Queen are very similar to each
other in many respects except one which could be Gender. It is not pre-decided
on what features the model should learn but during the process model Itself
decides the best values that reduce the loss and in process it learns the
embedding That makes more sense to it.

21
Long Short-Term Memory Network or LSTM is a type of Recurrent Neural
Network that are very good for processing of long sentences because of its
ability to learn long term dependencies within the text by modifying the weight
of its gate cells. RNN Typically suffer with a problem that they can’t
remember the proper dependencies When processing text whose length is long.
To illustrate that problem, we will demonstrate Using a very simple Series.
Suppose you are being provided with a series and you have to tell the next
number Example 1) 2->4->6->8

Example 2) 2->4->8

Now in both the series three numbers are common and we know the first series
is a Multiple of 2 while the second one is a power of 2. But when we pass the
numbers to A Model the last input that it gets in both cases is 8 so how should
the model distinguish Between both the series. It should essentially have
previous pattern information combined with the current input to output the
correct result. But when the sequence gets longer in Length an RNN fails to
factor the previous information properly as with no proper mechanism to deal
with degrading gradients and at the end it is unable to do any kind of learning
This is the problem that LSTM were built to solve. An LSTM has additional
gates that help It properly retains the information throughout the input.
However Not all information is Important every time. As we go deeper into the
sequence the chances that the next output Depends on a very old input is very
less and that is where the forget gate of LSTM comes into action. At every
Step of input in a sequence an LSTM remodifies the weight of the gates using
backpropagation. In a very simple way, it helps it to determine what kind of
inputs are important at the current step to predict the next word/element in a
sequence.

22
2.2 PRE-PROCESSING THE IMAGES

Mean and Standard Deviation for Proper Normalisation of Data:

We need to properly process the data before passing to the model as this will
Determine the level of accuracy that we can reach Instead of Using the 0 mean
and standard Deviation of 1, we can compute the mean and standard deviation
for each channel easily. for the current dataset the mean comes out to be
[0.484,0.451,0.406] and the standard deviation comes to be
[0.231,0.244,0.412].
Data Augmentation

Data Augmentation will help us to create more data to feed in to the model and
help it to generalise well by letting it see the data in various orientations. We
create our own transformation using NumPy. Here are some of the
Augmentation that we will be implementing

•Random Flip (Horizontal and Vertical)

•Random 90-degree Rotations

•Adding Lightening to the visual channels

Combining the random flip and random rotation we have come up with the 8
dihedral transformations that could be applied to any number of channels and
on any kind of dataset as could be seen in the code snippet we first start by
creating a function which takes in an input x as a tensor (Matrix
Representation of Our Image) and a mode. We do not want to apply these
image augmentations when we are in validation mode and testing the entire
thing out in training mode, we need to randomly apply these transforms. We
use the python’s default random number generator to determine what kind of
transformations would be randomly applied to the image.

23
To flip the image horizontally we first convert our tensor into a NumPy array
and then use the NumPy flipper function to flip the array horizontally and flip
up to flip the array vertically. To rotate the image, we generate a random
number k between 0 and 3 which determines how many 90-degree rotations of
the array we will do. The following dihedral transformations could be formed
after this step

•Horizontal Flip + Any of three 90-degree Rotations

•Horizontal Flip with No Rotations

•Vertical Flip + Any of three 90-degree Rotations

•Vertical Flip with No Rotations

24
2.3 CREATING CUSTOMISED GENERATOR MODEL

The way a standard Generator Model Works is that it takes in some input and
by a series of Up sampling or Deconvolution operations, it creates the Image.
The only issue with that is while generating the final output it takes into
account is the Information from the previous layer which are very ideal for
tasks like Classification and Bounding Box Regression. But when dealing with
Image Generation we should also keep into account the original input
constraints without much processing along with the Information in the last
layer as it will not only help the gradient flow better but also help converge the
model faster

25
In the code snippet above we create our customised generator model from
scratch using pytorch. We start off by declaring the class and then initializing
the architecture within it.to properly use of pytorch’s inbuilt neural network
layers we need to use super to inherit the properties of the base class we start
off by declaring a convtranspose2d which essentially takes in the input
embedding and starts by doubling along the height and width and reducing
along the channel direction we add a dropout to increase regularization which
not only deals with overfitting the model on the training but also helps the
model generalise on the input features well this is followed by two
convolutional blocks, one doubling along the channel dimension and the other
one taking in that input and again reducing it back to original channel
dimensions without any change in any other dimensions. This was done as in
our practical implementations this trick worked out well now comes the major
step of producing the final image. As we stated earlier that we also need to add
in the original embedding directly. But the issue with that is embedding has
different dimensions altogether. To resolve that we use a simple up sampling
operation to bring the embedding to proper dimension before adding it to the
output of last layer. In terms of equations, we can see it as let the input be x
and desired output be h(x)

where F(x)=Conv Blocks + Non Linearities Instead of hoping the function to


fit to a desired mapping we can specify a residual mapping and let model
reduce it and optimise it so as to bring it closer to our desired output h(x)

26
2.4 TRAINING THE MODEL

The training process of a Generative Adversarial Network is a bit complicated


than training a normal Neural Network as it involves training the discriminator
and the generator in an alternating fashion.

Step 1: Train the discriminator on original dataset and some random noise to
get the discriminator an edge in identifying real images from random noise.
This step is very important in the beginning as if the discriminator doesn’t
already know to some extent what the real dataset should look like. When we
use the loss function to the generator it will give essentially a lower loss than it
should which slows down the initial training. The training eventually stabilises
if we do not train the discriminator first properly but that takes a lot of time. by
doing this we are decreasing the training time of the model.

Here is the algorithm:

Step 2: After the discriminator has been initially trained for a while, we start
by making a forward pass through the modified generator model and get in a
random image initially and a high loss function, which is then backpropagated
throughout the entire network in order to update and fine tune its internal
parameters. The generated images are stored in a temporary variable and are
passed on to the discriminator in its next phase. There might be a chance where
our GAN is not finding the equilibrium between the discriminator and
generator.
27
This graph shows us the accuracy by the discriminator:

Here the accuracy of discriminator is 100% which means our GAN is perfectly
identifying that weather the image is real or fake. This is the most common
failure and it is called convergence failure.

28
3 PERFORMANCE ANALYSIS

EVALUATION:
It is not easy to evaluate the performance of a generative model based on any
metric and mostly humans at the end have to decide whether the generated
content is good or not and whether or not it holds any particular meaning.
However, we can judge the discriminator model in its ability to distinguish real
images from fake ones. now compared to ordinary convolutional models where
high accuracy means better results it isn’t true in the case of generative models.
If the discriminator has very high accuracy in distinguishing real images from
fake ones, then that implies generator hasn’t done a very good job in creating
images that represent the dataset well. In the situation of a perfect equilibrium
the discriminator should have an accuracy of 50% that is it has to take a
random guess to determine whether the generated image is fake or not
implying the generator has created images so good that are indistinguishable
from the original images. The closer an accuracy is to 50% the better task the
generator has done in creating images.

29
The above graph shows us the loss that is the discriminator loss for the real images
in blue colour and discriminator loss for fake images in orange colour and
generator loss for the generated images in the green colour. This is an expected
loss during this training and it will stabilize after around 100 to 300 epochs. The
discriminator loss for the real and the fake images is approximately 50% and the
generator loss for the generated images is between to 50% to 70 %.

Accuracy graph:

This graph shows us the accuracy by the discriminator for the real images that
is in blue colour and for the fake images in orange colour. This GAN model
will get stabilized in 100 to 300 epochs and after that it will give us an
accuracy approximately in between 70% to 80% and it will remain stabilized
after that.

30
4 ADVANTAGES AND DISADVANTAGES:

4.1 ADVANTAGES:
1. Enhanced Efficiency and Productivity:

AI-generated photos offer a significant advantage in terms of enhanced


efficiency and productivity in the creative process. By leveraging AI
algorithms, the photo generation process can be automated, resulting in time
and effort savings for photographers and content creators. Traditionally,
photographers spend hours manually editing and retouching photos to achieve
the desired quality and aesthetics. However, with AI-generated photos, the
need for extensive manual editing is reduced. AI algorithms can quickly
analyse and process images, applying enhancements and adjustments to
generate high-quality images with minimal human intervention.

The automation of the photo generation process enables photographers and


content creators to focus more on the artistic aspects of their work. With AI
handling the technical aspects, such as colour correction, lighting adjustments,
and image enhancement, photographers can dedicate their time and energy to
capturing unique perspectives, exploring creative concepts, and telling
compelling visual stories. This increased efficiency allows for a streamlined
workflow, enabling photographers to produce a larger volume of high-quality
images in a shorter amount of time.
2. Improved Image Quality and Aesthetics:

AI algorithms excel in analysing and enhancing photos, leading to a


remarkable improvement in image quality and aesthetics. With the ability to
adjust lighting, correct colour grading, and optimize composition, AI-generated
photos capture attention with their visually stunning and captivating
characteristics.

31
3. Accessibility and Affordability:

AI-generated photos offer a significant advantage in terms of accessibility,


making professional-quality imagery more attainable for individuals and
businesses with limited resources. Traditional photography equipment and
professional editing tools can be prohibitively expensive, requiring a
substantial financial investment. In contrast, AI- powered photo generation
provides a cost-effective alternative that opens doors for aspiring
photographers on a tighter budget. By leveraging AI technology, individuals
without extensive photography experience or expensive equipment can now
produce high-quality photos that rival the work of seasoned professionals.

Furthermore, AI-generated photos offer a simplified and streamlined approach


to image creation, removing the need for extensive technical expertise.
Traditional photography often requires a deep understanding of complex
camera settings, lighting techniques, and post- processing tools. AI, on the
other hand, simplifies the process by automating many of these tasks.

32
4.2 DISADVANTAGES
1. Lack of Authenticity and Originality:

One of the main concerns with AI-generated photos is the potential lack of
authenticity and originality. While AI algorithms excel at analysing patterns
and generating visually pleasing images, they may lack the human touch and
personal expression found in traditional photography. AI-generated photos can
sometimes appear generic or formulaic, lacking the unique perspective and
creativity that photographers bring to their work. This raises questions about
the uniqueness and artistic value of AI-generated visuals.
2. Ethical Considerations and Bias:

The use of AI algorithms in generating photos also raises ethical


considerations. Privacy and consent issues may arise when AI is used to
manipulate or generate images of individuals without their knowledge or
consent. Moreover, AI algorithms can inherit biases present in the data they are
trained on, potentially perpetuating societal biases or stereotypes in the
generated photos. Responsible and transparent AI practices is essential to
address these ethical concerns and ensure the fair and unbiased use of AI-
generated photos.
3. Dependency on Data Quality and Training

The performance of AI algorithms heavily relies on the quality and diversity of


the training data they are provided. If the training data is biased or lacks
diversity, the generated photos may inherit these limitations. AI algorithms
need access to a wide range of high-quality training data to produce accurate
and representative results. Ensuring that the training datasets are
comprehensive and inclusive is crucial for overcoming biases and achieving
the desired level of diversity in the generated photos.

33
4 FINAL WEB-APP PRODUCTION
We Develop a Web App using flask that presents the user. The user with a web
app and an option to input the text and choose a model from various inference
Models that we have trained. On clicking generate Image, the request is
processed in the backend in python and a resultant image using the model is
created and we hit another flask endpoint where the generated Image is
displayed. The following end points have been created in our existing flask
application:
1) Home Route:

At the home endpoint or route, we redirect to the page where the user can
provide with an input if the model has been loaded successfully without any
error. If the chosen model cannot be loaded properly, we redirect to a route
describing the error.
2) Generate Route:

After the user successfully enters a text, it is pre-processed into a vector and
passed on to our LSTM model that generates the word embedding. The
embedded vector is then passed to the loaded generator model and is saved
onto a location using timestamp as the file name.
3) Result Route:

After the Image has been successfully generated, we redirect the application
to a page which displays the generated image.
4) Error Route:

The default route in case any error exists.

34
5 CONCLUSIONS
In this project we have created a web app that can take in text description of a
flower and bird and generate images based on that. And while doing that we
have modified the generator architecture in such a way that we have reduced
the training time of GAN.

In this paper, a brief image generation review is presented. The existing images
generation approaches have been categorized based on the data used as input
for generating new images including images, hand sketch, layout and text. In
addition, we presented the existing works of conditioned image generation
which is a type of image generation while a reference is exploited to generate
the final image. An effective image generation method is related to the dataset
used which must be a large-scale one. For that, we summarize popular
benchmark datasets used for image generation techniques. The evaluation
metrics for evaluating various methods is presented. Based on these metrics as
well as dataset used for training, a tabulated comparison is performed. Then, a
summarization of the current image generation challenges is presented.

35
6 REFRENCES
[1] M. Arlovski and L. Bottou. Towards principled methods for training generative adversarial networks. In
ICLR, 2017.

[2] A. Brock, T. Lim, J. M. Ritchie, and N. Weston. Neural photo editing with introspective adversarial
networks. In ICLR, 2017.

[3] T. Che, Y. Li, A. P. Jacob, Y. Bagnio, and W. Li. Mode regularized generative adversarial networks. In
ICLR, 2017.

[4] X. Chen, Y. Duan, R. Houthooft, J. Schulman, I. Sutskever, and P. Abbeel. Infogan: Interpretable
representation learning by information maximizing generative adversarial nets. In NIPS, 2016.

[5] E. L. Denton, S. Chintala, A. Szlam, and R. Fergus. Deep generative image models using a laplacian
pyramid of adversarial networks. In NIPS, 2015.

[6] C. Doersch. Tutorial on variational autoencoders. arXiv:1606.05908, 2016.

[7] J. Gauthier. Conditional generative adversarial networks for convolutional face generation. Technical
report, 2015.

[8] I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. C. Courville, and Y.


Bengio. Generative adversarial nets. In NIPS, 2014.

[9] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In CVPR, 2016. [10]
X. Huang, Y. Li, O. Poursaeed, J. Hopcroft, and S. Belongie. Stacked generative adversarial networks. In
CVPR, 2017

36

You might also like