0% found this document useful (0 votes)
2 views

ai-image-generator

The document presents a project report on a Text to Image Generator using Artificial Intelligence, submitted for a Bachelor of Engineering degree at the University of Mumbai. The project focuses on utilizing Generative Adversarial Networks (GANs) to generate unique images based on textual descriptions, addressing challenges such as training time and model convergence. The final output will be a web application that allows users to input text and receive generated images, showcasing the potential of AI in content creation.

Uploaded by

rohitkumar22218
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

ai-image-generator

The document presents a project report on a Text to Image Generator using Artificial Intelligence, submitted for a Bachelor of Engineering degree at the University of Mumbai. The project focuses on utilizing Generative Adversarial Networks (GANs) to generate unique images based on textual descriptions, addressing challenges such as training time and model convergence. The final output will be a web application that allows users to input text and receive generated images, showcasing the potential of AI in content creation.

Uploaded by

rohitkumar22218
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 37

Ai image generator

Computer Engineering (University of Mumbai)

Scan to open on Studocu

Studocu is not sponsored or endorsed by any college or university


Downloaded by Rohit Kumar
A
Project Report
On

TEXT TO IMAGE GENERATOR


USING ARTIFICIAL
INTELLIGENCE
Submitted in partial fulfilment of the requirements
For the degree of

Bachelor of Engineering in
Computer
By
Name Roll no.
Yash Ahire 02
Prashant Bhalerao 04
Chetan Parse 42
Jishan Shaikh 47
Supervisor
Prof. (Deepa Athawale)

Technology Personified
Department of Computer Engineering
Innovative Engineers’ and Teachers’ Education
society’s
Bharat College of Engineering
Badlapur: - 421504
(Affiliated to University of Mumbai)
(2023-24)

Downloaded by Rohit Kumar


Technology Personified
Bharat College of Engineering
(Affiliated to the University of
Mumbai) Badlapur:- 421504.

CERTIFICATE
This is to certify that, the Project titled

“IMAGE GENERATOR USING AI”


is a bona-fide work done by

Yash Ahire
Prashant Bhalerao
Chetan Parse
Jishan Shaikh

and is submitted in the partial full-filament of the requirement for the degree of
Bachelor of Engineering in Computer
To the University of Mumbai

Supervisor External
Prof. (Deepa Athawale) Prof.( )

Subject Incharge Head of Department Principal


(Prof.Archana Bhaware) (Prof. Deepa Athawale) (Prof.Siddhartha Ladhake)

Downloaded by Rohit Kumar


ABSTRACT
Synthetic Content Generation Using Machines is a very trending topic in the field of Deep
Learning and it is an extremely difficult task even for the state-of the-Art ML algorithms. The
upside of Using Deep Learning to do this is that it can generate Content that does not exist
yet. In the recent past Generative Adversarial Networks (GAN) have shown great promise
when it comes to generating images but they are difficult to train and condition on any
particular input which acts as a downside for them. However, they have tremendous
applications in generating content in an unsupervised learning approach like generating
video, Increasing the resolution of Images or Generating Images from Text. In this project we
look at generating 64*64 Images on the fly using a text as an Input. The images generated
will be unique in terms that they do not already exist and in doing that we will improve upon
already existing Architecture models and try to reduce the difficulties that come with training
GAN Models like Reduced Training Time and Better Convergence of The Model.
The Final Project will be a WebApp, where you can Input a Text and a Synthetic Image Will

be generated Based on the Description of the Text

Downloaded by Rohit Kumar


INDEX

S.NO CONTENTS PAGE NO.

ABSTRACT

1. INTRODUCTON 3

2. SYSTEM DEVELOPMENT 12

3. PERFORMANCE ANALYSIS
23

4. ADVANTAGES AND
DISADVANTAGE 25

5. FINAL WEB-APP PRODUCTION 28

6. CONCLUSION 30

7. REFERENCES 2

Downloaded by Rohit Kumar


1. INTRODUCTION

1.1 INTRODUCTION
For a human mind it is very easily too thin of new content. what if someone asks you to
“draw a flower with blue petals”. It is very easy for us to do that. but machines process
information very differently. Just understanding the structure of the above sentence is a
difficult task for them let alone generate something based on that description. Automatic
synthetic content generation is a field that has been explored in the past and was discredited
because at that time neither the algorithms existed nor enough processing power that could
help solve the problem. However, the advent of deep learning started changing the earlier
beliefs. The tremendous power of neural networks to capture the features even in the
humongous of datasets makes them a very viable candidate for automatic content
generation. another milestone was achieved when Ian Good Fellow proposed generative
adversarial networks in 2014. GANs are a kind of architecture in Deep learning that can
produce content from random noise. What is even more unique about GANs is that the
content they create represents the dataset on which they are being trained upon but it is
totally unique in some way or the other. Generating an image from a text-based description
is one aspect of generative adversarial networks that we will focus upon. Since the GANs
follow unsupervised learning approach we have modified them to take am input as a
condition and generate based on the input condition. This can form base for a large number
of things like synthetic audio generation like the ones used in Siri or assistant, video content
generation from just scripts. imagine entire movies made out of just the script. These are
some uses that many companies are researching about. modifying GANs and applying
conditions on them isn’t limited to just generating images, we can use it to create passwords
that are very hard to crack and numerous similar applications like this

Deep Learning and Content Generation


Deep Learning is a field that utilises and relies completely on Various
Flavours of Neural Networks to Extract Insights from the data and find patterns
among that data. While it has been shown to be very successful in things like Image
Classification (In some datasets even beating human level accuracy by a large
margin) and Time Series Analysis (There are so many factors involved that it
even becomes difficult for a human to take all those into account), A completely
different Aspect of it has been started to explore.

Downloaded by Rohit Kumar


The big Question Being
"Can We use Deep Learning to Generate Content?"
As we know Neural Networks can extract features of a dataset that they have
been trained upon, the goal becomes using those features to create new data
points that do not belong in the dataset itself.

Generative Adversarial Networks


Generative Adversarial Networks (GAN’s) were created by Ian Good Fellow in 2014 in
an attempt to generate content instead of just representing it in a compact form and they
are the most successful kind of Deep Learning Models that are even remotely close to the
task. What does GAN do? Basically, it can be trained to generate data from a
scratch or random noise.
It consists of two building blocks:
1) Generator:
The task of the Generator is to take in some input and generate something out of it. In cases
of Images, it might take in some noise as input and generate an image which might not
mean anything Initially. It is simply the reverse of what a standard Convolutional Neural
Network (CNN) is. A CNN takes in input as an image and down samples it along the height
and width dimensions while increasing it along the channel dimension which acts as our
features essentially. What a Generator Does is it takes in a down sampled input and through
various Up sampling Operations Generates an Image. By comparing the real images and the
images that is generated by generator, GAN builds a discriminator that helps us to learn the
differences that makes that image real and then after it will provide feedback to generator
about the image that is to be generated next.

Downloaded by Rohit Kumar


5

Downloaded by Rohit Kumar


2) Discriminator:
Generator alone will just generate something random, so basically discriminator will give
guidance to generator on what images it should create. Discriminator is nothing more than a
simple convolutional neural network that takes in an image as an input and determines
whether the image came from the original dataset or is it an image generated by the
generator. Simply taking in an image as an input it determines whether it is real or fake
(Synthetically Generated by Generator).

Downloaded by Rohit Kumar


1.1 PROBLEM STATEMENT
Generating Images from Text is a very difficult problem that can be approached by using
Generative Adversarial Networks and will be extremely useful for content creators
wherein they can type a description and have the type of content generated automatically
saving them a lot of money and work. Imagine Thinking about a Description and having to
draw something that matches the description in a meaningful way. It’s even a difficult task
for humans. But Deep Learning Can Understand the Underlying Structure of The Content
and might be able to generate that automatically. Thereby eliminating the need of domain
expertise. GANs despite having all the upside for content generation are very difficult to
train and take a lot of time to converge and are unstable doing the training process and in
this project, we also try to tackle these problems by modifying the Underlying Structure of
the GAN Model

Downloaded by Rohit Kumar


1.2 OBJECTIVE
The main objective of this project is to develop a web app in which a text can be inputted
and it outputs an image matching the description of the text and in doing so try to improve
upon the generator architecture of the Generative Adversarial Networks. By modifying the
input to a generator and applying conditions on the input we can create a model that
generates images not from noise but from a controlled input. In Our case the Controlled
Input Being the text that is Embedded after passing onto another Neural Network

Downloaded by Rohit Kumar


1.3 METHODOLOGY

We first start by downloading the Oxford 102 Dataset which contains 102 different
Categories of flowers and also contains annotation for each image in the form of a text
Description.

Downloaded by Rohit Kumar


After this we download on more data set that is CUB dataset that contains 200 bird species
with almost 11700 images.

10

Downloaded by Rohit Kumar


Next, We Begin importing all the packages and the sub packages and splitting it into the
training, Validation and testing set. The following packages and libraries are being used to
process The Dataset and build the architectures:

•NumPy

•Pytorch

•OpenCV

•Flask

We first start by downloading and pre-processing the dataset. During the pre-processing
phase we convert text into embedding and normalize the images so they are ready to be
passed onto respective models We then start to build our Customised Generator Model and
use a standard Pre-trained Model as the Discriminator After the model Creation we create a
training script and take in some best practices in the field of Deep Learning to train the
model with stability using our customised Pytorch Trainer. The Final task is to wrap up the
final trained model into a Flask Web App so that Testing beco

11

Downloaded by Rohit Kumar


1.4 FUTURE SCOPE
As AI and machine learning technologies continue to develop, the capabilities of AI image
generators will undoubtedly improve and expand. Here are some potential future
developments and innovations that can take AI image generation to new heights:

Improved algorithms:

As researchers and developers identify new techniques and approaches to improve AI image
generation, the algorithms employed by these tools will become more advanced and
efficient. It is anticipated that future versions of AI image generators will generate more
realistic and high-quality images, with fewer artifacts and more precise fine details.

More diverse and coherent results:

Existing AI image generators still struggle with generating diverse and coherent results
consistently. In other words, they sometimes lack the ability to represent a broader range of
styles and may generate images with inconsistencies or inaccuracies. In the future, AI image
generators will likely produce more diverse and consistent images while reducing these
common issues, leading to better alignment with users' expectations and requirements.

Better integration with existing tools:

Future AI image generators are likely to seamlessly integrate with various existing design
and development tools, enabling creatives to work more efficiently and add AI-powered
image generation functionalities to their workflows. This will remove any significant effort
required to implement AI image generation in applications, such as utilizing the capabilities
of tools like App Master platform for backend, web, and mobile applications.

Real-time image generation:

As computational power continues to improve, AI image generators will eventually be able


to create high-quality images in real-time. This low-latency image generation will open
doors for developers of real-time applications like video games, augmented reality (AR)and
virtual reality (VR) experiences, enabling them to enrich their applications with unique AI-
generated graphics and assets.

12

Downloaded by Rohit Kumar


2. SYSTEM DEVELOPMENT

2.1 TEXT DESCRIPTION TO EMBEDDING

The very first step involved in training our model is to convert the text to an embedding.
Neural networks work on vectors and numbers and cannot essentially do anything if the
input format is a text. So, the very first thing we do is utilise a Long Short-Term Memory
(LSTM) network which will take in the input as a pre-processed text after removing
unnecessary space and improving semantics using standard text pre-processing libraries like
spacy and converting the text description into a vector of numbers which is then given as an
input to a pre-trained LSTM and the last layer is taken out which is essentially the word
embedding that we are looking for.

13

Downloaded by Rohit Kumar


Why Word Embedding

Why exactly do we need to convert our sentence into an Embedding and not just a one hot
Encoded vector. To Understand that let us take a very simple Example where in once we
represent the Words as one hot encoded Vectors and in the other, we Use an Embedding
Matrix

The issue with representing words like this is:

1. Each word is a very high dimensional vector

2. Those Vectors do not have any kind of relation among them that a model can Learn and
it becomes very difficult for it to learn when it cannot even understand the relation between
words Now let us Represent them in an Embedding

14

Downloaded by Rohit Kumar


When Represented like this the embedding for each vector has a meaning. When
representing these in Euclidean Space we will see that The Two Fruits are closer to each
other while the King and Queen are very similar to each other in many respects except one
which could be Gender. It is not pre-decided on what features the model should learn but
during the process model Itself decides the best values that reduce the loss and in process it
learns the embedding That makes more sense to it.

15

Downloaded by Rohit Kumar


Long Short-Term Memory Network or LSTM is a type of Recurrent Neural Network that
are very good for processing of long sentences because of its ability to learn long term
dependencies within the text by modifying the weight of its gate cells. RNN Typically suffer
with a problem that they can’t remember the proper dependencies When processing text
whose length is long. To illustrate that problem, we will demonstrate Using a very simple
Series. Suppose you are being provided with a series and you have to tell the next number
Example 1) 2->4->6->8

Example 2) 2->4->8

Now in both the series three numbers are common and we know the first series is a Multiple
of 2 while the second one is a power of 2. But when we pass the numbers to A Model the
last input that it gets in both cases is 8 so how should the model distinguish Between both
the series. It should essentially have previous pattern information combined with the current
input to output the correct result. But when the sequence gets longer in Length an RNN fails
to factor the previous information properly as with no proper mechanism to deal with
degrading gradients and at the end it is unable to do any kind of learning This is the problem
that LSTM were built to solve. An LSTM has additional gates that help It properly retains
the information throughout the input. However Not all information is Important every time.
As we go deeper into the sequence the chances that the next output Depends on a very old
input is very less and that is where the forget gate of LSTM comes into action. At every
Step of input in a sequence an LSTM remodifies the weight of the gates using
backpropagation. In a very simple way, it helps it to determine what kind of inputs are
important at the current step to predict the next word/element in a sequence. While the
forget gate determines how much every input it has seen earlier in the sequence is important,
the input gate helps to decide and update what information to keep and using combination of
these it is able to retain information even in a long sentence and able to overcome the
problems that arise with Recurrent Networks. The beauty of LSTM is that even a very
shallow LSTM model can understand the structure of a sentence very well due to the large
number of parameters that it has and its very unique configuration of the three gates.

16

Downloaded by Rohit Kumar


2.2 PRE-PROCESSING THE IMAGES

Mean and Standard Deviation for Proper Normalisation of Data:

We need to properly process the data before passing to the model as this will Determine the
level of accuracy that we can reach Instead of Using the 0 mean and standard Deviation of
1, we can compute the mean and standard deviation for each channel easily. for the current
dataset the mean comes out to be [0.484,0.451,0.406] and the standard deviation comes to
be [0.231,0.244,0.412].

Data Augmentation

Data Augmentation will help us to create more data to feed in to the model and help it to
generalise well by letting it see the data in various orientations. We create our own
transformation using NumPy. Here are some of the Augmentation that we will be
implementing

•Random Flip (Horizontal and Vertical)

•Random 90-degree Rotations

•Adding Lightening to the visual channels

Combining the random flip and random rotation we have come up with the 8 dihedral
transformations that could be applied to any number of channels and on any kind of dataset
as could be seen in the code snippet we first start by creating a function which takes in an
input x as a tensor (Matrix Representation of Our Image) and a mode. We do not want to
apply these image augmentations when we are in validation mode and testing the entire
thing out in training mode, we need to randomly apply these transforms. We use the
python’s default random number generator to determine what kind of transformations would
be randomly applied to the image.

17

Downloaded by Rohit Kumar


To flip the image horizontally we first convert our tensor into a NumPy array and then use
the NumPy flipper function to flip the array horizontally and flip up to flip the array
vertically. To rotate the image, we generate a random number k between 0 and 3 which
determines how many 90-degree rotations of the array we will do. The following dihedral
transformations could be formed after this step

•Horizontal Flip + Any of three 90-degree Rotations

•Horizontal Flip with No Rotations

•Vertical Flip + Any of three 90-degree Rotations

•Vertical Flip with No Rotations

18

Downloaded by Rohit Kumar


2.3 CREATING CUSTOMISED GENERATOR MODEL

The way a standard Generator Model Works is that it takes in some input and by a series of
Up sampling or Deconvolution operations, it creates the Image. The only issue with that is
while generating the final output it takes into account is the Information from the previous
layer which are very ideal for tasks like Classification and Bounding Box Regression. But
when dealing with Image Generation we should also keep into account the original input
constraints without much processing along with the Information in the last layer as it will
not only help the gradient flow better but also help converge the model faster

In the code snippet above we create our customised generator model from scratch using
pytorch. We start off by declaring the class and then initialising

19

Downloaded by Rohit Kumar


the architecture within it.to properly use of pytorch’s inbuilt neural network layers we need
to use super to inherit the properties of the base class we start off by declaring a
convtranspose2d which essentially takes in the input embedding and starts by doubling
along the height and width and reducing along the channel direction we add a dropout to
increase regularization which not only deals with overfitting the model on the training but
also helps the model generalise on the input features well this is followed by two
convolutional blocks, one doubling along the channel dimension and the other one taking in
that input and again reducing it back to original channel dimensions without any change in
any other dimensions. This was done as in our practical implementations this trick worked
out well now comes the major step of producing the final image. As we stated earlier that
we also need to add in the original embedding directly. But the issue with that is embedding
has different dimensions altogether. To resolve that we use a simple up sampling operation
to bring the embedding to proper dimension before adding it to the output of last layer. In
terms of equations, we can see it as let the input be x and desired output be h(x)

where F(x)=Conv Blocks + Non Linearities Instead of hoping the function to fit to a desired
mapping we can specify a residual mapping and let model reduce it and optimise it so as to
bring it closer to our desired output h(x)

20

Downloaded by Rohit Kumar


21

Downloaded by Rohit Kumar


2.4 TRAINING THE MODEL

The training process of a Generative Adversarial Network is a bit complicated than training
a normal Neural Network as it involves training the discriminator and the generator in an
alternating fashion.

Step 1: Train the discriminator on original dataset and some random noise to get the
discriminator an edge in identifying real images from random noise. This step is very
important in the beginning as if the discriminator doesn’t already know to some extent what
the real dataset should look like. When we use the loss function to the generator it will give
essentially a lower loss than it should which slows down the initial training. The training
eventually stabilises if we do not train the discriminator first properly but that takes a lot of
time. by doing this we are decreasing the training time of the model.

Here is the algorithm:

Step 2: After the discriminator has been initially trained for a while, we start by making a
forward pass through the modified generator model and get in a random image initially and
a high loss function, which is then backpropagated throughout the entire network in order to
update and fine tune its internal parameters. The generated images are stored in a temporary
variable and are passed on to the discriminator in its next phase. There might be a chance
where our GAN is not finding the equilibrium between the discriminator and generator. This
graph shows us the loss for the discriminator in the blue colour and loss for the generator in
the orange colour that is both is heading towards zero in the initial phase of the training. It is
possible when GAN is not stable.

22

Downloaded by Rohit Kumar


This graph shows us the accuracy by the discriminator:

Here the accuracy of discriminator is 100% which means our GAN is perfectly identifying
that weather the image is real or fake. This is the most common failure and it is called
convergence failure.

23

Downloaded by Rohit Kumar


3 PERFORMANCE ANALYSIS

EVALUATION:

It is not easy to evaluate the performance of a generative model based on any metric and
mostly humans at the end have to decide whether the generated content is good or not and
whether or not it holds any particular meaning. However, we can judge the discriminator
model in its ability to distinguish real images from fake ones. now compared to ordinary
convolutional models where high accuracy means better results it isn’t true in the case of
generative models. If the discriminator has very high accuracy in distinguishing real images
from fake ones, then that implies generator hasn’t done a very good job in creating images
that represent the dataset well. In the situation of a perfect equilibrium the discriminator
should have an accuracy of 50% that is it has to take a random guess to determine whether
the generated image is fake or not implying the generator has created images so good that
are indistinguishable from the original images. The closer an accuracy is to 50% the better
task the generator has done in creating images.

24

Downloaded by Rohit Kumar


The above graph shows us the loss that is the discriminator loss for the real images in blue
colour and discriminator loss for fake images in orange colour and generator loss for the
generated images in the green colour. This is an expected loss during this training and it will
stabilize after around 100 to 300 epochs. The discriminator loss for the real and the fake
images is approximately 50% and the generator loss for the generated images is between to
50% to 70 %.

Accuracy graph:

This graph shows us the accuracy by the discriminator for the real images that is in blue
colour and for the fake images in orange colour. This GAN model will get stabilized in 100
to 300 epochs and after that it will give us an accuracy approximately in between 70% to
80% and it will remain stabilized after that.

25

Downloaded by Rohit Kumar


4 ADVANTAGES AND DISADVANTAGES:

4.1 ADVANTAGES:

1. Enhanced Efficiency and Productivity:

AI-generated photos offer a significant advantage in terms of enhanced efficiency and


productivity in the creative process. By leveraging AI algorithms, the photo generation
process can be automated, resulting in time and effort savings for photographers and content
creators. Traditionally, photographers spend hours manually editing and retouching photos
to achieve the desired quality and aesthetics. However, with AI-generated photos, the need
for extensive manual editing is reduced. AI algorithms can quickly analyse and process
images, applying enhancements and adjustments to generate high-quality images with
minimal human intervention.

The automation of the photo generation process enables photographers and content creators
to focus more on the artistic aspects of their work. With AI handling the technical aspects,
such as colour correction, lighting adjustments, and image enhancement, photographers can
dedicate their time and energy to capturing unique perspectives, exploring creative concepts,
and telling compelling visual stories. This increased efficiency allows for a streamlined
workflow, enabling photographers to produce a larger volume of high-quality images in a
shorter amount of time.

2. Improved Image Quality and Aesthetics:

AI algorithms excel in analysing and enhancing photos, leading to a remarkable


improvement in image quality and aesthetics. With the ability to adjust lighting, correct
colour grading, and optimize composition, AI-generated photos capture attention with their
visually stunning and captivating characteristics. The precision and accuracy of AI
algorithms allow photographers to achieve a level of perfection that may be challenging to
attain manually.

One of the key advantages of AI-generated photos is their ability to enhance image quality.
AI algorithms can intelligently analyse each aspect of the photo, identifying areas that
require improvement and applying enhancements accordingly. This includes adjusting
lighting conditions to enhance details and contrast, correcting colour grading to achieve

26

Downloaded by Rohit Kumar


accurate and vibrant tones, and optimizing composition to create visually pleasing images.
The result is a polished and visually appealing photo that captivates viewers and conveys the
desired message effectively. The AI-powered enhancement process not only saves time and
effort for photographers but also ensures consistent and high-quality results across a large
volume of images.

3. Accessibility and Affordability:

AI-generated photos offer a significant advantage in terms of accessibility, making


professional-quality imagery more attainable for individuals and businesses with limited
resources. Traditional photography equipment and professional editing tools can be
prohibitively expensive, requiring a substantial financial investment. In contrast, AI-
powered photo generation provides a cost-effective alternative that opens doors for aspiring
photographers on a tighter budget. By leveraging AI technology, individuals without
extensive photography experience or expensive equipment can now produce high-quality
photos that rival the work of seasoned professionals.

Furthermore, AI-generated photos offer a simplified and streamlined approach to image


creation, removing the need for extensive technical expertise. Traditional photography often
requires a deep understanding of complex camera settings, lighting techniques, and post-
processing tools. AI, on the other hand, simplifies the process by automating many of these
tasks.

27

Downloaded by Rohit Kumar


4.2 DISADVANTAGES

1. Lack of Authenticity and Originality:

One of the main concerns with AI-generated photos is the potential lack of authenticity and
originality. While AI algorithms excel at analysing patterns and generating visually pleasing
images, they may lack the human touch and personal expression found in traditional
photography. AI-generated photos can sometimes appear generic or formulaic, lacking the
unique perspective and creativity that photographers bring to their work. This raises
questions about the uniqueness and artistic value of AI-generated visuals.

2. Ethical Considerations and Bias:

The use of AI algorithms in generating photos also raises ethical considerations. Privacy and
consent issues may arise when AI is used to manipulate or generate images of individuals
without their knowledge or consent. Moreover, AI algorithms can inherit biases present in
the data they are trained on, potentially perpetuating societal biases or stereotypes in the
generated photos. Responsible and transparent AI practices is essential to address these
ethical concerns and ensure the fair and unbiased use of AI-generated photos.

3. Dependency on Data Quality and Training

The performance of AI algorithms heavily relies on the quality and diversity of the training
data they are provided. If the training data is biased or lacks diversity, the generated photos
may inherit these limitations. AI algorithms need access to a wide range of high-quality
training data to produce accurate and representative results. Ensuring that the training
datasets are comprehensive and inclusive is crucial for overcoming biases and achieving the
desired level of diversity in the generated photos.

28

Downloaded by Rohit Kumar


4 FINAL WEB-APP PRODUCTION
We Develop a Web App using flask that presents the user. The user with a web app and an
option to input the text and choose a model from various inference Models that we have
trained. On clicking generate Image, the request is processed in the backend in python and a
resultant image using the model is created and we hit another flask endpoint where the
generated Image is displayed. The following end points have been created in our existing
flask application:

1) Home Route:

At the home endpoint or route, we redirect to the page where the user can provide with an
input if the model has been loaded successfully without any error. If the chosen model
cannot be loaded properly, we redirect to a route describing the error.

2) Generate Route:

After the user successfully enters a text, it is pre-processed into a vector and passed on to
our LSTM model that generates the word embedding. The embedded vector is then passed
to the loaded generator model and is saved onto a location using timestamp as the file name.

3) Result Route:

After the Image has been successfully generated, we redirect the application to a page
which displays the generated image.

4) Error Route:

The default route in case any error exists.

29

Downloaded by Rohit Kumar


30

Downloaded by Rohit Kumar


5 CONCLUSIONS
In this project we have created a web app that can take in text description of a flower and
bird and generate images based on that. And while doing that we have modified the
generator architecture in such a way that we have reduced the training time of GAN.

In this paper, a brief image generation review is presented. The existing images generation
approaches have been categorized based on the data used as input for generating new
images including images, hand sketch, layout and text. In addition, we presented the
existing works of conditioned image generation which is a type of image generation while a
reference is exploited to generate the final image. An effective image generation method is
related to the dataset used which must be a large-scale one. For that, we summarize popular
benchmark datasets used for image generation techniques. The evaluation metrics for
evaluating various methods is presented. Based on these metrics as well as dataset used for
training, a tabulated comparison is performed. Then, a summarization of the current image
generation challenges is presented.

31

Downloaded by Rohit Kumar


6 REFRENCES
[1] M. Arlovski and L. Bottou. Towards principled methods for training generative
adversarial networks. In ICLR, 2017.

[2] A. Brock, T. Lim, J. M. Ritchie, and N. Weston. Neural photo editing with introspective
adversarial networks. In ICLR, 2017.

[3] T. Che, Y. Li, A. P. Jacob, Y. Bagnio, and W. Li. Mode regularized generative
adversarial networks. In ICLR, 2017.

[4] X. Chen, Y. Duan, R. Houthooft, J. Schulman, I. Sutskever, and P. Abbeel. Infogan:


Interpretable representation learning by information maximizing generative adversarial nets.
In NIPS, 2016.

[5] E. L. Denton, S. Chintala, A. Szlam, and R. Fergus. Deep generative image models
using a laplacian pyramid of adversarial networks. In NIPS, 2015.

[6] C. Doersch. Tutorial on variational autoencoders. arXiv:1606.05908, 2016.

[7] J. Gauthier. Conditional generative adversarial networks for convolutional face


generation. Technical report, 2015.

[8] I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. C.


Courville, and Y. Bengio. Generative adversarial nets. In NIPS, 2014.

[9] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In
CVPR, 2016. [10] X. Huang, Y. Li, O. Poursaeed, J. Hopcroft, and S. Belongie. Stacked
generative adversarial networks. In CVPR, 2017

32

Downloaded by Rohit Kumar


33

Downloaded by Rohit Kumar


34

Downloaded by Rohit Kumar

You might also like