0% found this document useful (0 votes)

2 views

ai-image-generator

The document presents a project report on a Text to Image Generator using Artificial Intelligence, submitted for a Bachelor of Engineering degree at the University of Mumbai. The project focuses on utilizing Generative Adversarial Networks (GANs) to generate unique images based on textual descriptions, addressing challenges such as training time and model convergence. The final output will be a web application that allows users to input text and receive generated images, showcasing the potential of AI in content creation.

Uploaded by

rohitkumar22218

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

ai-image-generator

Uploaded by

rohitkumar22218

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 37

Ai image generator

Computer Engineering (University of Mumbai)

Scan to open on Studocu

Studocu is not sponsored or endorsed by any college or university

Downloaded by Rohit Kumar
A
Project Report
On

TEXT TO IMAGE GENERATOR

USING ARTIFICIAL
INTELLIGENCE
Submitted in partial fulfilment of the requirements
For the degree of

Bachelor of Engineering in
Computer
By
Name Roll no.
Yash Ahire 02
Prashant Bhalerao 04
Chetan Parse 42
Jishan Shaikh 47
Supervisor
Prof. (Deepa Athawale)

Technology Personified
Department of Computer Engineering
Innovative Engineers’ and Teachers’ Education
society’s
Bharat College of Engineering
Badlapur: - 421504
(Affiliated to University of Mumbai)
(2023-24)

Downloaded by Rohit Kumar

Technology Personified
Bharat College of Engineering
(Affiliated to the University of
Mumbai) Badlapur:- 421504.

CERTIFICATE
This is to certify that, the Project titled

“IMAGE GENERATOR USING AI”

is a bona-fide work done by

Yash Ahire
Prashant Bhalerao
Chetan Parse
Jishan Shaikh

and is submitted in the partial full-filament of the requirement for the degree of
Bachelor of Engineering in Computer
To the University of Mumbai

Supervisor External
Prof. (Deepa Athawale) Prof.( )

Subject Incharge Head of Department Principal

(Prof.Archana Bhaware) (Prof. Deepa Athawale) (Prof.Siddhartha Ladhake)

Downloaded by Rohit Kumar

ABSTRACT
Synthetic Content Generation Using Machines is a very trending topic in the field of Deep
Learning and it is an extremely difficult task even for the state-of the-Art ML algorithms. The
upside of Using Deep Learning to do this is that it can generate Content that does not exist
yet. In the recent past Generative Adversarial Networks (GAN) have shown great promise
when it comes to generating images but they are difficult to train and condition on any
particular input which acts as a downside for them. However, they have tremendous
applications in generating content in an unsupervised learning approach like generating
video, Increasing the resolution of Images or Generating Images from Text. In this project we
look at generating 64*64 Images on the fly using a text as an Input. The images generated
will be unique in terms that they do not already exist and in doing that we will improve upon
already existing Architecture models and try to reduce the difficulties that come with training
GAN Models like Reduced Training Time and Better Convergence of The Model.
The Final Project will be a WebApp, where you can Input a Text and a Synthetic Image Will

be generated Based on the Description of the Text

Downloaded by Rohit Kumar

INDEX

S.NO CONTENTS PAGE NO.

ABSTRACT

1. INTRODUCTON 3

2. SYSTEM DEVELOPMENT 12

3. PERFORMANCE ANALYSIS
23

4. ADVANTAGES AND
DISADVANTAGE 25

5. FINAL WEB-APP PRODUCTION 28

6. CONCLUSION 30

7. REFERENCES 2

Downloaded by Rohit Kumar

1. INTRODUCTION

1.1 INTRODUCTION
For a human mind it is very easily too thin of new content. what if someone asks you to
“draw a flower with blue petals”. It is very easy for us to do that. but machines process
information very differently. Just understanding the structure of the above sentence is a
difficult task for them let alone generate something based on that description. Automatic
synthetic content generation is a field that has been explored in the past and was discredited
because at that time neither the algorithms existed nor enough processing power that could
help solve the problem. However, the advent of deep learning started changing the earlier
beliefs. The tremendous power of neural networks to capture the features even in the
humongous of datasets makes them a very viable candidate for automatic content
generation. another milestone was achieved when Ian Good Fellow proposed generative
adversarial networks in 2014. GANs are a kind of architecture in Deep learning that can
produce content from random noise. What is even more unique about GANs is that the
content they create represents the dataset on which they are being trained upon but it is
totally unique in some way or the other. Generating an image from a text-based description
is one aspect of generative adversarial networks that we will focus upon. Since the GANs
follow unsupervised learning approach we have modified them to take am input as a
condition and generate based on the input condition. This can form base for a large number
of things like synthetic audio generation like the ones used in Siri or assistant, video content
generation from just scripts. imagine entire movies made out of just the script. These are
some uses that many companies are researching about. modifying GANs and applying
conditions on them isn’t limited to just generating images, we can use it to create passwords
that are very hard to crack and numerous similar applications like this

Deep Learning and Content Generation

Deep Learning is a field that utilises and relies completely on Various
Flavours of Neural Networks to Extract Insights from the data and find patterns
among that data. While it has been shown to be very successful in things like Image
Classification (In some datasets even beating human level accuracy by a large
margin) and Time Series Analysis (There are so many factors involved that it
even becomes difficult for a human to take all those into account), A completely
different Aspect of it has been started to explore.

Downloaded by Rohit Kumar

The big Question Being
"Can We use Deep Learning to Generate Content?"
As we know Neural Networks can extract features of a dataset that they have
been trained upon, the goal becomes using those features to create new data
points that do not belong in the dataset itself.

Generative Adversarial Networks

Generative Adversarial Networks (GAN’s) were created by Ian Good Fellow in 2014 in
an attempt to generate content instead of just representing it in a compact form and they
are the most successful kind of Deep Learning Models that are even remotely close to the
task. What does GAN do? Basically, it can be trained to generate data from a
scratch or random noise.
It consists of two building blocks:
1) Generator:
The task of the Generator is to take in some input and generate something out of it. In cases
of Images, it might take in some noise as input and generate an image which might not
mean anything Initially. It is simply the reverse of what a standard Convolutional Neural
Network (CNN) is. A CNN takes in input as an image and down samples it along the height
and width dimensions while increasing it along the channel dimension which acts as our
features essentially. What a Generator Does is it takes in a down sampled input and through
various Up sampling Operations Generates an Image. By comparing the real images and the
images that is generated by generator, GAN builds a discriminator that helps us to learn the
differences that makes that image real and then after it will provide feedback to generator
about the image that is to be generated next.

Downloaded by Rohit Kumar

2) Discriminator:
Generator alone will just generate something random, so basically discriminator will give
guidance to generator on what images it should create. Discriminator is nothing more than a
simple convolutional neural network that takes in an image as an input and determines
whether the image came from the original dataset or is it an image generated by the
generator. Simply taking in an image as an input it determines whether it is real or fake
(Synthetically Generated by Generator).

Downloaded by Rohit Kumar

1.1 PROBLEM STATEMENT
Generating Images from Text is a very difficult problem that can be approached by using
Generative Adversarial Networks and will be extremely useful for content creators
wherein they can type a description and have the type of content generated automatically
saving them a lot of money and work. Imagine Thinking about a Description and having to
draw something that matches the description in a meaningful way. It’s even a difficult task
for humans. But Deep Learning Can Understand the Underlying Structure of The Content
and might be able to generate that automatically. Thereby eliminating the need of domain
expertise. GANs despite having all the upside for content generation are very difficult to
train and take a lot of time to converge and are unstable doing the training process and in
this project, we also try to tackle these problems by modifying the Underlying Structure of
the GAN Model

Downloaded by Rohit Kumar

1.2 OBJECTIVE
The main objective of this project is to develop a web app in which a text can be inputted
and it outputs an image matching the description of the text and in doing so try to improve
upon the generator architecture of the Generative Adversarial Networks. By modifying the
input to a generator and applying conditions on the input we can create a model that
generates images not from noise but from a controlled input. In Our case the Controlled
Input Being the text that is Embedded after passing onto another Neural Network

Downloaded by Rohit Kumar

1.3 METHODOLOGY

We first start by downloading the Oxford 102 Dataset which contains 102 different
Categories of flowers and also contains annotation for each image in the form of a text
Description.

Downloaded by Rohit Kumar

After this we download on more data set that is CUB dataset that contains 200 bird species
with almost 11700 images.

Downloaded by Rohit Kumar

Next, We Begin importing all the packages and the sub packages and splitting it into the
training, Validation and testing set. The following packages and libraries are being used to
process The Dataset and build the architectures:

•NumPy

•Pytorch

•OpenCV

•Flask

We first start by downloading and pre-processing the dataset. During the pre-processing
phase we convert text into embedding and normalize the images so they are ready to be
passed onto respective models We then start to build our Customised Generator Model and
use a standard Pre-trained Model as the Discriminator After the model Creation we create a
training script and take in some best practices in the field of Deep Learning to train the
model with stability using our customised Pytorch Trainer. The Final task is to wrap up the
final trained model into a Flask Web App so that Testing beco

Downloaded by Rohit Kumar

1.4 FUTURE SCOPE
As AI and machine learning technologies continue to develop, the capabilities of AI image
generators will undoubtedly improve and expand. Here are some potential future
developments and innovations that can take AI image generation to new heights:

Improved algorithms:

As researchers and developers identify new techniques and approaches to improve AI image
generation, the algorithms employed by these tools will become more advanced and
efficient. It is anticipated that future versions of AI image generators will generate more
realistic and high-quality images, with fewer artifacts and more precise fine details.

More diverse and coherent results:

Existing AI image generators still struggle with generating diverse and coherent results
consistently. In other words, they sometimes lack the ability to represent a broader range of
styles and may generate images with inconsistencies or inaccuracies. In the future, AI image
generators will likely produce more diverse and consistent images while reducing these
common issues, leading to better alignment with users' expectations and requirements.

Better integration with existing tools:

Future AI image generators are likely to seamlessly integrate with various existing design
and development tools, enabling creatives to work more efficiently and add AI-powered
image generation functionalities to their workflows. This will remove any significant effort
required to implement AI image generation in applications, such as utilizing the capabilities
of tools like App Master platform for backend, web, and mobile applications.

Real-time image generation:

As computational power continues to improve, AI image generators will eventually be able

to create high-quality images in real-time. This low-latency image generation will open
doors for developers of real-time applications like video games, augmented reality (AR)and
virtual reality (VR) experiences, enabling them to enrich their applications with unique AI-
generated graphics and assets.

Downloaded by Rohit Kumar

2. SYSTEM DEVELOPMENT

2.1 TEXT DESCRIPTION TO EMBEDDING

The very first step involved in training our model is to convert the text to an embedding.
Neural networks work on vectors and numbers and cannot essentially do anything if the
input format is a text. So, the very first thing we do is utilise a Long Short-Term Memory
(LSTM) network which will take in the input as a pre-processed text after removing
unnecessary space and improving semantics using standard text pre-processing libraries like
spacy and converting the text description into a vector of numbers which is then given as an
input to a pre-trained LSTM and the last layer is taken out which is essentially the word
embedding that we are looking for.

Downloaded by Rohit Kumar

Why Word Embedding

Why exactly do we need to convert our sentence into an Embedding and not just a one hot
Encoded vector. To Understand that let us take a very simple Example where in once we
represent the Words as one hot encoded Vectors and in the other, we Use an Embedding
Matrix

The issue with representing words like this is:

1. Each word is a very high dimensional vector

2. Those Vectors do not have any kind of relation among them that a model can Learn and
it becomes very difficult for it to learn when it cannot even understand the relation between
words Now let us Represent them in an Embedding

Downloaded by Rohit Kumar

When Represented like this the embedding for each vector has a meaning. When
representing these in Euclidean Space we will see that The Two Fruits are closer to each
other while the King and Queen are very similar to each other in many respects except one
which could be Gender. It is not pre-decided on what features the model should learn but
during the process model Itself decides the best values that reduce the loss and in process it
learns the embedding That makes more sense to it.

Downloaded by Rohit Kumar

Long Short-Term Memory Network or LSTM is a type of Recurrent Neural Network that
are very good for processing of long sentences because of its ability to learn long term
dependencies within the text by modifying the weight of its gate cells. RNN Typically suffer
with a problem that they can’t remember the proper dependencies When processing text
whose length is long. To illustrate that problem, we will demonstrate Using a very simple
Series. Suppose you are being provided with a series and you have to tell the next number
Example 1) 2->4->6->8

Example 2) 2->4->8

Now in both the series three numbers are common and we know the first series is a Multiple
of 2 while the second one is a power of 2. But when we pass the numbers to A Model the
last input that it gets in both cases is 8 so how should the model distinguish Between both
the series. It should essentially have previous pattern information combined with the current
input to output the correct result. But when the sequence gets longer in Length an RNN fails
to factor the previous information properly as with no proper mechanism to deal with
degrading gradients and at the end it is unable to do any kind of learning This is the problem
that LSTM were built to solve. An LSTM has additional gates that help It properly retains
the information throughout the input. However Not all information is Important every time.
As we go deeper into the sequence the chances that the next output Depends on a very old
input is very less and that is where the forget gate of LSTM comes into action. At every
Step of input in a sequence an LSTM remodifies the weight of the gates using
backpropagation. In a very simple way, it helps it to determine what kind of inputs are
important at the current step to predict the next word/element in a sequence. While the
forget gate determines how much every input it has seen earlier in the sequence is important,
the input gate helps to decide and update what information to keep and using combination of
these it is able to retain information even in a long sentence and able to overcome the
problems that arise with Recurrent Networks. The beauty of LSTM is that even a very
shallow LSTM model can understand the structure of a sentence very well due to the large
number of parameters that it has and its very unique configuration of the three gates.

Downloaded by Rohit Kumar

2.2 PRE-PROCESSING THE IMAGES

Mean and Standard Deviation for Proper Normalisation of Data:

We need to properly process the data before passing to the model as this will Determine the
level of accuracy that we can reach Instead of Using the 0 mean and standard Deviation of
1, we can compute the mean and standard deviation for each channel easily. for the current
dataset the mean comes out to be [0.484,0.451,0.406] and the standard deviation comes to
be [0.231,0.244,0.412].

Data Augmentation

Data Augmentation will help us to create more data to feed in to the model and help it to
generalise well by letting it see the data in various orientations. We create our own
transformation using NumPy. Here are some of the Augmentation that we will be
implementing

•Random Flip (Horizontal and Vertical)

•Random 90-degree Rotations

•Adding Lightening to the visual channels

Combining the random flip and random rotation we have come up with the 8 dihedral
transformations that could be applied to any number of channels and on any kind of dataset
as could be seen in the code snippet we first start by creating a function which takes in an
input x as a tensor (Matrix Representation of Our Image) and a mode. We do not want to
apply these image augmentations when we are in validation mode and testing the entire
thing out in training mode, we need to randomly apply these transforms. We use the
python’s default random number generator to determine what kind of transformations would
be randomly applied to the image.

Downloaded by Rohit Kumar

To flip the image horizontally we first convert our tensor into a NumPy array and then use
the NumPy flipper function to flip the array horizontally and flip up to flip the array
vertically. To rotate the image, we generate a random number k between 0 and 3 which
determines how many 90-degree rotations of the array we will do. The following dihedral
transformations could be formed after this step

•Horizontal Flip + Any of three 90-degree Rotations

•Horizontal Flip with No Rotations

•Vertical Flip + Any of three 90-degree Rotations

•Vertical Flip with No Rotations

Downloaded by Rohit Kumar

2.3 CREATING CUSTOMISED GENERATOR MODEL

The way a standard Generator Model Works is that it takes in some input and by a series of
Up sampling or Deconvolution operations, it creates the Image. The only issue with that is
while generating the final output it takes into account is the Information from the previous
layer which are very ideal for tasks like Classification and Bounding Box Regression. But
when dealing with Image Generation we should also keep into account the original input
constraints without much processing along with the Information in the last layer as it will
not only help the gradient flow better but also help converge the model faster

In the code snippet above we create our customised generator model from scratch using
pytorch. We start off by declaring the class and then initialising

Downloaded by Rohit Kumar

the architecture within it.to properly use of pytorch’s inbuilt neural network layers we need
to use super to inherit the properties of the base class we start off by declaring a
convtranspose2d which essentially takes in the input embedding and starts by doubling
along the height and width and reducing along the channel direction we add a dropout to
increase regularization which not only deals with overfitting the model on the training but
also helps the model generalise on the input features well this is followed by two
convolutional blocks, one doubling along the channel dimension and the other one taking in
that input and again reducing it back to original channel dimensions without any change in
any other dimensions. This was done as in our practical implementations this trick worked
out well now comes the major step of producing the final image. As we stated earlier that
we also need to add in the original embedding directly. But the issue with that is embedding
has different dimensions altogether. To resolve that we use a simple up sampling operation
to bring the embedding to proper dimension before adding it to the output of last layer. In
terms of equations, we can see it as let the input be x and desired output be h(x)

where F(x)=Conv Blocks + Non Linearities Instead of hoping the function to fit to a desired
mapping we can specify a residual mapping and let model reduce it and optimise it so as to
bring it closer to our desired output h(x)

Downloaded by Rohit Kumar

2.4 TRAINING THE MODEL

The training process of a Generative Adversarial Network is a bit complicated than training
a normal Neural Network as it involves training the discriminator and the generator in an
alternating fashion.

Step 1: Train the discriminator on original dataset and some random noise to get the
discriminator an edge in identifying real images from random noise. This step is very
important in the beginning as if the discriminator doesn’t already know to some extent what
the real dataset should look like. When we use the loss function to the generator it will give
essentially a lower loss than it should which slows down the initial training. The training
eventually stabilises if we do not train the discriminator first properly but that takes a lot of
time. by doing this we are decreasing the training time of the model.

Here is the algorithm:

Step 2: After the discriminator has been initially trained for a while, we start by making a
forward pass through the modified generator model and get in a random image initially and
a high loss function, which is then backpropagated throughout the entire network in order to
update and fine tune its internal parameters. The generated images are stored in a temporary
variable and are passed on to the discriminator in its next phase. There might be a chance
where our GAN is not finding the equilibrium between the discriminator and generator. This
graph shows us the loss for the discriminator in the blue colour and loss for the generator in
the orange colour that is both is heading towards zero in the initial phase of the training. It is
possible when GAN is not stable.

Downloaded by Rohit Kumar

This graph shows us the accuracy by the discriminator:

Here the accuracy of discriminator is 100% which means our GAN is perfectly identifying
that weather the image is real or fake. This is the most common failure and it is called
convergence failure.

Downloaded by Rohit Kumar

3 PERFORMANCE ANALYSIS

EVALUATION:

It is not easy to evaluate the performance of a generative model based on any metric and
mostly humans at the end have to decide whether the generated content is good or not and
whether or not it holds any particular meaning. However, we can judge the discriminator
model in its ability to distinguish real images from fake ones. now compared to ordinary
convolutional models where high accuracy means better results it isn’t true in the case of
generative models. If the discriminator has very high accuracy in distinguishing real images
from fake ones, then that implies generator hasn’t done a very good job in creating images
that represent the dataset well. In the situation of a perfect equilibrium the discriminator
should have an accuracy of 50% that is it has to take a random guess to determine whether
the generated image is fake or not implying the generator has created images so good that
are indistinguishable from the original images. The closer an accuracy is to 50% the better
task the generator has done in creating images.

Downloaded by Rohit Kumar

The above graph shows us the loss that is the discriminator loss for the real images in blue
colour and discriminator loss for fake images in orange colour and generator loss for the
generated images in the green colour. This is an expected loss during this training and it will
stabilize after around 100 to 300 epochs. The discriminator loss for the real and the fake
images is approximately 50% and the generator loss for the generated images is between to
50% to 70 %.

Accuracy graph:

This graph shows us the accuracy by the discriminator for the real images that is in blue
colour and for the fake images in orange colour. This GAN model will get stabilized in 100
to 300 epochs and after that it will give us an accuracy approximately in between 70% to
80% and it will remain stabilized after that.

Downloaded by Rohit Kumar

4 ADVANTAGES AND DISADVANTAGES:

4.1 ADVANTAGES:

1. Enhanced Efficiency and Productivity:

AI-generated photos offer a significant advantage in terms of enhanced efficiency and

productivity in the creative process. By leveraging AI algorithms, the photo generation
process can be automated, resulting in time and effort savings for photographers and content
creators. Traditionally, photographers spend hours manually editing and retouching photos
to achieve the desired quality and aesthetics. However, with AI-generated photos, the need
for extensive manual editing is reduced. AI algorithms can quickly analyse and process
images, applying enhancements and adjustments to generate high-quality images with
minimal human intervention.

The automation of the photo generation process enables photographers and content creators
to focus more on the artistic aspects of their work. With AI handling the technical aspects,
such as colour correction, lighting adjustments, and image enhancement, photographers can
dedicate their time and energy to capturing unique perspectives, exploring creative concepts,
and telling compelling visual stories. This increased efficiency allows for a streamlined
workflow, enabling photographers to produce a larger volume of high-quality images in a
shorter amount of time.

2. Improved Image Quality and Aesthetics:

AI algorithms excel in analysing and enhancing photos, leading to a remarkable

improvement in image quality and aesthetics. With the ability to adjust lighting, correct
colour grading, and optimize composition, AI-generated photos capture attention with their
visually stunning and captivating characteristics. The precision and accuracy of AI
algorithms allow photographers to achieve a level of perfection that may be challenging to
attain manually.

One of the key advantages of AI-generated photos is their ability to enhance image quality.
AI algorithms can intelligently analyse each aspect of the photo, identifying areas that
require improvement and applying enhancements accordingly. This includes adjusting
lighting conditions to enhance details and contrast, correcting colour grading to achieve

Downloaded by Rohit Kumar

accurate and vibrant tones, and optimizing composition to create visually pleasing images.
The result is a polished and visually appealing photo that captivates viewers and conveys the
desired message effectively. The AI-powered enhancement process not only saves time and
effort for photographers but also ensures consistent and high-quality results across a large
volume of images.

3. Accessibility and Affordability:

AI-generated photos offer a significant advantage in terms of accessibility, making

professional-quality imagery more attainable for individuals and businesses with limited
resources. Traditional photography equipment and professional editing tools can be
prohibitively expensive, requiring a substantial financial investment. In contrast, AI-
powered photo generation provides a cost-effective alternative that opens doors for aspiring
photographers on a tighter budget. By leveraging AI technology, individuals without
extensive photography experience or expensive equipment can now produce high-quality
photos that rival the work of seasoned professionals.

Furthermore, AI-generated photos offer a simplified and streamlined approach to image

creation, removing the need for extensive technical expertise. Traditional photography often
requires a deep understanding of complex camera settings, lighting techniques, and post-
processing tools. AI, on the other hand, simplifies the process by automating many of these
tasks.

Downloaded by Rohit Kumar

4.2 DISADVANTAGES

1. Lack of Authenticity and Originality:

One of the main concerns with AI-generated photos is the potential lack of authenticity and
originality. While AI algorithms excel at analysing patterns and generating visually pleasing
images, they may lack the human touch and personal expression found in traditional
photography. AI-generated photos can sometimes appear generic or formulaic, lacking the
unique perspective and creativity that photographers bring to their work. This raises
questions about the uniqueness and artistic value of AI-generated visuals.

2. Ethical Considerations and Bias:

The use of AI algorithms in generating photos also raises ethical considerations. Privacy and
consent issues may arise when AI is used to manipulate or generate images of individuals
without their knowledge or consent. Moreover, AI algorithms can inherit biases present in
the data they are trained on, potentially perpetuating societal biases or stereotypes in the
generated photos. Responsible and transparent AI practices is essential to address these
ethical concerns and ensure the fair and unbiased use of AI-generated photos.

3. Dependency on Data Quality and Training

The performance of AI algorithms heavily relies on the quality and diversity of the training
data they are provided. If the training data is biased or lacks diversity, the generated photos
may inherit these limitations. AI algorithms need access to a wide range of high-quality
training data to produce accurate and representative results. Ensuring that the training
datasets are comprehensive and inclusive is crucial for overcoming biases and achieving the
desired level of diversity in the generated photos.

Downloaded by Rohit Kumar

4 FINAL WEB-APP PRODUCTION
We Develop a Web App using flask that presents the user. The user with a web app and an
option to input the text and choose a model from various inference Models that we have
trained. On clicking generate Image, the request is processed in the backend in python and a
resultant image using the model is created and we hit another flask endpoint where the
generated Image is displayed. The following end points have been created in our existing
flask application:

1) Home Route:

At the home endpoint or route, we redirect to the page where the user can provide with an
input if the model has been loaded successfully without any error. If the chosen model
cannot be loaded properly, we redirect to a route describing the error.

2) Generate Route:

After the user successfully enters a text, it is pre-processed into a vector and passed on to
our LSTM model that generates the word embedding. The embedded vector is then passed
to the loaded generator model and is saved onto a location using timestamp as the file name.

3) Result Route:

After the Image has been successfully generated, we redirect the application to a page
which displays the generated image.

4) Error Route:

The default route in case any error exists.

Downloaded by Rohit Kumar

5 CONCLUSIONS
In this project we have created a web app that can take in text description of a flower and
bird and generate images based on that. And while doing that we have modified the
generator architecture in such a way that we have reduced the training time of GAN.

In this paper, a brief image generation review is presented. The existing images generation
approaches have been categorized based on the data used as input for generating new
images including images, hand sketch, layout and text. In addition, we presented the
existing works of conditioned image generation which is a type of image generation while a
reference is exploited to generate the final image. An effective image generation method is
related to the dataset used which must be a large-scale one. For that, we summarize popular
benchmark datasets used for image generation techniques. The evaluation metrics for
evaluating various methods is presented. Based on these metrics as well as dataset used for
training, a tabulated comparison is performed. Then, a summarization of the current image
generation challenges is presented.

Downloaded by Rohit Kumar

6 REFRENCES
[1] M. Arlovski and L. Bottou. Towards principled methods for training generative
adversarial networks. In ICLR, 2017.

[2] A. Brock, T. Lim, J. M. Ritchie, and N. Weston. Neural photo editing with introspective
adversarial networks. In ICLR, 2017.

[3] T. Che, Y. Li, A. P. Jacob, Y. Bagnio, and W. Li. Mode regularized generative
adversarial networks. In ICLR, 2017.

[4] X. Chen, Y. Duan, R. Houthooft, J. Schulman, I. Sutskever, and P. Abbeel. Infogan:

Interpretable representation learning by information maximizing generative adversarial nets.
In NIPS, 2016.

[5] E. L. Denton, S. Chintala, A. Szlam, and R. Fergus. Deep generative image models
using a laplacian pyramid of adversarial networks. In NIPS, 2015.

[6] C. Doersch. Tutorial on variational autoencoders. arXiv:1606.05908, 2016.

[7] J. Gauthier. Conditional generative adversarial networks for convolutional face

generation. Technical report, 2015.

[8] I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. C.

Courville, and Y. Bengio. Generative adversarial nets. In NIPS, 2014.

[9] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In
CVPR, 2016. [10] X. Huang, Y. Li, O. Poursaeed, J. Hopcroft, and S. Belongie. Stacked
generative adversarial networks. In CVPR, 2017

Downloaded by Rohit Kumar

How To Hack Atm
87% (15)
How To Hack Atm
1 page
Christopher Langan - CTMU, The Cognitive-Theoretic Model of The Universe, A New Kind of Reality Theory
88% (8)
Christopher Langan - CTMU, The Cognitive-Theoretic Model of The Universe, A New Kind of Reality Theory
56 pages
Gayle Laakmann McDowell - Cracking The Coding Interview - 189 Programming Questions and Solutions (2015, CareerCup)
81% (48)
Gayle Laakmann McDowell - Cracking The Coding Interview - 189 Programming Questions and Solutions (2015, CareerCup)
708 pages
Gödel, Escher, Bach - An Eternal Golden Braid (20th Anniversary Edition) by Douglas R. Hofstadter (Charm-Quark) PDF
100% (10)
Gödel, Escher, Bach - An Eternal Golden Braid (20th Anniversary Edition) by Douglas R. Hofstadter (Charm-Quark) PDF
821 pages
A Coomer's Guide To AI Dungeon
No ratings yet
A Coomer's Guide To AI Dungeon
30 pages
Chris Bailey - Hyperfocus - The New Science of Attention, Productivity, and Creativity-Viking (2018)
100% (25)
Chris Bailey - Hyperfocus - The New Science of Attention, Productivity, and Creativity-Viking (2018)
306 pages
AI Money Machine
100% (2)
AI Money Machine
267 pages
Generative AI For Beginners1
100% (1)
Generative AI For Beginners1
85 pages
Banana Pancakes - Ukulele Chord Chart
100% (1)
Banana Pancakes - Ukulele Chord Chart
2 pages
Gestalt
100% (3)
Gestalt
39 pages
75 Productivity Hacks - System Sunday
100% (7)
75 Productivity Hacks - System Sunday
75 pages
Awesome 2 Column Arpit-Bhayani PDF
No ratings yet
Awesome 2 Column Arpit-Bhayani PDF
1 page
MernStack Project
No ratings yet
MernStack Project
47 pages
Military Remote Viewing Manual
100% (5)
Military Remote Viewing Manual
72 pages
c2390573-3bbf-436a-9f9d-053ac5b9d8cd
No ratings yet
c2390573-3bbf-436a-9f9d-053ac5b9d8cd
30 pages
project 4 report(Rohit&Gayatri)
No ratings yet
project 4 report(Rohit&Gayatri)
36 pages
Synopsis Final Final
No ratings yet
Synopsis Final Final
39 pages
Google Aiml1
No ratings yet
Google Aiml1
29 pages
University Institute of Technology Rajiv Gandhi Proudyogikivishwavidyalaya Bhopal (M.P.)
No ratings yet
University Institute of Technology Rajiv Gandhi Proudyogikivishwavidyalaya Bhopal (M.P.)
44 pages
Final Project Report
No ratings yet
Final Project Report
32 pages
Handwritten Digit Recognizer Report
No ratings yet
Handwritten Digit Recognizer Report
48 pages
COMPUTER VISION PROJECT
No ratings yet
COMPUTER VISION PROJECT
33 pages
AIML Bangalore Brochure Jan2020 PDF
No ratings yet
AIML Bangalore Brochure Jan2020 PDF
12 pages
NTAL Report Format
No ratings yet
NTAL Report Format
9 pages
Object Detection - Deep Learning: Jamia Hamdard
No ratings yet
Object Detection - Deep Learning: Jamia Hamdard
26 pages
AI-Ticket Alarm (Team Kanishka)
No ratings yet
AI-Ticket Alarm (Team Kanishka)
29 pages
Arpit Bhayani
No ratings yet
Arpit Bhayani
5 pages
A Project Report Submitted To: TRAILERS (Application Development)
No ratings yet
A Project Report Submitted To: TRAILERS (Application Development)
13 pages
Getting Started With Deep Learning - GEC Rajkot
No ratings yet
Getting Started With Deep Learning - GEC Rajkot
1 page
A Facial Recognition System
No ratings yet
A Facial Recognition System
4 pages
Machine Learning (Aryan Kumar 7th Sem) PDF
No ratings yet
Machine Learning (Aryan Kumar 7th Sem) PDF
56 pages
AIML Brochure
No ratings yet
AIML Brochure
13 pages
Digit Final PDF
No ratings yet
Digit Final PDF
46 pages
ImageCaptioningfortheVisuallyImpaired1
No ratings yet
ImageCaptioningfortheVisuallyImpaired1
6 pages
Machine Learning
100% (2)
Machine Learning
58 pages
final yash reprty
No ratings yet
final yash reprty
21 pages
Prototype of ChatBot Short
No ratings yet
Prototype of ChatBot Short
7 pages
Kavya
No ratings yet
Kavya
38 pages
Srs Main Icg Akash
No ratings yet
Srs Main Icg Akash
22 pages
"Implementation On A 3D Image Password": International Journal of Scholarly Research (IJSR) Vol-1, Issue-1, 2017
No ratings yet
"Implementation On A 3D Image Password": International Journal of Scholarly Research (IJSR) Vol-1, Issue-1, 2017
7 pages
Case study On Prototyping Model
No ratings yet
Case study On Prototyping Model
17 pages
Lectue 2 - SE Motivation and Process
No ratings yet
Lectue 2 - SE Motivation and Process
30 pages
Ram Report
No ratings yet
Ram Report
35 pages
Designing Iot Face Recognition Robot: Shreenivas Telkar
No ratings yet
Designing Iot Face Recognition Robot: Shreenivas Telkar
5 pages
Department of Masters of Comp. Applications
No ratings yet
Department of Masters of Comp. Applications
10 pages
google ai-ml report-5J6[1][1]
No ratings yet
google ai-ml report-5J6[1][1]
52 pages
Programmer's Query Portal
No ratings yet
Programmer's Query Portal
30 pages
INTERNSHIP REPORT-vivek Payla
No ratings yet
INTERNSHIP REPORT-vivek Payla
20 pages
final report
No ratings yet
final report
14 pages
Cursor Manipulation
No ratings yet
Cursor Manipulation
22 pages
Micro-Project Report ON " ": Design Digital Clock
No ratings yet
Micro-Project Report ON " ": Design Digital Clock
19 pages
Internship-Report 32429
No ratings yet
Internship-Report 32429
31 pages
Report Final
No ratings yet
Report Final
21 pages
LP4 Ass6
No ratings yet
LP4 Ass6
14 pages
Internship Report
No ratings yet
Internship Report
43 pages
Project Report - Face Emotion Tracking
No ratings yet
Project Report - Face Emotion Tracking
12 pages
Indreni College Institute of Science and Technology: A Project Report On "Face Detection"
No ratings yet
Indreni College Institute of Science and Technology: A Project Report On "Face Detection"
10 pages
Deep Learningchap1
No ratings yet
Deep Learningchap1
20 pages
Intership Report
No ratings yet
Intership Report
21 pages
Google Aiml 3
No ratings yet
Google Aiml 3
29 pages
Digital Drawing Tools Using Java: A Project Report
No ratings yet
Digital Drawing Tools Using Java: A Project Report
27 pages
Sem 8 Project
No ratings yet
Sem 8 Project
131 pages
cd2cfb55-a8cf-4020-bd0b-25c18635d4cf
No ratings yet
cd2cfb55-a8cf-4020-bd0b-25c18635d4cf
17 pages
Image Caption Generator Report
No ratings yet
Image Caption Generator Report
27 pages
Google Aiml2
No ratings yet
Google Aiml2
33 pages
Image Watermarking: MR - Abhishek Chaube MR - Bharat Chauhan MR - Deepak Chandrasekharan
No ratings yet
Image Watermarking: MR - Abhishek Chaube MR - Bharat Chauhan MR - Deepak Chandrasekharan
37 pages
Final Report
No ratings yet
Final Report
51 pages
Industrial Training At
No ratings yet
Industrial Training At
22 pages
Mastering OpenCV Android Application Programming
From Everand
Mastering OpenCV Android Application Programming
Salil Kapur
No ratings yet
Getting Started with UDOO
From Everand
Getting Started with UDOO
Emanuele Palazzetti
No ratings yet
Internet of Things (IoT) A Quick Start Guide: A to Z of IoT Essentials
From Everand
Internet of Things (IoT) A Quick Start Guide: A to Z of IoT Essentials
Chitra Lele
No ratings yet
Applied Machine Learning Solutions with Python: SOLUTIONS FOR PYTHON, #1
From Everand
Applied Machine Learning Solutions with Python: SOLUTIONS FOR PYTHON, #1
rayaan
No ratings yet
Teas Topics To Study
100% (12)
Teas Topics To Study
6 pages
Test Ninjas Digital Sat Math Cheat Sheet
100% (4)
Test Ninjas Digital Sat Math Cheat Sheet
38 pages
My Ai Cheat List
100% (11)
My Ai Cheat List
3 pages
2045: The Year Man Becomes Immortal
No ratings yet
2045: The Year Man Becomes Immortal
9 pages
Attention Is All You Need
50% (2)
Attention Is All You Need
11 pages
Roadmap How To Learn AI in 2024 (Uncovered AI)
No ratings yet
Roadmap How To Learn AI in 2024 (Uncovered AI)
6 pages
Algebra Workbook
100% (3)
Algebra Workbook
299 pages
Rationality From AI To Zombies
86% (7)
Rationality From AI To Zombies
1,813 pages
If I Ain't Got You - Alicia Keys
No ratings yet
If I Ain't Got You - Alicia Keys
2 pages
Mythic Magazine #009
100% (3)
Mythic Magazine #009
27 pages
Situationalawareness 1 30
No ratings yet
Situationalawareness 1 30
30 pages
The Secrets of A Slot Machine
No ratings yet
The Secrets of A Slot Machine
4 pages
Yuval Noah Harari Argues That AI Has Hacked The Operating System of Human Civilisation
100% (1)
Yuval Noah Harari Argues That AI Has Hacked The Operating System of Human Civilisation
7 pages
Guide of Recruitment - Bethi Arun Kumar
100% (2)
Guide of Recruitment - Bethi Arun Kumar
21 pages
Psych Unit 7a Practice Quiz
No ratings yet
Psych Unit 7a Practice Quiz
4 pages
Current and Future Trends on AI Applications - Mohammed A Al-Sharafi
No ratings yet
Current and Future Trends on AI Applications - Mohammed A Al-Sharafi
456 pages
Improved Statistical Test
87% (171)
Improved Statistical Test
20 pages
CAN Bus - The Ultimate Guide
100% (3)
CAN Bus - The Ultimate Guide
114 pages
2022-A Multi-Modal Wildfire Prediction and Personalized Early-Warning
No ratings yet
2022-A Multi-Modal Wildfire Prediction and Personalized Early-Warning
11 pages
Optimal Hyperparameters For Deep LSTM-Networks For Sequence Labeling Tasks
No ratings yet
Optimal Hyperparameters For Deep LSTM-Networks For Sequence Labeling Tasks
34 pages
BTP Report On Text To Image Synthesis
No ratings yet
BTP Report On Text To Image Synthesis
62 pages
lstm-gru-notes
No ratings yet
lstm-gru-notes
8 pages
Cse-564 (Final Viva Voce Ppt)
No ratings yet
Cse-564 (Final Viva Voce Ppt)
32 pages
Air Pollution Prediction Using Long Short-Term Memory (LSTM) and Deep Autoencoder (DAE) Models
No ratings yet
Air Pollution Prediction Using Long Short-Term Memory (LSTM) and Deep Autoencoder (DAE) Models
17 pages
F8 DP 2017 Maurerova Veronika Thesis
No ratings yet
F8 DP 2017 Maurerova Veronika Thesis
133 pages
Shopee Code League Administrative Guide V2 PDF
No ratings yet
Shopee Code League Administrative Guide V2 PDF
15 pages
Comprehensive Guide Attention Mechanism Deep Learning
No ratings yet
Comprehensive Guide Attention Mechanism Deep Learning
17 pages
Anomaly Detection and Condition Monitoring of Wind Turbine Gearbox Based On LSTM-FS and Transfer Learning
No ratings yet
Anomaly Detection and Condition Monitoring of Wind Turbine Gearbox Based On LSTM-FS and Transfer Learning
14 pages
Deep Learning Approach For Suspicious Activity Detection From Surveillance Video
No ratings yet
Deep Learning Approach For Suspicious Activity Detection From Surveillance Video
5 pages
A Deep Learning Model For Remaining Usef
No ratings yet
A Deep Learning Model For Remaining Usef
16 pages
Chatbots With Personality Using Deep Learning
No ratings yet
Chatbots With Personality Using Deep Learning
47 pages
Deep Learning Approach For Heavy Rainfall Prediction Using Himawari-8 and RDCA Data
No ratings yet
Deep Learning Approach For Heavy Rainfall Prediction Using Himawari-8 and RDCA Data
6 pages
Voice Controlled Home Automation
No ratings yet
Voice Controlled Home Automation
67 pages
Efficient CORDIC-Based Activation Functions for RNN Acceleration on FPGAs
No ratings yet
Efficient CORDIC-Based Activation Functions for RNN Acceleration on FPGAs
11 pages
Understanding LSTM Networks
No ratings yet
Understanding LSTM Networks
7 pages
Empirical Analysis For Crime Prediction and Forecasting Using Machine
No ratings yet
Empirical Analysis For Crime Prediction and Forecasting Using Machine
15 pages
Paper 4
No ratings yet
Paper 4
33 pages
Team 21 Omkar Reddy Gojala Mrinalini Injeti Ramakanth
No ratings yet
Team 21 Omkar Reddy Gojala Mrinalini Injeti Ramakanth
14 pages
Aspect-based sentiment analysis with gated alternate neural network
No ratings yet
Aspect-based sentiment analysis with gated alternate neural network
14 pages
LSTM Paper
No ratings yet
LSTM Paper
5 pages
Tree Vs LSTM For SCM
No ratings yet
Tree Vs LSTM For SCM
17 pages
GDELT
No ratings yet
GDELT
13 pages
A_Comparative_Analysis_of_Deep_Neural_Networks_for_Hourly_Temperature_Forecasting
No ratings yet
A_Comparative_Analysis_of_Deep_Neural_Networks_for_Hourly_Temperature_Forecasting
15 pages
Predictive_Maintenance_using_Machine_Learning
No ratings yet
Predictive_Maintenance_using_Machine_Learning
3 pages
Electronics 09 01696 v2
No ratings yet
Electronics 09 01696 v2
17 pages
An AIoT Based Smart Agricultural System For Pests
No ratings yet
An AIoT Based Smart Agricultural System For Pests
12 pages
Traffic Flow Prediction Using Deep Learning Techniques
No ratings yet
Traffic Flow Prediction Using Deep Learning Techniques
18 pages
Stock Market Price Prediction
0% (1)
Stock Market Price Prediction
21 pages