0% found this document useful (0 votes)
33 views31 pages

Seminar Report GRP No. 56

Uploaded by

PRAJAKTA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views31 pages

Seminar Report GRP No. 56

Uploaded by

PRAJAKTA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

Building software features using pre-trained AI/ML models for a product design platform

SEMINAR REPORT

Building Software Features Using Pre-trained AI/ML


Models For A Product Design Platform
delivered by

ANUSHKA CHIKHALE C22019111125


PRAJAKTA DESHPANDE C22019111133
PRAGATI DOUND C22019111138
SAVI GANDEWAR C22019111141

in partial fulfillment for the award of the degree of


Bachelor of Technology in
ELECTRONICS AND TELECOMMUNICATION of
SAVITRIBAI PHULE PUNE UNIVERSITY,
under the guidance of
Dr.Mrudul Dixit

Sponsored by : - Naya Studio, Boston, USA

in the Department of Electronics and Telecommunication of


CUMMINS COLLEGE OF ENGINEERING FOR WOMEN ,
KARVENAGAR,
PUNE - 411052 ( An Autonomous Institute affiliated to SAVITRIBAI
PHULE PUNE UNIVERSITY )

Academic year
2022-23

MKSSS’S CUMMINS COLLEGE OF ENGINEERING FOR WOMEN, PUNE


1
Building software features using pre-trained AI/ML models for a product design platform

a) Project title : - Building Software Features Using


Pre-trained AI/ML Models For A Product Design Platform

b) Subject area : - Software Development/AI

c) Nature of the Project : - Software

MKSSS’S CUMMINS COLLEGE OF ENGINEERING FOR WOMEN, PUNE


2
Building software features using pre-trained AI/ML models for a product design platform

This is to certify that


ANUSHKA CHIKHALE
PRAJAKTA DESHPANDE
PRAGATI DOUND
SAVI GANDEWAR

have presented a SEMINAR on their PROJECT TOPIC


Building Software Features Using Pre-trained AI/ML Models For A Product
Design Platform
__________________________________________________________________________

in partial fulfillment for the award of the degree of

Bachelor of Technology in ELECTRONICS AND TELECOMMUNICATION of


SAVITRIBAI PHULE PUNE UNIVERSITY,
in

CUMMINS COLLEGE OF ENGINEERING FOR WOMEN , KARVENAGAR ,


PUNE-52 ( An Autonomous Institute affiliated to SAVITRIBAI PHULE PUNE
UNIVERSITY )

Academic year : 2022-23

________________ _______________ ________________


Internal Guide Head of the Department Principal
[ Dr. Mrudul Dixit ] [Dr. Prachi Mukherji] [Dr. M.B.Khambete]

MKSSS’S CUMMINS COLLEGE OF ENGINEERING FOR WOMEN, PUNE


3
Building software features using pre-trained AI/ML models for a product design platform

MKSSS’S CUMMINS COLLEGE OF ENGINEERING FOR WOMEN, PUNE


4
Building software features using pre-trained AI/ML models for a product design platform

Acknowledgement

We would like to thank Vivek H V who gave us a golden opportunity to work on this project for
NAYA Studio. We would also like to thank him for his constant motivation, valuable counsel
and advice in every possible way in spite of his busy schedule throughout our project activity.

We would like to express our sincere gratitude towards our project guide Dr .Mrudul Dixit for her
constant support and valuable guidance during the completion of this B.Tech Project.

We would also like to thank Dr.Prachi Mukherji(H.O.D., E&TC) for her constant encouragement,
valuable guidance, suggestions and her precious time in every possible way in spite of her busy
schedule throughout our project activity.

We take this opportunity to express our sincere thanks to all the teaching as well as non-teaching
staff of the E&TC department for their constant help whenever required. Finally, we express our
sincere thanks to all those who helped us directly or indirectly in many ways towards our
B.Tech project work.

Anushka Chikhale(C22019111125)
Prajakta Deshpande (C22019111133)
Pragati Dound(C22019111138)
Savi Gandewar(C22019111141)

MKSSS’S CUMMINS COLLEGE OF ENGINEERING FOR WOMEN, PUNE


5
Building software features using pre-trained AI/ML models for a product design platform

Abstract

The use of technology in the field of design has revolutionized designing products to some
extent. Today a designer can visualize and design spaces without being physically present in the
said space. This depicts how amazing innovations and constantly evolving technologies have
simplified the design process to great lengths.
Text-to-image models use deep neural networks to translate a natural language description into
an image. In text-to-image models, a language model transforms the input text into a latent
representation, and a generative model produces an image based on that representation. The
most effective models are trained using large amounts of web-scraped image and text data.
Text-to-image models can be used by a designer to realize ideas. Such models will enable the
designers to visualize their ideas in real-time.
Although the use of technology for simplifying the design process is fairly new, it is surely
going to become mainstream in the future.
There is a need to develop an interface which will act as a bridge between the users and the
pre-trained model. The interface will enable the users to generate images from text prompts,
store the images in their workspace, and retrieve images when need be.

MKSSS’S CUMMINS COLLEGE OF ENGINEERING FOR WOMEN, PUNE


6
Building software features using pre-trained AI/ML models for a product design platform

TABLE OF CONTENTS
Sr. no Chapter Page no.

1. INTRODUCTION 9

2. LITERATURE SURVEY 10
2.1 Product Table

3. SPECIFICATIONS 12

4. METHODOLOGY : 13
4.1. DALL-E Mini
4.2. Vector Quantized Adversarial Network
4.3. Contrastive Language- Image Pre-Training

5. DETAIL DESIGN 16
5.1. User Flow
5.2. Database
5.2.1. Schema for Users Database
5.2.2. Schema for Images Database
5.3 API Documentation

6. RESULTS 24

7. EVALUATION 29

8. CONCLUSION 29

9. FUTURE SCOPE 29

10. REFERENCES 30

11. WORK PLAN 31

MKSSS’S CUMMINS COLLEGE OF ENGINEERING FOR WOMEN, PUNE


7
Building software features using pre-trained AI/ML models for a product design platform

List Of Figures

Fig. Name of figures Description Page


no. no.
1. Methodology Block diagram of DALL-E Model 14
2. Block Diagram Methodology of the Project. 15
3. User Flow Flow Chart of the User Flow 16
4. DeepDaze Result Output of DeepDaze model 24
5. Big Sleep Result Output of BigSleep model 24
6. GLIDE Result Output of Glide model 25
7. DALL-E mini Output of DALL-E mini 25

MKSSS’S CUMMINS COLLEGE OF ENGINEERING FOR WOMEN, PUNE


8
Building software features using pre-trained AI/ML models for a product design platform

1. Introduction
AI is a promising exploratory area that can greatly improve the user experience for designers
and gather relevant data during the development process of specific applications. The result is
increasing gratitude for a technology that simplifies complex systems and drives product
innovation.
Naya Studios is a platform that aims to create more inclusive and sustainable products through
their adaptive platform that embodies co:creation and trust while providing an incredible user
experience.
Building a tool that will bring to reality, the idea of a product, from the designer's mind by using
just words or sentences is a fascinating concept. By using AI/ML image generation models the
exact same thing can be done.
Dall-E is a deep learning image generative model that generates images from natural language
descriptions called captions. It is a generative pre-trained transformer which can generate
images in multiple styles including photorealistic imagery, paintings and emoji. The model is
trained by looking at millions of images from the web with relevant captions.Some concepts are
learned from memory but images that cannot exist can also be created.
Several models have been combined to achieve these results : An image coder that converts a
raw image into a sequence of numbers using an associated decoder, a model that converts text
prompts to encoded images and a model to judge the quality of the generated image for better
filtering.
It is necessary to provide an interface that serves as a link between users and the pre-trained
model. Users will be able to create images from text prompts, store the images in their
workspaces, and retrieve them as needed thanks to the UI.

MKSSS’S CUMMINS COLLEGE OF ENGINEERING FOR WOMEN, PUNE


9
Building software features using pre-trained AI/ML models for a product design platform

2. Literature Survey

Sr. Title of Research paper Journal Model Hyperlink


no. name and
Year of
Publication

1. Photorealistic Org paper, Imagen https://arxiv.org/pdf/220


Text-to-Image Diffusion 2022 5.11487.pdf
Models with Deep
Language Understanding

2. Text to Image using Deep IJERT, 2021 GAN https://www.ijert.org/res


Learning earch/text-to-image-usi
ng-deep-learning-IJERT
V10IS040132.pdf

3. Learning Transferable Org paper, CLIP https://arxiv.org/pdf/210


Visual Models From 2021 3.00020.pdf
Natural Language
Supervision

4. Hierarchical Org paper, DALL-E 2 https://cdn.openai.com/


Text-Conditional Image 2022 papers/dall-e-2.pdf
Generation with CLIP
Latents

5. Text-Based Real Image Org paper, Imagic https://arxiv.org/pdf/221


Editing with Diffusion 2022 0.09276.pdf
Models

MKSSS’S CUMMINS COLLEGE OF ENGINEERING FOR WOMEN, PUNE


10
Building software features using pre-trained AI/ML models for a product design platform

2.1 Product Table

Sr. No. Name of the Place where it Cost Link of its


existing is in-use website
Product

1. Text to image Developers $5 per 100 API https://deepai.or


generator Calls g/machine-lear
ning-model/text
2img

2. Canva AI Designers Premium https://www.ca


image generator Artists Subscription nva.com/feature
required s/ai-image-gene
rator/

MKSSS’S CUMMINS COLLEGE OF ENGINEERING FOR WOMEN, PUNE


11
Building software features using pre-trained AI/ML models for a product design platform

3. Specifications

HARDWARE SPECIFICATIONS :

Memory: 8 GB RAM

Processor: Intel(R) Core(TM) i5-1035G1 CPU @ 1.00GHz 1.19 GHz.

GPU: Tesla K80 GPU, 12Gb RAM

SOFTWARE SPECIFICATIONS :
Operating System: Windows 11 Home 64

Software Platform: Jupyter Notebook and Google Colab, Figma

Language : Python

Libraries : PyTorch, Ftfy, Regex, Tqdm, Jax

Models : DALL-E mini, CLIP, VQGAN

MKSSS’S CUMMINS COLLEGE OF ENGINEERING FOR WOMEN, PUNE


12
Building software features using pre-trained AI/ML models for a product design platform
4. Methodology

After initial literature survey and research the following models have been found to be most
suitable for this project :-

4.1 DALL-E Mini


The VQGAN encoder is first used to encrypt the images for training before turning them into
tokens. The text description will similarly be sent to a BART encoder for transcription.
The output of the BART encoder and the images that have been encoded are then sent through
the BART decoder. This auto-regressive model can use the data and make an attempt to
correctly forecast the token that will come after it in the sequence. We may employ captions to
create images now that the algorithm has been trained to correctly predict these tokens. Image
tokens are sampled progressively depending on the decoder's projected distribution over the
following token, using the provided caption as a prompt. Based on the encodings from the
BART Encoder and VQGAN Encoder, the BART Decoder then returns a number of sample
potential picture encodings. The VQGAN decoder then converts these sequences into visuals
that resemble the sample sequence. Finally, CLIP ranks and organizes the output images before
choosing the best one(s) to display.

4.2 Vector Quantized Generative Adversarial Network (VQGAN)


VQGAN is an image generative model. It tries to combine the benefits of both CNN and the
transformer approach for image synthesis.
It uses a codebook which is a table of vectors where each vector represents a “perceptually
rich image constituent”. The encoder tries to represent the input image as a sequence of codes
from the codebook.
Later, the decoder uses this sequence (and the codebook) to restore the original image.
We take the intermediate, sequence representation to the transformer model and train it in the
language modeling task. If we denote the sequential representation of the image as
[S = S0, S1, S2, .. Sn]

MKSSS’S CUMMINS COLLEGE OF ENGINEERING FOR WOMEN, PUNE


13
Building software features using pre-trained AI/ML models for a product design platform

We can train the transformer on predicting the next value in this sequence. This way, the
transformer learns how far distributed pieces of the image are related to each other.

4.3 Contrastive Language-Image Pre-Training (CLIP)


At the core of this neural network lies Natural Language Supervision instead of unsupervised or
self-supervised learning approaches. CLIP uses a text encoder and an image encoder to relate an
image to a relevant label or caption. These encoders essentially embed information in
mathematical space which is in turn used for comparing the similarity of an image to a given
text. CLIP aims at maximizing the cosine similarity of the image and text embeddings to
accurately predict the similarity between the image and text.

Fig 1 : DALL-E mini Structure

MKSSS’S CUMMINS COLLEGE OF ENGINEERING FOR WOMEN, PUNE


14
Building software features using pre-trained AI/ML models for a product design platform

Fig 2 : Methodology

After the literature survey and finalization of models, run the models and compare their
outputs for the same input.
Compare results of the said models and select the best model for developing the software
feature.
Make a user-flow diagram describing the flow of data as the user navigates the website.
Design the website wireframe and prototype on figma
Develop a front-end for the website and establish connection between the model and front-end
Accept text input from user in order to generate images
Fetch images generated by model and display to user
Enable user to select appropriate image
Save image selected by user to database.
Display image on users workspace.

MKSSS’S CUMMINS COLLEGE OF ENGINEERING FOR WOMEN, PUNE


15
Building software features using pre-trained AI/ML models for a product design platform

5. Detail Design

5.1 User Flow:


User flow diagrams visually describe the logical path that users follow when interacting with a
website. It identifies everything from the entry point of the website,to the page the user
navigates to. It includes all the intermediate steps and flow of data as the user navigates the
website. Figure 3 shows the user flow diagram.

MKSSS’S CUMMINS COLLEGE OF ENGINEERING FOR WOMEN, PUNE


16
Building software features using pre-trained AI/ML models for a product design platform

MKSSS’S CUMMINS COLLEGE OF ENGINEERING FOR WOMEN, PUNE


17
Building software features using pre-trained AI/ML models for a product design platform

Fig 3 : User flow

5.2 Database:
A database is an organized collection of structured information or data, usually stored electronically
in a computer system. This project requires two databases, one to store the basic details of users
of the platform and another to store images generated by the model. The user database will store
information about the user’s email, full name, profession and password. After successful login,
every user will be assigned a unique user id for easy identification.

MKSSS’S CUMMINS COLLEGE OF ENGINEERING FOR WOMEN, PUNE


18
Building software features using pre-trained AI/ML models for a product design platform

5.2.1 Schema for users database :

MKSSS’S CUMMINS COLLEGE OF ENGINEERING FOR WOMEN, PUNE


19
Building software features using pre-trained AI/ML models for a product design platform

The second database will contain information about the user’s previously saved images and any
new images that the user wants to save along with the image caption. It will also consist of a
unique image id for easy identification of each image.

MKSSS’S CUMMINS COLLEGE OF ENGINEERING FOR WOMEN, PUNE


20
Building software features using pre-trained AI/ML models for a product design platform

5.2.2 Schema for images database :

MKSSS’S CUMMINS COLLEGE OF ENGINEERING FOR WOMEN, PUNE


21
Building software features using pre-trained AI/ML models for a product design platform

5.3 API Documentation:


Application Programming Interface (API ) documentation describes the API calls that will be
made in our project.

MKSSS’S CUMMINS COLLEGE OF ENGINEERING FOR WOMEN, PUNE


22
Building software features using pre-trained AI/ML models for a product design platform

MKSSS’S CUMMINS COLLEGE OF ENGINEERING FOR WOMEN, PUNE


23
Building software features using pre-trained AI/ML models for a product design platform

6. Results
On giving the following text Input : “ Mushroom shaped chair ” , the images generated by the
different models are as follows :

Outputs :

Model : DeepDaze (CLIP+ siren)

Fig 4 : DeepDaze Result

We can see in the above figure (Fig. 4 ) that the image is blurry and that it doesn't even properly
outline the object.

Model : BigSleep (CLIP + BigGAN)

Fig 5 : BigSleep Result


In the above figure (Fig. 5) the object's shape is only partially discernible and requires significant
improvement in order to be clearly described.

MKSSS’S CUMMINS COLLEGE OF ENGINEERING FOR WOMEN, PUNE


24
Building software features using pre-trained AI/ML models for a product design platform

Model : GLIDE

Fig 6 : GLIDE Result


As seen in the above figure (Fig. 6) although lacking in creativity, the image provided is
accurate. But, user's options are limited with this model.

Model : DALL-E mini

Fig 7 : DALL-E Mini

As the figure (Fig 7.) shows, this model provides the client with a wide variety of choices while
producing accurate results.

MKSSS’S CUMMINS COLLEGE OF ENGINEERING FOR WOMEN, PUNE


25
Building software features using pre-trained AI/ML models for a product design platform

FIGMA PROTOTYPE
An outline of what the end project will look like. ( A wireframe )

This is the login page where users sign in, create accounts, and access their workspaces.

This is the user's workspace, it is visible to them and where they can enter the prompt for the
images.

MKSSS’S CUMMINS COLLEGE OF ENGINEERING FOR WOMEN, PUNE


26
Building software features using pre-trained AI/ML models for a product design platform

For example : The user enters the prompt “ Mushroom shaped chair ” and clicks on create.

The pre-trained model generates all the possible images and displays them to the user on the page.

MKSSS’S CUMMINS COLLEGE OF ENGINEERING FOR WOMEN, PUNE


27
Building software features using pre-trained AI/ML models for a product design platform

The users only needs to click on an image to save it, if they want to save any of the generated
images.

Once saved, the image appears in the user's workspace, which may also be accessed by clicking
"My Workspace" in the top-left corner of the page.

MKSSS’S CUMMINS COLLEGE OF ENGINEERING FOR WOMEN, PUNE


28
Building software features using pre-trained AI/ML models for a product design platform

7. Evaluation
● The automated performance metric used internally is the CLIP score.
● CLIP scores have limitations, it is ineffective at counting. Due to such limitations,
human evaluations are used to assess image quality and caption similarity.
● From visual inspection, DALL-E mini generates the best image according to the
caption.
● Fast image generation
● High quality scene Images (Resolution : 256 x 256 pixels )
● Speed : 40 sec per Image.

8. Conclusion

● The model DALL-E mini is most suitable for building the feature as it produces more
realistic images.
● This model has lower computational time.
● DALL-E mini also gives a more accurate image according to the caption provided.

9. Future scope

● Creating a user interface (UI) for the chosen pre-trained model to make it easier to use
and more effective.
● Creating a backend that uses a database and API to store the user's images and keep
them accessible even after the user logs out of the session.
● Adding a functionality that modifies the provided picture based on the user's request.

MKSSS’S CUMMINS COLLEGE OF ENGINEERING FOR WOMEN, PUNE


29
Building software features using pre-trained AI/ML models for a product design platform

10. References

[1] Chitwan Saharia , William Chan et.al. , “Photorealistic Text-to-Image Diffusion Models with

Deep Language Understanding” , 2022

[2] Akanksha Singh, Ritika Shenoy, Sonam Anekar,Prof. Sainath Patil , “Text to Image using Deep

Learning” , IJERT, Volume 10 Issue 4, 2278-0181, IJERTV10IS040132, 2021

[3] Alec Radford et. al. , “Learning Transferable Visual Models From Natural Language

Supervision”, 2021

[4] Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, Mark Chen, “Hierarchical

Text-Conditional Image Generation with CLIP Latents”, 2022

[5] Bahjat Kawar,Huiwen Chang,” Text-Based Real Image Editing with Diffusion Models”, 2022

[6]https://colab.research.google.com/drive/1wwyTCWYNqTZbV0KFhqbaIRLesmHEZtB4?usp=sh
aring
[7]https://colab.research.google.com/drive/1_hPc8DDGIwLPGLiM7LgC_0AUkQ1MmEWb?authu
ser=1

MKSSS’S CUMMINS COLLEGE OF ENGINEERING FOR WOMEN, PUNE


30
Building software features using pre-trained AI/ML models for a product design platform

10. WORK PLAN

MKSSS’S CUMMINS COLLEGE OF ENGINEERING FOR WOMEN, PUNE


31

You might also like