Image Classification and Generation of Images

This capstone project by Nihal Raj focuses on utilizing generative AI techniques, specifically stable diffusion, to generate high-quality images from prompts. The methodology involves a detailed process of training a generator network using noise addition and denoising techniques, with the Hugging Face interface facilitating the implementation. The project's results are evaluated using the Fréchet Inception Distance (FID) metric to assess the quality of generated images against real images.

Uploaded by

gamerfever790

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

Image Classification and Generation of Images

Uploaded by

gamerfever790

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 21

Image Classification and Generation of

Images, digital art using prompts with

the help of Generative AI
Capstone Project
Nihal Raj
20BIT0417
Guide: Dr.L.Agilandeeswari
Proposed Methodology & Architecture
• This project aims to utilize generative AI techniques, specifically the stable diffusion
technique, to generate quality images. Stable diffusion is a machine learning approach
that involves gradually adding noise to an image over multiple steps, allowing the
model to refine its understanding of the image while maintaining stability. The
generated images will be produced using the Hugging Face interface, a popular
platform for natural language processing and machine learning tasks.
Stable Diffusion
• Stable diffusion is a machine learning method that involves iteratively adding noise to
an image while maintaining stability and coherence. This technique allows the model to
gradually refine its understanding of the image without losing important features or
details. By diffusing noise over multiple steps, the model can generate high-quality
images with realistic textures and structures.
Stable Diffusion
• The below image of a cat is a training image which is given as an input with a text,
usually we name it as an “alt text” which describes the object in the image precisely.
Now, as we can see the sequence of adding noise bit by bit with each alteration to
completely distort the image with 100% noise.
Stable Diffusion
The generative denoising process typically occurs through a combination of techniques, such as deep learning
models and optimization algorithms. A simplified overview of how it works is:
• 1. Input Image: The process starts with an input image that may be corrupted by noise or artifacts.
• 2. Generative Model: A generative model, often based on neural networks, is employed to denoise the input
image. This model learns to understand the underlying structure and features of the image data.
• 3. Training Phase: The generative model is trained on a dataset of clean images paired with their noisy
counterparts. During training, the model learns to map noisy images to their clean versions by minimizing a
loss function that measures the difference between the generated output and the ground truth clean image.
• 4. Inference Phase: Once the model is trained, it can be used to denoise new, unseen images. During
inference, the generative model takes a noisy image as input and produces a denoised version as output.
• 5. Post-processing: Sometimes, additional post-processing techniques may be applied to further enhance the
quality of the denoised image. These techniques could include filtering, smoothing, or sharpening operations.
Architecture
Architecture
1. Conditional Inputs:
- Conditional Inputs refer to the use of prompts or context to guide the generation process towards specific
image outcomes.
- These inputs provide additional guidance to the generator network, influencing the characteristics of the
generated images.
- Conditional Inputs play a crucial role in directing the generation process towards desired image
outcomes.

2. Sampling from Noise Distribution:

- During the initiation of the generation process, random noise vectors are sampled from a predefined noise
distribution.
- These noise vectors are utilized as the initial input to the generator network.
- The primary function of these noise vectors is to facilitate the generation of a wide range of diverse and
innovative images.
Architecture
3. Diffusion Steps:
- Diffusion Steps are an integral component of both the training and generation phases in the context of
image generation.
- This process involves the gradual addition of noise to input images, with the aim of enhancing image
fidelity over successive steps.
- The generator iteratively works to attenuate noise and improve image quality until the desired level of
fidelity is achieved.

4.Output Image:
- The final result of the generation process is a high-fidelity image created by the generator network.
- This output is a combination of learned patterns and structures stored within the parameters of the
generator.
- It embodies the generative capabilities of the model and represents the culmination of the generation process.
Complete Design
• Training the model:
• a. Generator Network: The generator network is typically a deep neural network responsible for
generating high-quality images. It may consist of convolutional layers, residual blocks, attention
mechanisms, and normalization layers. During training, the generator learns to transform noise or low-
quality inputs into realistic images.
• b. Diffusion Process: The training process involves the diffusion process, where noise is gradually
added to the input images over multiple steps. At each step, the generator attempts to remove the added
noise and reconstruct the original image. This process helps the model learn to generate images while
maintaining stability and realism.
• c. Loss Function: The training is guided by a loss function, such as the mean squared error (MSE) or
perceptual loss, which measures the discrepancy between the generated images and the ground truth
images. The loss function drives the model to produce images that closely match the desired output.
• d. Optimization Algorithm: The optimization algorithm, such as stochastic gradient descent (SGD) or
Adam, is used to update the parameters of the generator network to minimize the loss function. Through
backpropagation, the model learns to adjust its parameters to improve image quality and stability over
time.
Complete Design
• Training the model:
Complete Design
Generation:
• a. Conditional Inputs: The generation process involves conditional inputs, such as prompts, to
guide the generation of specific types of images. These inputs provide additional context to
the generator network and influence the generated outputs accordingly.
• b. Sampling from Noise Distribution: The generation process starts by sampling random
noise vectors from a predefined noise distribution. These noise vectors serve as the initial
inputs to the generator network.
• c. Diffusion Steps: Like the training process, the generation process involves multiple
diffusion steps, where noise is gradually added to the input images. At each step, the generator
attempts to denoise the input and produce a more realistic image. This iterative process
continues until the desired level of image quality is achieved.
• d. Output Image: The final output of the generation process is a high-quality image
generated by the generator network. This image is typically the result of multiple diffusion
steps and reflects the learned patterns and structures encoded in the generator's parameters.
Module discription
• Stable diffusion
• Stable diffusion is a machine learning method that involves iteratively adding noise to
an image while maintaining stability and coherence. This technique allows the model to
gradually refine its understanding of the image without losing important features or
details.
Module discription
• Hugging Face Interface:
• Hugging Face is a leading platform for natural language processing (NLP) and machine
learning tasks. It provides a user-friendly interface for accessing pre-trained models,
training custom models, and deploying machine learning applications. In this project,
the Hugging Face interface will be used to implement and deploy the generative AI
model for image generation. The interface offers various tools and resources for
working with machine learning models, making it an ideal choice for this project.
Hence, the model that we implemented for our project is:

stabilityai/stable-diffusion-2
• , this model is trained by various types of images having huge dataset with images
having alt texts related to them which is used in supervised learning.
Implementation and Results
• Following are the images we got from the implementation of the Image
generation model:

•Prompt: Beach view from a building

Implementation and Results

•Providing ratings to each photo

Implementation and Results
•Final Image after rating each photo
Implementation
and Results
•Final Image Editing
Implementatio
n and Results
•Final Image after rating
each photo
Testing and performance
metrics
Research findings:
To assess the quality of images created by generative models, it is common to use
the Fréchet inception distance (FID) metric. In a nutshell, FID calculates the
distance between the feature vectors of real images and generated images. On the
COCO benchmark, Imagen currently achieved the best (lowest) zero-shot FID
score of 7.27, outperforming DALL·E 2 with a 10.39 FID score.

From the original Latent Diffusion paper (see below), the Stable Diffusion Model (LDM) has
reached a 12.63 FID score
Testing and performance
metrics
Frechet Inception Distance (FID): Assessing Image distribution similarity.
FID stands as a cornerstone metric that measures the distance between the distributions
of generated and real images. Lower FID scores signify a closer match between generated
and real-world images. In addition, it shows superior model performance in mimicking
real data distributions.
Thank you.

Diffusion: by Aryan Jain
100% (1)
Diffusion: by Aryan Jain
55 pages
Audio Games-New Perspectives On Game Audio
No ratings yet
Audio Games-New Perspectives On Game Audio
7 pages
(Paper ID -321) Exploring the various Machine Learning Models for Image Generation - A Comprehensive Survey Unlocking the Future of Digital Creativity
No ratings yet
(Paper ID -321) Exploring the various Machine Learning Models for Image Generation - A Comprehensive Survey Unlocking the Future of Digital Creativity
15 pages
A Survey of Diffusion Based Image Generation Models: Issues and Their Solutions
No ratings yet
A Survey of Diffusion Based Image Generation Models: Issues and Their Solutions
11 pages
New Denoising Diffusion Model
No ratings yet
New Denoising Diffusion Model
13 pages
Unit - 4
No ratings yet
Unit - 4
46 pages
NeurIPS 2021 Diffusion Models Beat Gans On Image Synthesis Paper
No ratings yet
NeurIPS 2021 Diffusion Models Beat Gans On Image Synthesis Paper
15 pages
2582 Elucidating The Design Space o
No ratings yet
2582 Elucidating The Design Space o
13 pages
Generative AI Art Internship
No ratings yet
Generative AI Art Internship
23 pages
Diffusion
100% (5)
Diffusion
62 pages
Classifier-Guided-Diffusion-Diffusion Models Beat GANs On Image Synthesis
No ratings yet
Classifier-Guided-Diffusion-Diffusion Models Beat GANs On Image Synthesis
44 pages
Synthetic Data Generation For Scarce Road Scene Detection Scenarios
No ratings yet
Synthetic Data Generation For Scarce Road Scene Detection Scenarios
10 pages
98152bdf-d3c6-4d64-8f54-5cfe41c88dda_Background_and_Literature_Review
No ratings yet
98152bdf-d3c6-4d64-8f54-5cfe41c88dda_Background_and_Literature_Review
17 pages
b383fba0-f67c-4a5a-aad0-fd288516352c_Background_and_Literature_Review
No ratings yet
b383fba0-f67c-4a5a-aad0-fd288516352c_Background_and_Literature_Review
7 pages
2502.15176v1
No ratings yet
2502.15176v1
30 pages
2304.11267
No ratings yet
2304.11267
5 pages
2405.00196v1
No ratings yet
2405.00196v1
11 pages
3 Paper
No ratings yet
3 Paper
14 pages
CN2
No ratings yet
CN2
4 pages
s40745-024-00544-1
No ratings yet
s40745-024-00544-1
30 pages
diffusion-csail-lecture-notes
No ratings yet
diffusion-csail-lecture-notes
56 pages
Efficient Diffusion Models For Vision A Survey
No ratings yet
Efficient Diffusion Models For Vision A Survey
16 pages
Diffusion
No ratings yet
Diffusion
55 pages
2209.04747v6
No ratings yet
2209.04747v6
25 pages
Stable Diffusion For Image Generation
No ratings yet
Stable Diffusion For Image Generation
23 pages
Diffusion Models in Vision a Survey
No ratings yet
Diffusion Models in Vision a Survey
20 pages
Scene Reconstruction From 4D Radar Data With GAN and Diffusion
No ratings yet
Scene Reconstruction From 4D Radar Data With GAN and Diffusion
69 pages
Deep Learning - Image Synthesis
No ratings yet
Deep Learning - Image Synthesis
36 pages
Algorithms 17 00125
No ratings yet
Algorithms 17 00125
16 pages
STABLE DIFFUSION WITH GENERATIVE AI
No ratings yet
STABLE DIFFUSION WITH GENERATIVE AI
3 pages
1 s2.0 S0262885623001452 Main
No ratings yet
1 s2.0 S0262885623001452 Main
21 pages
2023 - Diffusion-GAN - Wang Et Al
No ratings yet
2023 - Diffusion-GAN - Wang Et Al
26 pages
Research_Paper_Shailesh_Tagadghar_31031523034
No ratings yet
Research_Paper_Shailesh_Tagadghar_31031523034
16 pages
FULLTEXT01
No ratings yet
FULLTEXT01
85 pages
Empowering Local Image Generation: Harnessing Stable Diffusion for Machine Learning and AI
No ratings yet
Empowering Local Image Generation: Harnessing Stable Diffusion for Machine Learning and AI
3 pages
paper10
No ratings yet
paper10
8 pages
Project Proposal
No ratings yet
Project Proposal
22 pages
A Simplified Generative Model Based On Gradient Descent and Mean Square Error
No ratings yet
A Simplified Generative Model Based On Gradient Descent and Mean Square Error
8 pages
AICTE_Internship_Project_Report
No ratings yet
AICTE_Internship_Project_Report
2 pages
Diffusion Models in Deep Learning
No ratings yet
Diffusion Models in Deep Learning
14 pages
Stable Diffusion A Tutorial
100% (1)
Stable Diffusion A Tutorial
66 pages
Applsci 13 10637 v2
No ratings yet
Applsci 13 10637 v2
29 pages
2312.14977diffusion Models For Generative Artificial
No ratings yet
2312.14977diffusion Models For Generative Artificial
23 pages
Base Paper Batch 9 Final Updated 3
No ratings yet
Base Paper Batch 9 Final Updated 3
10 pages
Final Term Paper Draft 2
No ratings yet
Final Term Paper Draft 2
33 pages
IEEE Editable
No ratings yet
IEEE Editable
8 pages
Moniiiii
No ratings yet
Moniiiii
53 pages
The Physics Principle That Inspired Modern AI Art - Quanta Magazine
No ratings yet
The Physics Principle That Inspired Modern AI Art - Quanta Magazine
10 pages
1-EFFECTIVE DATA AUGMENTATION WITH DIFFUSION MODELS
No ratings yet
1-EFFECTIVE DATA AUGMENTATION WITH DIFFUSION MODELS
23 pages
Lossy Image Compression With Foundation Diffusion Models Paper
No ratings yet
Lossy Image Compression With Foundation Diffusion Models Paper
17 pages
Weekly Update 10-05-24
No ratings yet
Weekly Update 10-05-24
1 page
AI-Diffuser-Models-Revolutionizing-Image-Generation
No ratings yet
AI-Diffuser-Models-Revolutionizing-Image-Generation
9 pages
Project_Review
No ratings yet
Project_Review
20 pages
Arbitrary Style Guidance For Enhanced Diffusion-Based Text-to-Image Generation
No ratings yet
Arbitrary Style Guidance For Enhanced Diffusion-Based Text-to-Image Generation
11 pages
GenAI Assignment 3 & 4
No ratings yet
GenAI Assignment 3 & 4
6 pages
DMD Lowres
No ratings yet
DMD Lowres
22 pages
PFGM++ - Unlocking The Potential of Physics-Inspired Generative Models
No ratings yet
PFGM++ - Unlocking The Potential of Physics-Inspired Generative Models
23 pages
AN IMPROVED TECHNIQUE FOR MIX NOISE AND BLURRING REMOVAL IN DIGITAL IMAGES
From Everand
AN IMPROVED TECHNIQUE FOR MIX NOISE AND BLURRING REMOVAL IN DIGITAL IMAGES
UTKARSH SHUKLA
No ratings yet
AI for Everyone: An Intermediate Guide to Artificial Intelligence
From Everand
AI for Everyone: An Intermediate Guide to Artificial Intelligence
Nova Clarke
No ratings yet
Image Compression: Efficient Techniques for Visual Data Optimization
From Everand
Image Compression: Efficient Techniques for Visual Data Optimization
Fouad Sabry
No ratings yet
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
Matplotlib (2)
No ratings yet
Matplotlib (2)
5 pages
Yogi Ashari: Education
No ratings yet
Yogi Ashari: Education
1 page
Procreate Handbook: Procreate Procreate Pocket Support Community
0% (2)
Procreate Handbook: Procreate Procreate Pocket Support Community
5 pages
8888888888888888888888888888888
No ratings yet
8888888888888888888888888888888
18 pages
CV MincanYang CMU2022
No ratings yet
CV MincanYang CMU2022
1 page
Module 1 and 2
No ratings yet
Module 1 and 2
8 pages
Applications of 3d Printing Technology and How It Works in Construction of The Philippines
No ratings yet
Applications of 3d Printing Technology and How It Works in Construction of The Philippines
4 pages
Awareness of Helmet
No ratings yet
Awareness of Helmet
13 pages
Costs
No ratings yet
Costs
4 pages
Real-Time DSP: ECE 5655/4655 Lecture Notes
No ratings yet
Real-Time DSP: ECE 5655/4655 Lecture Notes
34 pages
EI8651-Logic and Distributed Control System
No ratings yet
EI8651-Logic and Distributed Control System
17 pages
Sample Test Plan Lab1
No ratings yet
Sample Test Plan Lab1
18 pages
Release Notes XS6650HD
No ratings yet
Release Notes XS6650HD
1 page
Display Interactiv IQTouch User Manual ENG
No ratings yet
Display Interactiv IQTouch User Manual ENG
64 pages
MagicQ PC Install Instructions
No ratings yet
MagicQ PC Install Instructions
3 pages
DS Assi 1-5 Ques
No ratings yet
DS Assi 1-5 Ques
5 pages
B450 Aorus Elite V2: User's Manual
No ratings yet
B450 Aorus Elite V2: User's Manual
44 pages
CS100
No ratings yet
CS100
68 pages
Computer Assembly
No ratings yet
Computer Assembly
5 pages
EWAVR CompilerGuide PDF
No ratings yet
EWAVR CompilerGuide PDF
457 pages
Laksmi Manual & Automation Tester
No ratings yet
Laksmi Manual & Automation Tester
4 pages
My Computer Details
No ratings yet
My Computer Details
2 pages
What's New Document - Cimatron (PDFDrive)
No ratings yet
What's New Document - Cimatron (PDFDrive)
216 pages
Plot Scales For The Paper Space Zoom XP in Autocad and Intellicad
No ratings yet
Plot Scales For The Paper Space Zoom XP in Autocad and Intellicad
1 page
COM 215 (Computer Application Package II)
No ratings yet
COM 215 (Computer Application Package II)
9 pages
Interrupts AND Exceptions Exceptions: Interrupt Classification Interrupt Classification
No ratings yet
Interrupts AND Exceptions Exceptions: Interrupt Classification Interrupt Classification
15 pages
Conformation Meeting PPT Presentation
No ratings yet
Conformation Meeting PPT Presentation
20 pages
Detailed Lesson Plan in ICT
100% (1)
Detailed Lesson Plan in ICT
6 pages
Log
No ratings yet
Log
2 pages