0% found this document useful (0 votes)
2 views

Image Classification and Generation of Images

This capstone project by Nihal Raj focuses on utilizing generative AI techniques, specifically stable diffusion, to generate high-quality images from prompts. The methodology involves a detailed process of training a generator network using noise addition and denoising techniques, with the Hugging Face interface facilitating the implementation. The project's results are evaluated using the Fréchet Inception Distance (FID) metric to assess the quality of generated images against real images.

Uploaded by

gamerfever790
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Image Classification and Generation of Images

This capstone project by Nihal Raj focuses on utilizing generative AI techniques, specifically stable diffusion, to generate high-quality images from prompts. The methodology involves a detailed process of training a generator network using noise addition and denoising techniques, with the Hugging Face interface facilitating the implementation. The project's results are evaluated using the Fréchet Inception Distance (FID) metric to assess the quality of generated images against real images.

Uploaded by

gamerfever790
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 21

Image Classification and Generation of

Images, digital art using prompts with


the help of Generative AI
Capstone Project
Nihal Raj
20BIT0417
Guide: Dr.L.Agilandeeswari
Proposed Methodology & Architecture
• This project aims to utilize generative AI techniques, specifically the stable diffusion
technique, to generate quality images. Stable diffusion is a machine learning approach
that involves gradually adding noise to an image over multiple steps, allowing the
model to refine its understanding of the image while maintaining stability. The
generated images will be produced using the Hugging Face interface, a popular
platform for natural language processing and machine learning tasks.
Stable Diffusion
• Stable diffusion is a machine learning method that involves iteratively adding noise to
an image while maintaining stability and coherence. This technique allows the model to
gradually refine its understanding of the image without losing important features or
details. By diffusing noise over multiple steps, the model can generate high-quality
images with realistic textures and structures.
Stable Diffusion
• The below image of a cat is a training image which is given as an input with a text,
usually we name it as an “alt text” which describes the object in the image precisely.
Now, as we can see the sequence of adding noise bit by bit with each alteration to
completely distort the image with 100% noise.
Stable Diffusion
The generative denoising process typically occurs through a combination of techniques, such as deep learning
models and optimization algorithms. A simplified overview of how it works is:
• 1. Input Image: The process starts with an input image that may be corrupted by noise or artifacts.
• 2. Generative Model: A generative model, often based on neural networks, is employed to denoise the input
image. This model learns to understand the underlying structure and features of the image data.
• 3. Training Phase: The generative model is trained on a dataset of clean images paired with their noisy
counterparts. During training, the model learns to map noisy images to their clean versions by minimizing a
loss function that measures the difference between the generated output and the ground truth clean image.
• 4. Inference Phase: Once the model is trained, it can be used to denoise new, unseen images. During
inference, the generative model takes a noisy image as input and produces a denoised version as output.
• 5. Post-processing: Sometimes, additional post-processing techniques may be applied to further enhance the
quality of the denoised image. These techniques could include filtering, smoothing, or sharpening operations.
Architecture
Architecture
1. Conditional Inputs:
- Conditional Inputs refer to the use of prompts or context to guide the generation process towards specific
image outcomes.
- These inputs provide additional guidance to the generator network, influencing the characteristics of the
generated images.
- Conditional Inputs play a crucial role in directing the generation process towards desired image
outcomes.

2. Sampling from Noise Distribution:


- During the initiation of the generation process, random noise vectors are sampled from a predefined noise
distribution.
- These noise vectors are utilized as the initial input to the generator network.
- The primary function of these noise vectors is to facilitate the generation of a wide range of diverse and
innovative images.
Architecture
3. Diffusion Steps:
- Diffusion Steps are an integral component of both the training and generation phases in the context of
image generation.
- This process involves the gradual addition of noise to input images, with the aim of enhancing image
fidelity over successive steps.
- The generator iteratively works to attenuate noise and improve image quality until the desired level of
fidelity is achieved.

4.Output Image:
- The final result of the generation process is a high-fidelity image created by the generator network.
- This output is a combination of learned patterns and structures stored within the parameters of the
generator.
- It embodies the generative capabilities of the model and represents the culmination of the generation process.
Complete Design
• Training the model:
• a. Generator Network: The generator network is typically a deep neural network responsible for
generating high-quality images. It may consist of convolutional layers, residual blocks, attention
mechanisms, and normalization layers. During training, the generator learns to transform noise or low-
quality inputs into realistic images.
• b. Diffusion Process: The training process involves the diffusion process, where noise is gradually
added to the input images over multiple steps. At each step, the generator attempts to remove the added
noise and reconstruct the original image. This process helps the model learn to generate images while
maintaining stability and realism.
• c. Loss Function: The training is guided by a loss function, such as the mean squared error (MSE) or
perceptual loss, which measures the discrepancy between the generated images and the ground truth
images. The loss function drives the model to produce images that closely match the desired output.
• d. Optimization Algorithm: The optimization algorithm, such as stochastic gradient descent (SGD) or
Adam, is used to update the parameters of the generator network to minimize the loss function. Through
backpropagation, the model learns to adjust its parameters to improve image quality and stability over
time.
Complete Design
• Training the model:
Complete Design
Generation:
• a. Conditional Inputs: The generation process involves conditional inputs, such as prompts, to
guide the generation of specific types of images. These inputs provide additional context to
the generator network and influence the generated outputs accordingly.
• b. Sampling from Noise Distribution: The generation process starts by sampling random
noise vectors from a predefined noise distribution. These noise vectors serve as the initial
inputs to the generator network.
• c. Diffusion Steps: Like the training process, the generation process involves multiple
diffusion steps, where noise is gradually added to the input images. At each step, the generator
attempts to denoise the input and produce a more realistic image. This iterative process
continues until the desired level of image quality is achieved.
• d. Output Image: The final output of the generation process is a high-quality image
generated by the generator network. This image is typically the result of multiple diffusion
steps and reflects the learned patterns and structures encoded in the generator's parameters.
Module discription
• Stable diffusion
• Stable diffusion is a machine learning method that involves iteratively adding noise to
an image while maintaining stability and coherence. This technique allows the model to
gradually refine its understanding of the image without losing important features or
details.
Module discription
• Hugging Face Interface:
• Hugging Face is a leading platform for natural language processing (NLP) and machine
learning tasks. It provides a user-friendly interface for accessing pre-trained models,
training custom models, and deploying machine learning applications. In this project,
the Hugging Face interface will be used to implement and deploy the generative AI
model for image generation. The interface offers various tools and resources for
working with machine learning models, making it an ideal choice for this project.
Hence, the model that we implemented for our project is:

stabilityai/stable-diffusion-2
• , this model is trained by various types of images having huge dataset with images
having alt texts related to them which is used in supervised learning.
Implementation and Results
• Following are the images we got from the implementation of the Image
generation model:

•Prompt: Beach view from a building


Implementation and Results

•Providing ratings to each photo


Implementation and Results
•Final Image after rating each photo
Implementation
and Results
•Final Image Editing
Implementatio
n and Results
•Final Image after rating
each photo
Testing and performance
metrics
Research findings:
To assess the quality of images created by generative models, it is common to use
the Fréchet inception distance (FID) metric. In a nutshell, FID calculates the
distance between the feature vectors of real images and generated images. On the
COCO benchmark, Imagen currently achieved the best (lowest) zero-shot FID
score of 7.27, outperforming DALL·E 2 with a 10.39 FID score.

From the original Latent Diffusion paper (see below), the Stable Diffusion Model (LDM) has
reached a 12.63 FID score
Testing and performance
metrics
Frechet Inception Distance (FID): Assessing Image distribution similarity.
FID stands as a cornerstone metric that measures the distance between the distributions
of generated and real images. Lower FID scores signify a closer match between generated
and real-world images. In addition, it shows superior model performance in mimicking
real data distributions.
Thank you.

You might also like