generative ai madhav
generative ai madhav
AIML-303
Bachelor of Technology
in
Assistant Professor
By-
Madhav
Khanna
A2305222268
6CSE4-Y
Experiment-01
Aim-
Build an Artificial Neural Network to implement Binary Classification task using the Back-propagation algorithm and
test the same using appropriate data sets.
Theory-
Binary Classification
Binary classification is a supervised learning task where the model predicts one of two possible classes (e.g., 0 or 1,
True or False). The output layer of the ANN in binary classification typically uses the sigmoid activation function,
which outputs a probability between 0 and 1.
Backpropagation Algorithm
Backpropagation (Backward Propagation of Errors) is a supervised learning algorithm used to train neural networks. It
efficiently computes the gradient of the loss function concerning the weights by applying the chain rule.
Steps of Backpropagation:
1. Forward Pass:
o The input data passes through the network, and the output is calculated.
2. Loss Calculation:
o Calculate the loss using a suitable loss function, like binary cross-entropy
3. Backward Pass:
o Compute the gradient of the loss function concerning each weight by applying the chain rule.
4. Weight Update:
o The weights and biases are updated to minimize the loss function.
5. Iteration:
o Repeat the forward and backward passes until convergence (minimum loss).
Activation Functions
2. Sigmoid:
Code-
pandas as pd
#importing dataset
churn_data.info()
unnecessary columns
churn_data.drop(['CustomerId','Surname'],axis=1,inplace=True)
churn_data.shape
def plot_univariate(col):
if(df[col].nunique()>2):
plt.figure(figsize=(10,7))
plt.figure(figsize=(6,6)) h = 0.5
for p in bars:
stats.spearmanr(df[col], df[hue])
feature.append(col)
if p > alpha:
else:
plot_univariate('Age') spearman(churn_data,
'Age')
#splitting dataset y =
churn_data.Exited
X = churn_data.drop(['Exited'], axis=1)
StandardScaler()
X_train = sc.fit_transform(X_train)
Sequential()
accuracy:', acc)
accuracy:', acc)
#ROC curve
Accuracy-
Confusion Matrix-
F1 Score-
ROC curve-
Conclusion-
The Artificial Neural Network (ANN) built for binary classification using the Backpropagation algorithm effectively
learns complex patterns and achieves high accuracy when tested on appropriate datasets. By optimizing
hyperparameters and employing techniques like normalization, dropout, and early stopping, the model demonstrates
robust performance, typically achieving accuracy 85.7%. The evaluation metrics, including precision, recall, F1-score,
and confusion matrix, confirm the model’s reliability in classifying binary outcomes. However, the model's
effectiveness depends on the quality and size of the data, and in some cases, simpler models may offer comparable
performance with lower computational cost.
Theory-
The primary objective of this experiment is to build an ANN model that can classify data into multiple classes using
the backpropagation algorithm. The model will be tested using an appropriate dataset to evaluate its accuracy and
performance.
Backpropagation Algorithm
The backpropagation algorithm is a supervised learning algorithm used for training artificial neural networks. It
minimizes the error by propagating it backward from the output layer to the input layer, adjusting weights to reduce
the error.
2. Forward Propagation: Compute the output of the network by passing the input through hidden layers and
calculating the output.
o Calculate the net input for each neuron o Apply activation function (like ReLU or softmax) to
3. Error Calculation: Compute the error at the output layer using a loss function (like categorical crossentropy).
4. Backward Propagation: Compute gradients of the error with respect to weights and biases.
6. Iteration: Repeat the process for a given number of epochs or until convergence.
numpy as np import
pandas as pd
#import dataset
bird_data.dropna(how='any', inplace=True)
LabelEncoder() bird_data[['type']] =
bird_data[['type']].apply(le.fit_transform) bird_data.head() y
= bird_data['type']
X = bird_data.drop(['type'],axis=1)
X.columns y
y.shape
=6
StandardScaler sc = StandardScaler()
6CSE4Y Madhav Khanna A2305222268
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
# sequential model to initialise our ann and dense module to build the layers
classifier = Sequential()
# Adding the input layer and the first hidden layer classifier.add(Dense(units = 8,
print("*****************")
np.argmax(y_test, axis = 1)
print("*****************")
print("Y_test:", y_true)
seaborn as sns
{1:0.4f}'.format(i, roc_auc[i])) plt.plot([0, 1], [0, 1], 'k--', lw=lw) plt.xlim([-0.05, 1.0])
Accuracy-
Confusion Matrix-
F1 Score-
AUC Curve-
Conclusion-
The Artificial Neural Network (ANN) built for multi-class classification using the Backpropagation algorithm
demonstrates remarkable accuracy when tested on appropriate datasets. By employing techniques such as softmax
activation in the output layer and categorical cross-entropy as the loss function, the model efficiently learns to
distinguish between multiple classes. Through hyperparameter tuning, normalization, and the use of dropout to
prevent overfitting, the model achieves high accuracy, typically ranging from 85% to 95%, depending on the
complexity and quality of the data. Evaluation metrics such as accuracy, precision, recall, F1-score, and the confusion
matrix indicate the model's effectiveness in multi-class classification tasks, making it a reliable choice for real-world
applications.
Theory-
1. Introduction to CNNs
Convolutional Neural Networks (CNNs) are specialized deep learning architectures used primarily for image
processing and computer vision tasks. They are highly effective in image classification, object detection, and
segmentation due to their ability to learn spatial hierarchies of features directly from the raw pixel data.
A typical CNN architecture for image classification consists of the following layers:
1. Input Layer:
o Takes an image as input, typically of shape (height, width, channels), where channels are usually 3
(RGB).
2. Convolutional Layer:
o The output is a feature map that highlights the presence of features like edges, textures, or patterns.
o Applies a non-linear activation function such as ReLU (f(x) = max(0, x)) to introduce non-linearity.
o Reduces the spatial dimensions of the feature map, retaining the most important information.
5. Dropout Layer:
o Connects every neuron from the previous layer to the output neurons.
o Usually uses a softmax activation function to obtain probabilities for classification tasks.
7. Output Layer:
Code-
numpy as np
from tensorflow.keras import applications, optimizers, models, layers, Input from tensorflow.keras.models import
seaborn as sns
= ImageDataGenerator(rescale=1./255) validation_datagen =
ImageDataGenerator(rescale=1./255) test_datagen =
ImageDataGenerator(rescale=1./255)
= train_datagen.flow_from_directory(
'C://Users//abhia//Downloads//plant_village(1)//plant_village/t/rain',
validation_generator = validation_datagen.flow_from_directory(
test_generator = test_datagen.flow_from_directory(
test_generator.next() plt.imshow(img[0])
plt.show()
= Sequential()
model.add(MaxPooling2D(pool_size=(2,2))) model.add(Conv2D(64,
model.add(MaxPooling2D(pool_size=(2,2))) model.add(Flatten())
model.summary()
# Compiling the model with Adam optimizer and categorical crossentropy loss
steps_per_epoch=train_generator.samples/train_generator.batch_size,
epochs=30, validation_data=validation_generator,
validation_steps=validation_generator.samples/validation_generator.batch_size,
verbose=2)
plt.legend() plt.show()
plt.legend() plt.show()
print(f'Accuracy: {accuracy:.2f}%')
plt.figure(figsize=(8, 6))
Visualization
Accuracy-
Confusion matrix
Conclusion-
The Convolutional Neural Network (CNN) architecture designed for image classification utilizes multiple
convolutional layers followed by pooling layers to extract spatial features from images effectively. After feature
extraction, fully connected layers are employed to classify the images into their respective categories. Hyperparameter
tuning, including optimizing learning rate, batch size, number of epochs, and activation functions, significantly
improves the model's performance. Techniques like data augmentation, dropout, and batch normalization are also
applied to enhance generalization and prevent overfitting. The model is trained and validated on an appropriate image
dataset, achieving high classification accuracy, typically ranging from 90% to 98%, depending on the dataset's
complexity and diversity. The results demonstrate the CNN's robustness and efficiency in image classification tasks.
-
Aim Deep Learning Training and Architecture, Feature Extraction, Models training with some pretrained models.
Theory- Deep learning training involves designing and optimizing neural network architectures to learn
from data. Feature extraction is a crucial step where meaningful patterns are derived from raw data, often
using convolutional or transformer-based layers. Model training can be done from scratch or by fine-tuning
pre-trained models like ResNet, VGG, or BERT to leverage prior knowledge. Transfer learning helps
improve accuracy and reduce training time by adapting pre-trained models to new tasks. Techniques like data
augmentation, regularization, and hyperparameter tuning further enhance model performance.
Code-
#Importing Necessary libraries import numpy as np from tensorflow.keras import Input from
tensorflow.keras import models from tensorflow.keras import layers from tensorflow.keras import optimizers
from tensorflow.keras.models import Model from tensorflow.keras import applications from tensorflow.keras
import backend as k import matplotlib.pyplot as plt from tensorflow.keras.optimizers import SGD, Adam from
#Loading the Training and Testing Data and Defining the Basic Parameters
# horizontal_flip=True,
# height_shift_range=0.1,
# width_shift_range=0.1
= ImageDataGenerator(rescale=1./255)
train_generator = train_datagen.flow_from_directory(
'/workspace/Bootcamp/Data/plant_village/train/',
class_mode='categorical')
# Read Validation data from directory and define target size with batch size
validation_generator = validation_datagen.flow_from_directory(
'/workspace/Bootcamp/Data/plant_village/val/', target_size=(128,
test_generator = test_datagen.flow_from_directory(
'/workspace/Bootcamp/Data/plant_village/test/',
class_mode='categorical', shuffle=False)
# print(img.shape)
# print(label)
plt.imshow(img[0]) plt.show()
img[0].shape #VGG16
base_model.summary()
flatten_layer = layers.GlobalAveragePooling2D()
6CSE4Y Madhav Khanna A2305222268
# dense_layer_1 = layers.Dense(64, activation='relu') #
model = models.Sequential([
base_model, flatten_layer,
prediction_layer
])
model.summary()
#training
# We are going to use accuracy metrics and cross entropy loss as performance parameters model.compile(optimizer
steps_per_epoch=train_generator.samples/train_generator.batch_size,
epochs=30, validation_data=validation_generator,
validation_steps=validation_generator.samples/validation_generator.batch_size,
model = models.load_model('VGG16_plant_deseas.h5')
print("Model is loaded")
model.save_weights('cnn_classification.h5')
model.load_weights('cnn_classification.h5')
= history.history['val_acc'] train_loss =
history.history['loss'] val_loss =
history.history['val_loss']
plt.legend() plt.show()
# **Performance measure**
= test_generator.filenames
test_generator.class_indices
predictions = model.predict_generator(test_generator,
steps=test_generator.samples/test_generator.batch_size,verbose=1)
predicted_classes = np.argmax(predictions,axis=1)
errors = {}/{}".format(len(errors),test_generator.samples))
sklearn.metrics import confusion_matrix import seaborn as sns import numpy as np from matplotlib
fontsize=15) plt.show(block=False)
Output:
VGG16
Confusion Matrix:
Confusion Matrix:
Conclusion:
Deep learning training and architecture design play a crucial role in building efficient models for complex tasks.
Feature extraction enhances learning by capturing meaningful patterns from raw data. Leveraging pre-trained models
like ResNet or BERT accelerates training and improves performance through transfer learning.
Code-
from tensorflow.keras.layers import Embedding, LSTM, Dense # layers of the architecture from
import load_model # load saved model import re from keras.layers import SimpleRNN # **Preparing
Stop Word is a commonly used words in a sentence, usually a search engine is programmed to ignore this words (i.e.
"the", "a", "an", "of", etc.) Declaring the english stop words import nltk
nltk.download("stopwords") english_stops =
Clean Dataset**
**In the original dataset, the reviews are still dirty. There are still html tags, numbers, uppercase, and punctuations.
This will not be good for training, so in load_dataset() function, beside loading the dataset using pandas, I also
preprocess the reviews by removing html tags, non alphabet (punctuations and numbers), stop words, and lower case
all of the reviews.**
**In the same function, We also encode the sentiments into integers (0 and 1). Where 0 is for negative sentiments and
load_dataset():
= df['sentiment'] # Sentiment/Output
# PRE-PROCESS REVIEW
x_data = x_data.apply(lambda review: [w for w in review.split() if w not in english_stops]) # remove stop words
review_length.append(len(review)) return
int(np.ceil(np.mean(review_length)))
# ENCODE REVIEW
token = Tokenizer(lower=False) # no need lower, because already lowered the data in load_data()
token.fit_on_texts(x_train)
= token.texts_to_sequences(x_test) max_length =
get_max_length() x_train =
pad_sequences(x_train, maxlen=max_length,
pad_sequences(x_test, maxlen=max_length,
= Sequential()
rnn.compile(loss="binary_crossentropy",optimizer='adam',metrics=["accuracy"])
20,batch_size=128,verbose = 1)
rnn.save('rnn.h5') loaded_model =
load_model('rnn.h5')
y in enumerate(y_test): if y == y_pred[i]:
true += 1
print('Accuracy: {}'.format(true/len(y_pred)*100))
movie!**
#**Example review**
re.compile(r'[^a-zA-Z\s]') review
= regex.sub('', review)
review.split(' ')
print('positive') else:
print('negative')
Model Summary:
Conclusion:
Recurrent Neural Networks (RNNs) are effective for sentiment analysis as they capture sequential dependencies in
text data. Techniques like LSTMs and GRUs help address vanishing gradient issues, improving performance on long
text sequences. Preprocessing steps like tokenization and embedding (e.g., Word2Vec, GloVe) enhance model
accuracy. Fine-tuning and regularization further optimize RNN-based sentiment analysis models for real-world
applications.
Experiment – 06
Aim-
Theory-
Sentiment analysis is a technique used to determine the sentiment or emotion conveyed by textual data. It is widely
used in applications like social media monitoring, customer feedback analysis, and product review mining. In the
context of tweets data, sentiment analysis helps identify whether a tweet expresses a positive, negative, or neutral
sentiment.
1. Data Preprocessing:
o Cleaning tweets (removing URLs, mentions, special characters, etc.). o Handling slang and
2. Text Vectorization:
o Using TF-IDF or Word Embeddings (like Word2Vec or GloVe) to convert textual data into
numerical vectors.
o LSTM Layer: Captures the temporal and contextual relationships between words.
matplotlib.pyplot as plt
df.drop(['count','hate_speech','offensive_language','neither','Unnamed: 0'],axis=1,inplace=True)
= df['class']
plt.show()
offensive_tweets.iloc[0:12000, :] df = pd.concat([hate_tweets,
import stopwords
import re
nltk.download('wordnet') nltk.download('stopwords')
d = {'luv': 'love', 'wud': 'would', 'lyk': 'like', 'wateva': 'whatever', 'ttyl': 'talk to you later', 'kul': 'cool', 'fyn': 'fine', 'omg':
'oh my god!', 'fam': 'family', 'bruh': 'brother', 'cud': 'could', 'fud': 'food', 'u': 'you', 'ur': 'your', 'bday': 'birthday', 'bihday':
'birthday'} stop_words = set(stopwords.words('english')) stop_words.add('rt')
'http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*,]|(?:%[0-9a-fA-F][0-9a-fA-F]))+' mention_regex =
'@[\w\-]+'
def clean_text(text):
x = df.processed_tweets y = df['class']
x_tfidf = vectorizer.transform(x).toarray()
model = Sequential([
SimpleRNN(8, return_sequences=True),
GlobalMaxPool1D(),
Dropout(0.25),
Dense(3, activation='softmax')
])
model.evaluate(x_test, y_test)
Output-
Model Summary-
Classification Report-
EXPERIMENT 7
AIM: To implement and analyze an autoencoder model for the MNIST dataset, demonstrating its capability in
dimensionality reduction and feature extraction.
THEORY:
Autoencoders are a type of artificial neural network used to learn efficient codings of input data. They consist of two
main parts:
#Pre-processing
def preprocess(array):
array = array.astype("float32") / 255.0 array
= np.reshape(array, (len(array), 28, 28, 1))
return array
def noise(array):
noise_factor = 0.4 #amount of noise to add
noisy_array = array + noise_factor * np.random.normal(
loc=0.0, scale=1.0, size=array.shape
)
display(array1, array2):
n = 10
ax = plt.subplot(2, n, i + 1 + n)
plt.imshow(image2.reshape(28, 28))
plt.gray()
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
plt.show()
# Since we only need images from the dataset to encode and decode, we #
won't use the labels.
(train_data, _), (test_data, _) = mnist.load_data()
# Display the train data and a version of it with added noise display(train_data,
noisy_train_data)
# Encoder
x = layers.Conv2D(32, (3, 3), activation="relu", padding="same")(input)
x = layers.MaxPooling2D((2, 2), padding="same")(x) x =
layers.Conv2D(32, (3, 3), activation="relu", padding="same")(x) x =
layers.MaxPooling2D((2, 2), padding="same")(x)
# Autoencoder autoencoder =
Model(input, x)
autoencoder.compile(optimizer="adam", loss="binary_crossentropy") autoencoder.summary()
Conclusion-
The model consists of an encoder that compresses the input data into a latent space representation and a decoder that
reconstructs the original data from this compressed form. Using the MNIST dataset, which contains handwritten
digits, the autoencoder effectively reduces the high-dimensional input (28x28 pixels) to a lower-dimensional latent
vector. The model is trained using the mean squared error loss and optimized with techniques like Adam. After
training, the autoencoder demonstrates impressive reconstruction quality while significantly reducing dimensionality,
proving its effectiveness in feature extraction and data compression tasks. The latent space representations can also be
visualized, highlighting clusters corresponding to different digits, indicating successful feature learning.
EXPERIMENT 8
AIM:
To implement and analyze a Variational Autoencoder (VAE) for image reconstruction, specifically for facial image
data. The goal is to understand how VAEs encode input data into a latent space and reconstruct images from sampled
latent variables.
THEORY:
Variational Autoencoders (VAEs) are a type of generative model that learns to encode input data into a
lowerdimensional latent space and generate new data samples from that space. Unlike traditional autoencoders, VAEs
introduce a probabilistic framework to enforce a continuous and structured latent space, which helps in generating
diverse and realistic outputs.
class Sampling(tf.keras.layers.Layer):
def call(self, inputs):
z_mean, z_log_var = inputs # Unpack the inputs into mean and log-variance
batch = tf.shape(z_mean)[0] # Get the batch size
dim = tf.shape(z_mean)[1] # Get the dimensionality of the latent space epsilon =
tf.keras.backend.random_normal(shape=(batch, dim)) # Sample from standard normal distribution return
epsilon * tf.exp(z_log_var * 0.5) + z_mean # Apply the reparameterization trick
# Encoder
encoder_inputs = tf.keras.Input(shape=(num_pixels,)) # Input layer for the encoder
x = tf.keras.layers.Dense(512, activation='relu')(encoder_inputs) # First dense layer with 512 units and ReLU activ
ation
x = tf.keras.layers.Dense(128, activation='relu')(x) # Second dense layer with 128 units and ReLU activation
x = tf.keras.layers.Dense(32, activation='relu')(x) # Third dense layer with 32 units and ReLU activation
z_mean = tf.keras.layers.Dense(num_latent_vars)(x) # Dense layer for the mean of the latent variables
z_log_var = tf.keras.layers.Dense(num_latent_vars)(z_mean) # Dense layer for the log-variance of the latent
variables
6CSE4Y Madhav Khanna A2305222268
z = Sampling()([z_mean, z_log_var]) # Sampling layer to sample the latent variables using the reparameterization
trick
# Decoder
decoder_inputs = tf.keras.Input(shape=(num_latent_vars,)) # Input layer for the decoder
x = tf.keras.layers.Dense(32, activation='relu')(decoder_inputs) # First dense layer with 32 units and ReLU
activation
x = tf.keras.layers.Dense(128, activation='relu')(x) # Second dense layer with 128 units and ReLU activation x
= tf.keras.layers.Dense(512, activation='relu')(x) # Third dense layer with 512 units and ReLU activation
reconstruction = tf.keras.layers.Dense(num_pixels, activation='linear')(x) # Output dense layer with 'num_pixels'
units and linear activation
# Full model
model_inputs = encoder.input # Inputs of the full VAE model are the inputs of the encoder
model_outputs = decoder(encoder.output) # Outputs of the full VAE model are the outputs of the decoder, given the
encoder's output
print(face_encoder.summary())
#We will use pixel_data as both the input to the model and the target to compare the output to. history
= face_model.fit(
pixel_data,
pixel_data,
validation_split=0.2,
batch_size=32,
epochs=100,
callbacks=[
tf.keras.callbacks.EarlyStopping(
monitor='val_loss',
patience=10,
restore_best_weights=True
)
])
#Image Reconstruction
#Let's see how the model does at reconstructing an image that it has already seen.
i=6
sample = np.array(pixel_data)[i].copy()
sample = sample.reshape(48, 48, 1)
plt.figure(figsize=(10, 5))
plt.subplot(1, 2, 1) plt.imshow(sample,
cmap='gray') plt.axis('off')
plt.subplot(1, 2, 2)
plt.imshow(reconstruction, cmap='gray') plt.axis('off')
plt.title("Reconstructed Image")
plt.show()
#Now let's see how we can use our own values to generate never-before-seen images.
# A function to allow us to specify our own latent variable values and plot the constructed image
def generate_face_image(latent1, latent2, latent3): latent_vars = np.array([[latent1, latent2,
latent3]]) reconstruction = np.array(face_decoder(latent_vars)) reconstruction =
reconstruction.reshape(48, 48, 1) plt.figure()
plt.imshow(reconstruction, cmap='gray')
plt.axis('off') plt.show()
# Let's get the min and max for each slider on the interactive widget
latent1_min = np.min(face_encoder(pixel_data).numpy()[:, 0]) latent1_max
= np.max(face_encoder(pixel_data).numpy()[:, 0]) latent2_min =
np.min(face_encoder(pixel_data).numpy()[:, 1]) latent2_max =
np.max(face_encoder(pixel_data).numpy()[:, 1])
EXPERIMENT 9
AIM: To implement and analyse a Generative Adversarial Network (GAN) for generating synthetic handwritten
digits using the MNIST dataset. The experiment aims to understand how GANs generate new data by learning from a
dataset of real images.
THEORY:
Generative Adversarial Networks (GANs) are a class of deep learning models used for generating new data that is
similar to a given dataset. A GAN consists of two neural networks that compete with each other in a zero-sum game:
BUFFER_SIZE = 60000
BATCH_SIZE = 256
# nch = 200
# g_input = Input(shape=[100])
# H = Dense(nch*14*14, kernel_initializer='glorot_normal')(g_input)
# H = BatchNormalization()(H)
# H = Activation('relu')(H)
# H = Reshape( [nch, 14, 14] )(H)
#Building Generator
def generator_model():
model = tf.keras.Sequential() model.add(Dense(7*7*256,
use_bias=False, input_shape=(100,)))
model.add(BatchNormalization())
model.add(LeakyReLU())
print(model.summary())
generator_model()
def discriminator_model():
model = tf.keras.Sequential()
model.add(Flatten())
model.add(Dense(1))
print(model.summary())
discriminator_model()
# This method returns a helper function to compute cross entropy loss cross_entropy
= tf.keras.losses.BinaryCrossentropy(from_logits=True) def
discriminator_loss(real_output, fake_output):
real_loss = cross_entropy(tf.ones_like(real_output), real_output)
fake_loss = cross_entropy(tf.zeros_like(fake_output), fake_output)
total_loss = real_loss + fake_loss return total_loss
def generator_loss(fake_output):
return cross_entropy(tf.ones_like(fake_output), fake_output)
#Training
import os
checkpoint_dir = '/content/drive/MyDrive/AMITY/Deep Learning (codes)/GAN/'
checkpoint_prefix = os.path.join(checkpoint_dir, "ckpt") checkpoint =
tf.train.Checkpoint(generator_optimizer=generator_optimizer,
discriminator_optimizer=discriminator_optimizer,
generator=generator,
discriminator=discriminator)
EPOCHS = 60
# We will reuse this seed overtime (so it's easier)
# to visualize progress in the animated GIF)
num_examples_to_generate = 16 noise_dim =
100
seed = tf.random.normal([num_examples_to_generate, noise_dim])
#Training Steps
gradients_of_generator = gen_tape.gradient(gen_loss,
generator.trainable_variables)
gradients_of_discriminator = disc_tape.gradient(disc_loss,
discriminator.trainable_variables)
#Train GAN
import time from IPython import display # A command shell for interactive
computing in Python.
# 4 - Print out the completed epoch no. and the time spent print
('Time for epoch {} is {} sec'.format(epoch + 1, time.time()-start))
EPOCHS)
#Generated Digits
checkpoint.restore(tf.train.latest_checkpoint(checkpoint_dir)) #
PIL is a library which may open different image file formats
import PIL
# Display a single image using the epoch number def
display_image(epoch_no):
return PIL.Image.open('image_at_epoch_{:04d}.png'.format(epoch_no)) display_image(EPOCHS)
anim_file = 'dcgan.gif'
AIM: To implement an image classification model using Vision Transformer (ViT) on the CIFAR-100 dataset and
evaluate its performance.
THEORY:
The Vision Transformer (ViT) is a deep learning model that applies the transformer architecture, originally designed
for natural language processing (NLP), to image classification tasks. Unlike traditional Convolutional Neural
Networks (CNNs), ViTs do not use convolutional layers but instead rely on self-attention mechanisms to model spatial
relationships between different parts of an image.
#Setup
= 100
learning_rate = 0.001
weight_decay = 0.0001
batch_size = 256 num_epochs
= 100
image_size = 72 # We'll resize input images to this size patch_size = 6
# Size of the patches to be extract from the input images num_patches
= (image_size // patch_size) ** 2 projection_dim = 64 num_heads = 4
transformer_units = [ projection_dim * 2, projection_dim,
] # Size of the transformer layers transformer_layers
=8
mlp_head_units = [2048, 1024] # Size of the dense layers of the final classifier
data_augmentation = keras.Sequential(
[
layers.Normalization(),
layers.Resizing(image_size, image_size),
layers.RandomFlip("horizontal"),
layers.RandomRotation(factor=0.02),
layers.RandomZoom(
height_factor=0.2, width_factor=0.2
),
],
name="data_augmentation",
)
# Compute the mean and the variance of the training data for normalization.
data_augmentation.layers[0].adapt(x_train)
class Patches(layers.Layer):
def __init__(self, patch_size):
super().__init__()
self.patch_size = patch_size
matplotlib.pyplot as plt
resized_image = tf.image.resize(
tf.convert_to_tensor([image]), size=(image_size, image_size)
)
patches = Patches(patch_size)(resized_image) print(f"Image
size: {image_size} X {image_size}") print(f"Patch size:
{patch_size} X {patch_size}") print(f"Patches per image:
{patches.shape[1]}")
print(f"Elements per patch: {patches.shape[-1]}")
n = int(np.sqrt(patches.shape[1]))
plt.figure(figsize=(4, 4)) for i, patch
in enumerate(patches[0]):
ax = plt.subplot(n, n, i + 1) patch_img =
tf.reshape(patch, (patch_size, patch_size, 3))
plt.imshow(patch_img.numpy().astype("uint8"))
plt.axis("off")
def create_vit_classifier():
inputs = layers.Input(shape=input_shape)
# Augment data.
6CSE4Y Madhav Khanna A2305222268
augmented = data_augmentation(inputs)
# Create patches.
patches = Patches(patch_size)(augmented)
# Encode patches.
encoded_patches = PatchEncoder(num_patches, projection_dim)(patches)
def run_experiment(model):
optimizer = tfa.optimizers.AdamW(
learning_rate=learning_rate, weight_decay=weight_decay
)
model.compile( optimizer=optimizer,
loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=[
keras.metrics.SparseCategoricalAccuracy(name="accuracy"),
keras.metrics.SparseTopKCategoricalAccuracy(5, name="top-5-accuracy"),
],
)
model.load_weights(checkpoint_filepath)
_, accuracy, top_5_accuracy = model.evaluate(x_test, y_test)
print(f"Test accuracy: {round(accuracy * 100, 2)}%") print(f"Test
top 5 accuracy: {round(top_5_accuracy * 100, 2)}%")
return history
CONCLUSION: