0% found this document useful (0 votes)
63 views

Generative AI (1)

The document outlines a course on Generative AI, including objectives such as understanding generative models, implementing them, and developing applications. It lists various practical experiments involving word embeddings, sentiment analysis, summarization, and chatbot creation. The course includes programming tasks using libraries like gensim and Hugging Face to explore and visualize word relationships and enhance AI prompts.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
63 views

Generative AI (1)

The document outlines a course on Generative AI, including objectives such as understanding generative models, implementing them, and developing applications. It lists various practical experiments involving word embeddings, sentiment analysis, summarization, and chatbot creation. The course includes programming tasks using libraries like gensim and Hugging Face to explore and visualize word relationships and enhance AI prompts.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Generative AI Semester 6

Course Code BAIL657C CIE Marks 50


Teaching Hours/Week (L:T:P: S) 0:0:1:0 SEE Marks 50
Credits 01 Exam Hours 100
Examination type (SEE) Practical
Course objectives:
● Understand the principles and concepts behind generative AI models
● Explain the knowledge gained to implement generative models using Prompt design frameworks.
● Apply various Generative AI applications for increasing productivity.
● Develop Large Language Model-based Apps.

Sl.NO Experiments
1. Explore pre-trained word vectors. Explore word relationships using vector arithmetic. Perform arithmetic
operations and analyze results.

2. Use dimensionality reduction (e.g., PCA or t-SNE) to visualize word embeddings for Q 1. Select 10 words from a
specific domain (e.g., sports, technology) and visualize their embeddings. Analyze clusters and relationships. Generate
contextually rich outputs using embeddings. Write a program to generate 5 semantically similar words for a given
input.

3. Train a custom Word2Vec model on a small dataset. Train embeddings on a domain-specific corpus (e.g., legal,
medical) and analyze how embeddings capture domain-specific semantics.

4. Use word embeddings to improve prompts for Generative AI model. Retrieve similar words using word embeddings.
Use the similar words to enrich a GenAI prompt. Use the AI model to generate responses for the original and enriched
prompts. Compare the outputs in terms of detail and relevance.

5. Use word embeddings to create meaningful sentences for creative tasks. Retrieve similar words for a seed word. Create
a sentence or story using these words as a starting point. Write a program that: Takes a seed word. Generates similar
words. Constructs a short paragraph using these words.

6. Use a pre-trained Hugging Face model to analyze sentiment in text. Assume a real-world application, Load the
sentiment analysis pipeline. Analyze the sentiment by giving sentences to input.

7. Summarize long texts using a pre-trained summarization model using Hugging face model. Load the
summarization pipeline. Take a passage as input and obtain the summarized text.

8. Install langchain, cohere (for key), langchain-community. Get the api key( By logging into Cohere and obtaining the
cohere key). Load a text document from your google drive . Create a prompt template to display the output in a
particular manner.

9. Take the Institution name as input. Use Pydantic to define the schema for the desired output and create a custom output
parser. Invoke the Chain and Fetch Results. Extract the below Institution related details from Wikipedia: The founder
of the Institution. When it was founded. The current branches in the institution . How many employees are
working in it. A brief 4-line summary of the institution.

10 Build a chatbot for the Indian Penal Code. We'll start by downloading the official Indian Penal Code document, and
then we'll create a chatbot that can interact with it. Users will be able to ask questions about the Indian Penal Code
and have a conversation with it.
PROGRAM 1:

1. Explore pre-trained word vectors. Explore word relationships using vector arithmetic. Perform
arithmetic operations and analyze results.

import gensim.downloader as api

# Load pre-trained Word2Vec model (Google News)


print("Loading model... (This may take a while)")
model = api.load("word2vec-google-news-300")
print("Model loaded!")

# Function to find similar words


def find_similar(word):
try:
similar_words = model.most_similar(word)
print(f"\nWords similar to '{word}':")
for w, score in similar_words[:5]: # Show top 5
print(f"{w}: {score:.4f}")
except KeyError:
print(f"'{word}' not found in the vocabulary.")

# Function to perform word arithmetic


def word_arithmetic(word1, word2, word3):
try:
result = model.most_similar(positive=[word1, word2], negative=[word3])
print(f"\n'{word1}' - '{word3}' + '{word2}' = '{result[0][0]}' (Most similar
word)")
except KeyError as e:
print(f"Error: {e}")

# Function to check similarity between two words


def check_similarity(word1, word2):
try:
similarity = model.similarity(word1, word2)
print(f"\nSimilarity between '{word1}' and '{word2}': {similarity:.4f}")
except KeyError as e:
print(f"Error: {e}")

# Function to find the odd one out


def odd_one_out(words):
try:
odd = model.doesnt_match(words)
print(f"\nOdd one out from {words}: {odd}")
except KeyError as e:
print(f"Error: {e}")
# Run the functions
find_similar("king")
word_arithmetic("king", "woman", "man") # Expected output: "queen"
check_similarity("king", "queen")
odd_one_out(["apple", "banana", "grape", "car"]) # "car" should be the odd one

OUTPUT

Loading model... (This may take a while)


Model loaded!

Words similar to 'king':


kings: 0.7138
queen: 0.6511
monarch: 0.6413
crown_prince: 0.6204
prince: 0.6160

'king' - 'man' + 'woman' = 'queen' (Most similar word)

Similarity between 'king' and 'queen': 0.6511

Odd one out from ['apple', 'banana', 'grape', 'car']: car


PROGRAM -2

Use dimensionality reduction (e.g., PCA or t-SNE) to visualize word embeddings


for Q 1. Select 10 words from a specific domain (e.g., sports, technology) and
visualize their embeddings. Analyze clusters and relationships. Generate
contextually rich outputs using embeddings. Write a program to generate 5
semantically similar words for a given input.

import gensim.downloader as api


import numpy as np
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA

# Load pre-trained Word2Vec model (Google News)


print("Loading model... (This may take a while)")
model = api.load("word2vec-google-news-300")
print("Model loaded!")

# Select 10 words from the Technology domain


tech_words = ["computer", "algorithm", "software", "hardware", "AI",
"cloud", "database", "network", "cybersecurity", "encryption"]

# Get their word vectors


word_vectors = np.array([model[word] for word in tech_words])

# Perform PCA to reduce to 2D


pca = PCA(n_components=2)
reduced_vectors = pca.fit_transform(word_vectors)

# Plot the words in 2D


plt.figure(figsize=(8,6))
for word, (x, y) in zip(tech_words, reduced_vectors):
plt.scatter(x, y)
plt.text(x+0.02, y+0.02, word, fontsize=12)

plt.title("2D Visualization of Technology Word Embeddings")


plt.xlabel("PCA Component 1")
plt.ylabel("PCA Component 2")
plt.grid()
plt.show()

# Function to find 5 similar words


def find_similar_words(word):
try:
similar_words = model.most_similar(word, topn=5)
print(f"\n5 words similar to '{word}':")
for w, score in similar_words:
print(f"{w}: {score:.4f}")
except KeyError:
print(f"'{word}' not found in the vocabulary.")

# Test with an input word


find_similar_words("AI")
OUTPUT

Loading model... (This may take a while)


Model loaded!

5 words similar to 'AI':


Steven_Spielberg_Artificial_Intelligence: 0.5576
Index_MDE_##/###/####: 0.5415
Enemy_AI: 0.5256
Ace_Combat_Zero: 0.5227
DOA4: 0.5183
PROGRAM -3
Train a custom Word2Vec model on a small dataset. Train embeddings on a
domain-specific corpus (e.g., legal, medical) and analyze how embeddings
capture domain-specific semantics.

REQUIREMENTS FOR PROGRAM 3


1. STEP: USE NTLK FIRST
import nltk
nltk.download('punkt') # Ensure the main tokenizer is installed
nltk.download('averaged_perceptron_tagger')

2. STEP:- RUN THE COMMAND IN TERMNINAL OF VSCODE


pip install gensim spacy matplotlib scikit-learn
python -m spacy download en_core_web_sm

3. NOW RUN THE PROGRAM

import gensim
from gensim.models import Word2Vec
import spacy
import re
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA

# Load spaCy English tokenizer model


try:
nlp = spacy.load("en_core_web_sm")
except:
import os
os.system("python -m spacy download en_core_web_sm")
nlp = spacy.load("en_core_web_sm")

# Sample Legal Domain Corpus


legal_corpus = [
"The defendant is liable for breach of contract.",
"The plaintiff has the burden of proof.",
"Intellectual property rights are legally protected.",
"The court ruled in favor of the plaintiff.",
"A contract must have mutual consideration.",
"Negligence is a failure to exercise reasonable care."
]

# Preprocess text using spaCy


def preprocess_text(text):
text = text.lower() # Convert to lowercase
text = re.sub(r'[^\w\s]', '', text) # Remove punctuation
doc = nlp(text) # Process text with spaCy
tokens = [token.text for token in doc if not token.is_stop] # Remove stopwords
return tokens

# Tokenize entire corpus


tokenized_corpus = [preprocess_text(sentence) for sentence in legal_corpus]

# Train Word2Vec model


model = Word2Vec(sentences=tokenized_corpus, vector_size=50, window=3, min_count=1,
sg=1, epochs=100)

# Save and reload the model


model.save("legal_word2vec.model")
model = Word2Vec.load("legal_word2vec.model")

# Analyze embeddings (handling KeyError if word not found)


def print_similar_words(word):
try:
print(f"Similar words to '{word}':", model.wv.most_similar(word))
except KeyError:
print(f"'{word}' not in vocabulary!")

print_similar_words('plaintiff')
print_similar_words('contract')

# Visualizing Word Embeddings


def visualize_embeddings(model):
words = list(model.wv.index_to_key) # Get words in vocab
vectors = [model.wv[word] for word in words] # Get word vectors

# Reduce dimensions using PCA


pca = PCA(n_components=2)
reduced_vectors = pca.fit_transform(vectors)

# Plot word embeddings


plt.figure(figsize=(8, 6))
for word, coord in zip(words, reduced_vectors):
plt.scatter(coord[0], coord[1])
plt.annotate(word, (coord[0], coord[1]))

plt.title("Word Embeddings Visualization")


plt.show()

# Call visualization function


visualize_embeddings(model)
OUTPUT

Similar words to 'plaintiff': [('mutual', 0.23001788556575775), ('failure',


0.1589910238981247), ('negligence', 0.15074265003204346), ('liable',
0.12401115894317627), ('protected', 0.0824722871184349), ('burden',
0.07484304159879684), ('defendant', 0.054710932075977325), ('contract',
0.046396300196647644), ('breach', 0.020274512469768524), ('ruled',
0.01608944870531559)]
Similar words to 'contract': [('negligence', 0.27227097749710083), ('ruled',
0.21621155738830566), ('legally', 0.17338840663433075), ('court',
0.1557116061449051), ('proof', 0.13357841968536377), ('care',
0.13247060775756836), ('property', 0.10948310047388077), ('exercise',
0.07479500770568848), ('mutual', 0.061234381049871445), ('consideration',
0.052236158400774)]
PROGRAM -4

Use word embeddings to improve prompts for Generative AI model. Retrieve


similar words using word embeddings. Use the similar words to enrich a
GenAI prompt. Use the AI model to generate responses for the original and
enriched prompts. Compare the outputs in terms of detail and relevance.

Follow the steps to run the program if error occurs

step1 : - Update the opt tree


python -m pip install --upgrade "optree>=0.13.0"

step 2: install tf keras library


pip install tf-keras

step3:- Fix TensorFlow one DNN Warnings run the program in VSCODE termninal

set TF_ENABLE_ONEDNN_OPTS=0 # Windows Command Prompt

Step 4 :- check the required libraries


pip install --upgrade transformers gensim torch tensorflow optree

Now run the program

import gensim.downloader as api


from transformers import pipeline

# Load embedding model


embedding_model = api.load("glove-wiki-gigaword-100")

original_prompt = "Describe the beautiful landscapes during sunset."

def enrich_prompt(prompt, embedding_model, n=5):


words = prompt.split()
enriched_prompt = []

for word in words:


word_lower = word.lower()

if word_lower in embedding_model:
similar_words = embedding_model.most_similar(word_lower, topn=n)
similar_word_list = [w[0] for w in similar_words]
enriched_prompt.append(" ".join(similar_word_list)) # Join similar words as a phrase
else:
enriched_prompt.append(word) # Keep the word as is if not found
return " ".join(enriched_prompt)

enriched_prompt = enrich_prompt(original_prompt, embedding_model)

# Load text generation model


generator = pipeline("text-generation", model="gpt2")

# Generate responses
original_response = generator(original_prompt, max_length=50, num_return_sequences=1)
enriched_response = generator(enriched_prompt, max_length=50,
num_return_sequences=1)

# Print results
print("Original prompt response")
print(original_response[0]['generated_text'])

print("\nEnriched prompt response")


print(enriched_response[0]['generated_text'])

OUTPUT
Original prompt response
Describe the beautiful landscapes during sunset. View the entire project »

View more photos View slideshow

View gallery

Enriched prompt response


explain describing distinguish understand define this part one of same lovely
gorgeous wonderful charming magnificent landscape seascapes cityscapes scenery
paintings after early since following days sunset. cityscape paintings after all great of
beautiful seascapes and also a vast beautiful
PROGRAM -5

Use word embeddings to create meaningful sentences for creative tasks.


Retrieve similar words for a seed word. Create a sentence or story using these
words as a starting point. Write a program that: Takes a seed word. Generates
similar words. Constructs a short paragraph using these words

import random
import gensim.downloader as api

# Load a pre-trained word embedding model


model = api.load("glove-wiki-gigaword-50") # 50D GloVe embeddings

def get_similar_words(seed_word, top_n=5):


"""Retrieve similar words for the given seed word."""
try:
similar_words = [word for word, _ in model.most_similar(seed_word,
topn=top_n)]
return similar_words
except KeyError:
return []

def create_paragraph(seed_word):
"""Generate a short paragraph using the seed word and its similar words."""
similar_words = get_similar_words(seed_word)

if not similar_words:
return f"Could not find similar words for '{seed_word}'. Try another word!"

# Create a simple paragraph


paragraph = (
f"Once upon a time, a {seed_word} embarked on a journey. Along the way, it
encountered "
f"a {random.choice(similar_words)}, which led it to a hidden
{random.choice(similar_words)}. "
f"Despite the challenges, it found {random.choice(similar_words)} and
embraced the "
f"adventure with {random.choice(similar_words)}. In the end, the journey
was a tale of "
f"{random.choice(similar_words)} and discovery."
)

return paragraph
# Example usage
seed_word = input("Enter a seed word: ").strip().lower()
print("\nGenerated Story:\n")
print(create_paragraph(seed_word))

OUTPUT
Enter a seed word: adventure

Generated Story:

Once upon a time, a adventure embarked on a journey. Along the way, it


encountered a adventures, which led it to a hidden adventures. Despite the
challenges, it found romance and embraced the adventure with mystery. In the end,
the journey was a tale of mystery and discovery.
PROGRAM -6
Use a pre-trained Hugging Face model to analyze sentiment in text. Assume a
real-world application, Load the sentiment analysis pipeline. Analyze the
sentiment by giving sentences to input.

from transformers import pipeline

# Load the sentiment analysis pipeline


sentiment_analyzer = pipeline("sentiment-analysis")

def analyze_sentiment(text):
"""Analyze sentiment of the input text using Hugging Face pipeline."""
result = sentiment_analyzer(text)
label = result[0]['label']
score = result[0]['score']

return f"Sentiment: {label} (Confidence: {score:.2f})"

# Example usage
while True:
user_input = input("Enter a sentence for sentiment analysis (or 'exit' to quit):
").strip()
if user_input.lower() == 'exit':
break
print(analyze_sentiment(user_input))

OUTPUT
Device set to use CPU
Enter a sentence for sentiment analysis (or 'exit' to quit): I love this product! It's
amazing
Sentiment: POSITIVE (Confidence: 1.00)
Enter a sentence for sentiment analysis (or 'exit' to quit): This is the worst
experience ever
Sentiment: NEGATIVE (Confidence: 1.00)
Enter a sentence for sentiment analysis (or 'exit' to quit): The service was okay,
nothing special
Sentiment: NEGATIVE (Confidence: 0.99)
PROGRAM 8
Summarize long texts using a pre-trained summarization model using Hugging
face model. Load the summarization pipeline. Take a passage as input and
obtain the summarized text.

from transformers import pipeline

# Load the summarization pipeline (using a pre-trained model)


summarizer = pipeline("summarization")

def summarize_text(text):
"""Summarize the input text using Hugging Face's summarization model."""
summary = summarizer(text, max_length=100, min_length=30, do_sample=False)
return summary[0]['summary_text']

# Example usage
print("Enter a long passage for summarization (or 'exit' to quit):")
while True:
long_text = input("\nPaste your text: ").strip()
if long_text.lower() == 'exit':
break
print("\nSummarized Text:\n")
print(summarize_text(long_text))

OUTPUT
Device set to use cpu
Enter a long passage for summarization (or 'exit' to quit):

Paste your text: "Artificial Intelligence (AI) is transforming industries worldwide.


From healthcare to finance, AI is automating processes, improving efficiency, and
enabling data-driven decision-making. Companies are investing heavily in AI
research to stay competitive. However, challenges such as ethical concerns, bias in
AI models, and the need for regulation remain critical issues."

Summarized Text:
Your max_length is set to 100, but your input_length is only 70. Since this is a
summarization task, where outputs shorter than the input are typically wanted, you
might consider decreasing max_length manually, e.g. summarizer('...',
max_length=35)
From healthcare to finance, AI is automating processes, improving efficiency, and
enabling data-driven decision-making . Companies are investing heavily in AI
research to stay competitive . But ethical concerns, bias in AI models and the need
for regulation remain critical issues .

You might also like