0% found this document useful (0 votes)

7 views

NLP Exercise 10

Uploaded by

judinjomon

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views

NLP Exercise 10

Uploaded by

judinjomon

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 6

Exercise 10: Develop a python program to fine-tune a BERT model for text

classification task.

pip install fsspec==2024.10.0

pip install --upgrade gcsfs fsspec
pip install datasets evaluate

import torch
import evaluate
from transformers import BertTokenizer, BertForSequenceClassification, Trainer,
TrainingArguments
from datasets import load_dataset
from sklearn.model_selection import train_test_split
import numpy as np

# Load the IMDB dataset

dataset = load_dataset("imdb")

# Limit the dataset to only a few rows for testing

subset_size = 100 # Adjust the number of rows as needed
small_train_data = dataset['train'].select(range(subset_size))
small_val_data = dataset['test'].select(range(subset_size))

# Load pretrained BERT model and tokenizer

model_name = "bert-base-uncased"
tokenizer = BertTokenizer.from_pretrained(model_name)
model = BertForSequenceClassification.from_pretrained(model_name, num_labels=2)

# Tokenize the dataset

def tokenize_function(examples):
return tokenizer(examples['text'], padding="max_length", truncation=True)

train_encodings = small_train_data.map(tokenize_function, batched=True)

val_encodings = small_val_data.map(tokenize_function, batched=True)

# Convert labels to torch tensors

train_encodings.set_format("torch", columns=["input_ids", "attention_mask", "label"])
val_encodings.set_format("torch", columns=["input_ids", "attention_mask", "label"])
metric = evaluate.load("accuracy")

def compute_metrics(eval_pred):
logits, labels = eval_pred
predictions = np.argmax(logits, axis=-1)
return metric.compute(predictions=predictions, references=labels)

# Set up Trainer with training arguments

training_args = TrainingArguments(
output_dir="./results",
evaluation_strategy="epoch",
learning_rate=2e-5,
per_device_train_batch_size=8,
per_device_eval_batch_size=8,
num_train_epochs=1,
weight_decay=0.01,
)

trainer = Trainer(
model=model,
args=training_args,
train_dataset=train_encodings,
eval_dataset=val_encodings,
compute_metrics=compute_metrics,
)

# Fine-tune the model

trainer.train()

# Validate the model on test data

results = trainer.evaluate()

# Show accuracy
print(f"Validation accuracy: {results['eval_accuracy']:.4f}")

# Function to make a prediction on new input text

def classify_text(text):
tokens = tokenizer(
text,
max_length=128, # Set maximum length of input sequence
padding='max_length', # Pad to max length
truncation=True, # Truncate if text is too long
return_tensors="pt" # Return as PyTorch tensors
)

device = model.device # Get the device where the model is located

tokens = tokens.to(device) # Move input tokens to the same device as model
outputs = model(**tokens)

prediction = torch.argmax(outputs.logits, dim=1).item()

label = "Positive" if prediction == 1 else "Negative"
return label

# Example usage
new_text = "The food was awful and the service was great!"
print(f"Text: '{new_text}'")
print("Classification:", classify_text(new_text))
Exercise 10: Develop a python program to fine-tune a BERT model for text
classification task.

Step 1: Install Required Libraries

 Install necessary Python libraries for managing datasets, evaluating metrics, and
using the BERT model:

pip install fsspec==2024.10.0

pip install --upgrade gcsfs fsspec
pip install datasets evaluate

Step 2: Import Required Modules

import torch
import evaluate
from transformers import BertTokenizer, BertForSequenceClassification,
Trainer, TrainingArguments
from datasets import load_dataset
from sklearn.model_selection import train_test_split
import numpy as np

- `torch`: A machine learning library for building and training models.

- `evaluate`: Provides functions to evaluate model performance.
- `transformers`: Contains pretrained BERT models and tokenizers.
- `datasets`: Enables loading and processing datasets.
- `numpy`: Useful for numerical operations.

Step 3: Load IMDB Dataset

subset_size = 100
small_train_data = dataset['train'].select(range(subset_size))
small_val_data = dataset['test'].select(range(subset_size))

3. Model Loading
model_name = "bert-base-uncased"
tokenizer = BertTokenizer.from_pretrained(model_name)
model = BertForSequenceClassification.from_pretrained(model_name,
num_labels=2)
Explanation:
- `BertTokenizer.from_pretrained`: Loads the tokenizer for the BERT model.
- `BertForSequenceClassification`: Loads the BERT model for classification tasks
with two labels (positive and negative).
4. Training and Evaluation
def tokenize_function(examples):
return tokenizer(examples['text'], padding="max_length", truncation=True)
train_encodings = small_train_data.map(tokenize_function, batched=True)
val_encodings = small_val_data.map(tokenize_function, batched=True)
Explanation:
- `tokenize_function`: Converts text into token IDs with padding and truncation for
uniform input length.
- `map`: Applies the tokenization function to the dataset.
Training arguments are set to control the training process:
training_args = TrainingArguments(
output_dir="./results",
evaluation_strategy="epoch",
learning_rate=2e-5,
per_device_train_batch_size=8,
num_train_epochs=1,
weight_decay=0.01,
)
Explanation:
- `output_dir`: Directory to save the model outputs.
- `evaluation_strategy`: Evaluates the model at the end of each epoch.
- `learning_rate`: The learning rate for the optimizer.
5. Making Predictions
def classify_text(text):
tokens = tokenizer(
text, max_length=128, padding='max_length', truncation=True,
return_tensors="pt"
)
device = model.device
tokens = tokens.to(device)
outputs = model(**tokens)
prediction = torch.argmax(outputs.logits, dim=1).item()
label = "Positive" if prediction == 1 else "Negative"
return label
This function takes new text as input and classifies it as Positive or Negative:
- Tokenizes the input text.
- Moves tokens to the same device as the model.
- Gets the logits from the model and converts them into predictions.

BERT - Assignment - Jupyter Notebook
0% (2)
BERT - Assignment - Jupyter Notebook
8 pages
Neural Networks: 1 Basic Optimizer
No ratings yet
Neural Networks: 1 Basic Optimizer
8 pages
Bert T
No ratings yet
Bert T
2 pages
Hugging Face
No ratings yet
Hugging Face
1 page
FineTune OPUS MT Engine
No ratings yet
FineTune OPUS MT Engine
9 pages
Bert Fine Tuning (AutoRecovered)
No ratings yet
Bert Fine Tuning (AutoRecovered)
6 pages
Bert
No ratings yet
Bert
2 pages
566f0619-9145-4b8f-b12b-cb8a5b0cd30d
No ratings yet
566f0619-9145-4b8f-b12b-cb8a5b0cd30d
17 pages
hybridmodel with cnn modifications
No ratings yet
hybridmodel with cnn modifications
5 pages
3-Sentiment Analysis BERT
No ratings yet
3-Sentiment Analysis BERT
5 pages
Finetuning
No ratings yet
Finetuning
3 pages
Day 10 of Mastering LLMs_ Tokenizers
No ratings yet
Day 10 of Mastering LLMs_ Tokenizers
10 pages
PHASE 2 IBM
No ratings yet
PHASE 2 IBM
5 pages
EXERCISE-6
No ratings yet
EXERCISE-6
1 page
bertweet tokenizer
No ratings yet
bertweet tokenizer
2 pages
DOC-20250104-WA0000.
No ratings yet
DOC-20250104-WA0000.
40 pages
1729401471516
No ratings yet
1729401471516
98 pages
Day 10 of Mastering LLMs_ Tokenizers
No ratings yet
Day 10 of Mastering LLMs_ Tokenizers
10 pages
Fine-tuned vs RAG Short Notes ?
No ratings yet
Fine-tuned vs RAG Short Notes ?
25 pages
cl12_huggingface
No ratings yet
cl12_huggingface
34 pages
Team Name - Codesmashers Team Members - Manmeet Singh Tuteja, Raghav Gupta
No ratings yet
Team Name - Codesmashers Team Members - Manmeet Singh Tuteja, Raghav Gupta
4 pages
Lab 2 Assignment_W2022
No ratings yet
Lab 2 Assignment_W2022
8 pages
NLP
No ratings yet
NLP
45 pages
Practical 7 Thsem
No ratings yet
Practical 7 Thsem
50 pages
Assessment: The Dataset
No ratings yet
Assessment: The Dataset
5 pages
Experiment 10
No ratings yet
Experiment 10
5 pages
PGI20S02J - LAB RECORD (3)
No ratings yet
PGI20S02J - LAB RECORD (3)
24 pages
Exp 10 Sentiment Analysis BERT
No ratings yet
Exp 10 Sentiment Analysis BERT
5 pages
miniProject_NLP
No ratings yet
miniProject_NLP
22 pages
cnnquiz.md
No ratings yet
cnnquiz.md
6 pages
Experiment 2
No ratings yet
Experiment 2
5 pages
ASNM Program Explain
No ratings yet
ASNM Program Explain
4 pages
COMP 4650 6490 Assignment 3 2023-v1.1
No ratings yet
COMP 4650 6490 Assignment 3 2023-v1.1
6 pages
Keras For Beginners: Implementing A Recurrent Neural Network
No ratings yet
Keras For Beginners: Implementing A Recurrent Neural Network
13 pages
Hugging Face
100% (1)
Hugging Face
11 pages
Assignment3
No ratings yet
Assignment3
6 pages
Assignment Text Classification Using Hugging Face
No ratings yet
Assignment Text Classification Using Hugging Face
6 pages
DL & AI - Lab Manual
No ratings yet
DL & AI - Lab Manual
33 pages
dl_22Q71A4206
No ratings yet
dl_22Q71A4206
65 pages
Course 3 - Week 2 - Exercise - Answer - Ipynb - Colaboratory
No ratings yet
Course 3 - Week 2 - Exercise - Answer - Ipynb - Colaboratory
8 pages
Deep Learning Final Project FS24-2
No ratings yet
Deep Learning Final Project FS24-2
5 pages
Train Edu Bert
No ratings yet
Train Edu Bert
3 pages
Lecture 2 - Hello World in ML
No ratings yet
Lecture 2 - Hello World in ML
49 pages
keras
No ratings yet
keras
4 pages
NNDL Lab Manual
No ratings yet
NNDL Lab Manual
43 pages
bert_tokenizer
No ratings yet
bert_tokenizer
2 pages
Chapter04 - Getting Started With Neural Networks
No ratings yet
Chapter04 - Getting Started With Neural Networks
9 pages
RLDL128
No ratings yet
RLDL128
73 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
10 pages
NNDL Lab Record
No ratings yet
NNDL Lab Record
26 pages
2802ICT Programming Assignment 2
No ratings yet
2802ICT Programming Assignment 2
6 pages
Oracle Certified Professional Java Programmer OCPJP 1Z0 809
From Everand
Oracle Certified Professional Java Programmer OCPJP 1Z0 809
Manish Soni
No ratings yet
R Deep Neural Network Step by Step
No ratings yet
R Deep Neural Network Step by Step
27 pages
Final Assesment
No ratings yet
Final Assesment
1 page
Retorno 1
No ratings yet
Retorno 1
29 pages
Deep Neural Network Application
No ratings yet
Deep Neural Network Application
17 pages
DL Exp-10,11,12
No ratings yet
DL Exp-10,11,12
6 pages
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
1b dl nandy-1
No ratings yet
1b dl nandy-1
6 pages
Sentiment Analysis On Tweets
No ratings yet
Sentiment Analysis On Tweets
2 pages
Ijfs 11 00110
No ratings yet
Ijfs 11 00110
17 pages
Prompt Engr - Module 14
No ratings yet
Prompt Engr - Module 14
5 pages
DL Practical
No ratings yet
DL Practical
14 pages
Report On Flipkart GRID Projects
No ratings yet
Report On Flipkart GRID Projects
2 pages
AI You Tube Pages
No ratings yet
AI You Tube Pages
156 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
5 pages
Prashanth PDF
No ratings yet
Prashanth PDF
2 pages
AI BCAI 551 Lab Manual
No ratings yet
AI BCAI 551 Lab Manual
54 pages
Ai Class 9 Unit 1
No ratings yet
Ai Class 9 Unit 1
26 pages
Learning Agent
No ratings yet
Learning Agent
6 pages
The History and Evolution of Artificial Intelligence
No ratings yet
The History and Evolution of Artificial Intelligence
4 pages
Nvidia Learning Learning Path Developers It Administrators
No ratings yet
Nvidia Learning Learning Path Developers It Administrators
19 pages
Artificial_intelligence
No ratings yet
Artificial_intelligence
70 pages
Project Report Title
No ratings yet
Project Report Title
9 pages
Detection of Turkish Fake News From Tweets With BERT Models
No ratings yet
Detection of Turkish Fake News From Tweets With BERT Models
14 pages
18CS71
No ratings yet
18CS71
4 pages
BLIP-2: Bootstrapping Language-Image Pre-Training With Frozen Image Encoders and Large Language Models
No ratings yet
BLIP-2: Bootstrapping Language-Image Pre-Training With Frozen Image Encoders and Large Language Models
13 pages
Syllabus
No ratings yet
Syllabus
2 pages
DNN ALL Practical 28
No ratings yet
DNN ALL Practical 28
34 pages
K_NN classification
No ratings yet
K_NN classification
4 pages
40 - Sentiment Extraction From Bangla Text A Character Level Supervised Recurrent Neural Network Approach
No ratings yet
40 - Sentiment Extraction From Bangla Text A Character Level Supervised Recurrent Neural Network Approach
5 pages
Machine Learning: Algorithms Types
No ratings yet
Machine Learning: Algorithms Types
27 pages
100 MCQ with Answers
No ratings yet
100 MCQ with Answers
12 pages
UNIT III
No ratings yet
UNIT III
24 pages
Cosmetic Suggestion Based On Skin Condition Using Artificial Intelligence
No ratings yet
Cosmetic Suggestion Based On Skin Condition Using Artificial Intelligence
6 pages
Aspect Based Sentiment Analysis Approaches and Algorithms
No ratings yet
Aspect Based Sentiment Analysis Approaches and Algorithms
4 pages
Sanyam Modi PPT Seminar PCE19IT051
No ratings yet
Sanyam Modi PPT Seminar PCE19IT051
13 pages
Transformer
No ratings yet
Transformer
33 pages
ChatGPT Is Not All You Need. A State of The Art Review of Large Generative AI Models
No ratings yet
ChatGPT Is Not All You Need. A State of The Art Review of Large Generative AI Models
22 pages
An Application of A Deep Learning Algorithm For Automatic Detection of Unexpected
No ratings yet
An Application of A Deep Learning Algorithm For Automatic Detection of Unexpected
7 pages