0% found this document useful (0 votes)
7 views

chapter4 (1)

The document provides an overview of loading and processing data for deep learning using PyTorch, focusing on an animals dataset. It covers defining input features, target values, creating a TensorDataset, and using DataLoader for batching. Additionally, it discusses model evaluation metrics, overfitting, and strategies to improve model performance, including regularization techniques and hyperparameter tuning.

Uploaded by

hunglaikcad1247
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

chapter4 (1)

The document provides an overview of loading and processing data for deep learning using PyTorch, focusing on an animals dataset. It covers defining input features, target values, creating a TensorDataset, and using DataLoader for batching. Additionally, it discusses model evaluation metrics, overfitting, and strategies to improve model performance, including regularization techniques and hyperparameter tuning.

Uploaded by

hunglaikcad1247
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

A deeper dive into

loading data
INTRODUCTION TO DEEP LEARNING WITH PYTORCH

Maham Faisal Khan


Senior Data Scientist
Back to our animals dataset
import pandas as pd
pd.read_csv('animals.csv')

animal_name hair feathers eggs milk predator fins legs tail type
skimmer 0 1 1 0 1 0 2 1 2
gull 0 1 1 0 1 0 2 1 2
seahorse 0 0 1 0 0 1 0 1 4
tuatara 0 0 1 0 1 0 4 1 3
squirrel 1 0 0 1 0 0 2 1 1

Type key: mammal (1), bird (2), reptile (3), fish (4), amphibian (5), bug (6), invertebrate (7).

INTRODUCTION TO DEEP LEARNING WITH PYTORCH


Back to our animals dataset: defining features
import numpy as np
# Define input features
features = animals.iloc[:, 1:-1]
X = features.to_numpy()
print(X)

array([[0, 1, 1, 0, 1, 0, 2, 1],
[0, 1, 1, 0, 1, 0, 2, 1],
[0, 0, 1, 0, 0, 1, 0, 1],
[0, 0, 1, 0, 1, 0, 4, 1],
[1, 0, 0, 1, 0, 0, 2, 1]])

INTRODUCTION TO DEEP LEARNING WITH PYTORCH


Back to our animals dataset: defining target values
# Define target features (ground truth)
target = animals.iloc[:, -1]
y = target.to_numpy()

array([2, 2, 4, 3, 1])

INTRODUCTION TO DEEP LEARNING WITH PYTORCH


Recalling TensorDataset
import torch
from torch.utils.data import TensorDataset

# Instantiate dataset class


dataset = TensorDataset(torch.tensor(X).float(), torch.tensor(y).float())

# Access an individual sample


sample = dataset[0]
input_sample, label_sample = sample
print('input sample:', input_sample)
print('label_sample:', label_sample)

input sample: tensor([0., 1., 1., 0., 1., 0., 2., 1.])
label_sample: tensor(2.)

INTRODUCTION TO DEEP LEARNING WITH PYTORCH


Recalling DataLoader
from torch.utils.data import DataLoader

batch_size = 2
shuffle = True

# Create a DataLoader
dataloader = DataLoader(dataset, batch_size=batch_size, shuffle=shuffle)

INTRODUCTION TO DEEP LEARNING WITH PYTORCH


Recalling DataLoader
# Iterate over the dataloader
for batch_inputs, batch_labels in dataloader:
print('batch inputs', batch_inputs)
print('batch labels', batch_labels)

batch inputs: tensor([[0., 0., 1., 0., 0., 1., 0., 1.],
[0., 1., 1., 0., 1., 0., 2., 1.]])
batch labels: tensor([4., 2.])
batch inputs: tensor([[0., 1., 1., 0., 1., 0., 2., 1.],
[1., 0., 0., 1., 0., 0., 2., 1.]])
batch labels: tensor([2., 1.])
batch inputs: tensor([[0., 0., 1., 0., 1., 0., 4., 1.]])
batch labels: tensor([3.])

INTRODUCTION TO DEEP LEARNING WITH PYTORCH


Let's practice!
INTRODUCTION TO DEEP LEARNING WITH PYTORCH
Evaluating model
performance
INTRODUCTION TO DEEP LEARNING WITH PYTORCH

Maham Faisal Khan


Senior Data Scientist
Training, validation and testing
Raw dataset is usually split in three subsets:

Percent of data Role


Training 80-90% Used to adjust the model's parameters
Validation 10-20% Used for hyperparameter tuning
Testing 5-10% Only used once to calculate final metrics

INTRODUCTION TO DEEP LEARNING WITH PYTORCH


Model evaluation metrics
In this video, we'll focus on evaluating: In classification, accuracy measures how
Loss well model correctly predicts ground truth
Training labels

Validation

Accuracy
Training

Validation

INTRODUCTION TO DEEP LEARNING WITH PYTORCH


Calculating training loss
For each epoch: training_loss = 0.0
we sum up the loss for each iteration of for i, data in enumerate(trainloader, 0):
the training set dataloader # Run the forward pass
...
at the end of the epoch, we calculate the
mean training loss # Calculate the loss
loss = criterion(outputs, labels)
# Calculate the gradients
...
# Calculate and sum the loss
training_loss += loss.item()
epoch_loss = training_loss / len(trainloader)

INTRODUCTION TO DEEP LEARNING WITH PYTORCH


Calculating validation loss
After the training epoch, we iterate over the validation set and calculate the average
validation loss

validation_loss = 0.0
model.eval() # Put model in evaluation mode
with torch.no_grad(): # Speed up the forward pass
for i, data in enumerate(validationloader, 0):
# Run the forward pass
...
# Calculate the loss
loss = criterion(outputs, labels)
validation_loss += loss.item()
epoch_loss = validation_loss / len(validationloader)
model.train()

INTRODUCTION TO DEEP LEARNING WITH PYTORCH


Overfitting

INTRODUCTION TO DEEP LEARNING WITH PYTORCH


Calculating accuracy with torchmetrics
import torchmetrics

# Create accuracy metric using torch metrics


metric = torchmetrics.Accuracy(task="multiclass", num_classes=3)
for i, data in enumerate(dataloader, 0):
features, labels = data
outputs = model(features)
# Calculate accuracy over the batch
acc = metric(outputs, labels.argmax(dim=-1))
# Calculate accuracy over the whole epoch
acc = metric.compute()
print(f"Accuracy on all data: {acc}")
# Reset the metric for the next epoch (training or validation)
metric.reset()

INTRODUCTION TO DEEP LEARNING WITH PYTORCH


Let's practice!
INTRODUCTION TO DEEP LEARNING WITH PYTORCH
Fighting overfitting
INTRODUCTION TO DEEP LEARNING WITH PYTORCH

Maham Faisal Khan


Senior Data Scientist
Reasons for overfitting
Overfitting: the model does not generalize to unseen data.
model memorizes training data

good performances on the training set / poor performances on the validation set

Possible causes:

Problem Solutions
Dataset is not large enough Get more data / use data augmentation
Model has too much capacity Reduce model size / add dropout
Weights are too large Weight decay

INTRODUCTION TO DEEP LEARNING WITH PYTORCH


Fighting overfitting
Strategies:

Reducing model size or adding dropout layer

Using weight decay to force parameters to remain small

Obtaining new data or augmenting data

INTRODUCTION TO DEEP LEARNING WITH PYTORCH


"Regularization" using a dropout layer
Randomly zeroes out elements of the input tensor during training

model = nn.Sequential(nn.Linear(8, 4),


nn.ReLU(),
nn.Dropout(p=0.5))
features = torch.randn((1, 8))
model(i)

tensor([[1.4655, 0.0000, 0.0000, 0.8456]], grad_fn=<MulBackward0>)

Dropout is added after the activation function

Behaves differently during training and evaluation; we must remember to switch modes
using model.train() and model.eval()

INTRODUCTION TO DEEP LEARNING WITH PYTORCH


Regularization with weight decay
optimizer = optim.SGD(model.parameters(), lr=1e-3, weight_decay=1e-4)

Optimizer's weight_decay parameter takes values between zero and one


Typically small value, e.g. 1e-3

Weight decay adds penalty to loss function to discourage large weights and biases

The higher the parameter, the less likely the model is to overfit

INTRODUCTION TO DEEP LEARNING WITH PYTORCH


Data augmentation

INTRODUCTION TO DEEP LEARNING WITH PYTORCH


Let's practice!
INTRODUCTION TO DEEP LEARNING WITH PYTORCH
Improving model
performance
INTRODUCTION TO DEEP LEARNING WITH PYTORCH

Maham Faisal Khan


Senior Data Scientist
Steps to maximize performance
Overfit the training set
can we solve the problem?

sets a performance baseline

Reduce overfitting
improve performances on the validation
set

Fine-tune hyperparameters

INTRODUCTION TO DEEP LEARNING WITH PYTORCH


Step 1: overfit the training set
Modify the training loop to overfit a single data point (batch size of 1)

features, labels = next(iter(trainloader))


for i in range(1e3):
outputs = model(features)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()

should reach 1.0 accuracy and 0 loss

helps findings bugs in the code

Goal: minimize the training loss


create large enough model

use a default learning rate

INTRODUCTION TO DEEP LEARNING WITH PYTORCH


Step 2: reduce overfitting
Goal: maximize the validation accuracy
Experiment with:
Dropout

Data augmentation

Weight decay

Reducing model capacity

Keep track of each hyperparameter and


report maximum validation accuracy

INTRODUCTION TO DEEP LEARNING WITH PYTORCH


Step 2: reduce overfitting
Original model overfitting the Model with too much regularization
training data

INTRODUCTION TO DEEP LEARNING WITH PYTORCH


Step 3: fine-tune hyperparameters
Grid search Random search

for factor in range(2, 6): factor = np.random.uniform(2, 6)


lr = 10 ** -factor lr = 10 ** -factor

INTRODUCTION TO DEEP LEARNING WITH PYTORCH


Let's practice!
INTRODUCTION TO DEEP LEARNING WITH PYTORCH
Wrap-up video
INTRODUCTION TO DEEP LEARNING WITH PYTORCH

Maham Faisal Khan


Senior Data Scientist
Summary
Chapter 1 Chapter 3
Discovered deep learning Manipulated the architecture of a neural
network
Created small neural networks
Played with learning rate and momentum
Discovered linear layers and activation
functions Learned about transfer learning

Chapter 2 Chapter 4
Created and used loss functions Learned about dataloaders

Calculated derivatives and use Reduced overfitting


backpropagation Evaluated model performance
Trained a neural network

INTRODUCTION TO DEEP LEARNING WITH PYTORCH


Next steps
Course
Intermediate Deep Learning with
PyTorch

Learn
Probability and statistics

Linear algebra

Calculus

Practice
Pick a dataset on Kaggle
Check out DataCamp workspace
Train a neural network

INTRODUCTION TO DEEP LEARNING WITH PYTORCH


Let's practice!
INTRODUCTION TO DEEP LEARNING WITH PYTORCH

You might also like