0% found this document useful (0 votes)

2 views

Language Translation with nn.Transformer and torchtext — PyTorch Tutorials 2.3.0+cu121 documentation

This document is a tutorial on training a German to English translation model using the Transformer architecture and the torchtext library. It covers data sourcing and processing, model creation, training and evaluation loops, and includes code snippets for implementing each step. The tutorial also provides instructions for setting up the necessary environment and dependencies.

Uploaded by

Arez Zhang

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

Language Translation with nn.Transformer and torchtext — PyTorch Tutorials 2.3.0+cu121 documentation

Uploaded by

Arez Zhang

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Colab Notebook GitHub

Language Translation with nn.Transformer and torchtext

This tutorial shows:

How to train a translation model from scratch using Transformer.

Use torchtext library to access Multi30k dataset to train a German to English translation model.

Data Sourcing and Processing

torchtext library has utilities for creating datasets that can be easily iterated through for the purposes of creating a language translation model. In this example, we show how to use
torchtext’s inbuilt datasets, tokenize a raw text sentence, build vocabulary, and numericalize tokens into tensor. We will use Multi30k dataset from torchtext library that yields a pair of
source-target raw sentences.

To access torchtext datasets, please install torchdata following instructions at https://github.com/pytorch/data.

from torchtext.data.utils import get_tokenizer

from torchtext.vocab import build_vocab_from_iterator
from torchtext.datasets import multi30k, Multi30k
from typing import Iterable, List

# We need to modify the URLs for the dataset since the links to the original dataset are broken
# Refer to https://github.com/pytorch/text/issues/1756#issuecomment-1163664163 for more info
multi30k.URL["train"] = "https://raw.githubusercontent.com/neychev/small_DL_repo/master/datasets/Multi30k/training.tar.gz"
multi30k.URL["valid"] =
"https://raw.githubusercontent.com/neychev/small_DL_repo/master/datasets/Multi30k/validation.tar.gz"

SRC_LANGUAGE = 'de'
TGT_LANGUAGE = 'en'

# Place-holders
token_transform = {}
vocab_transform = {}

Create source and target language tokenizer. Make sure to install the dependencies.

pip install -U torchdata

pip install -U spacy
python -m spacy download en_core_web_sm
python -m spacy download de_core_news_sm

/
token_transform[SRC_LANGUAGE] = get_tokenizer('spacy', language='de_core_news_sm')
token_transform[TGT_LANGUAGE] = get_tokenizer('spacy', language='en_core_web_sm')

# helper function to yield list of tokens

def yield_tokens(data_iter: Iterable, language: str) -> List[str]:
language_index = {SRC_LANGUAGE: 0, TGT_LANGUAGE: 1}

for data_sample in data_iter:

yield token_transform[language](data_sample[language_index[language]])

# Define special symbols and indices

UNK_IDX, PAD_IDX, BOS_IDX, EOS_IDX = 0, 1, 2, 3
# Make sure the tokens are in order of their indices to properly insert them in vocab
special_symbols = ['<unk>', '<pad>', '<bos>', '<eos>']

for ln in [SRC_LANGUAGE, TGT_LANGUAGE]:

# Training data Iterator
train_iter = Multi30k(split='train', language_pair=(SRC_LANGUAGE, TGT_LANGUAGE))
# Create torchtext's Vocab object
vocab_transform[ln] = build_vocab_from_iterator(yield_tokens(train_iter, ln),
min_freq=1,
specials=special_symbols,
special_first=True)

# Set ``UNK_IDX`` as the default index. This index is returned when the token is not found.
# If not set, it throws ``RuntimeError`` when the queried token is not found in the Vocabulary.
for ln in [SRC_LANGUAGE, TGT_LANGUAGE]:
vocab_transform[ln].set_default_index(UNK_IDX)

Seq2Seq Network using Transformer

Transformer is a Seq2Seq model introduced in “Attention is all you need” paper for solving machine translation tasks. Below, we will create a Seq2Seq network that uses Transformer. The
network consists of three parts. First part is the embedding layer. This layer converts tensor of input indices into corresponding tensor of input embeddings. These embedding are further
augmented with positional encodings to provide position information of input tokens to the model. The second part is the actual Transformer model. Finally, the output of the Transformer
model is passed through linear layer that gives unnormalized probabilities for each token in the target language.

/
from torch import Tensor
import torch
import torch.nn as nn
from torch.nn import Transformer
import math
DEVICE = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# helper Module that adds positional encoding to the token embedding to introduce a notion of word order.
class PositionalEncoding(nn.Module):
def __init__(self,
emb_size: int,
dropout: float,
maxlen: int = 5000):
super(PositionalEncoding, self).__init__()
den = torch.exp(- torch.arange(0, emb_size, 2)* math.log(10000) / emb_size)
pos = torch.arange(0, maxlen).reshape(maxlen, 1)
pos_embedding = torch.zeros((maxlen, emb_size))
pos_embedding[:, 0::2] = torch.sin(pos * den)
pos_embedding[:, 1::2] = torch.cos(pos * den)
pos_embedding = pos_embedding.unsqueeze(-2)

self.dropout = nn.Dropout(dropout)
self.register_buffer('pos_embedding', pos_embedding)

def forward(self, token_embedding: Tensor):

return self.dropout(token_embedding + self.pos_embedding[:token_embedding.size(0), :])

# helper Module to convert tensor of input indices into corresponding tensor of token embeddings
class TokenEmbedding(nn.Module):
def __init__(self, vocab_size: int, emb_size):
super(TokenEmbedding, self).__init__()
self.embedding = nn.Embedding(vocab_size, emb_size)
self.emb_size = emb_size

def forward(self, tokens: Tensor):

return self.embedding(tokens.long()) * math.sqrt(self.emb_size)

# Seq2Seq Network
class Seq2SeqTransformer(nn.Module):
def __init__(self,
num_encoder_layers: int,
num_decoder_layers: int,
emb_size: int,
nhead: int,
src_vocab_size: int,
tgt_vocab_size: int,
dim_feedforward: int = 512,
dropout: float = 0.1):
super(Seq2SeqTransformer, self).__init__()
self.transformer = Transformer(d_model=emb_size,
nhead=nhead,
num_encoder_layers=num_encoder_layers,
num_decoder_layers=num_decoder_layers,
dim_feedforward=dim_feedforward,
dropout=dropout)
self.generator = nn.Linear(emb_size, tgt_vocab_size)
self.src_tok_emb = TokenEmbedding(src_vocab_size, emb_size)
self.tgt_tok_emb = TokenEmbedding(tgt_vocab_size, emb_size)
self.positional_encoding = PositionalEncoding(
emb_size, dropout=dropout)

def forward(self,
src: Tensor,
trg: Tensor,
src_mask: Tensor,
tgt_mask: Tensor,
src_padding_mask: Tensor,
tgt_padding_mask: Tensor,
memory_key_padding_mask: Tensor):
src_emb = self.positional_encoding(self.src_tok_emb(src))
tgt_emb = self.positional_encoding(self.tgt_tok_emb(trg))
outs = self.transformer(src_emb, tgt_emb, src_mask, tgt_mask, None,
src_padding_mask, tgt_padding_mask, memory_key_padding_mask)
return self.generator(outs)

def encode(self, src: Tensor, src_mask: Tensor):

return self.transformer.encoder(self.positional_encoding(
self.src_tok_emb(src)), src_mask)

def decode(self, tgt: Tensor, memory: Tensor, tgt_mask: Tensor):

return self.transformer.decoder(self.positional_encoding(
/
self.tgt_tok_emb(tgt)), memory,
tgt_mask)

During training, we need a subsequent word mask that will prevent the model from looking into the future words when making predictions. We will also need masks to hide source and
target padding tokens. Below, let’s define a function that will take care of both.

def generate_square_subsequent_mask(sz):
mask = (torch.triu(torch.ones((sz, sz), device=DEVICE)) == 1).transpose(0, 1)
mask = mask.float().masked_fill(mask == 0, float('-inf')).masked_fill(mask == 1, float(0.0))
return mask

def create_mask(src, tgt):

src_seq_len = src.shape[0]
tgt_seq_len = tgt.shape[0]

tgt_mask = generate_square_subsequent_mask(tgt_seq_len)
src_mask = torch.zeros((src_seq_len, src_seq_len),device=DEVICE).type(torch.bool)

src_padding_mask = (src == PAD_IDX).transpose(0, 1)

tgt_padding_mask = (tgt == PAD_IDX).transpose(0, 1)
return src_mask, tgt_mask, src_padding_mask, tgt_padding_mask

Let’s now define the parameters of our model and instantiate the same. Below, we also define our loss function which is the cross-entropy loss and the optimizer used for training.

torch.manual_seed(0)

SRC_VOCAB_SIZE = len(vocab_transform[SRC_LANGUAGE])
TGT_VOCAB_SIZE = len(vocab_transform[TGT_LANGUAGE])
EMB_SIZE = 512
NHEAD = 8
FFN_HID_DIM = 512
BATCH_SIZE = 128
NUM_ENCODER_LAYERS = 3
NUM_DECODER_LAYERS = 3

transformer = Seq2SeqTransformer(NUM_ENCODER_LAYERS, NUM_DECODER_LAYERS, EMB_SIZE,

NHEAD, SRC_VOCAB_SIZE, TGT_VOCAB_SIZE, FFN_HID_DIM)

for p in transformer.parameters():
if p.dim() > 1:
nn.init.xavier_uniform_(p)

transformer = transformer.to(DEVICE)

loss_fn = torch.nn.CrossEntropyLoss(ignore_index=PAD_IDX)

optimizer = torch.optim.Adam(transformer.parameters(), lr=0.0001, betas=(0.9, 0.98), eps=1e-9)

Collation
As seen in the Data Sourcing and Processing section, our data iterator yields a pair of raw strings. We need to convert these string pairs into the batched tensors that can be
processed by our Seq2Seq network defined previously. Below we define our collate function that converts a batch of raw strings into batch tensors that can be fed directly into our
model.

/
from torch.nn.utils.rnn import pad_sequence

# helper function to club together sequential operations

def sequential_transforms(*transforms):
def func(txt_input):
for transform in transforms:
txt_input = transform(txt_input)
return txt_input
return func

# function to add BOS/EOS and create tensor for input sequence indices
def tensor_transform(token_ids: List[int]):
return torch.cat((torch.tensor([BOS_IDX]),
torch.tensor(token_ids),
torch.tensor([EOS_IDX])))

# ``src`` and ``tgt`` language text transforms to convert raw strings into tensors indices
text_transform = {}
for ln in [SRC_LANGUAGE, TGT_LANGUAGE]:
text_transform[ln] = sequential_transforms(token_transform[ln], #Tokenization
vocab_transform[ln], #Numericalization
tensor_transform) # Add BOS/EOS and create tensor

# function to collate data samples into batch tensors

def collate_fn(batch):
src_batch, tgt_batch = [], []
for src_sample, tgt_sample in batch:
src_batch.append(text_transform[SRC_LANGUAGE](src_sample.rstrip("\n")))
tgt_batch.append(text_transform[TGT_LANGUAGE](tgt_sample.rstrip("\n")))

src_batch = pad_sequence(src_batch, padding_value=PAD_IDX)

tgt_batch = pad_sequence(tgt_batch, padding_value=PAD_IDX)
return src_batch, tgt_batch

Let’s define training and evaluation loop that will be called for each epoch.

/
from torch.utils.data import DataLoader

def train_epoch(model, optimizer):

model.train()
losses = 0
train_iter = Multi30k(split='train', language_pair=(SRC_LANGUAGE, TGT_LANGUAGE))
train_dataloader = DataLoader(train_iter, batch_size=BATCH_SIZE, collate_fn=collate_fn)

for src, tgt in train_dataloader:

src = src.to(DEVICE)
tgt = tgt.to(DEVICE)

tgt_input = tgt[:-1, :]

src_mask, tgt_mask, src_padding_mask, tgt_padding_mask = create_mask(src, tgt_input)

logits = model(src, tgt_input, src_mask, tgt_mask,src_padding_mask, tgt_padding_mask, src_padding_mask)

optimizer.zero_grad()

tgt_out = tgt[1:, :]
loss = loss_fn(logits.reshape(-1, logits.shape[-1]), tgt_out.reshape(-1))
loss.backward()

optimizer.step()
losses += loss.item()

return losses / len(list(train_dataloader))

def evaluate(model):
model.eval()
losses = 0

val_iter = Multi30k(split='valid', language_pair=(SRC_LANGUAGE, TGT_LANGUAGE))

val_dataloader = DataLoader(val_iter, batch_size=BATCH_SIZE, collate_fn=collate_fn)

for src, tgt in val_dataloader:

src = src.to(DEVICE)
tgt = tgt.to(DEVICE)

tgt_input = tgt[:-1, :]

src_mask, tgt_mask, src_padding_mask, tgt_padding_mask = create_mask(src, tgt_input)

logits = model(src, tgt_input, src_mask, tgt_mask,src_padding_mask, tgt_padding_mask, src_padding_mask)

tgt_out = tgt[1:, :]
loss = loss_fn(logits.reshape(-1, logits.shape[-1]), tgt_out.reshape(-1))
losses += loss.item()

return losses / len(list(val_dataloader))

Now we have all the ingredients to train our model. Let’s do it!

/
from timeit import default_timer as timer
NUM_EPOCHS = 18

for epoch in range(1, NUM_EPOCHS+1):

start_time = timer()
train_loss = train_epoch(transformer, optimizer)
end_time = timer()
val_loss = evaluate(transformer)
print((f"Epoch: {epoch}, Train loss: {train_loss:.3f}, Val loss: {val_loss:.3f}, "f"Epoch time = {(end_time -
start_time):.3f}s"))

# function to generate output sequence using greedy algorithm

def greedy_decode(model, src, src_mask, max_len, start_symbol):
src = src.to(DEVICE)
src_mask = src_mask.to(DEVICE)

memory = model.encode(src, src_mask)

ys = torch.ones(1, 1).fill_(start_symbol).type(torch.long).to(DEVICE)
for i in range(max_len-1):
memory = memory.to(DEVICE)
tgt_mask = (generate_square_subsequent_mask(ys.size(0))
.type(torch.bool)).to(DEVICE)
out = model.decode(ys, memory, tgt_mask)
out = out.transpose(0, 1)
prob = model.generator(out[:, -1])
_, next_word = torch.max(prob, dim=1)
next_word = next_word.item()

ys = torch.cat([ys,
torch.ones(1, 1).type_as(src.data).fill_(next_word)], dim=0)
if next_word == EOS_IDX:
break
return ys

# actual function to translate input sentence into target language

def translate(model: torch.nn.Module, src_sentence: str):
model.eval()
src = text_transform[SRC_LANGUAGE](src_sentence).view(-1, 1)
num_tokens = src.shape[0]
src_mask = (torch.zeros(num_tokens, num_tokens)).type(torch.bool)
tgt_tokens = greedy_decode(
model, src, src_mask, max_len=num_tokens + 5, start_symbol=BOS_IDX).flatten()
return " ".join(vocab_transform[TGT_LANGUAGE].lookup_tokens(list(tgt_tokens.cpu().numpy()))).replace("<bos>",
"").replace("<eos>", "")

print(translate(transformer, "Eine Gruppe von Menschen steht vor einem Iglu ."))

References
1. Attention is all you need paper. https://papers.nips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf
2. The annotated transformer. https://nlp.seas.harvard.edu/2018/04/03/attention.html#positional-encoding
Total running time of the script: ( 0 minutes 0.000 seconds)

Previous Next

Rate this Tutorial 

Built with Sphinx using a theme provided by Read the Docs.

Docs Tutorials Resources

Access comprehensive developer documentation for Get in-depth tutorials for beginners and advanced Find development resources and get your questions
PyTorch developers answered
View Docs View Tutorials View Resources

/
PyTorch Resources Stay up to date PyTorch Podcasts

Get Started Tutorials Facebook Spotify

Features Docs Twitter Apple

Ecosystem Discuss YouTube Google

Blog Github Issues LinkedIn Amazon

Contributing Brand Guidelines

Terms | Privacy

© Copyright The Linux Foundation. The PyTorch Foundation is a project of The Linux Foundation. For web site terms of use, trademark policy and other policies applicable to The PyTorch Foundation
please see www.linuxfoundation.org/policies/. The PyTorch Foundation supports the PyTorch open source project, which has been established as PyTorch Project a Series of LF Projects, LLC. For
policies applicable to the PyTorch Project a Series of LF Projects, LLC, please see www.lfprojects.org/policies/.

Zero Inflated Models and Generalized Linear Mixed Models With R PDF
80% (5)
Zero Inflated Models and Generalized Linear Mixed Models With R PDF
342 pages
Glove
100% (1)
Glove
10 pages
BERT - Assignment - Jupyter Notebook
0% (2)
BERT - Assignment - Jupyter Notebook
8 pages
Aquifer Case 16
No ratings yet
Aquifer Case 16
4 pages
GPT2 From Scratch in PyTorch
No ratings yet
GPT2 From Scratch in PyTorch
13 pages
Modeling Chatglm
No ratings yet
Modeling Chatglm
20 pages
assignment-9
No ratings yet
assignment-9
4 pages
EncoderDecoderSeq2Seq DeepLSTM
No ratings yet
EncoderDecoderSeq2Seq DeepLSTM
7 pages
Transformers Torch
No ratings yet
Transformers Torch
38 pages
assignment-8
No ratings yet
assignment-8
2 pages
Project Source
No ratings yet
Project Source
21 pages
Python Scripts
No ratings yet
Python Scripts
5 pages
nlp4
No ratings yet
nlp4
10 pages
Image_Captioning_with_Visual_Attention.pdf
No ratings yet
Image_Captioning_with_Visual_Attention.pdf
16 pages
ilovepdf_merged (3)
No ratings yet
ilovepdf_merged (3)
144 pages
Transformers Implementations 1731410319
No ratings yet
Transformers Implementations 1731410319
10 pages
Assignment No 4
No ratings yet
Assignment No 4
8 pages
Assingment-3 NLP
No ratings yet
Assingment-3 NLP
5 pages
Cache-Augmented Generation (CAG) in LLMs_ A Step-by-Step Tutorial _ by Ronan Takizawa _ Jan, 2025 _ Medium
No ratings yet
Cache-Augmented Generation (CAG) in LLMs_ A Step-by-Step Tutorial _ by Ronan Takizawa _ Jan, 2025 _ Medium
15 pages
Exp 8 Machine Translation
No ratings yet
Exp 8 Machine Translation
11 pages
Exp 10 Sentiment Analysis BERT
No ratings yet
Exp 10 Sentiment Analysis BERT
5 pages
NLP Assignment(917722H031)
No ratings yet
NLP Assignment(917722H031)
18 pages
Multivariate Multi Step Time Series Forecasting Using Stacked LSTM Sequence To Sequence Autoencoder in Tensorflow 2 0 Keras
No ratings yet
Multivariate Multi Step Time Series Forecasting Using Stacked LSTM Sequence To Sequence Autoencoder in Tensorflow 2 0 Keras
9 pages
FineTune OPUS MT Engine
No ratings yet
FineTune OPUS MT Engine
9 pages
Autoencoder Transformer
No ratings yet
Autoencoder Transformer
2 pages
Decoder-Only Transformer (LLM) For Question Asking: Notebook Structure
No ratings yet
Decoder-Only Transformer (LLM) For Question Asking: Notebook Structure
9 pages
Next Word Prediction With NLP and Deep Learning
No ratings yet
Next Word Prediction With NLP and Deep Learning
13 pages
Natural language processing-Section (7)
No ratings yet
Natural language processing-Section (7)
22 pages
Distributed Fine-Tuning With The Transformers API by HuggingFace - Databricks
No ratings yet
Distributed Fine-Tuning With The Transformers API by HuggingFace - Databricks
7 pages
sentiment analysis using LSTM (1)
No ratings yet
sentiment analysis using LSTM (1)
5 pages
3-Sentiment Analysis BERT
No ratings yet
3-Sentiment Analysis BERT
5 pages
RoPE
No ratings yet
RoPE
33 pages
Font Transfer 2 Autoencoders
No ratings yet
Font Transfer 2 Autoencoders
78 pages
Building Deep Learning Models Using the PyTorch Library
No ratings yet
Building Deep Learning Models Using the PyTorch Library
4 pages
LSTM-AutoEncoders. Understand and Perform Composite & - by Bob Rupak Roy - DataDrivenInvestor
100% (1)
LSTM-AutoEncoders. Understand and Perform Composite & - by Bob Rupak Roy - DataDrivenInvestor
9 pages
Design A Neural Network For Classifying Movie Reviews
No ratings yet
Design A Neural Network For Classifying Movie Reviews
5 pages
Problem
No ratings yet
Problem
13 pages
PyTorch Crash Course 1713016363
No ratings yet
PyTorch Crash Course 1713016363
15 pages
Cse425 Assignement - 20101257
No ratings yet
Cse425 Assignement - 20101257
12 pages
BERT - Ipynb - Colaboratory
No ratings yet
BERT - Ipynb - Colaboratory
6 pages
Image Caption2
No ratings yet
Image Caption2
9 pages
566f0619-9145-4b8f-b12b-cb8a5b0cd30d
No ratings yet
566f0619-9145-4b8f-b12b-cb8a5b0cd30d
17 pages
vertopal.com_lab6
No ratings yet
vertopal.com_lab6
29 pages
06_functions.ipynb - Colab
No ratings yet
06_functions.ipynb - Colab
17 pages
Assignment 3 DS5620
No ratings yet
Assignment 3 DS5620
11 pages
Effects of Batches - Jupyter Notebook
No ratings yet
Effects of Batches - Jupyter Notebook
73 pages
DL_0801CS223D04_Assignment5.ipynb - Colab
No ratings yet
DL_0801CS223D04_Assignment5.ipynb - Colab
15 pages
Training a Classifier — PyTorch Tutorials 2.3.0+cu121 documentation
No ratings yet
Training a Classifier — PyTorch Tutorials 2.3.0+cu121 documentation
8 pages
Chap 3.1 Embedding in Tensorflow
No ratings yet
Chap 3.1 Embedding in Tensorflow
23 pages
DL 6
No ratings yet
DL 6
4 pages
NLP Manual
No ratings yet
NLP Manual
21 pages
pythonprogram
No ratings yet
pythonprogram
6 pages
Implementation of Time Series Forecasting
No ratings yet
Implementation of Time Series Forecasting
12 pages
CS236 Introduction To PyTorch
100% (4)
CS236 Introduction To PyTorch
33 pages
Exercise 7
No ratings yet
Exercise 7
3 pages
mlp-fromscratch__sigmoid-mse
No ratings yet
mlp-fromscratch__sigmoid-mse
13 pages
Code For Mean Squared
No ratings yet
Code For Mean Squared
2 pages
Machine Learning Code Explanation
No ratings yet
Machine Learning Code Explanation
33 pages
Solutions Midterm 1 March 72020
No ratings yet
Solutions Midterm 1 March 72020
7 pages
AAI JOURNAL
No ratings yet
AAI JOURNAL
54 pages
Ds File
No ratings yet
Ds File
58 pages
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
Service Processes OK
No ratings yet
Service Processes OK
16 pages
ITC Sonar
100% (1)
ITC Sonar
48 pages
HW2 Answer Key
100% (2)
HW2 Answer Key
18 pages
Davco Fine Plaster ECO Catalogue
No ratings yet
Davco Fine Plaster ECO Catalogue
2 pages
RE API Installation Guide
No ratings yet
RE API Installation Guide
4 pages
PP Woven Bags
No ratings yet
PP Woven Bags
4 pages
Ather-Impact-Report-2021
No ratings yet
Ather-Impact-Report-2021
22 pages
Afar1 Grp4 Acctng4materials Jit Backflush
No ratings yet
Afar1 Grp4 Acctng4materials Jit Backflush
39 pages
GDR - Global Depository Receipts
0% (1)
GDR - Global Depository Receipts
22 pages
Funny Stories
No ratings yet
Funny Stories
5 pages
Republic of The Philippines Department of The Interior and Local Government
No ratings yet
Republic of The Philippines Department of The Interior and Local Government
9 pages
SIM-PA Simplified Consensus Protocol Simulator Applications To Proof of Reputation-X and Proof of Contribution
No ratings yet
SIM-PA Simplified Consensus Protocol Simulator Applications To Proof of Reputation-X and Proof of Contribution
12 pages
Developpe Ment Des Applications Mobiles
No ratings yet
Developpe Ment Des Applications Mobiles
84 pages
Military Leave
No ratings yet
Military Leave
4 pages
One Liners in Social Science.
No ratings yet
One Liners in Social Science.
9 pages
Fa Module3
No ratings yet
Fa Module3
10 pages
Functional Tests - Air Speed Indicator - Anemometer: Modified Content Reference
No ratings yet
Functional Tests - Air Speed Indicator - Anemometer: Modified Content Reference
3 pages
Grasps - Model - of Performance - Task
No ratings yet
Grasps - Model - of Performance - Task
6 pages
Presentation Topics
No ratings yet
Presentation Topics
5 pages
1 Well Performance Concepts
No ratings yet
1 Well Performance Concepts
14 pages
Gratitude Word Search
No ratings yet
Gratitude Word Search
3 pages
DM Vasudevan - Textbook of Biochemistry For Medical Students, 6th Edition
No ratings yet
DM Vasudevan - Textbook of Biochemistry For Medical Students, 6th Edition
2 pages
Harshal CV
No ratings yet
Harshal CV
1 page
Z.A. Sea Foods1
No ratings yet
Z.A. Sea Foods1
1 page
Serba Dinamik: Hit To Reputation Amid Audit Flags
No ratings yet
Serba Dinamik: Hit To Reputation Amid Audit Flags
4 pages
Evanston Voter Initiative Lawsuit
No ratings yet
Evanston Voter Initiative Lawsuit
34 pages
Wharton Advanced Finance Program
No ratings yet
Wharton Advanced Finance Program
8 pages
Department of Education: Application For Leave
100% (1)
Department of Education: Application For Leave
1 page

Language Translation with nn.Transformer and torchtext — PyTorch Tutorials 2.3.0+cu121 documentation

Uploaded by

Language Translation with nn.Transformer and torchtext — PyTorch Tutorials 2.3.0+cu121 documentation

Uploaded by

Table of Contents

Colab Notebook GitHub

Language Translation with nn.Transformer and torchtext

How to train a translation model from scratch using Transformer.

Data Sourcing and Processing

To access torchtext datasets, please install torchdata following instructions at https://github.com/pytorch/data.

from torchtext.data.utils import get_tokenizer

pip install -U torchdata

# helper function to yield list of tokens

for data_sample in data_iter:

# Define special symbols and indices

for ln in [SRC_LANGUAGE, TGT_LANGUAGE]:

Seq2Seq Network using Transformer

def forward(self, token_embedding: Tensor):

def forward(self, tokens: Tensor):

def encode(self, src: Tensor, src_mask: Tensor):

def decode(self, tgt: Tensor, memory: Tensor, tgt_mask: Tensor):

def create_mask(src, tgt):

src_padding_mask = (src == PAD_IDX).transpose(0, 1)

transformer = Seq2SeqTransformer(NUM_ENCODER_LAYERS, NUM_DECODER_LAYERS, EMB_SIZE,

optimizer = torch.optim.Adam(transformer.parameters(), lr=0.0001, betas=(0.9, 0.98), eps=1e-9)

# helper function to club together sequential operations

# function to collate data samples into batch tensors

src_batch = pad_sequence(src_batch, padding_value=PAD_IDX)

def train_epoch(model, optimizer):

for src, tgt in train_dataloader:

src_mask, tgt_mask, src_padding_mask, tgt_padding_mask = create_mask(src, tgt_input)

logits = model(src, tgt_input, src_mask, tgt_mask,src_padding_mask, tgt_padding_mask, src_padding_mask)

return losses / len(list(train_dataloader))

val_iter = Multi30k(split='valid', language_pair=(SRC_LANGUAGE, TGT_LANGUAGE))

for src, tgt in val_dataloader:

src_mask, tgt_mask, src_padding_mask, tgt_padding_mask = create_mask(src, tgt_input)

logits = model(src, tgt_input, src_mask, tgt_mask,src_padding_mask, tgt_padding_mask, src_padding_mask)

return losses / len(list(val_dataloader))

for epoch in range(1, NUM_EPOCHS+1):

# function to generate output sequence using greedy algorithm

memory = model.encode(src, src_mask)

# actual function to translate input sentence into target language

Rate this Tutorial 

© Copyright 2024, PyTorch.

Docs Tutorials Resources

Get Started Tutorials Facebook Spotify

Features Docs Twitter Apple

Ecosystem Discuss YouTube Google

Blog Github Issues LinkedIn Amazon

Contributing Brand Guidelines

You might also like