XCS224N Assignment 4 Neural Machine Translation With Rnns
XCS224N Assignment 4 Neural Machine Translation With Rnns
Guidelines
1. If you have a question about this homework, we encourage you to post your question on our Slack channel, at
http://xcs224n-scpd.slack.com/
2. Familiarize yourself with the collaboration and honor code policy before starting work.
3. For the coding problems, you must use the packages specified in the provided environment description. Since
the autograder uses this environment, we will not be able to grade any submissions which import unexpected
libraries.
Submission Instructions
Coding Submission: Some questions in this assignment require a coding response. For these questions, you should
submit all files indicated in the question to the online student portal. For further details, see Writing Code
and Running the Autograder below.
Honor code
We strongly encourage students to form study groups. Students may discuss and work on homework problems
in groups. However, each student must write down the solutions independently, and without referring to written
notes from the joint session. In other words, each student must understand the solution well enough in order to
reconstruct it by him/herself. In addition, each student should write on the problem set the set of people with
whom s/he collaborated. Further, because we occasionally reuse problem set questions from previous years, we
expect students not to copy, refer to, or look at the solutions in preparing their answers. It is an honor code
violation to intentionally refer to a previous year’s solutions. More information regarding the Stanford honor code
can be foudn at https://communitystandards.stanford.edu/policies-and-guidance/honor-code.
Before beginning this course, please walk through the Anaconda Setup for XCS Courses to familiarize yourself with
the coding environment. Use the env defined in src/environment.yml to run your code. This is the same environment
used by the online autograder.
Test Cases
The autograder is a thin wrapper over the python unittest framework. It can be run either locally (on your computer)
or remotely (on SCPD servers). The following description demonstrates what test results will look like for both
local and remote execution. For the sake of example, we will consider two generic tests: 1a-0-basic and 1a-1-hidden.
Local Execution - Hidden Tests
All hidden tests rely on files that are not provided to students. Therefore, the tests can only be run remotely. When
a hidden test like 1a-1-hidden is executed locally, it will produce the following result:
When a basic test like 1a-0-basic fails locally, the error is printed to the terminal, along with a stack trace indicating
where the error occurred:
Remote Execution
Basic and hidden tests are treated the same by the remote autograder. Here are screenshots of failed basic and
hidden tests. Notice that the same information (error and stack trace) is provided as the in local autograder, now
for both basic and hidden tests.
XCS224N Assignment 4 Neural Machine Translation with RNNs 3
Finally, here is what it looks like when basic and hidden tests pass in the remote autograder.
XCS224N Assignment 4 Neural Machine Translation with RNNs 4
In Machine Translation, our goal is to convert a sentence from the source language (e.g. Cherokee) to the target
language (e.g. English). In this assignment, we will implement a sequence-to-sequence (Seq2Seq) network with
attention, to build a Neural Machine Translation (NMT) system. In this section, we describe the training proce-
dure for the proposed NMT system, which uses a Bidirectional LSTM Encoder and a Unidirectional LSTM Decoder.
Figure 1: Seq2Seq Model with Multiplicative Attention, shown on the third step of the
decoder. Note that for readability, we do not picture the concatenation of the previous
combined-output with the decoder input.
−−→ ←enc
−− −−→ ←enc−−
henc
i = [henc enc
i ; hi ] where hi ∈ R2h×1 , henc
i , hi ∈ Rh×1 1≤i≤m (1)
−−→ ←enc
−− −−→ ←enc
−−
cenc
i = [cenc enc
i ; ci ] where ci ∈ R2h×1 , cenc
i , ci ∈ Rh×1 1≤i≤m (2)
With the Decoder initialized, we must now feed it a matching sentence in the target language. On the tth step,
we look up the embedding for the tth word, yt ∈ Re×1 . We then concatenate yt with the combined-output vector
ot−1 ∈ Rh×1 from the previous timestep (we will explain what this is later down this page!) to produce yt ∈ R(e+h)×1 .
Note that for the first target word (i.e. the start token) o0 is a zero-vector. We then feed yt as input to the Decoder
LSTM.
hdec dec
t , ct = Decoder(yt , hdec dec dec
t−1 , ct−1 ) where ht ∈ Rh×1 , cdec
t ∈ Rh×1 (5)
(6)
We now concatenate the attention output at with the decoder hidden state hdec
t and pass this through a linear layer,
Tanh, and Dropout to attain the combined-output vector ot .
ut = [hdec
t ; at ] where ut ∈ R
3h×1
(10)
h×1 h×3h
vt = Wu ut where vt ∈ R , Wu ∈ R (11)
h×1
ot = Dropout(Tanh(vt )) where ot ∈ R (12)
Then, we produce a probability distribution Pt over target words at the tth timestep:
Here, Vt is the size of the target vocabulary. Finally, to train the network we then compute the softmax cross
entropy loss between Pt and gt , where gt is the 1-hot vector of the target word at timestep t:
Here, θ represents all the parameters of the model and Jt (θ) is the loss on step t of the decoder. Now that we have
described the model, let’s try implementing it for Cherokee to English translation!
before attempting to train it on your VM. GPU time is expensive and limited. It takes approximately 30 minutes
to 1 hour to train the NMT system. We don’t want you to accidentally use all your GPU time for the assignment,
debugging your model rather than training and evaluating it. Finally, make sure that your VM is turned off
whenever you are not using it.
In order to run the model code on your workstation’s CPU or if you have an Apple silicon GPU, please run the
following command to create the proper virtual environment (You did this at the beginning of the course on your
local computer):
$ conda update -n base conda
$ conda env create -- file environment . yml
If you have a local CUDA based GPU (Nvidia) or on your VM, then instead of using XCS224N conda environment,
create a new environment that supports CUDA, XCS224N_CUDA by following line:
$ conda env create -- file e n v i r o n me n t _ c u d a . yml
$ conda activate XCS224N_CUDA
For local development and testing, you can feel free to continue to using the same XCS224N environment you’ve used
for all the assignments so far.
Implementation Assignment
(a) [2 points (Coding)] In order to apply tensor operations, we must ensure that the sentences in a given batch
are of the same length. Thus, we must identify the longest sentence in a batch and pad others to be the
same length. Implement the pad sents function in submission/utils.py, which shall produce these padded
sentences.
(b) [3 points (Coding)] Implement the init function in submission/model embeddings.py to initialize
the necessary source and target embeddings.
(c) [4 points (Coding)] Implement the init function in submission/nmt model.py to initialize the neces-
sary layers (LSTM, projection, and dropout) for the NMT system.
(d) [8 points (Coding)] Implement the encode function in submission/nmt model.py. This function converts
the padded source sentences into the tensor X, generates henc enc dec
1 , . . . , hm , and computes the initial state h0
dec
and initial cell c0 for the Decoder.
(e) [8 points (Coding)] Implement the decode function in submission/nmt model.py. This function constructs
ȳ and runs the step function over every timestep for the input.
(f) [10 points (Coding)] Implement the step function in submission/nmt model.py. This function applies
the Decoder’s LSTM cell for a single timestep, computing the encoding of the target word hdec t , the attention
scores et , attention distribution αt , the attention output at , and finally the combined output ot .
Now it’s time to get things running! Execute the following to generate the necessary vocab file (you can do
this on your local computer):
( XCS224N ) $ sh run . sh vocab
As noted earlier, we recommend that you develop the code on your personal computer. Confirm that you are
running in the proper conda environment and then execute the following command to train the model on your
local machine:
( XCS224N ) $ sh run . sh train_cpu
Once you have ensured that your code does not crash (i.e. let it run until iter 10 or iter 20), power on your
VM from the Azure Web Portal. Then read the Practical Guide for Using the VM section of the XCS224N
Azure Guide for instructions on how to upload your code to the VM. Next, turn to the Managing Processes
on a VM section of the Practical Guide and follow the instructions to create a new tmux session. Concretely,
run the following command to create tmux session called nmt.
( XCS224N_CUDA ) $ tmux new -s nmt
XCS224N Assignment 4 Neural Machine Translation with RNNs 7
Once your VM is configured and you are in a tmux session, reactivate your XCS224N environment and execute.
Note that it is a different conda env XCS224N_CUDA based on environment_cuda.yml. Details can be found from
XCS224N Azure Guide.
$ conda activate XCS224N_CUDA
( XCS224N_CUDA ) $ sh run . sh train_gpu
Once you know your code is running properly, you can detach from session and close your ssh connection to
the server. To detach from the session, run:
( XCS224N_CUDA ) $ tmux detach
You can return to your training model by ssh-ing back into the server and attaching to the tmux session by
running:
( XCS224N_CUDA ) $ tmux a -t nmt
(g) [3 points (Coding)] Once your model is done training (this should take about 30 minutes to 1 hour
on the VM), execute the following command to test the model:
( XCS224N_CUDA ) $ sh run . sh test_gpu
After running this command, it should generate a file src/submission/test_outputs.txt needed for submission.
XCS224N Assignment 4 Neural Machine Translation with RNNs 8
Deliverables
For this assignment, please submit all files within the src/submission subdirectory. This includes:
• src/submission/__init__.py
• src/submission/model_embeddings.py
• src/submission/nmt_model.py
• src/submission/utils.py
• src/submission/test_outputs.txt
XCS224N Assignment 4 Neural Machine Translation with RNNs 9
2 Quiz
[11 points (Online)] This remainder of this homework is a series of multiple choice questions related to the neural
machine translation model. Please input your answers into the Gradescope Online Assessment A4 Online Assessment.
XCS224N Assignment 4 Neural Machine Translation with RNNs 10
This handout includes space for every question that requires a written response. Please feel free to use it to handwrite
your solutions (legibly, please). If you choose to typeset your solutions, the README.md for this assignment includes
instructions to regenerate this handout with your typeset LATEX solutions.