0% found this document useful (0 votes)
4 views

XCS224N Assignment 4 Neural Machine Translation With Rnns

Uploaded by

bksaif
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

XCS224N Assignment 4 Neural Machine Translation With Rnns

Uploaded by

bksaif
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

XCS224N Assignment 4 Neural Machine Translation with RNNs 1

XCS224N Assignment 4 Neural Machine Translation with


RNNs
Due Sunday, June 23 at 11:59pm PT.

Guidelines

1. If you have a question about this homework, we encourage you to post your question on our Slack channel, at
http://xcs224n-scpd.slack.com/
2. Familiarize yourself with the collaboration and honor code policy before starting work.
3. For the coding problems, you must use the packages specified in the provided environment description. Since
the autograder uses this environment, we will not be able to grade any submissions which import unexpected
libraries.

Submission Instructions
Coding Submission: Some questions in this assignment require a coding response. For these questions, you should
submit all files indicated in the question to the online student portal. For further details, see Writing Code
and Running the Autograder below.

Honor code
We strongly encourage students to form study groups. Students may discuss and work on homework problems
in groups. However, each student must write down the solutions independently, and without referring to written
notes from the joint session. In other words, each student must understand the solution well enough in order to
reconstruct it by him/herself. In addition, each student should write on the problem set the set of people with
whom s/he collaborated. Further, because we occasionally reuse problem set questions from previous years, we
expect students not to copy, refer to, or look at the solutions in preparing their answers. It is an honor code
violation to intentionally refer to a previous year’s solutions. More information regarding the Stanford honor code
can be foudn at https://communitystandards.stanford.edu/policies-and-guidance/honor-code.

Writing Code and Running the Autograder


All your code should be entered into the src/submission/ directory. When editing files in src/submission/, please only
make changes between the lines containing ### START_CODE_HERE ### and ### END_CODE_HERE ###. Do not make changes
to files outside the src/submission/ directory.
The unit tests in src/grader.py (the autograder) will be used to verify a correct submission. Run the autograder
locally using the following terminal command within the src/ subdirectory:
$ python grader . py

There are two types of unit tests used by the autograder:


• basic: These tests are provided to make sure that your inputs and outputs are on the right track, and that
the hidden evaluation tests will be able to execute.
• hidden: These unit tests are the evaluated elements of the assignment, and run your code with more complex
inputs and corner cases. Just because your code passed the basic local tests does not necessarily mean that
they will pass all of the hidden tests. These evaluative hidden tests will be run when you submit your code to
the Gradescope autograder via the online student portal, and will provide feedback on how many points you
have earned.
For debugging purposes, you can run a single unit test locally. For example, you can run the test case 3a-0-basic
using the following terminal command within the src/ subdirectory:
$ python grader . py 3a -0 - basic
XCS224N Assignment 4 Neural Machine Translation with RNNs 2

Before beginning this course, please walk through the Anaconda Setup for XCS Courses to familiarize yourself with
the coding environment. Use the env defined in src/environment.yml to run your code. This is the same environment
used by the online autograder.

Test Cases
The autograder is a thin wrapper over the python unittest framework. It can be run either locally (on your computer)
or remotely (on SCPD servers). The following description demonstrates what test results will look like for both
local and remote execution. For the sake of example, we will consider two generic tests: 1a-0-basic and 1a-1-hidden.
Local Execution - Hidden Tests
All hidden tests rely on files that are not provided to students. Therefore, the tests can only be run remotely. When
a hidden test like 1a-1-hidden is executed locally, it will produce the following result:

Local Execution - Basic Tests


When a basic test like 1a-0-basic passes locally, the autograder will indicate success:

When a basic test like 1a-0-basic fails locally, the error is printed to the terminal, along with a stack trace indicating
where the error occurred:

Remote Execution
Basic and hidden tests are treated the same by the remote autograder. Here are screenshots of failed basic and
hidden tests. Notice that the same information (error and stack trace) is provided as the in local autograder, now
for both basic and hidden tests.
XCS224N Assignment 4 Neural Machine Translation with RNNs 3

Finally, here is what it looks like when basic and hidden tests pass in the remote autograder.
XCS224N Assignment 4 Neural Machine Translation with RNNs 4

1 Neural Machine Translation with RNNs


We highly recommend reading Zhang et al (2020) to better understand the Cherokee-to-English translation task,
which served as inspiration for this assignment.

In Machine Translation, our goal is to convert a sentence from the source language (e.g. Cherokee) to the target
language (e.g. English). In this assignment, we will implement a sequence-to-sequence (Seq2Seq) network with
attention, to build a Neural Machine Translation (NMT) system. In this section, we describe the training proce-
dure for the proposed NMT system, which uses a Bidirectional LSTM Encoder and a Unidirectional LSTM Decoder.

Figure 1: Seq2Seq Model with Multiplicative Attention, shown on the third step of the
decoder. Note that for readability, we do not picture the concatenation of the previous
combined-output with the decoder input.

Model description (training procedure)


Given a sentence in the source language, we look up the word embeddings from an embeddings matrix, yielding
x1 , . . . , xm | xi ∈ Re×1 , where m is the length of the source sentence and e is the embedding size. We feed these
embeddings to the bidirectional Encoder, yielding hidden states and cell states for both the forwards (→) and
backwards (←) LSTMs. The forwards and backwards versions are concatenated to give hidden states henc i and cell
states cenc i :
XCS224N Assignment 4 Neural Machine Translation with RNNs 5

−−→ ←enc
−− −−→ ←enc−−
henc
i = [henc enc
i ; hi ] where hi ∈ R2h×1 , henc
i , hi ∈ Rh×1 1≤i≤m (1)
−−→ ←enc
−− −−→ ←enc
−−
cenc
i = [cenc enc
i ; ci ] where ci ∈ R2h×1 , cenc
i , ci ∈ Rh×1 1≤i≤m (2)

We then initialize the Decoder’s first hidden state hdec


0 and cell state cdec
0 with a linear projection of the Encoder’s
1
final hidden state and final cell state.
−−→ ←enc
−−
hdec
0 = Wh [henc dec
m ; h1 ] where h0 ∈ Rh×1 , Wh ∈ Rh×2h (3)
−−→ ←enc
−−
cdec
0 = Wc [cenc dec
m ; c1 ] where c0 ∈ Rh×1 , Wc ∈ Rh×2h (4)

With the Decoder initialized, we must now feed it a matching sentence in the target language. On the tth step,
we look up the embedding for the tth word, yt ∈ Re×1 . We then concatenate yt with the combined-output vector
ot−1 ∈ Rh×1 from the previous timestep (we will explain what this is later down this page!) to produce yt ∈ R(e+h)×1 .
Note that for the first target word (i.e. the start token) o0 is a zero-vector. We then feed yt as input to the Decoder
LSTM.

hdec dec
t , ct = Decoder(yt , hdec dec dec
t−1 , ct−1 ) where ht ∈ Rh×1 , cdec
t ∈ Rh×1 (5)
(6)

We then use hdec


t to compute multiplicative attention over henc enc
0 , . . . , hm :

et,i = (hdec T enc


t ) WattProj hi where et ∈ Rm×1 , WattProj ∈ Rh×2h 1≤i≤m (7)
m×1
αt = Softmax(et ) where αt ∈ R (8)
m
X
at = αt,i henc
i where at ∈ R2h×1 (9)
i

We now concatenate the attention output at with the decoder hidden state hdec
t and pass this through a linear layer,
Tanh, and Dropout to attain the combined-output vector ot .

ut = [hdec
t ; at ] where ut ∈ R
3h×1
(10)
h×1 h×3h
vt = Wu ut where vt ∈ R , Wu ∈ R (11)
h×1
ot = Dropout(Tanh(vt )) where ot ∈ R (12)

Then, we produce a probability distribution Pt over target words at the tth timestep:

Pt = Softmax(Wvocab ot ) where Pt ∈ RVt ×1 , Wvocab ∈ RVt ×h (13)

Here, Vt is the size of the target vocabulary. Finally, to train the network we then compute the softmax cross
entropy loss between Pt and gt , where gt is the 1-hot vector of the target word at timestep t:

Jt (θ) = CE(Pt , gt ) (14)

Here, θ represents all the parameters of the model and Jt (θ) is the loss on step t of the decoder. Now that we have
described the model, let’s try implementing it for Cherokee to English translation!

Setting up your Virtual Machine


Follow the instructions in the XCS224N Azure Guide in order to create your VM instance. Though you will need
the GPU to train your model, we strongly advise that you first develop the code locally and ensure that it runs,
1 If ←−− −− →
it’s not obvious, think about why we regard [henc enc
1 , hm ] as the ‘final hidden state’ of the Encoder.
XCS224N Assignment 4 Neural Machine Translation with RNNs 6

before attempting to train it on your VM. GPU time is expensive and limited. It takes approximately 30 minutes
to 1 hour to train the NMT system. We don’t want you to accidentally use all your GPU time for the assignment,
debugging your model rather than training and evaluating it. Finally, make sure that your VM is turned off
whenever you are not using it.
In order to run the model code on your workstation’s CPU or if you have an Apple silicon GPU, please run the
following command to create the proper virtual environment (You did this at the beginning of the course on your
local computer):
$ conda update -n base conda
$ conda env create -- file environment . yml

If you have a local CUDA based GPU (Nvidia) or on your VM, then instead of using XCS224N conda environment,
create a new environment that supports CUDA, XCS224N_CUDA by following line:
$ conda env create -- file e n v i r o n me n t _ c u d a . yml
$ conda activate XCS224N_CUDA

For local development and testing, you can feel free to continue to using the same XCS224N environment you’ve used
for all the assignments so far.

Implementation Assignment
(a) [2 points (Coding)] In order to apply tensor operations, we must ensure that the sentences in a given batch
are of the same length. Thus, we must identify the longest sentence in a batch and pad others to be the
same length. Implement the pad sents function in submission/utils.py, which shall produce these padded
sentences.
(b) [3 points (Coding)] Implement the init function in submission/model embeddings.py to initialize
the necessary source and target embeddings.
(c) [4 points (Coding)] Implement the init function in submission/nmt model.py to initialize the neces-
sary layers (LSTM, projection, and dropout) for the NMT system.
(d) [8 points (Coding)] Implement the encode function in submission/nmt model.py. This function converts
the padded source sentences into the tensor X, generates henc enc dec
1 , . . . , hm , and computes the initial state h0
dec
and initial cell c0 for the Decoder.
(e) [8 points (Coding)] Implement the decode function in submission/nmt model.py. This function constructs
ȳ and runs the step function over every timestep for the input.
(f) [10 points (Coding)] Implement the step function in submission/nmt model.py. This function applies
the Decoder’s LSTM cell for a single timestep, computing the encoding of the target word hdec t , the attention
scores et , attention distribution αt , the attention output at , and finally the combined output ot .
Now it’s time to get things running! Execute the following to generate the necessary vocab file (you can do
this on your local computer):
( XCS224N ) $ sh run . sh vocab

As noted earlier, we recommend that you develop the code on your personal computer. Confirm that you are
running in the proper conda environment and then execute the following command to train the model on your
local machine:
( XCS224N ) $ sh run . sh train_cpu

Once you have ensured that your code does not crash (i.e. let it run until iter 10 or iter 20), power on your
VM from the Azure Web Portal. Then read the Practical Guide for Using the VM section of the XCS224N
Azure Guide for instructions on how to upload your code to the VM. Next, turn to the Managing Processes
on a VM section of the Practical Guide and follow the instructions to create a new tmux session. Concretely,
run the following command to create tmux session called nmt.
( XCS224N_CUDA ) $ tmux new -s nmt
XCS224N Assignment 4 Neural Machine Translation with RNNs 7

Once your VM is configured and you are in a tmux session, reactivate your XCS224N environment and execute.
Note that it is a different conda env XCS224N_CUDA based on environment_cuda.yml. Details can be found from
XCS224N Azure Guide.
$ conda activate XCS224N_CUDA
( XCS224N_CUDA ) $ sh run . sh train_gpu

Once you know your code is running properly, you can detach from session and close your ssh connection to
the server. To detach from the session, run:
( XCS224N_CUDA ) $ tmux detach

You can return to your training model by ssh-ing back into the server and attaching to the tmux session by
running:
( XCS224N_CUDA ) $ tmux a -t nmt

(g) [3 points (Coding)] Once your model is done training (this should take about 30 minutes to 1 hour
on the VM), execute the following command to test the model:
( XCS224N_CUDA ) $ sh run . sh test_gpu

After running this command, it should generate a file src/submission/test_outputs.txt needed for submission.
XCS224N Assignment 4 Neural Machine Translation with RNNs 8

Deliverables
For this assignment, please submit all files within the src/submission subdirectory. This includes:
• src/submission/__init__.py
• src/submission/model_embeddings.py
• src/submission/nmt_model.py
• src/submission/utils.py
• src/submission/test_outputs.txt
XCS224N Assignment 4 Neural Machine Translation with RNNs 9

2 Quiz
[11 points (Online)] This remainder of this homework is a series of multiple choice questions related to the neural
machine translation model. Please input your answers into the Gradescope Online Assessment A4 Online Assessment.
XCS224N Assignment 4 Neural Machine Translation with RNNs 10

This handout includes space for every question that requires a written response. Please feel free to use it to handwrite
your solutions (legibly, please). If you choose to typeset your solutions, the README.md for this assignment includes
instructions to regenerate this handout with your typeset LATEX solutions.

THERE IS NO WRITTEN SUBMISSION FOR THIS ASSIGNMENT.

YOU ARE NOT REQUIRED TO SUBMIT ANYTHING.

You might also like