Lecture #6: Assembler: From Nand To Tetris
Lecture #6: Assembler: From Nand To Tetris
Assembler
From Nand to Tetris
www.nand2tetris.org
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 2: Boolean Arithmetic slide 1
Where we are at:
Abstract design
Human abstract interface
Software
Thought hierarchy
Chapters 9, 12 Compiler
H.L. Language
& abstract interface
Chapters 10 - 11
Operating Sys.
VM Translator
Virtual
abstract interface
Machine
Chapters 7 - 8
Assembly
Language
Assembler
Chapter 6
abstract interface
Computer
Machine Architecture
abstract interface
Language
Chapters 4 - 5
Hardware Gate Logic
abstract interface
Platform Chapters 1 - 3 Electrical
Chips & Engineering
Hardware Physics
Logic Gates
hierarchy
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 2: Boolean Arithmetic slide 2
Why care about assemblers?
Because …
– WriAen in Java
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 2: Boolean Arithmetic slide 3
Assembly example For now,
ignore all
details!
Source code (example) Target code
// Computes 1+...+RAM[0] 0000000000010000
// And stored the sum in RAM[1] 1110111111001000
@i 0000000000010001
M=1 // i = 1 1110101010001000
@sum 0000000000010000
M=0 // sum = 0 1111110000010000
(LOOP) assemble 0000000000000000
execute
@i // if i>RAM[0] goto WRITE 1111010011010000
D=M 0000000000010010
@R0 1110001100000001
D=D-M 0000000000010000
@WRITE 1111110000010000
D;JGT 0000000000010001
... // Etc. ...
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 2: Boolean Arithmetic slide 5
The assembler’s view of an assembly
program
Assembly program
Assembly program =
// Computes 1+...+RAM[0]
// And stores the sum in RAM[1].
a stream of text lines, each being
@i one of the following:
M=1 // i = 1
@sum q A-instruction
M=0 // sum = 0
(LOOP) q C-instruction
@i // if i>RAM[0] goto WRITE
D=M q Symbol declaration: (SYMBOL)
@0
D=D-M q Comment or white space:
@WRITE
D;JGT // comment
@i // sum += i
D=M
@sum
Helper Methods:
M=D+M
@i // i++ q cleanLine(raw : String)
M=M+1
@LOOP // goto LOOP q parseCommandType(clean : String)
0;JMP
(WRITE) q Can you write a parse method for each
@sum instruction?
D=M The challenge:
@1
M=D // RAM[1] = the sum
(END)
Translate the program into a sequence of
@END 16-bit instructions that can be executed
0;JMP by the target hardware platform.
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 2: Boolean Arithmetic slide 6
Translating / assembling
A-instructions
Symbolic: @value // Where value is either a non-negative decimal number
// or a symbol referring to such number.
value (v = 0 or 1)
Binary: 0 v v v v v v v v v v v v v v v
Translation to binary:
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 2: Boolean Arithmetic slide 7
Translating / assembling C-instructions
Symbolic: dest=comp;jump // Either the dest or jump fields may be empty.
// If dest is empty, the "=" is ommitted;
Transla=on to binary: // If jump is empty, the ";" is omitted.
Binary: 1 1 1 a c1 c2 c3 c4 c5 c6 d1 d2 d3 j1 j2 j3
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 2: Boolean Arithmetic slide 8
Translating / assembling C-
Instructions
• Approach:
– Parse C-instruc=on into dest, comp, comp
• Already know the structure for this! Great!
– For each part, look for M/D/A/etc. symbols
• Add appropriate bit/flag for each symbol in appropriate place
• Return String of bits
– Doable, but complicated!
• Approach #2:
– Realize that there is a small set of possibili=es (see previous slide)
– Create a lookup table of all possibili=es
• Tedious, but the investment pays off!
• Lookups are EASY
• If string present in look up table, return matching string of bits
• Otherwise, return null (great for error checking)
• What’s a look up table?
– HashMap
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 2: Boolean Arithmetic slide 9
• Think dic=onaries
– “parsec” : “unit of length for astronomically large distances”
– Key : Value
HashMap
• So a dic=onary is just a set of Key/Value pairs
• Using HashMap:
import java.util.HashMap;
• Create a HashMap:
//String key(mnemonic), String value(bits)
HashMap<String, String> compCodes =
new HashMap<String, String>;
• Add key/value pair to HashMap:
compCodes.put(“A+1”, “0110111”);
//a c1 c2 c3 c4 c5 c6 values for A+1. why a here?
• Check if key exists in HashMap:
compCodes.containsKey(“A+1”); //returns boolean
• Lookup value for key in HashMap:
compCodes.get(“A+1”);
//returns null if key not present, else value
• Remove key/value pair in HashMap (useful later):
compCodes.remove(“A+1”);
//returns null if key not present, else value
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 2: Boolean Arithmetic slide 10
The overall assembly logic
Assembly program
For each (real) command // Computes 1+...+RAM[0]
// And stores the sum in RAM[1].
q Parse the command, @i
i.e. break it into its underlying fields M=1 // i = 1
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 2: Boolean Arithmetic slide 11
Typical symbolic Hack
Handling symbols (aka symbol resolution) assembly code:
@R0
D=M
Assembly programs typically have many @END
symbols: D;JLE
@counter
q Labels that mark des=na=ons of goto M=D
commands @SCREEN
D=A
q Labels that mark special memory loca=ons @x
q Variables
M=D
(LOOP)
@x
A=M
These symbols fall into two categories: M=-1
q User–defined symbols (created by programmers) @x
D=M
q Pre-defined symbols (used by the Hack plakorm). @32
D=D+A
@x
M=D
@counter
MD=M-1
@LOOP
D;JGT
(END)
@END
0;JMP
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 2: Boolean Arithmetic slide 12
Typical symbolic Hack
Handling symbols: user-defined symbols assembly code:
@R0
Label symbols: Used to label des=na=ons of goto D=M
commands. Declared by the pseudo-command @END
D;JLE
(XXX). This direc=ve defines the symbol XXX to @counter
refer to the instruc=on memory loca=on holding the M=D
next command in the program @SCREEN
D=A
Variable symbols: Any user-defined symbol xxx @x
appearing in an assembly program that is not M=D
defined elsewhere using the (xxx) direc=ve is (LOOP)
treated as a variable, and is automa=cally assigned a @x
A=M
unique RAM address, star=ng at RAM address 16 M=-1
(why start at 16? Later...) @x
D=M
By conven=on, Hack programmers use lower-case @32
and upper-case to represent variable and label D=D+A
names, respec=vely @x
M=D
@counter
Q: Who does all the “automatic” assignments of symbols MD=M-1
to RAM addresses? @LOOP
See reading/appendix for valid symbol chars D;JGT
(END)
A: As part of the program translation process, the assembler @END
resolves all the symbols into RAM addresses. 0;JMP
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 2: Boolean Arithmetic slide 13
Typical symbolic Hack
Handling symbols: pre-defined symbols assembly code:
@R0
D=M
Virtual registers: @END
The symbols R0,…, R15 are automatically predefined to D;JLE
refer to RAM addresses 0,…,15 @counter
M=D
I/O pointers: The symbols SCREEN and KBD are automatically @SCREEN
predefined to refer to RAM addresses 16384 and 24576, D=A
respectively (base addresses of the screen and keyboard @x
M=D
memory maps)
(LOOP)
VM control pointers: the symbols SP, LCL, ARG, THIS, and THAT @x
(that don’t appear in the code example on the right) are A=M
M=-1
automatically predefined to refer to RAM addresses 0 to
@x
4, respectively D=M
(The VM control pointers, which overlap R0,…, R4 will come to @32
D=D+A
play in the virtual machine implementation, covered in the
@x
next lecture) M=D
@counter
Q: Who does all the “automatic” assignments of symbols MD=M-1
to RAM addresses? @LOOP
D;JGT
A: As part of the program translation process, the assembler (END)
resolves all the symbols into RAM addresses. @END
0;JMP
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 2: Boolean Arithmetic slide 14
Handling symbols: symbol table
Source code (example) Symbol table
// Computes 1+...+RAM[0] Every program has these! R0 0
// And stored the sum in RAM[1] R1 1
@i R2 2
M=1 // i = 1 ... ...
@sum R15 15
M=0 // sum = 0 SCREEN 16384
(LOOP) KBD 24576
@i // if i>RAM[0] goto WRITE SP 0
D=M LCL 1
@R0 ARG 2
D=D-M THIS 3
@WRITE
THAT 4
D;JGT
WRITE 18
@i // sum += i
END 22
D=M
@sum i 16
M=D+M sum 17
@i // i++
M=M+1
@LOOP // goto LOOP
0;JMP
(WRITE) This symbol table is generated
@sum by the assembler, and used to
D=M
translate the symbolic code
@R1
M=D // RAM[1] = the sum into binary code.
(END)
@END
0;JMP
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 2: Boolean Arithmetic slide 15
Symbol table
Handling symbols: constructing the symbol table R0 0
R1 1
Source code (example) R2 2
... ...
// Computes 1+...+RAM[0] R15 15
// And stored the sum in RAM[1]
SCREEN 16384
@i
KBD 24576
M=1 // i = 1
SP 0
@sum
LCL 1
M=0 // sum = 0
(LOOP) ARG 2
@i // if i>RAM[0] goto WRITE THIS 3
D=M THAT 4
@R0 WRITE 18
D=D-M END 22
@WRITE i 16
D;JGT sum 17
@i // sum += i
D=M Initialization: create an empty
@sum symbol table and populate it with all
M=D+M the pre-defined symbols
@i // i++
M=M+1 First pass: go through the entire
@LOOP // goto LOOP source code, and add all the user-
defined label symbols to the symbol
0;JMP
(WRITE)
@sum table (without generating any code)
D=M
@R1 Second pass: go again through the
M=D // RAM[1] = the sum source code, and use the symbol
(END) table to translate all the commands.
@END In the process, handle all the user-
0;JMP defined variable symbols.
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 2: Boolean Arithmetic slide 16
The assembly process (detailed algorithm)
• Ini=aliza=on: create the symbol table and ini=alize it with the pre-defined symbols
• First pass: march through the source code without genera=ng any code.
For each label declara=on (LABEL) that appears in the source code,
add the pair <LABEL , n > to the symbol table
• Second pass: march again (advance) through each line of source code, and process each:
– If the line is a C-instruc=on, simple
– If the line is @xxx where xxx is a number, simple
– If the line is @xxx and xxx is a symbol, look it up in the symbol table and
proceed as follows:
• If the symbol is found, replace it with its numeric value and complete
the command’s transla=on
• If the symbol is not found, then it must represent a new variable:
add the pair <xxx , n > to the symbol table, where n is the next
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 2: Boolean Arithmetic slide 18
Proposed assembler implementation
An assembler program can be wriAen in any high-level language.
We propose a language-independent design, as follows.
So3ware modules:
q We will call this Assembler, and it also does error handling (excep=ons or booleans)!
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 2: Boolean Arithmetic slide 19
Parser (a software module in the assembler
program)
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 2: Boolean Arithmetic slide 20
Parser (a software module in the assembler program) / continued
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 2: Boolean Arithmetic slide 21
Code (a software module in the assembler program)
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 2: Boolean Arithmetic slide 22
SymbolTable (a software module in the
assembler program)
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 2: Boolean Arithmetic slide 23
Perspective
• Simple machine language, simple assembler
• Most assemblers are not stand-alone, but rather
encapsulated in a translator of a higher order
• C programmers that understand the code generated by
a C compiler can improve their code considerably
• C programming (e.g. for real-=me systems) may involve
re-wri=ng cri=cal segments in assembly, for
op=miza=on
• Wri=ng an assembler is an excellent prac=ce for wri=ng
more challenging translators, e.g. a VM Translator and a
compiler, as we will do in the next lectures.
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 2: Boolean Arithmetic slide 24