0% found this document useful (0 votes)
223 views

Lecture #6: Assembler: From Nand To Tetris

This document discusses an assembler and assembly language. It explains that an assembler is the first step up the software hierarchy from machine language and acts as a simple translator. It translates assembly language commands into binary machine instructions. The document provides an example of an assembly program that calculates a sum and the corresponding machine code generated by the assembler. It also shows a screenshot of an emulator running the example program.

Uploaded by

Hunter Haggard
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
223 views

Lecture #6: Assembler: From Nand To Tetris

This document discusses an assembler and assembly language. It explains that an assembler is the first step up the software hierarchy from machine language and acts as a simple translator. It translates assembly language commands into binary machine instructions. The document provides an example of an assembly program that calculates a sum and the corresponding machine code generated by the assembler. It also shows a screenshot of an emulator running the example program.

Uploaded by

Hunter Haggard
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Lecture #6:

Assembler
From Nand to Tetris
www.nand2tetris.org

Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 2: Boolean Arithmetic slide 1
Where we are at:
Abstract design
Human abstract interface
Software
Thought hierarchy
Chapters 9, 12 Compiler
H.L. Language
& abstract interface
Chapters 10 - 11
Operating Sys.
VM Translator
Virtual
abstract interface
Machine
Chapters 7 - 8
Assembly
Language

Assembler

Chapter 6

abstract interface
Computer
Machine Architecture
abstract interface
Language
Chapters 4 - 5
Hardware Gate Logic
abstract interface
Platform Chapters 1 - 3 Electrical
Chips & Engineering
Hardware Physics
Logic Gates
hierarchy

Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 2: Boolean Arithmetic slide 2
Why care about assemblers?
Because …

•  Assemblers employ ni3y programming tricks

•  Assemblers are the first rung up the so3ware hierarchy ladder

•  An assembler is a translator of a simple language

•  Wri=ng an assembler = low-impact prac=ce for wri=ng


compilers.

–  WriAen in Java

–  Will come up with useful methods along the way

Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 2: Boolean Arithmetic slide 3
Assembly example For now,
ignore all
details!
Source code (example) Target code
// Computes 1+...+RAM[0] 0000000000010000

// And stored the sum in RAM[1] 1110111111001000

@i 0000000000010001

M=1 // i = 1 1110101010001000

@sum 0000000000010000

M=0 // sum = 0 1111110000010000

(LOOP) assemble 0000000000000000
execute
@i // if i>RAM[0] goto WRITE 1111010011010000

D=M 0000000000010010

@R0 1110001100000001

D=D-M 0000000000010000

@WRITE 1111110000010000

D;JGT 0000000000010001

... // Etc. ...

The program translation challenge


n  Extract the program’s semantics from the source program,
using the syntax rules of the source language
n  Re-express the program’s semantics in the target language,
using the syntax rules of the target language
Assembler = simple translator
n  Translates each assembly command into one or more binary machine instructions
n  Handles symbols (e.g. i, sum, LOOP, …).
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 2: Boolean Arithmetic slide 4
Revisiting Hack low-level
programming: an example CPU emulator screen shot
Assembly program (sum.asm) after running this program
// Computes 1+...+RAM[0]
// And stores the sum in RAM[1].
@i
M=1 // i = 1

@sum
M=0 // sum = 0 user
supplied
(LOOP) input
@i // if i>RAM[0] goto WRITE
D=M
@0 program
D=D-M generated
output
@WRITE
D;JGT
@i // sum += i
D=M
@sum
M=D+M
@i // i++
M=M+1
@LOOP // goto LOOP
0;JMP
(WRITE)
@sum
D=M
@1
M=D // RAM[1] = the sum
The CPU emulator allows loading and executing
(END)
symbolic Hack code. It resolves all the symbolic
@END
symbols to memory locations, and executes the code.
0;JMP

Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 2: Boolean Arithmetic slide 5
The assembler’s view of an assembly
program
Assembly program
Assembly program =
// Computes 1+...+RAM[0]
// And stores the sum in RAM[1].
a stream of text lines, each being
@i one of the following:
M=1 // i = 1
@sum q  A-instruction
M=0 // sum = 0
(LOOP) q  C-instruction
@i // if i>RAM[0] goto WRITE
D=M q  Symbol declaration: (SYMBOL)
@0
D=D-M q  Comment or white space:
@WRITE
D;JGT // comment
@i // sum += i
D=M
@sum
Helper Methods:
M=D+M
@i // i++ q  cleanLine(raw : String)
M=M+1
@LOOP // goto LOOP q  parseCommandType(clean : String)
0;JMP
(WRITE) q  Can you write a parse method for each
@sum instruction?
D=M The challenge:
@1
M=D // RAM[1] = the sum
(END)
Translate the program into a sequence of
@END 16-bit instructions that can be executed
0;JMP by the target hardware platform.
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 2: Boolean Arithmetic slide 6
Translating / assembling
A-instructions
Symbolic: @value // Where value is either a non-negative decimal number
// or a symbol referring to such number.

value (v = 0 or 1)

Binary: 0 v v v v v v v v v v v v v v v

Translation to binary:

q  If value is a non-negative decimal number, simple

q  If value is a symbol, later.


Helper Methods:

q  isValidSymbol(symbol : String) : boolean

q  decimalToBinary(toConvert : int) : String

Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 2: Boolean Arithmetic slide 7
Translating / assembling C-instructions
Symbolic: dest=comp;jump // Either the dest or jump fields may be empty.
// If dest is empty, the "=" is ommitted;
Transla=on to binary: // If jump is empty, the ";" is omitted.

simple! comp dest jump

Binary: 1 1 1 a c1 c2 c3 c4 c5 c6 d1 d2 d3 j1 j2 j3

Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 2: Boolean Arithmetic slide 8
Translating / assembling C-
Instructions
•  Approach:
–  Parse C-instruc=on into dest, comp, comp
•  Already know the structure for this! Great!
–  For each part, look for M/D/A/etc. symbols
•  Add appropriate bit/flag for each symbol in appropriate place
•  Return String of bits
–  Doable, but complicated!
•  Approach #2:
–  Realize that there is a small set of possibili=es (see previous slide)
–  Create a lookup table of all possibili=es
•  Tedious, but the investment pays off!
•  Lookups are EASY
•  If string present in look up table, return matching string of bits
•  Otherwise, return null (great for error checking)
•  What’s a look up table?
–  HashMap

Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 2: Boolean Arithmetic slide 9
•  Think dic=onaries
–  “parsec” : “unit of length for astronomically large distances”
–  Key : Value

HashMap
•  So a dic=onary is just a set of Key/Value pairs
•  Using HashMap:
import java.util.HashMap;
•  Create a HashMap:
//String key(mnemonic), String value(bits)
HashMap<String, String> compCodes =
new HashMap<String, String>;
•  Add key/value pair to HashMap:
compCodes.put(“A+1”, “0110111”);
//a c1 c2 c3 c4 c5 c6 values for A+1. why a here?
•  Check if key exists in HashMap:
compCodes.containsKey(“A+1”); //returns boolean
•  Lookup value for key in HashMap:
compCodes.get(“A+1”);
//returns null if key not present, else value
•  Remove key/value pair in HashMap (useful later):
compCodes.remove(“A+1”);
//returns null if key not present, else value

Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 2: Boolean Arithmetic slide 10
The overall assembly logic
Assembly program
For each (real) command // Computes 1+...+RAM[0]
// And stores the sum in RAM[1].
q  Parse the command, @i
i.e. break it into its underlying fields M=1 // i = 1

using parse() method @sum


M=0 // sum = 0
q  A-instruc=on: replace the symbolic (LOOP)

reference (if any) with the @i // if i>RAM[0] goto WRITE


D=M
corresponding memory address, @0
which is a number (…more later) D=D-M
@WRITE
q  C-instruc=on: for each field in the D;JGT
instruc=on, generate the @i // sum += i

corresponding binary code D=M


@sum
using parse*() methods M=D+M

q  Assemble the translated binary codes


@i // i++
M=M+1
into a complete 16-bit machine @LOOP // goto LOOP
instruc=on 0;JMP
(WRITE)
using Code class methods and parts of @sum
instruc=on from Parser variables D=M

q  Write the 16-bit instruc=on to the


@1
M=D // RAM[1] = the sum
output file. (END)
@END
0;JMP

Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 2: Boolean Arithmetic slide 11
Typical symbolic Hack
Handling symbols (aka symbol resolution) assembly code:
@R0
D=M
Assembly programs typically have many @END
symbols: D;JLE
@counter
q  Labels that mark des=na=ons of goto M=D
commands @SCREEN
D=A
q  Labels that mark special memory loca=ons @x

q  Variables
M=D
(LOOP)
@x
A=M
These symbols fall into two categories: M=-1
q  User–defined symbols (created by programmers) @x
D=M
q  Pre-defined symbols (used by the Hack plakorm). @32
D=D+A
@x
M=D
@counter
MD=M-1
@LOOP
D;JGT
(END)
@END
0;JMP

Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 2: Boolean Arithmetic slide 12
Typical symbolic Hack
Handling symbols: user-defined symbols assembly code:
@R0
Label symbols: Used to label des=na=ons of goto D=M
commands. Declared by the pseudo-command @END
D;JLE
(XXX). This direc=ve defines the symbol XXX to @counter
refer to the instruc=on memory loca=on holding the M=D
next command in the program @SCREEN
D=A
Variable symbols: Any user-defined symbol xxx @x
appearing in an assembly program that is not M=D
defined elsewhere using the (xxx) direc=ve is (LOOP)
treated as a variable, and is automa=cally assigned a @x
A=M
unique RAM address, star=ng at RAM address 16 M=-1
(why start at 16? Later...) @x
D=M
By conven=on, Hack programmers use lower-case @32
and upper-case to represent variable and label D=D+A
names, respec=vely @x
M=D
@counter
Q: Who does all the “automatic” assignments of symbols MD=M-1
to RAM addresses? @LOOP
See reading/appendix for valid symbol chars D;JGT
(END)
A: As part of the program translation process, the assembler @END
resolves all the symbols into RAM addresses. 0;JMP

Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 2: Boolean Arithmetic slide 13
Typical symbolic Hack
Handling symbols: pre-defined symbols assembly code:
@R0
D=M
Virtual registers: @END
The symbols R0,…, R15 are automatically predefined to D;JLE
refer to RAM addresses 0,…,15 @counter
M=D
I/O pointers: The symbols SCREEN and KBD are automatically @SCREEN
predefined to refer to RAM addresses 16384 and 24576, D=A
respectively (base addresses of the screen and keyboard @x
M=D
memory maps)
(LOOP)
VM control pointers: the symbols SP, LCL, ARG, THIS, and THAT @x
(that don’t appear in the code example on the right) are A=M
M=-1
automatically predefined to refer to RAM addresses 0 to
@x
4, respectively D=M
(The VM control pointers, which overlap R0,…, R4 will come to @32
D=D+A
play in the virtual machine implementation, covered in the
@x
next lecture) M=D
@counter
Q: Who does all the “automatic” assignments of symbols MD=M-1
to RAM addresses? @LOOP
D;JGT
A: As part of the program translation process, the assembler (END)
resolves all the symbols into RAM addresses. @END
0;JMP

Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 2: Boolean Arithmetic slide 14
Handling symbols: symbol table
Source code (example) Symbol table
// Computes 1+...+RAM[0] Every program has these! R0 0

// And stored the sum in RAM[1] R1 1

@i R2 2

M=1 // i = 1 ... ...

@sum R15 15

M=0 // sum = 0 SCREEN 16384

(LOOP) KBD 24576

@i // if i>RAM[0] goto WRITE SP 0

D=M LCL 1

@R0 ARG 2

D=D-M THIS 3

@WRITE
THAT 4

D;JGT
WRITE 18

@i // sum += i
END 22

D=M
@sum i 16

M=D+M sum 17

@i // i++
M=M+1
@LOOP // goto LOOP
0;JMP
(WRITE) This symbol table is generated
@sum by the assembler, and used to
D=M
translate the symbolic code
@R1
M=D // RAM[1] = the sum into binary code.
(END)
@END
0;JMP

Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 2: Boolean Arithmetic slide 15
Symbol table
Handling symbols: constructing the symbol table R0 0

R1 1

Source code (example) R2 2

... ...

// Computes 1+...+RAM[0] R15 15

// And stored the sum in RAM[1]
SCREEN 16384

@i
KBD 24576

M=1 // i = 1
SP 0

@sum
LCL 1

M=0 // sum = 0
(LOOP) ARG 2

@i // if i>RAM[0] goto WRITE THIS 3

D=M THAT 4

@R0 WRITE 18

D=D-M END 22

@WRITE i 16

D;JGT sum 17

@i // sum += i
D=M Initialization: create an empty
@sum symbol table and populate it with all
M=D+M the pre-defined symbols
@i // i++
M=M+1 First pass: go through the entire
@LOOP // goto LOOP source code, and add all the user-
defined label symbols to the symbol
0;JMP
(WRITE)
@sum table (without generating any code)
D=M
@R1 Second pass: go again through the
M=D // RAM[1] = the sum source code, and use the symbol
(END) table to translate all the commands.
@END In the process, handle all the user-
0;JMP defined variable symbols.

Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 2: Boolean Arithmetic slide 16
The assembly process (detailed algorithm)

•  Ini=aliza=on: create the symbol table and ini=alize it with the pre-defined symbols
•  First pass: march through the source code without genera=ng any code.
For each label declara=on (LABEL) that appears in the source code,
add the pair <LABEL , n > to the symbol table

•  Second pass: march again (advance) through each line of source code, and process each:
–  If the line is a C-instruc=on, simple
–  If the line is @xxx where xxx is a number, simple
–  If the line is @xxx and xxx is a symbol, look it up in the symbol table and
proceed as follows:
•  If the symbol is found, replace it with its numeric value and complete
the command’s transla=on
•  If the symbol is not found, then it must represent a new variable:
add the pair <xxx , n > to the symbol table, where n is the next

available RAM address, and complete the command’s transla=on.


•  (Plakorm design decision: the allocated RAM addresses are running,
star=ng at address 16. Must keep track of next available RAM
address too!).
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 2: Boolean Arithmetic slide 17
The result ... Target code

Source code (example)


0000000000010000

1110111111001000

// Computes 1+...+RAM[0] 0000000000010001

// And stored the sum in RAM[1] 1110101010001000

@i 0000000000010000

M=1 // i = 1 1111110000010000

@sum 0000000000000000

M=0 // sum = 0 1111010011010000

(LOOP) 0000000000010010

@i // if i>RAM[0] goto WRITE 1110001100000001

D=M 0000000000010000

@R0 1111110000010000

D=D-M 0000000000010001

@WRITE
D;JGT assemble 1111000010001000

0000000000010000

@i // sum += i 1111110111001000

D=M 0000000000000100

@sum 1110101010000111

M=D+M 0000000000010001

@i // i++ 1111110000010000

M=M+1 0000000000000001

@LOOP // goto LOOP 1110001100001000

0;JMP 0000000000010110

(WRITE) 1110101010000111

@sum
D=M
@R1 Note that comment lines and
M=D // RAM[1] = the sum
(END)
pseudo-commands (label
@END declarations) generate no code,
0;JMP nor contribute to final line count

Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 2: Boolean Arithmetic slide 18
Proposed assembler implementation
An assembler program can be wriAen in any high-level language.
We propose a language-independent design, as follows.

So3ware modules:

q  Parser: Unpacks each command into its underlying fields

q  Code: Translates each field into its corresponding binary value,


and assembles the resul=ng values

q  SymbolTable: Manages the symbol table

q  Main: Ini=alizes I/O files and drives the show.

q  We will call this Assembler, and it also does error handling (excep=ons or booleans)!

Proposed implementation stages

q  Stage I: Build a basic assembler for programs with no symbols

q  Stage II: Extend the basic assembler with symbol handling


capabilities.

Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 2: Boolean Arithmetic slide 19
Parser (a software module in the assembler
program)

Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 2: Boolean Arithmetic slide 20
Parser (a software module in the assembler program) / continued

Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 2: Boolean Arithmetic slide 21
Code (a software module in the assembler program)

Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 2: Boolean Arithmetic slide 22
SymbolTable (a software module in the
assembler program)

Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 2: Boolean Arithmetic slide 23
Perspective
•  Simple machine language, simple assembler
•  Most assemblers are not stand-alone, but rather
encapsulated in a translator of a higher order
•  C programmers that understand the code generated by
a C compiler can improve their code considerably
•  C programming (e.g. for real-=me systems) may involve
re-wri=ng cri=cal segments in assembly, for
op=miza=on
•  Wri=ng an assembler is an excellent prac=ce for wri=ng
more challenging translators, e.g. a VM Translator and a
compiler, as we will do in the next lectures.

Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 2: Boolean Arithmetic slide 24

You might also like