Practice Final Soln
Practice Final Soln
Name:
There are 13 questions, with the point values as shown below. You have 180
minutes with a total of 150 points. Pace yourself accordingly.
This exam must be individual work. You may not collaborate with your
fellow students. You may use 1 sheet of notes you created, but no other
external resources.
I certify that the work shown on this exam is my own work, and that I
have neither given nor received improper assistance of any form in the com-
pletion of this work.
Signature:
1
Question 1: Vocabulary [10 pts]
Match each of the following definitions with the appropriate vocab word:
A ALU
U XOR-gate
2
Question 2: Binary Math [10 pts]
1. Convert the number -42 to signed, 2’s complement 8-bit binary. 1101 0110
2. Write the binary number 0110 1111 0000 0101 in hexidecimal. 0x6F05
4. Add the binary numbers 0101 1110 + 0111 0010. 1101 0000
5. State whether the addition you did in part 4 overflows if the operands
are treated as signed numbers. Yes
6. State whether the addition you did in part 4 overflows if the operands
are treated as unsigned numbers. No
3
Question 3: C Programming [10 pts]
Given the following linked list node definition:
struct ll_node {
int data;
struct ll_node * next;
};
Write the reverseList function which reverses a linked list, and returns the
reversed list.
Answer:
4
Question 4: MIPS Assembly [10 pts]
Translate the strUpper function (written in C below) to MIPS assembly:
void strUpper(char * s) {
while (*s != ’\0’) {
*s = toUpper(*s);
s++;
}
}
Answer:
strUpper:
addiu $sp, $sp, 32
sw $fp, 0($sp)
sw $ra, 4($sp)
sw $s0, 8($sp)
addiu $fp, $sp, 28
move $s0, $a0
.L_lp:
lbu $a0, 0($s0)
beqz $a0, .L_done
jal toUpper
sw $v0, 0($s0)
addiu $s0, $s0, 1
b .L_lp
.L_done
lw $fp, 0($sp)
lw $ra, 4($sp)
lw $s0, 8($sp)
addiu $sp, $sp, 32
jr $ra
5
Question 5: Logic Gates [10 pts]
Given the following circuit:
A
AND
NOT
OR
B
AND
1. Write the boolean formula for this circuit (A and not B) or (B and C)
A B C Out
0 0 0 0
0 0 1 0
0 1 0 0
0 1 1 1
1 0 0 1
1 0 1 1
1 1 0 0
1 1 1 1
6
Question 6: Performance [20 pts]
A pipelined processor executes a 100-billion instruction program, and has
the following performance related characteristics:
3. What is the CPI penalty from data memory (D-cache misses)? 0.2
6. How long (in seconds) does the program take to execute? 80 seconds
7
7. Supposed the pipeline could be re-designed to have more stages and
a faster clock. Under this new design, the clock frequency is 4.0GHz
(0.25ns). The branch mis-prediction penalty increases to 10 cycles.
Now, 50% of loads cause a 2-cycle load-to-use penalty.
(a) What is the new CPI penalty from load-to-use stalls? 0.2
(b) What is the new CPI penalty from branch mis-predictions? 0.2
(c) What is the new CPI penalty from data memory? 0.3
(d) What is the new CPI penalty from instruction memory? 0.3
(f) How long (in seconds) does the program take to execute now?
50 seconds
8
Question 7: Datapaths [15 pts]
The following (multi-cycle) datapath has support for J-type absolute jumps
(jal, j, etc), but not for I-type relative branches (beq, bne, etc). Recall that
these relative branches take the immediate value, shift it left by 2, add it
to PC+4, and then use that as the target (if taken). Add support to this
datapath for such branches.
+4
<<2
A O
D
PC IMEM IW RegFile
B DMEM
SX
Const
9
Question 8: Caches I [10 pts]
You are desigining the memory hierarchy for a new processor. The access
latency of main memory is 200 cycles. You have the following choices for the
L2 design:
Answer:
The L2 design is independent of the L1 design, so we can/should select it
first. For the L2, we can compute Tavg for each design:
2MB: 30 + 0.01 * 200 = 32
1MB: 20 + 0.05 * 200 = 30
512K: 15 + 0.10 * 200 = 35
This means the 1MB (Tavg=30) is the best L2 design. Now we can pick the
L1:
64KB: 2 + 0.04 * 30 = 2 + 1.2 = 3.2
32KB: 1 + 0.05 * 30 = 1 + 1.5 = 2.5
This means 32KB (Tavg = 2.5) is the best L1 design.
10
Question 9: Caches II [10 pts]
Assume you have an empty 32B cache with 8B blocks. The cache is 2-way set-
associative. Addresses are 8 bits. The left column below lists the addresses
accessed. For each address, show how it is split into tag, index and offset in
the next three columns, show the new state of the cache after the access in
the next 4 columns, and state the outcome—whether its a hit or a miss—in
the last column. The first row shows the initial state of the cache. The
columns for the ways of each set show the tags (only) in those ways. Way 0
is MRU, Way 1 is LRU in each set at any given time. You do not need to
worry about data values. All numbers are in hex, as should be all of
the numbers in your answer.
Set 0 Set 1
Address Tag Index Offset Way 0 Way 1 Way 0 Way 1 Outcome
(start) — — — 0 F 7 C —
F1 F 0 1 F 0 7 C Hit
1F 1 1 7 F 0 1 7 Miss
18 1 1 0 F 0 1 7 Hit
C2 C 0 2 C F 1 7 Miss
81 8 0 1 8 C 1 7 Miss
01 0 0 1 0 8 1 7 Miss
11
Question 10: Virtual Memory [15 pts]
Suppose that a system has a 32-bit (4GB) virtual address space. It has 1GB
of physical memory, and uses 1MB pages.
1. How many virtual pages are there in the address space? 4096
2. How many physical pages are there in the address space? 1024
6. Some entries of the page table are shown below (all values are in hex,
and all entries shown are valid). Translate virtual address 0x410423 to a
physical address, using the translations in this page table. 0xDD10423
12
Entry Number Value
0 1F
1 3C
2 55
3 9C
4 DD
5 EE
6 99
... ...
20 2F
21 4C
22 65
23 AC
24 ED
25 FE
26 100
... ...
40 11F
41 13C
42 155
43 19C
44 1DD
45 1EE
46 199
... ...
13
Question 11: Branch Prediction [10 pts]
One particular branch (i.e., one specific PC) has the following actual out-
comes. Show the predictions for both a one-bit counter and a two-bit counter
(no history). For the two-bit counter, use T for strongly taken, t for weakly
taken, n for weakly not-taken, and N for strongly not taken. Finally fill in
the accuracy (percentage of predictions that were correct) at the bottom of
the table. The first prediction is done for you.
1-bit counter 2-bit counter
Outcome Prediction Correct? Prediction Correct?
T N no n no
T T yes t yes
T T yes T yes
N T no T no
T N no t yes
N T no T no
T N no t yes
T T yes T yes
T T yes T yes
N T no T no
Accuracy 40% 60%
14
Question 12: Short Answer [10 pts]
Answer:
A segmentation fault occurs when a program attempts to access invalid
memory. The canonical example is an attempt to dereference a NULL
pointer. The hardware detects this situation by finding no valid trans-
lation for the requested address, causing a page fault. The OS then
determines that the requested address lies outside of the program’s
address space and terminates it with a segmentation fault.
2. Compare and contrast caches and virtual memory. Give at least one
similarity and one difference between the two. For the difference, ex-
plain why this difference exists.
Answer:
(Many possible)
Similarity: both split memory into fixed sized chunks (blocks/pages),
and manipulate memory at this granularity.
Difference: In virtual memory, software (the OS) makes a replace-
ment decision, while in caches, the hardware makes the replacement
decision. This difference exists because Tmiss is so much larger for
memory (which misses to disk) than any other level of the memory
hierarchy—this justifies the extra time for software transfer control
into a software routine to make a complex decision, in the hopes of
reducing %miss.
15
Question 13: Multiple-Multiple-Choice
[10 pts]
For each question, circle all that apply. If none of the selections are appro-
priate, then choose “e. None of the above”
a. Memory-to-memory operations
b. 3 operand arithmetic operations
c. Fixed length instruction encodings
d. Load-compare-branch instructions
e. None of the above
4. Which of the following are problems that can arise when pipelining:
a,d
a. Data Hazards
b. Water Hazards
16
c. Dukes of Hazards
d. Control Hazards
e. None of the above
a. Interrupts are a simpler mechanism for both the hardware and soft-
ware.
b. Interrupts allow for more efficient CPU utilization.
c. Interrupts are compatible with virtual memory, while polling is not.
d. Polling causes bad hit rates in the L1 instruction cache.
e. None of the above
17