0% found this document useful (0 votes)

4 views

exercices memory-caches

The document outlines exercises for the EIT090 Computer Architecture course at Lund University, covering various topics such as memory systems, cache, storage systems, and multiprocessors over several weeks. It includes home assignments, detailed exercises, and questions related to computer architecture concepts, with specific tasks for students to complete. The document emphasizes the importance of individual work, proper referencing, and adherence to submission deadlines.

Uploaded by

Steve

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views

exercices memory-caches

Uploaded by

Steve

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 31

Exercises for

EIT090 Computer Architecture

HT2 2009
DRAFT
Anders Ardö
Department of Electrical and Information Technology, EIT
Lund University

October 22, 2009

Contents
1 Exercises week 8 3
1.1 Home Assignment 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Memory systems, Cache II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Exercises week 9 6
2.1 Memory systems, Virtual Memory . . . . . . . . . . . . . . . . . . . . . . . . . . 6

3 Exercises week 10 8
3.1 Storage systems, I/O . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

4 Exercises week 11 11
4.1 Multiprocessors I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

5 Exercises week 12 14
5.1 Home Assignment 2 - online quiz . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
5.2 Multiprocessors II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

6 Exercises week 13 16
6.1 Old exam 2003-12-17 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

7 Exercises week 14 19
7.1 Questions and answers session . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

8 Brief answers 20
8.1 Memory systems, Cache II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
8.2 Memory systems, Virtual Memory . . . . . . . . . . . . . . . . . . . . . . . . . . 21
8.3 Storage systems, I/O . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
8.4 Multiprocessors I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
8.5 Multiprocessors II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
8.6 Old exam 2003-12-17 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

1
2
1 Exercises week 8
1.1 Home Assignment 1
OPTIONAL!
However - An approved home assignment will give you 2 extra points on the
exam.
Select one item from the list below, and describe it as detailed as possible. You should
characterize it along all relevant subjects (like ISA, pipeline type, stages, registers, register
renaming, type of instruction issue, type of instruction commit, type of scheduling, branch
prediction, superscalar, VLIW, caches, cache optimizations, bandwidth, size, no of transistors,
etc) covered by this course. Your report have to include all references used to find the information
included.

• AMD Barcelona

• AMD Phenom II X4 965

• Intel Core 2

• Intel Core i7

• Intel/HP Itanium 2

• IBM Power6

• IBM Power7

• ARMv7

• ARM Cortex-A8

Home assignments are individual. You are not allowed to copy from each other.
You should aim at producing 2-5 A4 pages text (more if many figures are used) including
references.
The text can be either in English or Swedish.
You are allowed to cite 1-3 sentences with explicit reference to the source. Larger pieces
of text copied from any source will be detected by the Urkund system to which your home
assignments will be submitted, and are of course not allowed. Any detected plagiarism will
automatically generate a fail on the assignment.
The home assignment must at least contain:

• The title of the home assignment

• Your name

• The date it was written

All the sources you have used must be listed in a ’References’ section at the end of your home
assignment. You can use any number of sources to find the information you need to complete
the assignment, for example:

• the course literature

3
• E-huset physical library (http://www.ehuset.lth.se/english/library/)

• LUB digital library (http://elin.lub.lu.se/)

• Internet/Web based sources

Home assignments must be computer readable documents and sent, by email, to

[email protected]
Home assignments HAVE to be in my mailbox no later than November 8, 2009.

1.2 Memory systems, Cache II

Exercise 1.1 What is a replacement algorithm? Why is such an algorithm needed with cache
memories? With which of the following strategies is a replacement algorithm needed:

• Direct mapping

• Set-associative mapping

• Fully associative mapping

Exercise 1.2 Draw a schematics of how the following cache memory can be implemented:
Total size 16kB, 4-way set-associative. Blocksize 16 byte, replacement algorithm LRU, uses
copy-back.
The schematics should among others show central MUXes, comparators and connections. It
should clearly indicate how a physical memory address is translated to a cache position.
The greater detail shown the better.

Exercise 1.3 What is a write-through cache? Is it faster/slower than a write-back cache with
respect to the time it takes for writing.

Exercise 1.4 Hennessy/Patterson, Computer Architecture, 4th ed., exercise 5.4

4
Exercise 1.5 Hennessy/Patterson, Computer Architecture, 4th ed., exercise 5.5

Exercise 1.6 Three ways with hardware and/or software to decrease the time a program spends
on (data) memory accesses are:

• nonblocking caches with “hit under miss”

• hardware prefetching with 4–8 stream buffers

• software prefetching with nonfaulting cache prefetch

Explain, for each of these methods, how it affects:

a) miss rate

b) memory bandwidth to the underlying memory

c) number of executed instructions

Exercise 1.7 In systems with a write-through L1 cache backed by a write-back L2 cache instead
of main memory, a merging write buffer can be simplified.

a) Explain how this can be done.

b) Are there situations where having a full write buffer (instead of the simple version you
have just proposed) could be helpful?

Exercise 1.8 Shortly describe the three C’s model.

Exercise 1.9 Explain where replacement policy fits into the three C’s model, and explain why
this means that misses caused by a replacement policy are “ignored”- or more precisely cannot
in general be definitively classified - by the three C’s model.

5
2 Exercises week 9
2.1 Memory systems, Virtual Memory
Exercise 2.1 As caches increase in size, blocks often increase in size as well.

a) If a large instruction cache has larger blocks, is there still a need for pre-fetching? Explain
the interaction between pre-fetching and increased block size in instruction caches.

b) Is there a need for data pre-fetch instructions when data blocks get larger?

Exercise 2.2 Some memory systems handle TLB misses in software (as an exception), while
others use hardware for TLB misses.

a) What are the trade-offs between these methods for handling TLB misses?

b) Will TLB miss handling in software always be slower than TLB misses in hardware?
Explain!

c) Are there page table structures that would be difficult to handle in hardware, but possible
in software? Are there any such structures that would be difficult for software to handle
but easy for hardware to manage?

Exercise 2.3 The difficulty of building a memory system to keep pace with faster CPUs is
underscored by the fact that the raw material for main memory is the same as that found in
the cheapest computer. The performance difference is rather based on the arrangement.

a) List the four measures in answer to questions on block placement, identification, replace-
ment and writing strategy that rule the hierarchical construction for virtual memory.

b) What is the main purpose of the Translation-Lookaside Buffer within the memory hierar-
chy? Give an appropriate set of construction rules and explain why.

c) Fill the following table with characteristic (typical) entries:

TLB 1st-level cache 2nd-level cache Virtual memory
Block size (in bytes)
Block placement
Overall size

Exercise 2.4 Designing caches for out-of-order (OOO) superscalar CPUs is difficult for several
reasons. Clearly, the cache will need to be non-blocking and may need to cope with several
outstanding misses. However, the access pattern for OOO superscalar processors differs from
that generated by in-order execution.
What are the differences, and how might they affect cache design for OOO processors?

Exercise 2.5 Consider the following three hypothetical, but not atypical, processors, which we
run with the SPEC gcc benchmark

1. A simple MIPS two-issue static pipe running at a clock rate of 4 GHz and achieving
a pipeline CPI of 0.8. This processor has a cache system that yields 0.005 misses per
instruction.

6
2. A deeply pipelined version of a two-issue MIPS processor with slightly smaller caches and
a 5 GHz clock rate. The pipeline CPI of the processor is 1.0, and the smaller caches yield
0.0055 misses per instruction on average.

3. A speculative MIPS, superscalar with a 64-entry window. It achieves one-half of the ideal
issue rate measured for this window size (9 instruction issues per cycle). This processor
has the smallest caches, which leads to 0.01 misses per instruction, but hides 25scheduling.
This processor has a 2.5 GHz clock.

Assume that the main memory time (which sets the miss penalty) is 50 ns. Determine the
relative performance of these three processors.

Exercise 2.6 a) Give three arguments for larger pages in virtual memory, and one against.

b) Describe the concepts ’page’, ’page fault’, virtual address’, ’physical address’, ’TLB’, and
’memory mapping’ and how how they are related.

c) How much memory does the page table, indexed by the virtual page number, take for a
system using 32 bit virtual addresses, 4 KB pages, and 4 bytes per page table entry. The
system has 512 MB of physical memory.

d) In order to save memory sometimes inverted page tables are used. Briefly describe how
they are structured. How much memory would inverted page tables take for the above
system.

Exercise 2.7 A) Describe two cache memory optimization techniques that may improve hit
performance (latency and throughput). For each technique, specify how it affects hit time
and fetch bandwidth.

B) Describe two cache memory optimization techniques that may reduce miss rate, and define
the miss type (compulsory, capacity, conflict) that is primarily affected by each technique.

C) Describe two cache memory optimization techniques that may reduce miss penalty.

7
3 Exercises week 10
3.1 Storage systems, I/O
For Hennessy/Patterson exercises 6.8 - 6.14.

Exercise 3.1 Hennessy/Patterson, Computer Architecture, 4th ed., exercise 6.8

Exercise 3.2 Hennessy/Patterson, Computer Architecture, 4th ed., exercise 6.9

Exercise 3.3 Hennessy/Patterson, Computer Architecture, 4th ed., exercise 6.10

Exercise 3.4 Hennessy/Patterson, Computer Architecture, 4th ed., exercise 6.14

8
For Hennessy/Patterson exercises 6.19 - 6.22.

Exercise 3.5 Hennessy/Patterson, Computer Architecture, 4th ed., exercise 6.19

Exercise 3.6 Hennessy/Patterson, Computer Architecture, 4th ed., exercise 6.21

9
Exercise 3.7 Hennessy/Patterson, Computer Architecture, 4th ed., exercise 6.22

10
4 Exercises week 11
4.1 Multiprocessors I
Exercise 4.1 There are two main varieties (classes) of hardware-based cache coherence proto-
cols. Which are they and what are the main differences, strengths and weaknesses?

Exercise 4.2 Briefly describe MIMD and SIMD computers outlining the differences. Give
examples of computers (or type of computers) from each class.

Exercise 4.3 Assume a directory-based cache coherence protocol. The directory currently has
information that indicates that processor P1 has the data in “exclusive” mode.
If the directory now gets a request for the same cache block from processor P1, what could
this mean? What should the directory controller do?

Exercise 4.4 Although it is widely believed that buses are the ideal way to interconnect small-
scale multiprocessors, this may not always be the case. For example, increases in processor
performance are lowering the processor count at which a more distributed implementation be-
comes attractive. Because a standard bus-based implementation uses the bus both for access to
memory and for inter-processor coherency traffic, it has a uniform memory access time for both.
In comparison, a distributed memory implementation may sacrifice on remote memory access,
but it can have a much better local memory access time.
Consider the design of a multi-processor with 16 processors. Each CPU is driven by a 150
MHz clock. Assume that a memory access takes 150 ns from the time the address is available
from either the local processor or a remote processor until the first word is delivered. The bus
is driven by a 50 MHz clock. Each bus transaction takes five bus clock cycles, each 20 ns in
length, to perform arbitration, resolution, address, decode and acknowledge.
The detection of the miss and the generation of the memory request by the processor consists
of three steps: detecting a miss in the primary on-chip cache; initiating a secondary (off-chip)
cache access and detecting a miss in the secondary cache; and driving the complete address
off-chip through the bus. This process takes about 40 processor clock cycles.
For the bus and memory component, the initial read request is one bus transaction of 5 bus
cycles. The latency until memory is ready to transfer is 12 bus clock cycles. The reply will then
transfer all 128 bytes of a cache block in one reply transaction, taking 5 bus clock cycles. The
total is 22 bus clock cycles, which equals 66 processor clocks. It takes 16 bus cycles to reload
the cache line, while restarting the pipeline takes 10 processor cycles. The total is 58 cycles

a) How fast is a local access?

b) Assume that the interconnect is a 2-D grid with links that are 16 bits wide and clocked
at 100 MHz, with a start-up time of five cycles for a message. Assume one clock cycle
between nodes in the network, and ignore overhead in the messages and contention (i.e.
assume that the network bandwidth is not the limit). Find the average remote memory
access time, assuming a uniform distribution of remote requests.

Exercise 4.5 Nearly all computer manufacturers offer today multi-core microprocessors. This
assignment focuses on concepts central to how thread-level parallelism can be exploited to offer
a higher computational performance.

11
a) The performance of a superscalar processor is limited by the amount of instruction-level
parallelism in the program. In particular, when a load instruction must fetch data from
memory, it can be difficult to find a sufficient number of independent instructions to
execute while the data is being fetched from memory. Multithreading is a technique to
do useful work while waiting for the data to be returned from memory. Explain how the
following concepts can keep the processor busy doing useful work:

– Fine-grain multithreading

– Coarse-grain multi-threading

– Simultaneous multithreading

b) What structures in a superscalar processor must be replicated to realize a simultaneous

multithreaded processor?

c) Flynn classifies computer architectures that leverage thread-level parallelism into four
categories. Which ones?

d) Shared-memory multiprocessors is an important class of architectures that form the basis

for multi-core microprocessors. The memory model is such that all processors access the
same memory.

– What is cache coherence?

– How does an invalidation-based cache coherence protocol work?

– How is the lock primitive in a critical section implemented using test-and-set instruc-
tions?

Exercise 4.6 a) What structures in a superscalar processor must be replicated to realize a

simultaneous multithreaded processor?
Shared-memory multiprocessors is an important class of architectures that form the basis
for multi-core microprocessors. The memory model is such that all processors access the same
memory.

b) What is cache coherence? Give an example of what can happen if cache coherence is
missing.

c) A commonly used cache coherence protocol relies on snooping and invalidations. Below you
find a list of requests that arrive to the cache coherence mechanism. Connect all requests,
A-N, with the correct cache action and explanation, 1-14. Hint: each request matches
exactly one action/explanation. Your answer should be a table listing all connections like
A-3, B-2, C-8, etc...

12
Request Source State of addressed cache block
Read hit Processor shared or modified A
Read miss invalid B
Read miss shared C
Read miss modified D
Write hit modified E
Write hit shared F
Write miss invalid G
Write miss shared H
Write miss modified I
Read miss Bus shared J
Read miss modified K
Invalidate shared L
Write miss shared M
Write miss modified N
Type of cache action Function and explanation
1 normal hit Write data in cache.
2 coherence Place invalidate on bus.
3 coherence Attempt to write block that is shared; invalidate the cache
block
4 normal hit Read data in cache.
5 replacement Address conflict miss: write back block, then place write
miss on bus.
6 normal miss Place read miss on bus.
7 replacement Address conflict miss: write back block, then place read miss
on bus.
8 normal miss Place write miss on bus.
9 coherence Attempt to write shared block; invalidate the block.
10 coherence Attempt to write block that is exclusive elsewhere: write
back the cache block and make its state invalid
11 coherence Attempt to share data: place cache block on bus and change
state to shared.
12 replacement Address conflict miss: place write miss on bus.
13 replacement Address conflict miss: place read miss on bus.
14 no action Allow memory to service read miss.

13
5 Exercises week 12

5.1 Home Assignment 2 - online quiz

OPTIONAL!
However - An approved quiz will give you 2 extra points on the exam.
Take the quiz available for Computer Architecture EIT090, at http://courses.eit.lth.se/
It will be open during week 12 and 13 (weeks 5 and 6 in HT2 - 2009-11-23 – 12-06). You have
to log in to be able to see it.
Every student has a username and password based on your official mailaddress. Example
for [email protected] it will be:
username: et01xy9
password: ePWt01xy9

If you have a problem contact the course coordinator, Anders Ardö. You can take the quiz
any number of times during the time mentioned above.
When you have logged in, choose ’Computer Architecture EIT090’ and click on the quiz.
Then you can start answering questions. After all questions are answered you can send in
your answers by clicking on ’Submit all and finish’. You will get a feedback saying how many
correct answers you have. Both questions and numeric values in the quiz are selected randomly
each time you try the quiz. Redo the test until you have at least 90 % correct in order to be
approved.

5.2 Multiprocessors II

Exercise 5.1 Assume that we have a function for an application of the form F (i, p) which gives
the fraction of time that exactly i processors are usable given that a total of p processors are
available. This means that
p
X
F (i, p) = 1
i=1

Assume that when i processors are in use, the application runs i times faster. Rewrite
Amdahl’s Law so that it gives the speedup as a function of p for some application.

Exercise 5.2 One proposed solution for the problem of false sharing is to add a valid bit per
word (or even for each byte). This would allow the protocol to invalidate a word without
removing the entire block, letting a cache keep a portion of a block in its cache while another
processor writes a different portion of the block. what extra complications are introduced into
the basic snooping cache coherence protocol (see figure below) if this capability is included?
remember to consider all possible protocol actions.

14
Exercise 5.3 Some systems do not use multiprocessing for performance. Instead they run the
same program in lockstep on multiple processors. What potential benefit is possible on such
multiprocessors.

Exercise 5.4 When trying to perform detailed performance evaluation of a multiprocessor sys-
tem, system designers use one of three tools: analytical models, trace-driven simulation, and
execution-driven simulation. Analytical models use mathematical expressions to model the be-
havior of programs. Trace driven simulations run the applications on a real machine and generate
a trace, typically of memory operations. These traces can then be replayed through a cache sim-
ulator or a simulator with a simple processor model to predict the performance of the system
when various parameters are changed. Execution driven simulators simulate the entire execution
including maintaining an equivalent structure for the processor state, and so on. Discuss the
accuracy/speed trade-offs between these approaches.

15
6 Exercises week 13
6.1 Old exam 2003-12-17

Exercise 6.1
Computer Architecture, EIT
090
Final Exam Department of Information Technology

17 December 2003 8 – 13
The exam consists of a number of problems with a total of 50 points.
Grading 20 p ≤ grade 3 < 30 p ≤ grade 4 < 40 p ≤ grade 5
Instructions:

• You may use a pocket calculator and an English dictionary on this exam, but no other
aids

• Please start answering each problem on a new sheet – New problem =⇒ New sheet

• Write your name on each sheet of paper that you hand in – Name on each sheet

• Answers can be given in Swedish or English

• You must motivate your answers thoroughly. If there, in your opinion, is not enough
information to solve a problem, you can make reasonable assumptions that you need in
order to solve the problem. State these assumptions clearly!

GOOD LUCK :-)

Problem 1

Briefly (1-2 sentences) describe the following items/concepts concerning computer architecture:
(10)

a) dominance

b) basic block

c) inverted page table

d) true sharing misses

e) register renaming

f) data dependency

16
g) way prediction

h) unified cache

i) sequential consistency

j) General purpose register instruction set architecture (GPR ISA)

Problem 2

a) Describe the concept “memory hierarchy”, and state why it is important. State the func-
tion of each part, normally used hardware components, and what problems they solve (if
any). (5)

b) Describe the problems of implementing a “Fetch-and-increment” instruction (it atomically

increments the value of a memory location and saves the original value in a register) in a
simple MIPS like 5 stage pipeline. What is such an instruction used for? (5)

Problem 3

a) Define an expression (Speedup= ) for pipeline speedup as a function of:

– Tunpipelined : execution time for the non-pipelined unit.

– max(Tpipestage ): the execution time of the slowest pipe-stage.
– Tlatch : overhead time (setup time) for the storage element between pipe-stages.

(4)
Example:
IN OUT
- Tunpipelined -

clock
IN ? ? OUT
- Tpipestage1 - T - Tpipestage2 - T Tpipestage3
latch latch
- -

b) Define an expression for pipeline speedup as a function of (use only these):

– noofstages: number of pipe stages (assume equal pipe stage execution time).
– branch freq (bf): relative frequency of branches in the program.
– branch penalty (bp): number of clock cycles lost due to a branch.

(3)

c) Give an example of a piece of assembly code that contains WAW, RAW and WAR hazards
and identify them. (Use for example the assembly instruction
ADDD Rx,Ry,Rz
which stores (Ry+Rz) in Rx) (3)

17
Problem 4

Compare three computers with these processors and characteristics

A: a Celeron 2.4 GHz processor, 128 KByte cache, 128 byte blocks, copy-back (write-back)
with an average of 30 % dirty blocks, price 650 SEK

B: a P4 2.4 GHz processor, 512 KByte cache, 128 byte blocks, copy-back (write-back) with
an average of 35 % dirty blocks, price 1495 SEK

C: a P4 3.0 GHz processor, 512 KByte cache, 128 byte blocks, copy-back (write-back) with
an average of 35 % dirty blocks, price 2595 SEK

The main application is program development, so the compiler gcc is considered being the most
used program and is therefore used as the performance indicator. Assume that the processors
have the same architecture, and that the base CPI (for gcc) without effects from the above
mentioned cache (but including other caches and TLB) is 1.1
Some statistics for gcc:
Cache size Miss rate
512 KB 0.0075
256 KB 0.0116
128 KB 0.0321
64 KB 0.09
Instruction frequencies
load store uncond branch cond branch int fp
25.8 % 13.4 % 4.8 % 15.5 % 40.5 % 0 %
Main memory take 50 ns to set up and each transfer of 128 bits from main memory to the
cache takes 4 ns. Assume that the memory system can handle memory at these speeds and
widths.
Which of the three computers (A, B, C) have the best price/performance ratio? Motivate
your answer thoroughly. (10)

Problem 5

a) There are two main varieties (classes) of hardware-based cache coherence protocols. Which
are they and what are the main differences, strengths and weaknesses. (4)

b) Briefly describe MIMD and SIMD computers, outlining the differences. Give examples of
computers (or type of computers) from each class. (4)

c) Use Amdahl’s law to give a quantitative argument for keeping a computer system balanced
in terms of relative performance (for example processor speed versus I/O speed) as tech-
nological and methodological development improves various sub-systems of a computer.
(2)

18
7 Exercises week 14
7.1 Questions and answers session

19
8 Brief answers
8.1 Memory systems, Cache II
1.1 When a cache miss occurs, the controller must select a block to be replaced with the desired
data. Three primary strategies for doing this are: Random, Least-recently used (LRU), and
First-in first-out (FIFO).
A replacement algorithm is needed with set-associative and fully associative caches. For
direct mapped caches there is no choice, the block to replaced is uniquely determined by the
address.

1.2 see figure

31 12 11 4 3 0
TAG Block index Byte
select

Byte select Byte select Byte select Byte select

Block TAG d Data Block TAG d Data Block TAG d Data Block TAG d Data
index Store i Store index Store i Store index Store i Store index Store i Store
r r r r
t t t t
y y y y

TAG TAG TAG TAG

Compare Compare Compare Compare

4 to 1 MUX
hit/miss

1.3 A cache write is called write-through when information is passed both to the block in the
cache and to the block in the lower-level memory; when information is only written to the block,
it is called write-back. Write-back is the fastest of the two as it occurs at the speed of the cache
memory, while multiple writes within a block require only one write to the lower-level memory.

1.4 See ’Case Study Solutions’ at http://www.elsevierdirect.com/companion.jsp?ISBN=9780123704900

1.5 See ’Case Study Solutions’ at http://www.elsevierdirect.com/companion.jsp?ISBN=9780123704900

1.6 a) miss rate:

– non-blocking: don’t affect miss rate, the main thing happening is that the processor
makes other useful things while the miss is handled
– hardware prefetching with stream buffers: a hit in the stream buffer cancels the cache
request, ie the memory reference is not counted as a miss, which means that the miss
rate will decrease.
– software prefetching: if correctly done miss rate will decrease

b) memory bandwidth:

– non-blocking: since the processor will have fewer stall cycles it will get a lower CPI
and consequently the requirements on memory bandwidth will increase

20
– hardware prefetch and software prefetch: prefetching is a form of speculation which
means that some of the memory traffic is unused which in turn might increase the
need for memory bandwidth

c) number of executed instructions: will be unchanged for non-blocking and hardware prefetch.
Software prefetch will add the prefetch–instructions, so number of executed instructions
will increase.

1.7 a) The merging write buffer links the CPU to the write-back L2 cache. Two CPU writes
cannot merge if they are to different sets in L2. So, for each new entry into the buffer a
quick check on only those address bits that determine the L2 set number need be performed
at first. If there is no match in this ‘screening’ test, then the new entry is not merged. If
there is a set number match, then all address bits can be checked for a definitive result.

b) As the associativity of L2 increases, the rate of false positive matches from the simplified
check will increase, reducing performance.

1.8 The three C’s model sorts the causes for cache misses into three categories:

• Compulsory – The very first access can never be in cache and is therefore bound to generate
a miss;

• Capacity – If the cache cannot contain all the blocks needed for a program, capacity misses
may occur;

• Conflict – If the block placement strategy is set-associative or direct-mapped, conflict

misses will occur because a block can be discarded and later retrieved if too many blocks
map to its set.

1.9 The three C’s give insight into the cause of misses, but this simple model has its limits; it
gives you insight into average behavior but may not explain an individual miss. For example,
changing cache size changes conflict misses as well as capacity misses, since a larger cache spreads
out references to more blocks. Thus, a miss might move from a capacity miss to a conflict miss
as cache size changes. Note that the three C’s also ignore replacement policy, since it is difficult
to model and since, in general, it is less significant. In specific circumstances the replacement
policy can actually lead to anomalous behavior, such as poorer miss rates for larger associativity,
which is contradictory to the three C’s model.

8.2 Memory systems, Virtual Memory

2.1 a) Program basic blocks are often short (less than 10 instructions). Even program run
blocks, sequences of instructions executed between branches, are not very long. Pre-
fetching obtains the next sequential block, but program execution does not continue to
follow location PC, PC+4, PC+8, ...., for very long. So as blocks get larger the probability
that a program will not execute all instructions in the block, but rather take a branch
to another instruction address, increases. Pre-fetching instructions benefits performance
when the program continues straight-line execution into the next block. So as instruction
cache blocks increase in size, prefetching becomes less attractive.

b) Data structures often comprise lengthy sequences of memory addresses. Program access
of a data structure often takes the form of a sequential sweep. Large data blocks work well

21
with such access patterns; prefetching is likely still of value due to the highly sequential
access patterns. The efficiency of data pre-fetch can be enhanced through a suitable
grouping of the data items taking the block limitations into account. This is especially
noteworthy when the data-structure exceeds the cache size. Under such circumstances, it
will become of critical importance to limit the amount of out-of-cache block references.

2.2 a) We can expect software to be slower due to the overhead of a context switch to the
handler code, but the sophistication of the replacement algorithm can be higher for soft-
ware and a wider variety of virtual memory organizations can be readily accommodated.
Hardware should be faster, but less flexible.
b) Factors other than whether miss handling is done in software or hardware can quickly
dominate handling time. Is the page table itself paged? Can software implement a more
efficient page table search algorithm than hardware? What about hardware TLB entry
pre-fetching?
c) Page table structures that change dynamically would be difficult to handle in hardware
but possible in software.

2.3 a) – As miss penalty tends to be severe, one usually decides for a complex placement
strategy. Usually one takes for full association.
– To reduce address translation time, a cache is added to remember the most likely
translations, the Translation Lookaside Buffer.
– Almost all operating systems rely on a replacement of the least-recently used (LRU)
block indicated by a reference bit, which is logically set whenever a page is addressed.
– Since the cost of an unnecessary access to the next-lower level is high, one usually
includes a dirty bit. It allows blocks to be written to lower memory only if they have
been altered since reading.
b) The main purpose of the TLB is to accelerate the address translation for reading/writing
virtual memory. A TLB entry holds a portion of the virtual address, a physical page
frame number, a protection field, a valid bit, a use bit and a dirty bit. The latter two
is not always used. The size of the page table is inversely proportional to the page size;
choosing a large page size allows larger caches with fast cache hit times with a small TLB.
A small page size conserves storage, limiting the amount of internal fragmentation. Their
combined effect can be seen in process start-up time, where a large page size lengthens
invocation time but shortens page renewal times. Hence, the balance goes for large pages
in large computers and vice-versa.
TLB 1st-level cache 2nd-level cache Virtual memory
Block size (in bytes) 4-32 16-256 1-4k 4096-65,536
c) Block placement Full associative 2/4-way set 8/16-way set Direct mapped
associative associative
Overall size 32-8,192b 1 MB 2-16MB 32 MB – 1 TB

2.4 Out-of-order (OOO) execution will change both the timing of and sequence of cache access
with respect to that of in-order execution. Some specific differences and their effect on what
cache design is most desirable are explored in the following.
Because OOO reduces data hazard stalls, the pace of cache access, both to instructions and
data, will be higher than if execution were in order. Thus, the pipeline demand for available

22
cache bandwidth is higher with OOO. This affects cache design in areas such as block size, write
policy, and pre-fetching.
Block size has a strong effect on the delivered bandwidth between the cache and the next
lower level in the memory hierarchy. A write-through write policy requires more bandwidth
to the next lower memory level than does write back, generally, and use of a dirty bit further
reduces the bandwidth demand of a write-back policy. Pre-fetching increases the bandwidth
demand. Each of these cache design parameters – block size, write policy, and pre-fetching –
is in competition with the pipeline for cache bandwidth, and OOO increases the competition.
Cache design should adapt for this shift in bandwidth demand toward the pipeline.
Cache accesses for data and, because of exception, instructions occur during execution. OOO
execution will change the sequence of these accesses and may also change their pacing.
A change in sequence will interact with the cache replacement policy. Thus, a particular
cache and replacement policy that performs well on a chosen application when execution of the
superscalar pipeline is in order may perform differently – even quite differently – when execution
is OOO.
If there are multiple functional units for memory access, then OOO execution may allows
bunching multiple accesses into the same clock cycle. Thus, the instantaneous or peak memory
access bandwidth from the execution portion of the superscalar can be higher with OOO.
Imprecise exceptions are another cause of change in the sequence of memory accesses from
that of in-order execution. With OOO some instructions from earlier in the program order may
not have made their memory accesses, if any, at the time of the exception. Such accesses may
become interleaved with instruction and data accesses of the exception-handling code. This
increases the opportunity for capacity and conflict misses. So a cache design with size and/or
associativity to deliver lower numbers of capacity and conflict misses may be needed to meet
the demands of OOO.

2.5 First, we use the miss penalty and miss rate information to compute the contribution to
CPI from cache misses for each configuration. We do this with the formula:

Cache CP I = M isses per instruction ∗ M iss P enalty

We need to compute the miss penalties for each system:

M emory Access T ime
M iss P enalty =
Clock Cycle
The clock cycle times for the processors are 250 ps, 200 ps, and 400 ps, respectively. Hence, the
miss penalties are
50 ns
1: = 200 cycles
250 ps
50 ns
2: = 250 cycles
200 ps
0.75 ∗ 50 ns
3: = 94 cycles
400 ps
Applying this for each cache:
CP I1 = 0.005 ∗ 200 = 1.0
CP I2 = 0.0055 ∗ 250 = 1.4
CP I3 = 0.01 ∗ 94 = 0.94

23
We know the pipeline CPI contribution for everything but processor 3; its pipeline CPI is given
by
1
P ipeline CP I = 1/Issue rate = = 1/4.5 = 0.22
9 ∗ 0.5
Now we find the CPI for each processor by adding the pipeline and cache CPI contributions:

1 : 0.8 + 1.0 = 1.8

2 : 1.0 + 1.4 = 2.4

3 : 0.22 + 0.94 = 1.16

Since this is the same architecture, we can compare instruction execution rates in millions of
instructions per second (MIPS) to determine relative performance CR / CPI as

4000 M Hz
1: = 2222 M IP S
1.8

5000 M Hz
2: = 2083 M IP S
2.4
2500 M Hz
3: = 2155 M IP S
1.16
In this example, the simple two-issue static superscalar looks best. In practice, performance
depends on both the CPI and clock rate assumption.

2.6 a) For:

– Size of the page table is inversely proportional to the page size

– Larger page sizes allow larger caches using a virtually indexed, physically tagged
direct mapped cache.
– The number of TLB entries are restricted, larger page size means more memory
mapped efficiently.

Against: Larger pages lead more wasted storage due to internal fragmentation.

b) In a virtual memory system: Virtual address is a logical address space for a process. This
is translated by a combination of hardware and software into a physical address which
access main memory. This process is called memory mapping. The virtual address space
is divided into pages (blocks of memory). Page fault: an access to a page which is not in
physical memory. TLB, Translation Lookaside Buffer is a cache of address translations.
232
c) Page table takes 212
∗ 4 = 222 = 4 Mbyte

d) An inverted page table is like a fully associative cache where each page table entry contains
29
the physical address and, as tag, the virtual address. It takes 2212 ∗ (4 + 4) = 220 = 1 Mbyte

2.7 See lecture slides, HP Ch. 5 and App C.

24
8.3 Storage systems, I/O
3.1 See ’Case Study Solutions’ at http://www.elsevierdirect.com/companion.jsp?ISBN=9780123704900

3.2 See ’Case Study Solutions’ at http://www.elsevierdirect.com/companion.jsp?ISBN=9780123704900

3.3 See ’Case Study Solutions’ at http://www.elsevierdirect.com/companion.jsp?ISBN=9780123704900

3.4 See ’Case Study Solutions’ at http://www.elsevierdirect.com/companion.jsp?ISBN=9780123704900

3.5 See ’Case Study Solutions’ at http://www.elsevierdirect.com/companion.jsp?ISBN=9780123704900

3.6 See ’Case Study Solutions’ at http://www.elsevierdirect.com/companion.jsp?ISBN=9780123704900

3.7 See ’Case Study Solutions’ at http://www.elsevierdirect.com/companion.jsp?ISBN=9780123704900

8.4 Multiprocessors I
4.1 • Snooping

– Status for a block is stored in every cache that has a copy of the block.
– Sends all requests to all processors (broadcast).
– Caches monitor (snoop) the shared memory bus to update status and take actions.
– Popular with single shared memory.

• Directory based

– Status for a block is stored in one location (the directory).

– Messages used to update status.
– Scales better than snooping.
– Popular with distributed shared memory.

4.2 MIMD = Multiple Instruction stream, Multiple Data stream

multiprocessors - Symmetric shared memory Multiprocessors (SMP) with Uniform Memory
Access time (UMA) and bus interconnect.
SIMD = Single Instruction stream, Multiple Data stream
vector processors.

4.3 The problem illustrates the complexity of cache coherence protocols. In this case, this
could mean that the processor P1 evicted that cache block from its cache and immediately
requested the block in subsequent instructions. Given that the write-back message is longer
than the request message, with networks that allow out-of-order requests, the new request can
arrive before the write-back arrives at the directory. One solution to this problem would be
to have the directory wait for the write-back and then respond to the request. Alternatively,
the directory can send out a negative acknowledge (NACK). Note that these solutions need to
be thought out very carefully since they have the potential to lead to deadlocks based on the
particular implementation details of the system. Formal methods are often used to check for
races and deadlocks.

25
4.4 The question is to consider a design that is based on a mesh interconnect rather than on a
bus. The idea behind such a design is that local accesses will be faster than a pure shared-memory
approach since access to local memory does not need to go across a shared bus. Additionally,
the cost of a remote access will be a function of start-up time and number of hops across the
network rather than the time to acquire the bus.

a) The cost of local references is easy to compute. Local references require 40 clocks to detect
L2 miss, 66 clock to deliver the data, and 58 clocks to reload caches, yielding a total of
164 clocks.

b) For this part of the problem, we are asked to factor in the cost of references that need to
travel across the mesh network and to compute the average remote memory access time
(ARMAT). Since network clocks and processor clocks take different amounts of time, we
refer to network clocks as ‘nclk’ and to processor clocks as ‘pclk’. From the above we
already know that the cost on a shared-memory design is 164 pclks.

For the case of the distributed memory, we will make the simple assumption that each
remote reference must make on average 1.5 hops in the X-direction and 1.5 hops in the
Y-direction to get to its target node, because the result depends on how one measures
the average number of hops that a reference must make in the network. Hence the total
average distance is 3 hops.

For the complete request, the following times must be added together: the time for the
L2 miss to be recognized locally, the time for the address request to go across the network
(assume only 32 bits are needed for this message), the time for the remote memory to
respond (150 ns), the time for the data to return over the network (a 128-byte cache line),
and the time for the caches to be reloaded. The time across the network is based on the
number of hops and the size of the message. We are given that 2 bytes can be sent every
nclk, and so the time through a switch is
Time To Send = (Number of Bytes ) / 2 bytes per nclk.

Putting this into an equation yields ARMAT = 40 pclks + 5 nclks + 2 nclks + (3 + 1

nclks) + 150 ns + 64 nclks + (3 + 1) nclks + 58 pclks = 1574 ns.

4.5 See Chapter 3 and 4 in the book.

4.6 a) To realize SMT, we need to have a per-thread renaming table, having separate PC
registers, and provide the capability for instructions from multiple threads to commit.

b) Informally, cache coherence means that a value read from the memory systems should
reflect the latest write to that same memory location. For an example of what happens
when cache coherence is missing, refer to the book, Figure 4.3 (page 206).

26
Request Action
A 4
B 6
C 13
D 7
E 1
F 2
c) The correct connections: G 8
H 12
I 5
J 14
K 11
L 9 or 3
M 3 or 9
N 10
Note: 3 and 9 is equivalent

8.5 Multiprocessors II
5.1 The general form for Amdahl’s Law (as shown on the inside front cover of this text) is

Speedup = f racExecution timeold Execution timenew

all that needs to be done to compute the formula for speedup in this multiprocessor case is to
derive the new execution time. The exercise states that for the portion of the original execution
time that can use i processors is given by F (i, p). If we let Execution timeold be 1, then the
relative time for the application on p processors is given by summing the times required for each
portion of the execution time that can be sped up using i processors, where i is between 1 and
p. This yields
p
X F (i, p)
Execution timenew =
i=1
i
Substituting this value for Execution timenew into the speedup equation makes Amdahl’s
Law a function of the available processors, p.

5.2 An obvious complication introduced by providing a valid bit per word is the need to match
not only the tag of the block but also the offset within the block when snooping the bus. This
is easy, involving just looking at a few more bits. In addition, however, the cache must be
changed to support write-back of partial cache blocks. When writing back a block, only those
words that are valid should be written to memory because the contents of invalid words are not
necessarily coherent with the system. Finally, given that the state machine of Figure 6.12 is
applied at each cache block, there must be a way to allow this diagram to apply when state can
be different from word to word within a block. The easiest way to do this would be to provide
the state information of the figure for each word in the block. Doing so would require much
more than one valid bit per word, though. Without replication of state information the only
solution is to change the coherence protocol slightly.

5.3 Executing the identical program on more than one processor improves system ability to
tolerate faults. The multiple processors can compare results and identify a faulty unit by its
mismatching results. Overall system availability is increased.

27
5.4 Analytical models can be used to derive high-level insight on the behavior of the system in a
very short time. Typically, the biggest challenge is in determining the values of the parameters.
In addition, while the results from an analytical model can give a good approximation of the
relative trends to expect, there may be significant errors in the absolute predictions.
Trace-driven simulations typically have better accuracy than analytical models, but need
greater time to produce results. The advantages are that this approach can be fairly accurate
when focusing on specific components of the system (e.g., cache system, memory system, etc.).
However, this method does not model the impact of aggressive processors (mispredicted path)
and may not model the actual order of accesses with reordering. Traces can also be very large,
often taking gigabytes of storage, and determining sufficient trace length for trustworthy results
is important. It is also hard to generate representative traces from one class of machines that will
be valid for all the classes of simulated machines. It is also harder to model synchronization on
these systems without abstracting the synchronization in the traces to their high-level primitives.
Execution-driven simulation models all the system components in detail and is consequently
the most accurate of the three approaches. However, its speed of simulation is much slower than
that of the other models. In some cases, the extra detail may not be necessary for the particular
design parameter of interest.

8.6 Old exam 2003-12-17

6.1 Solution sketches
Version 1.0 2003-12-19

Problem 1

Describe the following items/concepts concerning computer architecture: (10)

a) Optimizing compilers generate a control-flow graph with a number of nodes, including u

and v. If all paths from the start to v include u then u dominates v.

b) Straight line code sequence with no branches in except at entry and no branches out except
at the exit.

c) A page table that uses hashing techniques to reduce the size of the page table so that the
length is equal to the number of physical pages in memory.

d) Misses arising from communication of data through cache coherence mechanism.

e) A set of physical registers holds both architecturally visible register as well as temporary
data. During instruction issue architectural registers are mapped to physical registers.
Register renaming is used to get rid of WAR and WAW hazards.

f) An instruction j is data dependent on instruction i, if i produces a result that may be used

by j.

g) An attempt to predict which block the next cache access will go to. It allows you to early
set up the multiplexor that selects cache block.

h) A cache that holds both instructions and data.

28
i) Sequential consistency requires that the result of any execution be the same as if the
memory accesses executed by each processor were kept in program order.

j) GPR have only explicit operands, either memory locations or registers, as opposed to
implicit operands like stack top or accumulator.

Problem 2

a) In real life bigger memory is slower and faster memory is more expensive. We want to
simultaneously increase the speed and decrease the cost. Speed is important because the
widening performance gap between CPU and memory. Size is important since applica-
tions and data sets are growing bigger. Use several types of memory with varying speeds
arranged in a hierarchy that is optimized with respect to the use of memory. Mapping
functions provide address translations between levels.

Registers: internal ultrafast memory for CPU; static register

Cache: speed up memory access; SRAM
Main memory: DRAM
VM: make memory larger, disk; safe sharing between processes of physical memory, pro-
tection, relocation
(archival storage, backup on tape)

b) The instruction is used to implement synchronization in a multiprocessor. A MIPS-like

pipeline assumes that an instruction only reads or writes memory, not both, and it’s not
easily modified to do that.

Problem 3

T
a) Speedup = unpipelined
max(T
pipestage )+Tlatch
noof stages
b) Speedup = 1+bf ∗bp

29
c) 1: MOV R3, R7
2: LD R8,(R3)
3: ADDDI R3, R3, 4
4: LD R9, (R3)
5: BNE R8, R9, Loop

WAW: 1,3 RAW: 1,2; 1,3; 2,5; 3,4; 4,5 WAR: 2,3;

Problem 4

# accesses ∗ Miss rate ∗ Miss penalty) ∗ T

Texe = IC ∗ (CPIbase + instruction C
Miss penalty = (1 + F ragment dirty) ∗ CP s per block transf er
CP s per block transf er = int( setup transf er
T ) + 8 ∗ (int( T ))
C C
A. Celeron B. P4 2.4 C. P4 3.0
Clock 2.4000 2.4000 3.0000
# accesses 1.3920 1.3920 1.3920 (1+0.258+0.134)
instruction
Miss rate 0.0321 0.0075 0.0075
fragment dirty 0.3000 0.3500 0.3500
transfer ns 4.0000 4.0000 4.0000
setup ns 50.0000 50.0000 50.0000
setup clocks 121.0000 121.0000 151.0000 int( setup
T )
C
transfer clocks 80.0000 80.0000 104.0000 8 ∗ (int( transf
T
er
))
C
CPI base 1.1000 1.1000 1.1000
CPI cache 11.6757 2.8329 3.5940 # accesses ∗ Miss rate ∗ Miss penalty)
( instruction
Texe
IC 5.3232 1.6387 1.5647
Price 650 1495 2595
P rice
Price/performance 3460.0909 2449.8652 4060.2841
(1/( exe
T
IC
))

Problem 5

a) – Snooping
∗ Status for a block is stored in every cache that has a copy of the block.
∗ Send all requests to all processors (broadcast)
∗ Caches monitor (snoop) the shared memory bus to update status and take ac-
tions.
∗ Popular with single shared memory.
– Directory based
∗ Status for a block is stored in one location (the directory).
∗ Messages used to update status.
∗ Scales better than snooping
∗ Popular with distributed shared memory.

30
b) SIMD = (Single Instruction stream, Multiple Data stream)
vector processors
MIMD = (Multiple Instruction stream, Multiple Data stream)
multiprocessors - Symmetric shared memory Multiprocessors (SMP) with Uniform Mem-
ory Access time (UMA) and bus interconnect.

c) – CPU performance increases 50% to 100% / year

– I/O system performance limited by mechanical delays
– Amdahl’s law: system speedup limited by the slowest component:
∗ Assume 10% I/O
∗ CPU relative speedup = 10 =⇒ System speedup = 5
∗ CPU relative speedup = 100 =⇒ System speedup = 10

CEN468 Lab 3 V2
No ratings yet
CEN468 Lab 3 V2
14 pages
840D - INC Ing Tool SinuCom
100% (1)
840D - INC Ing Tool SinuCom
102 pages
Px4x 92LE TM EN 3.2 PDF
No ratings yet
Px4x 92LE TM EN 3.2 PDF
174 pages
OMRON To Vijeo Citect
No ratings yet
OMRON To Vijeo Citect
6 pages
Ovn Main
No ratings yet
Ovn Main
54 pages
final.w11.txt
No ratings yet
final.w11.txt
10 pages
Micro
No ratings yet
Micro
4 pages
EC355TBF_CA_ 2022 scheme_V Sem_MQP
No ratings yet
EC355TBF_CA_ 2022 scheme_V Sem_MQP
4 pages
Practice Set PCC Csd301 - PCC Csm301
No ratings yet
Practice Set PCC Csd301 - PCC Csm301
12 pages
Sdca Course Info
No ratings yet
Sdca Course Info
5 pages
CS 230 Final Review
No ratings yet
CS 230 Final Review
7 pages
Fall 2008 PHD Qualifier Exam: Computer Architecture Area
No ratings yet
Fall 2008 PHD Qualifier Exam: Computer Architecture Area
10 pages
Destinationsof Benims Projems
No ratings yet
Destinationsof Benims Projems
6 pages
Architecture
No ratings yet
Architecture
21 pages
Solutions COA7e 1
No ratings yet
Solutions COA7e 1
92 pages
FinalExamGuide
No ratings yet
FinalExamGuide
5 pages
Lecture-1-02.01.2025
No ratings yet
Lecture-1-02.01.2025
18 pages
Unit III and Unit IV - Question Bank With Answers
No ratings yet
Unit III and Unit IV - Question Bank With Answers
5 pages
Cat2 b1 Cao
No ratings yet
Cat2 b1 Cao
7 pages
Architecture Question Bank
No ratings yet
Architecture Question Bank
5 pages
Lec 22
No ratings yet
Lec 22
14 pages
CSE 378 - Machine Organization & Assembly Language - Winter 2009 HW #4
No ratings yet
CSE 378 - Machine Organization & Assembly Language - Winter 2009 HW #4
11 pages
Computer Organization: Sandeep Kumar
No ratings yet
Computer Organization: Sandeep Kumar
117 pages
Time: 03 Hours Maximum Marks: 100: (Autonomous Institution Under VTU) VI Semester B. E. Examinations, May/Jun 14
No ratings yet
Time: 03 Hours Maximum Marks: 100: (Autonomous Institution Under VTU) VI Semester B. E. Examinations, May/Jun 14
3 pages
Compre_23
No ratings yet
Compre_23
3 pages
Computer Architecture Midterm
No ratings yet
Computer Architecture Midterm
4 pages
Systemverilog QP
No ratings yet
Systemverilog QP
27 pages
Syllabus 2paper
No ratings yet
Syllabus 2paper
5 pages
OS_Topics_2024-1
No ratings yet
OS_Topics_2024-1
5 pages
Memory 2
No ratings yet
Memory 2
31 pages
BFE Final Organization Fall 2014 Answer
No ratings yet
BFE Final Organization Fall 2014 Answer
8 pages
9-Memory Design (Module4) - 18-Dec-2019Material - I - 18-Dec-2019 - Module - 4A - Memory - Design
No ratings yet
9-Memory Design (Module4) - 18-Dec-2019Material - I - 18-Dec-2019 - Module - 4A - Memory - Design
134 pages
Onur 447 Spring15 Lecture17 Memoryhierarchyandcaches Afterlecture
No ratings yet
Onur 447 Spring15 Lecture17 Memoryhierarchyandcaches Afterlecture
51 pages
OS CIAT 2 Ans Key
No ratings yet
OS CIAT 2 Ans Key
13 pages
Questions On Chapter 1 and 2 Color New V2
No ratings yet
Questions On Chapter 1 and 2 Color New V2
8 pages
Set No. 1
No ratings yet
Set No. 1
7 pages
Crack Sheet COA
No ratings yet
Crack Sheet COA
5 pages
Co Endsem
No ratings yet
Co Endsem
2 pages
CS211 Exam PDF
No ratings yet
CS211 Exam PDF
8 pages
Chapter 2 Part 2
No ratings yet
Chapter 2 Part 2
18 pages
COA Answers
No ratings yet
COA Answers
5 pages
Document 90
No ratings yet
Document 90
12 pages
Test 2 Study Guide
No ratings yet
Test 2 Study Guide
2 pages
CompArch_Most_Important_Questions
No ratings yet
CompArch_Most_Important_Questions
12 pages
Lab Syllabus
No ratings yet
Lab Syllabus
21 pages
ELEC 4601 Sample-Questions
No ratings yet
ELEC 4601 Sample-Questions
12 pages
Large List of Intel Interview Questions
No ratings yet
Large List of Intel Interview Questions
4 pages
COE308 HM 1 Sol
No ratings yet
COE308 HM 1 Sol
3 pages
BSC 2020 21 Update Proposal
No ratings yet
BSC 2020 21 Update Proposal
11 pages
Lecture 07 - Memory Management
No ratings yet
Lecture 07 - Memory Management
87 pages
LD and CO Module 3
No ratings yet
LD and CO Module 3
74 pages
Solutions Ch4
No ratings yet
Solutions Ch4
7 pages
Cse 4293 S21
No ratings yet
Cse 4293 S21
3 pages
2013 Computer Architecture: CS/B.Tech/CSE/NEW/SEM-4/CS-403/2013
No ratings yet
2013 Computer Architecture: CS/B.Tech/CSE/NEW/SEM-4/CS-403/2013
7 pages
Intel Interview Questions
No ratings yet
Intel Interview Questions
11 pages
CA Lecture 12
No ratings yet
CA Lecture 12
48 pages
2CP05 2018
No ratings yet
2CP05 2018
2 pages
Subject Code:cs 46 Test #:1
No ratings yet
Subject Code:cs 46 Test #:1
12 pages
CS2253 - Coa - Univ QNS
No ratings yet
CS2253 - Coa - Univ QNS
6 pages
OR Explain Inclusion, Coherence and Locality Properties (8 Marks)
No ratings yet
OR Explain Inclusion, Coherence and Locality Properties (8 Marks)
2 pages
nguye865-hwk1
No ratings yet
nguye865-hwk1
13 pages
Mastering Python Advanced Concepts and Practical Applications
From Everand
Mastering Python Advanced Concepts and Practical Applications
Aissa Younes
No ratings yet
Advanced Multiplayer Game Development with Ureal Engine 5: A Comprehensive Guide to C++ Scripting
From Everand
Advanced Multiplayer Game Development with Ureal Engine 5: A Comprehensive Guide to C++ Scripting
Vladimir Kiselev
No ratings yet
F1 Book 1
No ratings yet
F1 Book 1
3 pages
ESXi Host Lifecycle
No ratings yet
ESXi Host Lifecycle
71 pages
Alternative System-Building Approaches
No ratings yet
Alternative System-Building Approaches
14 pages
TCL Script: Gen - Netlist - TCL:: TCLSH Auto - Synth - TCL Mul - 8 - Bit.v Mul - 8 - Bit - TB.V
No ratings yet
TCL Script: Gen - Netlist - TCL:: TCLSH Auto - Synth - TCL Mul - 8 - Bit.v Mul - 8 - Bit - TB.V
17 pages
A Stable Adaptive Fuzzy Sliding-Mode Control For Affine Nonlinear Systems With Application To Four-Bar Linkage Systems
No ratings yet
A Stable Adaptive Fuzzy Sliding-Mode Control For Affine Nonlinear Systems With Application To Four-Bar Linkage Systems
15 pages
(Ebook) Programming in Ada 2012 by Barnes, John Gilbert Presslie ISBN 9781107424814, 110742481X - Experience the full ebook by downloading it now
100% (1)
(Ebook) Programming in Ada 2012 by Barnes, John Gilbert Presslie ISBN 9781107424814, 110742481X - Experience the full ebook by downloading it now
55 pages
Getting Started With LabVIEW Robotics
No ratings yet
Getting Started With LabVIEW Robotics
18 pages
4.1 Fuzzy Logic Architecture and Set Theory
No ratings yet
4.1 Fuzzy Logic Architecture and Set Theory
16 pages
VPN US2 Datasheet: Service Configuration Sheet
No ratings yet
VPN US2 Datasheet: Service Configuration Sheet
1 page
Format Laptop
No ratings yet
Format Laptop
2 pages
Emergency Telecommunications Handbook PDF
100% (2)
Emergency Telecommunications Handbook PDF
248 pages
Ebs Ecc Quick Start Guide PDF
No ratings yet
Ebs Ecc Quick Start Guide PDF
4 pages
1 Larson Sol
No ratings yet
1 Larson Sol
134 pages
Interface Multiple DS18B20s With ESP32 & Display Values On Web Server
100% (1)
Interface Multiple DS18B20s With ESP32 & Display Values On Web Server
16 pages
Uber - 6 Months
No ratings yet
Uber - 6 Months
9 pages
SQM 1
No ratings yet
SQM 1
44 pages
Auditing Routers, Switches, and Firewalls
0% (1)
Auditing Routers, Switches, and Firewalls
14 pages
Experiment - 09: Aim: Tools: Methodology: Theory: Ring Counter
No ratings yet
Experiment - 09: Aim: Tools: Methodology: Theory: Ring Counter
6 pages
SAP FICO - AP - User Manual Overview
No ratings yet
SAP FICO - AP - User Manual Overview
59 pages
Design Patterns: Structural Pattern - Bridge
No ratings yet
Design Patterns: Structural Pattern - Bridge
18 pages
kriyaakramadyotikaaDEV PDF
No ratings yet
kriyaakramadyotikaaDEV PDF
466 pages
Conditionals and Logic in PHP
No ratings yet
Conditionals and Logic in PHP
6 pages
Codigo Latex Cv. Jordan
No ratings yet
Codigo Latex Cv. Jordan
7 pages
(Kevan Hall) Speed Lead Faster, Simpler Ways To M (BookFi) PDF
No ratings yet
(Kevan Hall) Speed Lead Faster, Simpler Ways To M (BookFi) PDF
224 pages
Dsa 5600 - Fresa CNC
50% (2)
Dsa 5600 - Fresa CNC
12 pages
Course Module ASIC Verification
No ratings yet
Course Module ASIC Verification
6 pages
Oprostajni Govor
No ratings yet
Oprostajni Govor
4 pages