100% found this document useful (28 votes)
95 views

Computer Organization and Architecture Themes and Variations 1st Edition Alan Clements Solutions Manual all chapter instant download

The document provides a comprehensive guide on the architecture of a hypothetical computer, detailing the sequence of actions required to execute various instructions using microprogramming. It includes specific examples of operations such as addition, fetch cycles, and different addressing modes, as well as solutions to implement these operations. Additionally, it highlights the inefficiencies in the architecture and instruction set due to limited bus connections.

Uploaded by

atindiels
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (28 votes)
95 views

Computer Organization and Architecture Themes and Variations 1st Edition Alan Clements Solutions Manual all chapter instant download

The document provides a comprehensive guide on the architecture of a hypothetical computer, detailing the sequence of actions required to execute various instructions using microprogramming. It includes specific examples of operations such as addition, fetch cycles, and different addressing modes, as well as solutions to implement these operations. Additionally, it highlights the inefficiencies in the architecture and instruction set due to limited bus connections.

Uploaded by

atindiels
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 48

Full Download Test Bank Get the Latest Study Materials at testbankfan.

com

Computer Organization and Architecture Themes and


Variations 1st Edition Alan Clements Solutions
Manual

https://testbankfan.com/product/computer-organization-and-
architecture-themes-and-variations-1st-edition-alan-
clements-solutions-manual/

OR CLICK HERE

DOWLOAD NOW

Download More Test Banks for All Subjects at https://testbankfan.com


Recommended digital products (PDF, EPUB, MOBI) that
you can download immediately if you are interested.

Computer Organization and Architecture 10th Edition


Stallings Solutions Manual

https://testbankfan.com/product/computer-organization-and-
architecture-10th-edition-stallings-solutions-manual/

testbankfan.com

Computer Organization and Architecture 9th Edition William


Stallings Solutions Manual

https://testbankfan.com/product/computer-organization-and-
architecture-9th-edition-william-stallings-solutions-manual/

testbankfan.com

Computer Organization and Architecture 10th Edition


Stallings Test Bank

https://testbankfan.com/product/computer-organization-and-
architecture-10th-edition-stallings-test-bank/

testbankfan.com

International Business The Challenges of Globalization


Global 7th Edition Wild Test Bank

https://testbankfan.com/product/international-business-the-challenges-
of-globalization-global-7th-edition-wild-test-bank/

testbankfan.com
Interpersonal Communication Relating to Others 8th Edition
Beebe Test Bank

https://testbankfan.com/product/interpersonal-communication-relating-
to-others-8th-edition-beebe-test-bank/

testbankfan.com

Calculus for Business Economics Life Sciences and Social


Sciences 13th Edition Barnett Solutions Manual

https://testbankfan.com/product/calculus-for-business-economics-life-
sciences-and-social-sciences-13th-edition-barnett-solutions-manual/

testbankfan.com

Practicing Financial Planning For Professionals and CFP


Aspirants 12th Edition Mittra Test Bank

https://testbankfan.com/product/practicing-financial-planning-for-
professionals-and-cfp-aspirants-12th-edition-mittra-test-bank/

testbankfan.com

Essentials of Statistics for The Behavioral Sciences 9th


Edition Gravetter Test Bank

https://testbankfan.com/product/essentials-of-statistics-for-the-
behavioral-sciences-9th-edition-gravetter-test-bank/

testbankfan.com

Technology In Action Complete 9th Edition Evans Solutions


Manual

https://testbankfan.com/product/technology-in-action-complete-9th-
edition-evans-solutions-manual/

testbankfan.com
Varcarolis Canadian Psychiatric Mental Health Nursing 1st
Edition Halter Test Bank

https://testbankfan.com/product/varcarolis-canadian-psychiatric-
mental-health-nursing-1st-edition-halter-test-bank/

testbankfan.com
Chapter 7: Processor Control
1. For the microprogrammed architecture of Figure P7.1, give the sequence of actions required to implement the
instruction ADD D0, D1 which is defined in RTL as [D1] ← [D1] + [D0].

Abus GMSR Bbus


Read Data out
Write
Main store EMSR The memory performs
GMSW
Data in a read when Read = 1
Address and a write when Write = 1
EMSW
CMAR
MAR

CMBR GMBR
MBR
EMBR
CIR GIR
IR
EIR
CPC GPC
PC
EPC
CD0 GD0
D0
ED0
CD1 GD1
D1
ED1
CL1

ALU P Latch 1
F f(P,Q)
Q Latch 2
Function select

CL2
F2 F1 F0

Figure P7.1 Architecture of a hypothetical computer

You should describe the actions that occur in plain English (e.g., “Put data from this register on that bus”) and as a
sequence of events (e.g., Read = 1, EMSR). The table below defines the effect of the ALU’s function code. Note that
all data has to pass through the ALU (the copy function) to get from bus B or bus C to bus A.

F2 F1 F0 Operation
0 0 0 Copy P to bus A A=P
0 0 1 Copy Q to bus A A=Q
0 1 0 Copy P + 1 to bus A A=P+1
0 1 1 Copy Q + 1 to bus A A=Q+1
1 0 0 Copy P ‐ 1 to bus A A=P–1
1 0 1 Copy Q ‐ 1 to bus A A=Q–1
1 1 0 Copy bus P + Q to bus A A=P+Q
1 1 1 Copy bus P ‐ Q to bus A A=P–Q

109
© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly available website, in whole or in part.
SOLUTION

To perform the addition D1 must be latched into an ALU latch, D2 latched into an ALU latch, the ALU set to add
and the result latched into D1. That is,

ED0 = 1, CL1 ;we can do D0 or D1 in any order and we can use latch L1 or latch L2
ED1 = 1, CL2 ;copy D1 via bus B into latch 2
ALU(f2,f1,f0) = 1,1,0, CD1 ;perform addition and latch result in D1.

2. For the architecture of Figure P7.1 write the sequence of signals and control actions necessary to implement the
fetch cycle.

SOLUTION

The fetch cycle involves reading the data at the address in the PC, moving the instruction read from memory to
the IR, and updating the PC.
EPC = 1, CL1 ;move PC via B bus to latch 1
ALU(f2,f1,f0) = 0,0,0, CMAR ;pass PC through ALU and clock into MAR
;the PC is in L1 so we can increment it
ALU(f2,f1,f0) = 0,1,0, CPC ;use the ALU to increment L1 and move to PC
Read = 1, EMSR = 1, CL1 ;move instruction from memory to latch 1 via B bus
ALU(f2,f1,f0) = 0,0,0, CIR ;pass instruction through ALU and clock into IR

3. Why is the structure of Figure P7.1 so inefficient?

SOLUTION

Because there is only one bus to the ALU input and no direct connection between the B and A bus. This means
that all data has to go through the ALU, which becomes a bottleneck.

4. Why is the ALU instruction set of Figure P7.1 so inefficient?

SOLUTION

Because three of the operations are repeated. Since there is only one B bus input to the ALU via latch L1 or L2, it
does not matter whether data is passed from bus B to bus A via L1 or L2.

5. For the architecture of Figure P7.1, write the sequence of signals and control actions necessary to execute the
instruction ADD M,D0 that adds the contents of memory location M to data register D0 and deposits the results
in D0. Assume that the address M is in the instruction register IR.

SOLUTION

This instruction requires a memory read followed by an addition.

EIR = 1, CL1 ;move IR (i.e., address) via B bus to latch 1


ALU(f2,f1,f0) = 0,0,0, CMAR ;pass IR through ALU and clock into MAR
Read = 1, EMSR = 1, CL1 ;move data from memory to latch 1 via B bus
ED0 = 1, CL2 ;move D0 via B bus to latch 2 via B bus
ALU(f2,f1,f0) = 1,1,0, CD0 ;perform addition and clock result into D0

110
© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly available website, in whole or in part.
6. This question asks you to implement register indirect addressing. For the architecture of Figure P7.1, write the
sequence of signals and control actions necessary to execute the instruction ADD (D1),D0 that adds the
contents of the memory location pointed at by the contents of register D1 to register D0, and deposits the results
in D0. This instruction is defined in RTL form as[D0] ← [[D1]] + [D0].

SOLUTION

Here, we have to read the contents of a register, use it as an address, and read from memory.

ED1 = 1, CL1 ;move D1 (i.e., address) via B bus to latch 1


ALU(f2,f1,f0) = 0,0,0, CMAR ;pass D1 (the pointer) through ALU and clock into MAR
Read = 1, EMSR = 1, CL1 ;move data from memory to latch 1 via B bus (this is the actual data)
ED0 = 1, CL2 ;move D0 via B bus to latch 2 via B bus
ALU(f2,f1,f0) = 1,1,0, CD0 ;perform addition and clock result into D0

7. This question asks you to implement memory indirect addressing. For the architecture of Figure P7.1, write the
sequence of signals and control actions necessary to execute the instruction ADD [M],D0 that adds the
contents of the memory location pointed at by the contents memory location M to register D0, and deposits the
results in D0. This instruction is defined in RTL form as[D0] ← [[M]] + [D0].

SOLUTION

We have to read the contents of a memory location, use it as an address, and read from memory. We can begin
with the same code we used for ADD M,D0.

EIR = 1, CL1 ;move IR (i.e., address) via B bus to latch 1


ALU(f2,f1,f0) = 0,0,0, CMAR ;pass IR through ALU and clock into MAR
Read = 1, EMSR = 1, CL1 ;move data from memory to latch 1 via B bus (this is a pointer)
ALU(f2,f1,f0) = 0,0,0, CMAR ;pass the pointer through ALU and clock into MAR
Read = 1, EMSR = 1, CL1 ;move data from memory to latch 1 via B bus (this is the data)
ED0 = 1, CL2 ;move D0 via B bus to latch 2 via B bus
ALU(f2,f1,f0) = 1,1,0, CD0 ;perform addition and clock result into D0

8. This question asks you to implement memory indirect addressing with index. For the architecture of Figure P7.1,
write the sequence of signals and control actions necessary to execute the instruction ADD [M,D1],D0, that
adds the contents of the memory location pointed at by the contents memory location M plus the contents of
register D1 to register D0, and deposits the results in D0. This instruction is defined in RTL form as[D0] ←
[[M]+[D1]] + [D0].

SOLUTION

We have to read the contents of a memory location, generate an address by adding this to a data register, and
then use the sum to get the actual data. We can begin with the same code we used for ADD [M],D0.

EIR = 1, CL1 ;move IR (i.e., address) via B bus to latch 1


ALU(f2,f1,f0) = 0,0,0, CMAR ;pass IR through ALU and clock into MAR
Read = 1, EMSR = 1, CL1 ;move data from memory to latch 1 via B bus (this is a pointer)
ED1 = 1, CL2 ;move D1 via B bus to latch 2
ALU(f2,f1,f0) = 1,1,0, CMAR ;perform addition to get the indexed address and clock result into MAR
Read = 1, EMSR = 1, CL1 ;move data from memory to latch 1 via B bus (this is the data)
ED0 = 1, CL2 ;move D0 via B bus to latch
ALU(f2,f1,f0) = 1,1,0, CD0 ;perform addition and clock result into D0

Note how microprogramming can implement any arbitrarily complex addressing mode.

111
© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly available website, in whole or in part.
9. For the microprogrammed architecture of Figure P7.1, define the sequence of actions (i.e., micro‐operations)
necessary to implement the instruction TXP1 (D0)+,D1 that is defined as:

[D1] ← 2*[M([D0])] + 1
[D0] ← [D0] + 1

Explain the actions in plain English and as a sequence of enables, ALU controls, memory controls and clocks. This
is quite a complex instruction because it requires a register‐indirect access to memory to get the operand and it
requires multiplication by two (there is no ALU multiplication instruction). You will probably have to use a
temporary register to solve this problem and you will find that it requires several cycles to implement this
instruction. A cycle is a sequence of operations that terminates in clocking data into a register.

SOLUTION

Now we have to perform quite a complex operation; that is, read from memory using a register indirect address.
The address is obtained by reading the data in the location pointed at by D0, multiplying this value by 2 and
adding 1. We have no multiplied or shifter, so we must add the number to itself.

ED0 = 1, CL1 ;move D0 via B bus to latch 1


ALU(f2,f1,f0) = 0,0,0, CMAR ;pass IR through ALU and clock into MAR
Read = 1, EMSR = 1, CL1, CL2 ;move data from memory to latch 1 and latch 2 via B bus
;note that we have a copy of D0 in L1 and L2
ALU(f2,f1,f0) = 1,1,0, CD1 ;perform addition to get 2[M[D0]] in D1 which we use as a temp register
ED1 = 1, CL1 ;move D1 via B bus to latch 1
ALU(f2,f1,f0) = 0,1,0, CMAR ;perform P + 1 in the ALU and clock address 2 × [M[D0]] + 1 into MAR
Read = 1, EMSR = 1, CL1 ;move data from memory to latch 1 via B bus (this is the final data)
ALU(f2,f1,f0) = 0,0,0, CD1 ;pass data through ALU and clock into D1
;now increment D0
ED0 = 1, CL1 ;move D0 via B bus to latch 1
ALU(f2,f1,f0) = 0,1,0, CD0 ;perform [D0] + 1 in the ALU and latch into D0

10. Why was microprogramming such a popular means of implementing control units in the 1980s?

SOLUTION

In the 1980s memory was horrendously expensive by comparison with the cost of memory today. Every byte was
precious. Consequently, complex instructions were created to do a lot of work per instruction. These instructions
were interpreted in microcode in the CPU. Today, memory is cheap and simple regular instructions are the order
of the day (i.e., RISC). However, some processors like the IA32 have legacy code (complex instructions), that is still
interpreted by means of microcode.

11. Why is microprogramming so unpopular today?

SOLUTION

Microcode is not generally used today in new processors because executing microcode involves too many data
paths in series. In particular, there are several ROM look‐up paths in series. First, it is necessary to look up the
instruction to decode it. Then you have to look up each microinstruction in the microinstruction memory. Today,
RISC‐like processors with 32‐bit instructions are encoded so that the instruction word itself is able to directly
generate the signals necessary to interpret the instruction in a single cycle. In other words, the machine itself has
become the new microcode.

112
© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly available website, in whole or in part.
12. Figure P7.12 from the text demonstrates the execution of a conditional branch instruction in a flow‐through
computer. The grayed out sections of the computer are not required by a conditional branch instruction. Can you
think of any way in which these unused elements of the computer could be used during the execution of a
conditional branch?
BRA Target

Register file
OpCode
PCaddress Data
S1address S1data S1data Memory
Memory
PC address Memory_MPLX
ALU_MPLX ALU Maddress
PCdata 0
S2address S2data 0 MPLX
BRA Target where S2data
MPLX Mdata_out 1
the target address is Instruction 1
[PC]+4+4*L Memory Daddress
Mdata_in
PC_MPLX Literal L Ddata
00
01
PC
Branch +
PC_adder
4

MPLX Jump Sign


10 extension Load data
11
32-bit branch 32-bit sign-extended
target address word offset

0
PC_MPLX
Z +
Branch_adder
Left shift x 2

control
The Z-bit from the CCR 32-bit sign-extended
controls the PC multiplexer. byte offset
It selects between next
address and branch address.

Figure P7.12 Architecture of a hypothetical computer

SOLUTION

In this example, the register file, ALU, and data memory are not in use. It begs an interesting question. Could a
branch be combined with another operation that could be performed in parallel (rather like the VLIW (very long
instruction word) computers that we look at in Chapter 8. For example, you could imagine an instruction BEQ
target: r0++ which performs a conditional branch to target and also increments register r0. Of course, the
price of such an extension would be to reduce the number of bits available for the target address.

13. What modifications would have to be made to the architecture of the computer in Figure P7.12 to implement
predicated execution like the ARM?

SOLUTION

The ARM predicates instructions; for example, ADDEQ r0,r1,r2. A predicated instruction is executed if the
stated condition is true. In this case ADDEQ r0,r1,r2 is executed if the Z‐bit of the status is true. One way of
implementing predicated execution would be to take a NOP (no operation) instruction that is jammed into the
instruction register if the predicated condition is false. Another solution would be to put AND gates in all paths
that generate signals that clock or update registers and status values. If the predicated condition is false, all
signals that perform an update are negated and the state of the processor does not change.

14. What modifications would have to be added to the computer of Figure P7.12 to add a conditional move
instruction with the format MOVZ r1,r2,r3 that performs [r1] ← [r2] if [r3] == 0?

SOLUTION

The basic data movement can be implemented in the normal way using existing data paths from the register file,
through the ALU, the memory multiplexer, and back to the ALU. To implement the conditional action, register r3
must be routed to the ALU and compared with zero. The result of the comparison is used to determine whether a
writeback (i.e., writing r2 into r1) would take place in the next pipeline stage.

113
© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly available website, in whole or in part.
15. What modifications would have to be made to the architecture of the computer in Figure P7.12 to implement
operand shifting (as part of a normal instruction) like the ARM?

SOLUTION

As in the case of the ARM processor family, it would require a barrel shifter in one of the inputs to the ALU so that
the operand is shifted before use. The number of shifts to be performed could be taken from the op‐code (for
example, from the literal field). However, the existing structure could not implement an ARM‐like dynamic shift
ADD r0,r1,r2, lsl r3 , because the register file does not have three address inputs. In order to provide
dynamic shifts, it would be necessary to add an extra address and read the data port to the register file.

16. Derive an expression for the speedup ratio (i.e., the ratio of the execution time without pipelining to the
execution time with pipelining) of a pipelined processor in terms of the number of stages in the pipeline m and
the number of instructions to be executed N.

SOLUTION

Suppose that the number of instructions to be executed were N. It would take N clocks + m ‐ 1 to execute. The
factor (m ‐ 1) is due to the time for the last instruction to pass through the pipeline. The speedup relative to an
unpipelined system that would require N⋅m cycles (N instructions executed in n stages) is N⋅m/(N + m ‐ 1).

17. In what ways is the formula for the speedup of the pipeline derived in the previous question flawed?

SOLUTION

There are two flaws. The first is that the pipeline can be exploited fully only if the pipeline is continually supplied
with instructions. However, interactions between data elements, competition for resources, and branch
operations reduce the efficiency of a pipeline. These factors can introduce stall cycles (wait states for resources)
or force the pipeline to be flushed.

However, there is another factor to consider. In order to pipeline a process, it is necessary to place a register
between stages. The register has a setup and hold time which must be taken into account; that is, the pipeline
register increases the effective length of each stage.

18. A processor executes an instruction in the following six stages. The time required by each stage in picoseconds
(1,000 ps = 1 ns) is given for each stage.

IF instruction fetch 300 ps


ID Instruction decode 150 ps
OF Operand fetch 250 ps
OE Execute 350 ps
M Memory access 700 ps
OS Operand store (writeback) 200 ps

a. What is the time to execute an instruction if the processor is not pipelined?


b. What is the time taken to fully execute an instruction assuming that this structure is pipelined in six stages
and that there is an additional 20 ps per stage due to the pipeline latches?
c. Once the pipeline is full, what is the average instruction rate?
d. Suppose that 25% of instructions are branch instructions that are taken and cause a 3‐cycle penalty, what is
the effective instruction execute time?

114
© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly available website, in whole or in part.
SOLUTION

a. Add up the individual times: 300 + 150 + 250 + 350 + 700 + 200 = 1950ps = 1.950ns

b. The longest stage is 700 ps which determines the clock period. With 20 ps for the latches, the time is 720 × 6
= 4320 ps = 4.32 ns.

c. One instruction per clock; that is every 720 ps.

d. 75% of instructions are not taken branches and these take on average 0.75 × 720 ps = 540 ps. 25% are taken
branches that take 0.25 × 3 × 720 ps = 540 ps. The total time is 540 + 540 =1080 ps.

19. Both RISC and CISC processors have registers. Answer the following questions about registers.

a. Is it true that a larger number of registers in any architecture is always better than a smaller number?
b. What limits the number of registers that can be implemented by any ISA?
c. What are the relative advantages and disadvantages of dedicated registers like the IA32 architecture
compared to general purpose registers like ARM and MIPS?
d. If you have an m‐bit register select field in an instruction, you can’t have more than 2m registers. There are, in
fact, ways round this restriction. Suggest ways of increasing the number of registers beyond 2m while keeping
an m‐bit register select field.

SOLUTION

a. In principle yes, as long as you don’t have to pay a price for them. More registers means fewer memory
accesses and that is good. However, if you have to perform a context switch when you run a new task, having
to save a lot of registers may be too time‐consuming. Having more registers requires more bits in an
instruction to specify them. If you allocate too many bits to register specification then you have a more
limited instruction set.

b. Today, it’s the number of bits required to specify a register. A processor like the Itanium IA64 with a much
longer instruction word can specify more registers.

c. Having fixed special purpose registers permits more compressed code. For example, if you have a counter
register, any instruction using the counter doesn’t need to specify the register – because that is fixed. The
weakness is that you can’t have two counter registers. Computers that originated in the CISC area like the
IA32 architecture use special‐purpose registers, because they were designed when saving bits (reducing
instruction size) was important. Remember that early 8‐bit microprocessors had an 8‐bit instruction set.
More recent architectures are RISC based and have general‐purpose architectures. ARM processors are
unusual in the sense that they have a small general‐purpose register set that includes two special‐purpose
registers, a link register for return addresses and the program counter itself.

d. Of course, you can’t address more than 2m registers with an m‐bit address field. But you can use a set of more
than 2m registers of which only 2m are currently visible. Such a so‐called windowing technique has been used
in, for example, the Berkeley RISC and the SPARC processor. Essentially, every time you call a
subroutine/function you get a new set of register windows (these are still numbered r0 t0 r31). However,
each function has its own private registers that cannot be accessed from other functions. There are also
global registers common to all functions and parameter passing registers that are shared with parent and
child functions. Such mechanisms have not proved popular. The problem is that if you deeply nest
subroutines, you end up having to dump registers to memory.

115
© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly available website, in whole or in part.
20. Someone once said, “RISC is to hardware what UNIX is to software”. What do you think this statement means and
is it true?

SOLUTION

This is one of those pretentious statements that people make for effect. UNIX is the operating system loved by
many computer scientists and is often contrasted with operating systems from large commercial organizations
such as Microsoft. By analogy, RISC processors were once seen as an opportunity for small companies and
academics to develop hardware at a time when existing processors were being developed by large corporations at
considerable expense. Relatively small teams were required to design MIPS or the ARM processor compared to an
Intel IA32 processor. In that sense RISC/UNIX were seen as returning hardware/software to the masses. Over the
years, the distinction between RISC and CISC processors has become very blurred, even though computing world
is still, to some extent, divided into UNIX and Windows spheres.

21. What are the characteristics of a RISC processor that distinguish it from a CISC processor? Does it matter whether
this question is asked in 2015 or 1990?

SOLUTION

The classic distinction between RISC processors and CISC processors is that RISC processors are pipelined, and
have a small, simple, and highly regular instruction sets. RISC processors are also called load/store processors with
the only memory access operations being load and store. All data processing operations are register‐to‐register.
CISC processors tend to have irregular instruction sets, special purpose registers, complex instruction
interpretation hardware and memory to memory operations. However, the difference between modern RISC and
CISC processors is blurred and the distinction is no longer as significant as it was. RISC techniques have been
applied to CISC processors and even traditional complex instruction set processors are highly pipelined. Equally,
some RISC processors have quite complex instruction sets. One difference is that today’s RISC processors have not
returned to memory‐to‐memory or memory‐to‐register instruction formats.

22. What, in the context of pipelined processors, is a bubble and why is it detrimental to the performance of a
pipelined processor?

SOLUTION

As an instruction flows through a pipeline, various operations are applied to it. For example, in the first stage it is
fetched from memory and it may be decoded. In the second stage any operands it requires are read from the
register file, and so on. Sometimes, it is not possible to perform an operation on an instruction. For example, if an
operand is required and that operand is not ready, the stage processing the operand cannot continue. This results
in a bubble or a stall when ‘nothing happens’. Equally, bubbles appear when a branch is taken and instructions
following the branch are no longer going to be executed. So, a bubble is any condition that leads to a stage in the
pipeline not performing its normal operation because it cannot proceed. A bubble is detrimental to performance
because it means that an operation that could be executed is not executed and its time slot is wasted.

23. To say that the RISC philosophy was all about reducing the size of instruction sets would be wrong and entirely
miss the point. What enduring trends or insights did the so‐called RISC revolution bring to computer architecture
including both RISC and CISC design?

SOLUTION

Designers learned to look at the whole picture rather than just optimizing one or two isolated aspects of the
processor. In particular there was a movement toward the use of benchmarks to improve performance. That is,
engineers applied more rigorous design techniques to the construction of new processors.

116
© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly available website, in whole or in part.
24. There are RAW, WAR, and WAW data hazards. What about RAR (read‐after‐read)? Can a RAR operation cause
problems in a pipelined machine?

SOLUTION

No. A read‐after‐read situation would be:


ADD r1,r2,r3
ADD r4,r2,r7

In the above code, register r2 is read by both instructions. Since the value of r2 is altered by neither operation and
it does not matter (semantically) which instruction is executed first, there can be no problem.

25. Consider the instruction sequence in a five‐stage pipeline IF, OF, E, M, OS:

1. ADD r0,r1,r2
2. ADD r3,r0,r5
3. STR r6,[r7]
4. LDR r8,[r7]

Instructions 1 and 2 will create a RAW hazard. What about instructions 3 and 4? Will they also create a RAW
hazard?

SOLUTION

Yes ‐ possibly. Register r6 may not have been stored before it is read (in memory) by the next instruction. Of
course, part of the problem is the bad code. You are storing a value in memory and then reading it back. You
should replace the LDR r8,[r7] by MOV r8,r6.

26. A RISC processor has a three‐address instruction format and typical arithmetic instructions (i.e., ADD, SUB, MUL,
DIV etc.). Write a suitable sequence of instructions to evaluate the following expression in the minimum time:

X = (A+B)(A+B+C)E+H
G+A+B+D+F(A+B-C)

Assume that all variables are in registers and that the RISC does not include a hardware mechanism for the
elimination of data dependency. Each instance of data dependency causes one bubble in the pipeline and wastes
one clock cycle.

SOLUTION

It is necessary to write the code with the minimum number of RAWs. For example,

ADD T1,A,B ;A+B


ADD T2,G,D ;G+D
ADD T3,T1,C ;A+B+C
ADD T2,T2,T1 ;G+A+B+D
MUL T4,T1,F ;(A+B)E
SUB T5,T1,C ;A+B-C
MUL T4,T4,T3 ;(A+B)(A+B+C)E
MUL T5,T5,F ;F(A+B-C)
ADD T4,T4,H ;(A+B)(A+B+C)E+H
ADD T5,T5,T2 ;G+A+B+D+F(A+B-C)
DIV T4,T4,T5 ;((A+B)(A+B+C)E+H)/(G+A+B+D+F(A+B-C)) (one stall)

117
© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly available website, in whole or in part.
27. Figure P7.27 gives a partial skeleton diagram of a pipelined processor. What is the purpose of the flip‐flops
(registers) in the information paths?

Figure P7.27 Structure of a Pipelined Processor

SOLUTION

The problem with architecture like that of Figure P7.27 is that when an instruction is processed (e.g., an operation
and its operands), all the information must be in place at the same time. For example, if you perform a = b + c
followed by p = q ‐ r, it would be unfortunate if q and r arrived at the ALU at the same time as the + operator. This
would lead to the erroneous operation p = q + r.

Once an instruction goes from PC to instruction memory to instruction register, it is divided into fields (operands,
constants, instructions) and each of these fields provides data that flows along different paths. For example, the
op‐code goes to the ALU immediately, whereas the operands (during a register‐to‐register operation) go via the
register file where operand addresses are translated into operand values. The flip‐flops equalize the time at which
data and operations arrive at the ALU. It is also necessary to put a delay in the destination address path because
the destination address has to wait an extra cycle – the time required for the ALU to perform an operation.

28. Explain why branch operations reduce the efficiency of a pipelined architecture. Describe how branch prediction
improves the performance of a RISC processor and minimizes the effect of branches?

SOLUTION

Four stage pipeline IF = instruction fetch


OF = operand fetch
i-1 IF OF E S E = execute
S = result store

i IF OF E S It is not until after the


execute phase that
IF OF a fetch from the target
i+1 Bubble Bubble E S address can begin.
IF These two instructions
Branch instruction i+2 Bubble OF E S are not executed.
BRA N

N IF OF E S First instruction at the


branch target address

N+1 IF OF E S

118
© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly available website, in whole or in part.
The figure demonstrates the effect of a bubble in a pipelined architecture due to a branch. The pipeline inputs a
string of instructions and executes them in stages; in this example, it’s four. Once the pipe is full, four instructions
are in varying stages of completion. If a branch is read into the pipeline and that branch is taken, the instructions
following the branch are not going to be executed. Instructions ahead of the branch will be executed. A bubble is
the term used to describe a pipeline state where the current instruction must be rejected. In this figure it takes
two clocks before normal operation can be resumed.

29. Assume that a RISC processor uses branch prediction to improve its performance. The table below gives the
number of cycles taken for predicted and actual branch outcomes. These figures include both the cycles taken by
the branch itself and the branch penalty associated with branch instructions.

Actual
Prediction Not taken Taken
Not taken 1 4
Taken 2 1

If pb is the probability that a particular instruction is a branch, pt is the probability that a branch is taken, and pw is
the probability of a wrong prediction, derive an expression for the average number of cycles per instruction, TAVE.
All non‐branch instructions take one cycle to execute.

SOLUTION

The total number of possible outcomes of an instruction are:

Non‐branch cycles + branches not taken and predicted not taken + branches not taken and predicted taken +
branches taken and predicted taken + branches taken and predicted not taken

In each case, we multiply the probability of the event by the cost of the event; that is:

TAVE = (1 ‐ pb)⋅1 + pb ⋅((1 ‐ pt)⋅(1 ‐ pw)⋅1 + (1 ‐ pt)⋅pw⋅2 + pt⋅(1 ‐ pw)⋅1 + pt⋅pw⋅4 )

Remember that if pt is the probability of a branch being taken, 1 ‐ pt is the probability of a branch not being taken.
If pw is the probability of a wrong correction, (1 ‐ pw) is the probability of a correct prediction.

Therefore, the average number of cycles is 1 ‐ pb(1 ‐ 1 + pt + pw ‐ pt⋅pw ‐ 2⋅pw + 2⋅pt⋅pw ‐ pt + pt⋅pw ‐ 4⋅pt⋅pw)
= 1 ‐ pb⋅ ( ‐pw ‐ 2⋅pt⋅pw ) =1 + pb⋅pw(1 + 2⋅pt)

30. IDT application note AN33 [IDT89] gives an expression for the average number of cycles per instruction in a RISC
system as:

Cave = Pb(1 + b) + Pm(1 + m) + (1 ‐ Pb ‐ Pm) where:

pb = probability that an instruction is a branch


b = branch penalty
pm = probability that an instruction is a memory reference
m = memory reference penalty
Explain the validity of this expression. How do you think that it might be improved?

SOLUTION

The first term, Pb(1 + b), is the probability of a branch multiplied by the total cost of a branch (i.e., 1 plus the
branch penalty). The second Pm(1 + m) term deals with memory accesses and is the probability of a memory
access multiplied by the total memory access cost. The final term, (1 ‐ Pb ‐ Pm), is what’s left over; that is not a
branch and not a memory access.
119
© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly available website, in whole or in part.
This formula is limited in the sense that it does not describe the difference between branches that are taken and
not taken and between cache accesses and not‐cache accessed. However, its message is clear; reduce both
branches and memory accesses.

31. RISC processors rely (to some extent) on on‐chip registers for their performance increase. A cache memory can
provide a similar level of performance increase without restricting the programmer to a fixed set of registers.
Discuss the validity of this statement.

SOLUTION

Memory accesses can take orders of magnitude longer than register accesses. Because RISC style processors have
far more registers than CISC processors, it is possible to operate on a subset of data stored within the chip and to
reduce memory accesses.

However, cache memory, which is a copy of some frequently‐used memory, can reduce the memory access
penalty by keeping data in the on‐chip cache.

One argument in favor of cache is that it is handled automatically by the hardware. Registers have to be allocated
by the programmer or the compiler. If the number of registers is limited, it is possible that the on‐chip registers
may be used/allocated non‐optimally.

Cache memory also has the advantage that it supports dynamic data structures like the stack. Most computers do
not allow dynamic data structures based on registers (that is, you can’t access register ri, where i is an index). The
Itanium IA64 that we discuss in Chapter 8 does indeed have dynamic registers.

32. RISC processors best illustrate the difference between architecture and implementation. To what extent is this
statement true (or not true)?

SOLUTION

We have already stated that architecture and organization are orthogonal; that is they are independent. In
principle, this statement is true. You can create an instruction set on paper and then implement it any way you
want; via direct logic (called random logic) or via a structure such as microprogramming. However, some design
or organization techniques may be suited or unsuited to a particular architecture. CISC processors are
characterized by both complicated instructions (i.e., multiple‐part instructions or instructions with complex
addressing modes); for example, the BFFFO (locate the occurrence of the first bit set to 1) can be regarded as a
complex instruction, and irregular instruction encodings. Consequently, CISC instruction sets are well‐suited to
implementation/interpretation via microcode. The instruction lookup table simply translates a machine code
value into the location of the appropriate microcode. It doesn’t matter how odd the instruction encoding is.

RISC processors with simple instructions are well suited to implementation by pipelining because of the regularity
of a pipeline; that is, all instructions are executed in approximately the same way.

33. A RISC processor executes the following code. There are no data dependencies.

ADD r0,r1,r2
ADD r3,r4,r5
ADD r6,r7,r8
ADD r9,r10,r11
ADD r12,r13,r14
ADD r15,r16,r17

a. Assuming a 4‐stage pipeline fetch, operand fetch, execute, write, what registers are being read during the 6th
clock cycle and what register is being written?

120
© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly available website, in whole or in part.
b. Assuming a 5‐stage pipeline fetch, operand fetch, execute, write, store, what registers are being read during
the 6th clock cycle and what register is being written?

SOLUTION

a. Four‐stage pipeline
Cycle 1 2 3 4 5 6 7 8
ADD r0,r1,r2 IF OF E W
ADD r3,r4,r5 IF OF E W
ADD r6,r7,r8 IF OF E W
ADD r9,r10,r11 IF OF E W
ADD r12,r13,r14 IF OF E W
ADD r15,r16,r17 IF OF E

During the 6th clock cycle, operands r13 and r14 are being read and operand r6 is being written.

b. Five ‐stage pipeline


Cycle 1 2 3 4 5 6 7 8
ADD r0,r1,r2 IF OF E M W
ADD r3,r4,r5 IF OF E M W
ADD r6,r7,r8 IF OF E M W
ADD r9,r10,r11 IF OF E M W
ADD r12,r13,r14 IF OF E M
ADD r15,r16,r17 IF OF E

During the 6th clock cycle operands r13 and r14 are being read and operand r3 is being written.

34. A RISC processor executes the following code. There are data dependencies but no internal forwarding. A source
operand cannot be used until it has been written.

ADD r0,r1,r2
ADD r3,r0,r4
ADD r5,r3,r6
ADD r7,r0,r8
ADD r9,r0,r3
ADD r0,r1,r3

a. Assuming a 4‐stage pipeline: fetch, operand fetch, execute, result write, what registers are being read during
the 10th clock cycle and what register is being written?
b. How long will it take to execute the entire sequence?

SOLUTION

Cycle 1 2 3 4 5 6 7 8 9 10 11 12 13
ADD r0,r1,r2 IF OF E W
ADD r3,r0,r4 IF OF E W
ADD r5,r3,r6 IF OF E W
ADD r7,r0,r8 IF OF E W
ADD r9,r0,r3 IF OF E W
ADD r0,r1,r3 IF OF E W

a. In the 10th cycle registers r0 and r3 are being read and register r5 is being written.

b. It takes 13 cycles to complete the sequence.

121
© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly available website, in whole or in part.
35. A RISC processor has an eight stage pipeline: F D O E1 E2 MR MW WB (fetch, decode, register read operands,
execute 1, execute 2, memory read, memory write, result writeback to register). Simple logical and arithmetic
operations are complete by the end of E1. Multiplication is complete by the end of E2.How many cycles are
required to execute the following code assuming that internal forwarding is not used?

MUL r0,r1,r2
ADD r3,r1,r4
ADD r5,r1,r6
ADD r6,r5,r7
LDR r1,[r2]

SOLUTION

Cycle 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
1 MUL r0,r1,r2 F D O E1 E2 MR MW WB
2 ADD r3,r1,r4 F D O E1 E2 MR MW WB
3 ADD r5,r1,r6 F D O E1 E2 MR MW WB
4 ADD r6,r5,r7 F D O E1 E2 MR MW WB
5 LDR r1,[r2] F D O E1 E2 MR MW WB

There’s only one RAW dependency in instruction 4 involving r5. The total number of cycles is 17.

36. Repeat the previous problem assuming that internal forwarding is implemented.

SOLUTION

Cycle 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
1 MUL r0,r1,r2 F D O E1 E2 MR MW WB
2 ADD r3,r1,r4 F D O E1 E2 MR MW WB
3 ADD r5,r1,r6 F D O E1 E2 MR MW WB
4 ADD r6,r5,r7 F D O E1 E2 MR MW WB
5 LDR r1,[r2] F D O E1 E2 MR MW WB

37. Consider the same structure as question 35 but with the following code fragment. Assume that internal
forwarding is possible and an operand can be used as soon as it is generated. Show the execution of this code.

LDR r0,[r2]
ADD r3,r0,r1
MUL r3,r3,r4
ADD r6,r5,r7
STR r3,[r2]
ADD r6,r5,r7

SOLUTION
Cycle 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
1 LDR r0,[r2] F D O E1 E2 MR MW WB
2 ADD r3,r0,r1 F D O E1 E2 MR MW WB
3 MUL r3,r3,r4 F D O E1 E2 MR MW WB
4 ADD r6,r3,r7 F D O E1 E2 MR MW WB
5 LDR r1,[r2] F D O E1 E2 MR MW WB

122
© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly available website, in whole or in part.
38. The following table gives a sequence of instructions that are performed on a 4‐stage pipelined computer. Detect
all hazards. For example if instruction m uses operand r2 generated by instruction m‐1, then write m‐1,r2 in the
RAW column in line m.

Number Instruction RAW WAR WAW


1 Add r1,r2,r3
2 Add r4,r1,r3
3 Add r5,r1,r2
4 Add r1,r2,r3
5 Add r5,r2,r3
6 Add r1,r6,r6
7 Add r8,r1,r5

SOLUTION

Number Instruction RAW WAR WAW


1 Add r1,r2,r3
2 Add r4,r1,r3 1,r1
3 Add r5,r1,r2 1,r1
4 Add r1,r2,r3 3,r1 1,r1
5 Add r5,r2,r3 3,r5
6 Add r1,r6,r6 4,r1 1,r1
7 Add r8,r1,r5 6,r1
5,r5

Note that some of the hazards are technical hazards and not real hazards. For example, instruction 3 does not
suffer a RAW hazard on r1 because any delay will have been swallowed by the previous instruction.

39. Consider the following code:

LDR r1,[r6] ;Load r1 from memory. r6 is a pointer


ADD r1,r1,#1 ;Increment r1 by 1
LDR r2,[r6,#4] ;Load r2 from memory
ADD r2,r2,#1 ;Increment r2 by 1
ADD r3,r1,r2 ;Add r1 and r2 with total in r3
ADD r8,r8,#4 ;Increment r8 by 4
STR r2,[r6,#8] ;Store r2 in memory
SUB r2,r2,#64 ;Subtract 64 from r2

The processor has a five‐stage pipeline F O E M S; that is, instruction fetch, operand fetch, operand execute,
memory, operand writeback to register file.

a. How many cycles does this code take to execute assuming internal forwarding is not used?
b. How many cycles does this code take to execute assuming internal forwarding is used?
c. How many cycles does the code take to execute assuming that it is reordered (no internal forwarding)?
d. How many cycles does the code take to execute assuming reordering and internal forwarding?

123
© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly available website, in whole or in part.
SOLUTION

a. No forwarding
Cycle 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
1 LDR r1,[r6] F O E M S
2 ADD r1,r1,#1 F O E M S
3 LDR r2,[r6,#4] F O E M S
4 ADD r2,r2,#1 F O E M S
5 ADD r3,r1,r2 F O E M S
6 ADD r8,r8,#4 F O E M S
7 STR r2,[r6,#8] F O E M S
8 SUB r4,r4,#64 F O E M S

b. Forwarding
Cycle 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
1 LDR r1,[r6] F O E M S
2 ADD r1,r1,#1 F O E M S
3 LDR r2,[r6,#4] F O E M S
4 ADD r2,r2,#1 F O E M S
5 ADD r3,r1,r2 F O E M S
6 ADD r8,r8,#4 F O E M S
7 STR r2,[r6,#8] F O E M S
8 SUB r4,r4,#64 F O E M S

c. Reordering
Cycle 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
1 LDR r1,[r6] F O E M S
2 LDR r2,[r6,#4] F O E M S
3 ADD r8,r8,#4 F O E M S
4 ADD r1,r1,#1 F O E M S
5 ADD r2,r2,#1 F O E M S
6 SUB r4,r4,#64 F O E M S
7 ADD r3,r1,r2 F O E M S
8 STR r2,[r6,#8] F O E M S

d. Reordering and forwarding


Cycle 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
1 LDR r1,[r6] F O E M S
2 LDR r2,[r6,#4] F O E M S
3 ADD r8,r8,#4 F O E M S
4 ADD r1,r1,#1 F O E M S
5 ADD r2,r2,#1 F O E M S
6 SUB r4,r4,#64 F O E M S
7 ADD r3,r1,r2 F O E M S
8 STR r2,[r6,#8] F O E M S

40. Why do conditional branches have a greater effect on a pipelined processor than unconditional branches?

SOLUTION

The outcome of an unconditional branch is known the moment it is first detected. Consequently, instructions at
the target address can be fetched immediately. The outcome of a conditional address is not known until the
condition has been tested which may be at a later stage in the pipeline.

41. Describe the various types of change of flow‐of‐control operations that modify the normal sequence in which a
processor executes instructions. How frequently do these operations occur in typical programs?

SOLUTION

Operations that affect the flow of control are:


Branch/jump (programmer initiated)
Subroutine call
Subroutine return
Trap (operating system call)
124
© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly available website, in whole or in part.
Software exception
Hardware exception (interrupt).

All these events cause a change in the flow of control (non‐sequential instruction execution). Interrupts and
exceptions are relatively rare (expressed as a percentage of total instructions executed). The frequency of
branches and jumps may be expressed statically or dynamically. The static frequency is the fractional number of
branches in the code. The dynamic frequency is more meaningful and is the number of branches executed when
the code is run. Branch instructions make up about 20% of a typical program. Subroutine calls and returns are less
frequent (of the order of 2%).

42. Consider the following code:

MOV r0,#Vector ;point to Vector


MOV r2,#10 ;loop count
Loop LDR r1,[r0] ;Repeat: get element
SUBS r2,r2,#1 ;decrement loop count and set Z flag
MUL r1,r1,#5
STR r1,[r0] ;save result
ADD r0,r0,#4 ;point to next
BNE Loop ;until all done (branch on Z flag).

Suppose this ARM‐like code is executed on a 4‐stage pipeline with internal forwarding. The load instruction has
one cycle penalty and the multiply instruction introduces two stall cycles into the execute phase. Assume the
taken branch has no penalty.

a. How many instructions are executed by this code?


b. Draw a timing diagram for the first iteration showing stalls. Assume internal forwarding.
c. How many cycles does it take to execute this code?

SOLUTION

a. There are two pre‐loop instructions and a 6‐instruction loop repeated 10 times. Total = 2 + 10 × 6 = 62.

b. The following shows the code of one pass round the loop

Cycle 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
1. LDR r1,[r0] F O E S
2. SUBS r2,r2,#1 F O E S
3. MUL r1,r1,#5 F O E S
4. STR r1,[r0] F O E S
5. ADD r0,r0,#4 F O E S
6. BNE Loop F O E S
1. LDR r1,[r0] F O E S (repeat)

c. It takes 11 cycles to make one pass round the loop. However, it takes 14 cycles to execute all the instructions
in a loop fully. The total number of cycles is 2 (preloop) + 10 × 11 + 3 (post loop) = 115.

43. Branch instructions may be taken or not taken. What is the relative frequency of taken to not taken, and why is
this so?

SOLUTION

At first sight is might appear that the probability of branches being taken or not taken is 50:50 because there are
two alternatives. However, this logic is entirely misleading because of the way in which branches are used. A
paper (albeit old) by Y. Wu and J.R. Larus (Static branch frequency and program profile analysis, MICRO‐27 Nov
1994) suggests that loop branches have a probability of 88% of being taken.
125
© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly available website, in whole or in part.
44. What is branchless computing?

SOLUTION

If branches are considered harmful because a misprediction can lead to bubbles in the pipeline, it is a good idea to
reduce the frequency of branches. Doing this is called branchless computing. In particular, it refers to predicated
computing where an instruction is conditionally executed; for example, the ARM’s ADDEQ r0,r0,#1
increments the value of register r0 if the result of the last operation that set the condition code was zero. The
IA32 MMX instruction set extension also permits branchless computing by turning a condition into a value; that is,
if a test yields true, the value 1111…1 is generated and if the condition is false the value 0000...0 is generated.
These two constants can then be used as masks in Boolean operations

45. What is a delayed branch and how does it contribute to minimizing the effect of pipeline bubbles? Why are
delayed branch mechanisms less popular then they were?

SOLUTION

The term delayed in delayed branch is not a very good description. In a pipelined computer a taken branch means
that the pipeline must be (partially) flushed. If the instruction sequence is P,B,Q where P, B, and Q are three
instructions and B is a branch, the instruction Q is executed if the branch is not taken and not executed if the
branch is taken. A delayed branch mechanism always executes the instruction after the branch. Thus, the
sequence P,Q,B (where P and Q are executed before the branch) becomes P,B,Q where Q is still executed before
the branch. Of course, if a suitable instruction P cannot be found, the so‐called delayed branch slot must be filled
with a NOP (no operation).

46. How does branch prediction reduce the branch penalty?

SOLUTION

In a pipelined processor, an instruction flows through the pipeline and is executed in stages. If an instruction is a
branch and the branch is taken, all instructions behind it in the pipeline have to be flushed. The earlier a branch is
detected and the outcome resolved the better. Branch prediction makes a guess about the direction (outcome) of
the branch; taken or not taken. If the branch is predicted not taken, nothing happens and execution continues. If
the branch is predicted as taken, instructions can be obtained from the branch target address and loaded into the
instruction stream immediately. If the prediction is incorrect, the pipeline has to be flushed in the normal way.

47. A pipelined computer has a four‐stage pipeline: fetch/decode, operand fetch, execute, writeback. All operations
except load and branch do not introduce stalls. A load introduces one stall cycle. A non‐taken branch introduces
not stalls and a taken branch introduces two stall cycles. Consider the following loop.

for (j=1023; j > 0; j--) {x[j]=x[j]+2;}

a. Express this code in an ARM‐like assembly language (assume that you cannot use autoindexed addressing and
that the only addressing mode is register indirect of the form [r0]).
b. Show a single trip round the loop and indicate how many clock cycles are required.
c. How many cycles will it take to execute this code in total?
d. How can you modify the code to reduce the number of cycles?

SOLUTION

a. The code
mov r2,#1023
Loop ldr r0,[r1]
add r0,r0,#2
126
© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly available website, in whole or in part.
str r0,[r1]
add r1,r1,#4
subs r2,r2,#1
BNE Loop

b. A trip round the loop has 6 instructions. The load has a one cycle stall and the taken branch back has two
cycles. The total is 6 + 1 + 2 = 9 cycles.

c. The total number of cycles is 1 + 1,024 × 9 ‐ 2 (the minus 2 is there because the branch is not taken on the last
loop). This is 9,215 cycles.

d. You can speed up the code by unrolling the loop and performing multiple iterations per trip and avoiding the
two cycle branch delay. You could save a cycle of latency by inserting the increment r1 by 4 after the load to
hide the load stall.

48. Suppose that you design an architecture with the following characteristics

Cost of a non‐branch instruction 1 cycle


Fraction of instructions that are branches 20%
Fraction of branches that are taken 85%
Fraction of delay slots that can be filled 50%
Cost of an unfilled delay slot 1 cycle

For this architecture

a. calculate the average number of cycles per instruction


b. calculate the improvement (as a percentage) if the fraction of delay slots that are filled can be increased to
95%.

SOLUTION

a. Average cycles = non‐branch cycles + non‐taken branches + taken branches slot filled + taken branches slot
unfilled.
= 80% × 1 + 20%(15% × 1 + 85% × (50% × 1 + 50% × 2))
=0.80 + 0.20 × (0.15 + 0.85 × (0.50 + 1.00) = 0.80 + 0.20 × (0.15 + 1.275) = 1.085

b. The only thing different is the fraction of unfilled slots. We can write
Average cycles = 80% × 1 + 20%(15% × 1 + 85% × (95% × 1 + 5% × 2))
= 0.80 + 0.20(0.15 + 0.8925) = 1.0085.

49. A pipelined processor has the following characteristics:


• Loads 18%
• Load stall (load penalty) 1 cycle
• Branches 22%
• Probability a branch is taken 80%
• Branch penalty on taken 3 cycles
• RAW dependencies 20% of all instructions except branches
• RAW penalty 1 cycle

Estimate the average cycles per instruction for this processor.

SOLUTION

We have to add the load and data and branch stalls.


Load stalls: 18% × 1 = 0.18.
127
© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly available website, in whole or in part.
Data stalls: 78% × 20% × 1 = 0.156.
Branch stalls 22% × 80% × 3 = .66 × .8 = .528
Total = 1 + 0.18 + 0.156 + 0.528 = 1.864 cpi

50. What is the difference between static and dynamic branch prediction?

SOLUTION

Static prediction takes place before any code is executed; that is, it does not use feedback from the actual running
of the code to make a prediction. Dynamic prediction uses information from the past behavior of the program to
predict the future behavior. Dynamic prediction is more accurate than static prediction.

Static prediction relies on factors such as the static behavior of individual branches (e.g., this branch type is
usually taken, this one is not). Such an approach is relatively crude. The compiler can analyze code and make a
guess about the outcome of branches and then set a hint bit in the code. The processor uses this hint bit to decide
whether the branch will be taken. Note that not all computers have a branch hint bit.

Dynamic branch prediction observes the history of branches (either individually or collectively) and the position of
branches in the program to decide whether to take or not take a branch. Dynamic prediction can be very accurate
in many circumstances.

51. A processor has a branch‐target buffer. If a branch is in the buffer and it is correctly predicted, there is no branch
penalty. The prediction rate is 85% correct. If it is incorrectly predicted, the penalty is 4 cycles. If the branch is not
in the buffer, and not taken, the penalty is 2 cycles. Seventy percent of branches are taken. If the branch is not in
the buffer and is taken the penalty is 3 cycles. The probability that a branch is in the buffer is 90%. What is the
average branch penalty?

SOLUTION

Branch penalty = mispredict penalty (in buffer) + taken penalty (not in buffer) + not taken penalty (not in buffer) =
90% × 15% × 4 + 10% × 70% × 3 + 10% × 30% × 2
= 0.54 + 0.21 + 0.06 = 0.81 cycles per branch.

52. How can the compiler improve the efficiency of some processors with branch prediction mechanisms?

SOLUTION

Some processors allow the compiler to set/clear bits in the op‐code that tell the processor whether to treat this
branch as taken or not taken; for example, if you have a loop in a high level language, the terminating conditional
branch will be taken back to the start of the loop n‐1 times for n iterations. The compiler would set the take
branch bit in the opcode and the processor would automatically assume ‘branch taken’.

53. Consider the following two streams of branch outcomes (T = taken and N = not taken). In each case what is the
simplest form of branch prediction mechanism that would be effective in reducing the branch penalty?

a. T, T, T, T, T, N, T, T, T, T, T, T, T, N, T, T, T, T, T, N, T, T, T, T, T, T, T, N, T, T, T, T, T
b. T, T, T, T, T, N, N, N, N, N, N, N, N, N, T, T, T, T, T, T, T, T, T, T, T, N, N, N, N, N, N, N, N

SOLUTION

a. Static

b. 1‐bit change direction on first error

128
© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly available website, in whole or in part.
54. A processor uses a 2‐bit saturation‐counter dynamic branch predictor with the states strongly taken, weakly
taken, weakly not taken, and strongly not taken. The symbol T indicates a branch that is taken and an N indicates
a branch that is not taken. Suppose that the following predicted sequence of branches is recorded: T T T N T X

What is the value of X?

SOLUTION

In order to make the N prediction, the previous two states would have to be not taken states. If the next
prediction is T then the previous branch must have been T to move from the weakly not taken predicted state to
the weakly predicted taken state. Therefore the next prediction X will be T.

55. The following sequence of branch outcomes is applied to a saturating counter branch predictor
TTTNTTNNNTNNNTTTTTNTTTNNTTTTNT. If the branch penalty is two cycles for a miss‐predicted branch, how
many additional cycles does the system incur for the above sequence of 30 branches? Assume that the predictor
is initially in the strongly predicted taken state.

SOLUTION
Branch sequence
T T T N T T N N N T N N N T T T T T N T T T N N T T T T N T
Next predictor state (ST, WT, SN, WN, SN)
ST ST ST WT ST ST WT WN SN WN SN SN SN WN WT ST ST ST WT ST ST ST WT WN WT ST ST ST WT ST
Outcome (decision)
T T T T T T T N N N N N N N T T T T T T T T T N T T T T T T
Wrong decision
W W W W W W W W W W W

The number of wrong decisions is 11 costing 11 × 2 = 22 cycles. This is no better than guessing taken.

56. The state diagram below represents one of the many possible 2‐bit state machines that can be used to perform
prediction. Explain, in plain English, what it does.

T T

NT Not Not
S0 S1 Taken S2 Taken S3
taken taken
NT NT T

NT

SOLUTION

We can regard S0 as a strongly not taken state and all not taken branches lead towards this state. States S0, S1,
S2, S3 behave exactly like the corresponding states in a saturating counter with respect to not taken branches.
The differences between this and a saturating counter are:

1. If you are in state S1 (not taken) and the next branch is taken, you go straight to state S3, the strongly taken
state.
2. If you are in state S3, a taken branch takes you to state S2 (rather than back to state S3). State S3 is not a
saturating state. If there is a sequence of taken branches, the system oscillates between S2 and S3. From
state S3 the next state is always state S2 (since a taken and a not taken have the same destination).

129
© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly available website, in whole or in part.
57. What is a branch target buffer and how does it contribute to a reduction of the branch penalty?

SOLUTION

The fundamental problem with a branch is that if it is taken, instructions already in the pipeline have to be
flushed. Consequently, you want to detect a branch instruction as soon as possible. Then you can begin execution
at the target address.

Branch target prediction operates by detecting the branch, guessing its outcome and fetching instructions from
the next or target address as soon as possible.

The branch target buffer, BTB, is a form of memory cache that caches the addresses of branch instructions. The
program counter searches the BTB. If the current instruction address corresponds to a branch, the cache can be
accessed and the predicted outcome of the branch read (This is true only of BTBs that have a prediction bit. In
general, it is assumed that every cached branch will be taken). The BTB contains the address of the target of the
branch. This means that instructions can be loaded from that address immediately (without having to read the
branch instruction and compute the target address). If you also cache the instruction at the target address you
can get the instruction almost immediately. The BTB lets you resolve the branch much earlier in the pipeline and
therefore reduce the branch penalty.

58. Consider the 4‐bit saturating counter as a branch predictor with 16 states from 1111 to 0000? Describe in words
the circumstances where such a counter might be effective.

SOLUTION

If the branch predictor works in the same way as a 2‐bit saturating counter, it has 16 states; 8 of which predict
take and 8 don’t take the branch. If you are in a run of taken or not taken branches (more than 15) then you are in
the strongest taken (or not taken state). It will take a run of eight wrongly predicted branches in sequence to
reverse the decision. Therefore, you might use such a system in circumstances where very longs runs of a branch
are in one direction, and you do not wish to reverse the direction unless there is a change of direction spanning 8
branches.

59. Draw the state diagram of a branch predictor using three‐bit saturating counter? Under what circumstances do
you think such a predictor might prove effective?

SOLUTION

The predictor will not change direction when fully saturated until four consecutive wrong decisions have been
made.

130
© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly available website, in whole or in part.
60. Given the branch sequence TTTTNTTNNTTTTTNNNNNNNNNTNTTTTTTTTTTTT and assuming that the 3‐bit
saturating predictor starts in its saturated T state, what will the predicted sequence be?

SOLUTION

Input
T T T T N T T N N T T T T T N N N N N N N N N T N T T T T T T T T T T T T
State
S7 S7 S7 S7 S6 S7 S7 S6 S5 S6 S7 S7 S7 S7 S6 S5 S4 S3 S2 S1 S0 S0 S0 S1 S0 S1 S2 S3 S4 S5 S6 S7 S7 S7 S7 S7 S7
Predict
T T T T T T T T T T T T T T T T T N N N N N N N N N N N T T T T T T T T T
Outcome (c = correct, w = wrong)
C C C W C C W W C C C C C W W W W C C C C C W C W W W W C C C C C C C C C

61. The following code is executed by an ARM processor:

MOV r0,#4
B1 MOV r2,#5
SUB r2,r2,r0
B2 SUBS r2,r2,#1
BNE B2 ;Branch 1
SUBS r0,r0,#1
BNE B1 ;Branch 2

Assume that a 1‐bit branch predictor is used for both branch 1 and branch 2 and that both predictors are initially
set to N. Complete the following table by running through this code.

Branch 1 Branch 2
Cycle Branch prediction Branch outcome Cycle Branch prediction Branch outcome
1 N N 1 N T
2 2
3 3
4 4
5
6
7
8
9
10

Repeat the same exercise with the same initial conditions but assume a 2‐bit saturating counter branch predictor.

SOLUTION

Branch 1 Branch 2
Cycle Branch prediction Branch outcome Cycle Branch prediction Branch outcome
1 N N 1 N T
2 N T 2 T T
3 T N 3 T T
4 N T 4 T N
5 T T
6 T N
7 N T
8 T T
9 T T
10 T N
131
© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly available website, in whole or in part.
62. A processor executes all non‐branch instructions in one cycle. This processor implements branch prediction,
which incurs an additional penalty of 2 cycles if the prediction is correct and 4 cycles if the prediction is incorrect.

a. If conditional branch instructions occupy 15% of the instruction stream, and the probability of an incorrect
branch prediction is 20%, what is the average number of cycles per instruction?
b. If the same processor is to run no less than 28% slower than a machine with a zero branch penalty when up
to 20% of the instructions are conditional branches, what level of accuracy must the branch prediction
achieve on average?

SOLUTION

a. CPI = non‐branch cycles + branch cycles (correct prediction) + branch cycles (incorrect prediction)
= 0.85 × 1 + 0.15(0.80 × 2 + 0.20 × 4) = 0.85 + 0.15(2.4) = 0.85 + 0.36 = 1.21 CPI

b. A machine with a zero branch penalty runs at 1.0 CPI.


CPI = 0.80 + 0.20(Pc × 2 + (1 ‐ Pc) × 4) = 0.8 + 0.20(4 ‐ 2Pc) = 0.85 + 0.80 ‐ 0.4Pc = 1.65 ‐ 0.4Pc.
This must not be 28% less than a machine with no branch penalties; that is 1.65 ‐ 0.4Pc = 1.28 and Pc = (1.65 ‐
1.28)/0.40 = 0.925; that is the branch prediction must be about 93% accurate.

63. A computer has a branch target buffer, BTB. Derive an expression for the average branch penalty if:

• a branch not in the BTB that is not taken incurs a penalty of 0 cycles
• a branch not in the BTB that is taken incurs a penalty of 6 cycles
• a branch in the BTB that is not taken incurs a penalty of 4 cycles
• a branch in the BTB that is taken incurs a penalty of 0 cycles
• the probability that a branch instruction is cached in the BTB is 80%
• the probability that an instruction not in the BTB is taken is 20%
• the probability that an instruction in the BTB is taken is 90%

SOLUTION

We have to add up penalties for all outcomes:


Not in BTB not taken 20% × 80% × 0 = 0.00
Not in BTB and taken 20% × 20% × 6 = 0.24
In BTB not taken 80% × 10% × 4 = 0.32
In BTB taken 80% × 90% × 0 = 0.00
Total = 0.54 cycles

132
© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly available website, in whole or in part.
64. A RISC processor implements a subroutine call using a link register (i.e., the return address is saved in the link
register).The cost of a call is 2 cycles and the return costs 1 cycle. If a subroutine is called from another subroutine
(i.e., the subroutine is nested), the contents of the link register must be saved and later restored. The cost of
saving the link register is 6 cycles and the cost of restoring the link register is 8 cycles. Assume that a certain
instruction mix contains 20% subroutine calls and returns (i.e., 10% calls, 10% returns). The probability of a single
subroutine call and return without nesting is 60%. The probability that a subroutine call will be followed by a
single nested call is 40%. Assume that the probability of further nesting is vanishingly small. What is the overall
cost of subroutine calls? The average call of all other instructions is 1.5 cycles. What is the average number of
cycles per instruction?

SOLUTION

There are five possibilities: an instruction is not a subroutine call or return, it is a single call, it is a nested call, it is
a single return, it is a nested return. Note that when a subroutine is nested, it has the unnested call return plus
the extra save/return time. The probabilities and costs are:

Not subroutine 80% × 1.5 cycles 1.20


Subroutine call (not nested) 10% × 60% × 2 cycles 0.12
Subroutine call (nested) 10% × 40% × (2 + 6 cycles) 0.32
Subroutine return (not nested) 10% × 60% × 1 cycle 0.06
Subroutine return (nested) 10% × 40% × (1 + 8 cycles) 0.36
Average cycles 2.06 cycles

65. Why is the literal in the op‐code sign‐extended before use (in most computer architectures)?

SOLUTION

Literals in instructions are invariably shorter than the register size of the computer; for example, a 32‐bit
processor might have a 16‐bit literal and 32‐bit registers. When the literal is loaded into the low‐order bits of a
register, the upper order bits must either be cleared, left unchanged, or used to extend the loaded value to the
full length of the register (i.e., sign extension). Since many computer instructions operate with signed values or
with address offsets, it makes sense to sign‐extend an operand when it is loaded. Some processors like the 68K
have separate address (pointer) and general‐purpose data register. Values in address registers are always sign‐
extended, whereas those in data registers are not sign‐extended.

66. Why is the address offset shifted two places left in branch/jump operations in 32‐bit RISC‐like processors?

SOLUTION

Typical processors have 32‐bit, four‐byte, instructions, yet the memory is byte addressed. That is, words have the
hexadecimal address 0,4,8,C,10,14 … However, the address bus can access addresses at any location; for example,
you can access address 0xABC3 (which is not word‐aligned). Because the two lowest bits of an address are always
zero for an aligned address, there is no point in storing them when an address is stored in an instruction as an
offset; for example if the address offset is xxxxxxxx00, it is stored as xxxxxxxx. Consequently, when loaded it must
be shifted left by two places to generate xxxxxxxx00. Doing this extends the effective size of a literal by two bits.

133
© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly available website, in whole or in part.
Other documents randomly have
different content
Page 51, changed comma to period after "stopped short."
Page 52, removed extraneous comma from "perfectly absorbed."
Page 55, removed extraneous quote after "To keep it!" Changed "It
in mine" to "It is mine."
Page 72, changed "hers ketch-block" to "her sketch-block."
Page 76, removed superfluous quote before "Not sure?"
Page 78, adjusted capitalization/punctuation at start of "What did
she say?"
Page 79, changed incorrect nested quotes at end of final paragraph.
Page 83, added missing quote after "you're right, Austin."
Page 84, added missing quote after "I dare say."
Page 86, added missing quote after "take this."
Page 93, removed superfluous quotation mark after "from the
room."
Page 105, changed "sauvely" to "suavely."
Page 123, added missing comma after "need not wait."
Page 127, added missing "o" to "all of their."
Page 130, changed "Mrs. Blair" to "Mrs. Day."
Page 132, added missing quote before "Well, the tide."
Page 140, changed "all-devoring" to "all-devouring."
Page 151, changed "keep if" to "keep it."
Page 161, added missing quote after "out in this storm."
Page 168, changed "Met me help" to "Let me help."
Page 174, removed superfluous quote after "Rose of Devon."
Page 179, added missing quote before "Some people's."
Page 181, added missing quote after "Prince Ferdinand Rivani."
Page 182, italicized "salon" for consistency.
Page 193, changed "camllias" to "camellias."
Page 196, added missing quote after "dear Lucille."
Page 199, changed "faint fry" to "faint cry" and "sholders" to
"shoulders" and added missing quote after "she murmured huskily."
Page 208, changed "acccount" to "account."
Page 215, changed ! to ? in "Oh, what have I said?"
Page 217, changed "sufficed for the signor" to "sufficed for the
signora."
Page 220, removed superfluous quote after "young attache."
Page 230, changed "require some preparations" to "requires some
preparations."
Page 242, changed comma to period after "favorable position."
Page 246, changed "addresss" to "address."
Page 251, changed ! to ? in "Do you think I am dreaming?"
Page 257, added missing close quote after "could require."
Page 258, changed "forgotton" to "forgotten."
Page 259, removed superfluous quote after first "It is true!"
Changed ! to ? in "And if I refuse?" Removed superfluous quote after
"I have told you."
Page 261, changed "husband wife" to "husband and wife" and added
missing quote before "I will go."
Page 263, changed "signoria" to "signorina" (twice in last
paragraph).
Page 266, removed superfluous quote after "Ambrose" in "Austin
Ambrose! The cruellest."
Page 268, added missing quote after "to the minute."
Page 273, changed "possessses" to "possesses."
Page 281, removed superfluous quote after "he had accomplished
it."
Page 284, added missing single close quote after "coronet
afterward!"
*** END OF THE PROJECT GUTENBERG EBOOK WILD MARGARET
***

Updated editions will replace the previous one—the old editions will
be renamed.

Creating the works from print editions not protected by U.S.


copyright law means that no one owns a United States copyright in
these works, so the Foundation (and you!) can copy and distribute it
in the United States without permission and without paying
copyright royalties. Special rules, set forth in the General Terms of
Use part of this license, apply to copying and distributing Project
Gutenberg™ electronic works to protect the PROJECT GUTENBERG™
concept and trademark. Project Gutenberg is a registered trademark,
and may not be used if you charge for an eBook, except by following
the terms of the trademark license, including paying royalties for use
of the Project Gutenberg trademark. If you do not charge anything
for copies of this eBook, complying with the trademark license is
very easy. You may use this eBook for nearly any purpose such as
creation of derivative works, reports, performances and research.
Project Gutenberg eBooks may be modified and printed and given
away—you may do practically ANYTHING in the United States with
eBooks not protected by U.S. copyright law. Redistribution is subject
to the trademark license, especially commercial redistribution.

START: FULL LICENSE


THE FULL PROJECT GUTENBERG LICENSE
PLEASE READ THIS BEFORE YOU DISTRIBUTE OR USE THIS WORK

To protect the Project Gutenberg™ mission of promoting the free


distribution of electronic works, by using or distributing this work (or
any other work associated in any way with the phrase “Project
Gutenberg”), you agree to comply with all the terms of the Full
Project Gutenberg™ License available with this file or online at
www.gutenberg.org/license.

Section 1. General Terms of Use and


Redistributing Project Gutenberg™
electronic works
1.A. By reading or using any part of this Project Gutenberg™
electronic work, you indicate that you have read, understand, agree
to and accept all the terms of this license and intellectual property
(trademark/copyright) agreement. If you do not agree to abide by all
the terms of this agreement, you must cease using and return or
destroy all copies of Project Gutenberg™ electronic works in your
possession. If you paid a fee for obtaining a copy of or access to a
Project Gutenberg™ electronic work and you do not agree to be
bound by the terms of this agreement, you may obtain a refund
from the person or entity to whom you paid the fee as set forth in
paragraph 1.E.8.

1.B. “Project Gutenberg” is a registered trademark. It may only be


used on or associated in any way with an electronic work by people
who agree to be bound by the terms of this agreement. There are a
few things that you can do with most Project Gutenberg™ electronic
works even without complying with the full terms of this agreement.
See paragraph 1.C below. There are a lot of things you can do with
Project Gutenberg™ electronic works if you follow the terms of this
agreement and help preserve free future access to Project
Gutenberg™ electronic works. See paragraph 1.E below.
1.C. The Project Gutenberg Literary Archive Foundation (“the
Foundation” or PGLAF), owns a compilation copyright in the
collection of Project Gutenberg™ electronic works. Nearly all the
individual works in the collection are in the public domain in the
United States. If an individual work is unprotected by copyright law
in the United States and you are located in the United States, we do
not claim a right to prevent you from copying, distributing,
performing, displaying or creating derivative works based on the
work as long as all references to Project Gutenberg are removed. Of
course, we hope that you will support the Project Gutenberg™
mission of promoting free access to electronic works by freely
sharing Project Gutenberg™ works in compliance with the terms of
this agreement for keeping the Project Gutenberg™ name associated
with the work. You can easily comply with the terms of this
agreement by keeping this work in the same format with its attached
full Project Gutenberg™ License when you share it without charge
with others.

1.D. The copyright laws of the place where you are located also
govern what you can do with this work. Copyright laws in most
countries are in a constant state of change. If you are outside the
United States, check the laws of your country in addition to the
terms of this agreement before downloading, copying, displaying,
performing, distributing or creating derivative works based on this
work or any other Project Gutenberg™ work. The Foundation makes
no representations concerning the copyright status of any work in
any country other than the United States.

1.E. Unless you have removed all references to Project Gutenberg:

1.E.1. The following sentence, with active links to, or other


immediate access to, the full Project Gutenberg™ License must
appear prominently whenever any copy of a Project Gutenberg™
work (any work on which the phrase “Project Gutenberg” appears,
or with which the phrase “Project Gutenberg” is associated) is
accessed, displayed, performed, viewed, copied or distributed:
This eBook is for the use of anyone anywhere in the United
States and most other parts of the world at no cost and with
almost no restrictions whatsoever. You may copy it, give it away
or re-use it under the terms of the Project Gutenberg License
included with this eBook or online at www.gutenberg.org. If you
are not located in the United States, you will have to check the
laws of the country where you are located before using this
eBook.

1.E.2. If an individual Project Gutenberg™ electronic work is derived


from texts not protected by U.S. copyright law (does not contain a
notice indicating that it is posted with permission of the copyright
holder), the work can be copied and distributed to anyone in the
United States without paying any fees or charges. If you are
redistributing or providing access to a work with the phrase “Project
Gutenberg” associated with or appearing on the work, you must
comply either with the requirements of paragraphs 1.E.1 through
1.E.7 or obtain permission for the use of the work and the Project
Gutenberg™ trademark as set forth in paragraphs 1.E.8 or 1.E.9.

1.E.3. If an individual Project Gutenberg™ electronic work is posted


with the permission of the copyright holder, your use and distribution
must comply with both paragraphs 1.E.1 through 1.E.7 and any
additional terms imposed by the copyright holder. Additional terms
will be linked to the Project Gutenberg™ License for all works posted
with the permission of the copyright holder found at the beginning
of this work.

1.E.4. Do not unlink or detach or remove the full Project


Gutenberg™ License terms from this work, or any files containing a
part of this work or any other work associated with Project
Gutenberg™.

1.E.5. Do not copy, display, perform, distribute or redistribute this


electronic work, or any part of this electronic work, without
prominently displaying the sentence set forth in paragraph 1.E.1
with active links or immediate access to the full terms of the Project
Gutenberg™ License.

1.E.6. You may convert to and distribute this work in any binary,
compressed, marked up, nonproprietary or proprietary form,
including any word processing or hypertext form. However, if you
provide access to or distribute copies of a Project Gutenberg™ work
in a format other than “Plain Vanilla ASCII” or other format used in
the official version posted on the official Project Gutenberg™ website
(www.gutenberg.org), you must, at no additional cost, fee or
expense to the user, provide a copy, a means of exporting a copy, or
a means of obtaining a copy upon request, of the work in its original
“Plain Vanilla ASCII” or other form. Any alternate format must
include the full Project Gutenberg™ License as specified in
paragraph 1.E.1.

1.E.7. Do not charge a fee for access to, viewing, displaying,


performing, copying or distributing any Project Gutenberg™ works
unless you comply with paragraph 1.E.8 or 1.E.9.

1.E.8. You may charge a reasonable fee for copies of or providing


access to or distributing Project Gutenberg™ electronic works
provided that:

• You pay a royalty fee of 20% of the gross profits you derive
from the use of Project Gutenberg™ works calculated using the
method you already use to calculate your applicable taxes. The
fee is owed to the owner of the Project Gutenberg™ trademark,
but he has agreed to donate royalties under this paragraph to
the Project Gutenberg Literary Archive Foundation. Royalty
payments must be paid within 60 days following each date on
which you prepare (or are legally required to prepare) your
periodic tax returns. Royalty payments should be clearly marked
as such and sent to the Project Gutenberg Literary Archive
Foundation at the address specified in Section 4, “Information
about donations to the Project Gutenberg Literary Archive
Foundation.”

• You provide a full refund of any money paid by a user who


notifies you in writing (or by e-mail) within 30 days of receipt
that s/he does not agree to the terms of the full Project
Gutenberg™ License. You must require such a user to return or
destroy all copies of the works possessed in a physical medium
and discontinue all use of and all access to other copies of
Project Gutenberg™ works.

• You provide, in accordance with paragraph 1.F.3, a full refund of


any money paid for a work or a replacement copy, if a defect in
the electronic work is discovered and reported to you within 90
days of receipt of the work.

• You comply with all other terms of this agreement for free
distribution of Project Gutenberg™ works.

1.E.9. If you wish to charge a fee or distribute a Project Gutenberg™


electronic work or group of works on different terms than are set
forth in this agreement, you must obtain permission in writing from
the Project Gutenberg Literary Archive Foundation, the manager of
the Project Gutenberg™ trademark. Contact the Foundation as set
forth in Section 3 below.

1.F.

1.F.1. Project Gutenberg volunteers and employees expend


considerable effort to identify, do copyright research on, transcribe
and proofread works not protected by U.S. copyright law in creating
the Project Gutenberg™ collection. Despite these efforts, Project
Gutenberg™ electronic works, and the medium on which they may
be stored, may contain “Defects,” such as, but not limited to,
incomplete, inaccurate or corrupt data, transcription errors, a
copyright or other intellectual property infringement, a defective or
damaged disk or other medium, a computer virus, or computer
codes that damage or cannot be read by your equipment.

1.F.2. LIMITED WARRANTY, DISCLAIMER OF DAMAGES - Except for


the “Right of Replacement or Refund” described in paragraph 1.F.3,
the Project Gutenberg Literary Archive Foundation, the owner of the
Project Gutenberg™ trademark, and any other party distributing a
Project Gutenberg™ electronic work under this agreement, disclaim
all liability to you for damages, costs and expenses, including legal
fees. YOU AGREE THAT YOU HAVE NO REMEDIES FOR
NEGLIGENCE, STRICT LIABILITY, BREACH OF WARRANTY OR
BREACH OF CONTRACT EXCEPT THOSE PROVIDED IN PARAGRAPH
1.F.3. YOU AGREE THAT THE FOUNDATION, THE TRADEMARK
OWNER, AND ANY DISTRIBUTOR UNDER THIS AGREEMENT WILL
NOT BE LIABLE TO YOU FOR ACTUAL, DIRECT, INDIRECT,
CONSEQUENTIAL, PUNITIVE OR INCIDENTAL DAMAGES EVEN IF
YOU GIVE NOTICE OF THE POSSIBILITY OF SUCH DAMAGE.

1.F.3. LIMITED RIGHT OF REPLACEMENT OR REFUND - If you


discover a defect in this electronic work within 90 days of receiving
it, you can receive a refund of the money (if any) you paid for it by
sending a written explanation to the person you received the work
from. If you received the work on a physical medium, you must
return the medium with your written explanation. The person or
entity that provided you with the defective work may elect to provide
a replacement copy in lieu of a refund. If you received the work
electronically, the person or entity providing it to you may choose to
give you a second opportunity to receive the work electronically in
lieu of a refund. If the second copy is also defective, you may
demand a refund in writing without further opportunities to fix the
problem.

1.F.4. Except for the limited right of replacement or refund set forth
in paragraph 1.F.3, this work is provided to you ‘AS-IS’, WITH NO
OTHER WARRANTIES OF ANY KIND, EXPRESS OR IMPLIED,
INCLUDING BUT NOT LIMITED TO WARRANTIES OF
MERCHANTABILITY OR FITNESS FOR ANY PURPOSE.

1.F.5. Some states do not allow disclaimers of certain implied


warranties or the exclusion or limitation of certain types of damages.
If any disclaimer or limitation set forth in this agreement violates the
law of the state applicable to this agreement, the agreement shall be
interpreted to make the maximum disclaimer or limitation permitted
by the applicable state law. The invalidity or unenforceability of any
provision of this agreement shall not void the remaining provisions.

1.F.6. INDEMNITY - You agree to indemnify and hold the Foundation,


the trademark owner, any agent or employee of the Foundation,
anyone providing copies of Project Gutenberg™ electronic works in
accordance with this agreement, and any volunteers associated with
the production, promotion and distribution of Project Gutenberg™
electronic works, harmless from all liability, costs and expenses,
including legal fees, that arise directly or indirectly from any of the
following which you do or cause to occur: (a) distribution of this or
any Project Gutenberg™ work, (b) alteration, modification, or
additions or deletions to any Project Gutenberg™ work, and (c) any
Defect you cause.

Section 2. Information about the Mission


of Project Gutenberg™
Project Gutenberg™ is synonymous with the free distribution of
electronic works in formats readable by the widest variety of
computers including obsolete, old, middle-aged and new computers.
It exists because of the efforts of hundreds of volunteers and
donations from people in all walks of life.

Volunteers and financial support to provide volunteers with the


assistance they need are critical to reaching Project Gutenberg™’s
goals and ensuring that the Project Gutenberg™ collection will
remain freely available for generations to come. In 2001, the Project
Gutenberg Literary Archive Foundation was created to provide a
secure and permanent future for Project Gutenberg™ and future
generations. To learn more about the Project Gutenberg Literary
Archive Foundation and how your efforts and donations can help,
see Sections 3 and 4 and the Foundation information page at
www.gutenberg.org.

Section 3. Information about the Project


Gutenberg Literary Archive Foundation
The Project Gutenberg Literary Archive Foundation is a non-profit
501(c)(3) educational corporation organized under the laws of the
state of Mississippi and granted tax exempt status by the Internal
Revenue Service. The Foundation’s EIN or federal tax identification
number is 64-6221541. Contributions to the Project Gutenberg
Literary Archive Foundation are tax deductible to the full extent
permitted by U.S. federal laws and your state’s laws.

The Foundation’s business office is located at 809 North 1500 West,


Salt Lake City, UT 84116, (801) 596-1887. Email contact links and up
to date contact information can be found at the Foundation’s website
and official page at www.gutenberg.org/contact

Section 4. Information about Donations to


the Project Gutenberg Literary Archive
Foundation
Project Gutenberg™ depends upon and cannot survive without
widespread public support and donations to carry out its mission of
increasing the number of public domain and licensed works that can
be freely distributed in machine-readable form accessible by the
widest array of equipment including outdated equipment. Many
small donations ($1 to $5,000) are particularly important to
maintaining tax exempt status with the IRS.

The Foundation is committed to complying with the laws regulating


charities and charitable donations in all 50 states of the United
States. Compliance requirements are not uniform and it takes a
considerable effort, much paperwork and many fees to meet and
keep up with these requirements. We do not solicit donations in
locations where we have not received written confirmation of
compliance. To SEND DONATIONS or determine the status of
compliance for any particular state visit www.gutenberg.org/donate.

While we cannot and do not solicit contributions from states where


we have not met the solicitation requirements, we know of no
prohibition against accepting unsolicited donations from donors in
such states who approach us with offers to donate.

International donations are gratefully accepted, but we cannot make


any statements concerning tax treatment of donations received from
outside the United States. U.S. laws alone swamp our small staff.

Please check the Project Gutenberg web pages for current donation
methods and addresses. Donations are accepted in a number of
other ways including checks, online payments and credit card
donations. To donate, please visit: www.gutenberg.org/donate.

Section 5. General Information About


Project Gutenberg™ electronic works
Professor Michael S. Hart was the originator of the Project
Gutenberg™ concept of a library of electronic works that could be
freely shared with anyone. For forty years, he produced and
distributed Project Gutenberg™ eBooks with only a loose network of
volunteer support.
Project Gutenberg™ eBooks are often created from several printed
editions, all of which are confirmed as not protected by copyright in
the U.S. unless a copyright notice is included. Thus, we do not
necessarily keep eBooks in compliance with any particular paper
edition.

Most people start at our website which has the main PG search
facility: www.gutenberg.org.

This website includes information about Project Gutenberg™,


including how to make donations to the Project Gutenberg Literary
Archive Foundation, how to help produce our new eBooks, and how
to subscribe to our email newsletter to hear about new eBooks.
back
back

You might also like