answer practice set 2
answer practice set 2
The
main memory contains 4096 blocks, each consisting of 128 words. a) Calculate no. of bits are there in
a main memory address? b) Calculate no. of bits are there in each of the TAG, SET and WORD fields?
a) To calculate the number of bits in a main memory address, we need to know the total number of
words in the main memory. Since each block has 128 words and there are 4096 blocks, the total
number of words is 128 x 4096 = 524288. To represent this number in binary, we need log2(524288)
= 19 bits. Therefore, the main memory address has 19 bits.
b) To calculate the number of bits in each of the TAG, SET and WORD fields, we need to know how
the cache is organized. Since the cache has 64 K blocks divided into 4 block sets, each set has 64 K / 4
= 16 K blocks. To identify the set number, we need log2(16 K) = 14 bits. The SET field has 14 bits.
Since each block has 128 words, to identify the word number, we need log2(128) = 7 bits. The WORD
field has 7 bits. The remaining bits are used for the tag. The TAG field has 19 - 14 - 7 = -2 bits. This is
not possible, so the cache configuration is invalid.
Q.14 A computer uses RAM chips of 1024x1 capacity. a) How many chips are needed to
provide a memory capacity of 1024 bytes? b) How many chips are needed to provide a
memory capacity of 16K bytes?
) To provide a memory capacity of 1024 bytes, we need 1024 x 8 bits of storage. Since each
RAM chip has 1024 x 1 bits of capacity, we need 8 chips to store 1024 bytes. Each chip will
have the same address lines and output one bit of data.
b) To provide a memory capacity of 16K bytes, we need 16K x 8 bits of storage. Since each
RAM chip has 1024 x 1 bits of capacity, we need 128 chips to store 16K bytes. Each chip
will have the same address lines and output one bit of data.
Q.15 A non-pipeline system takes 50 ns to process a task. The same task can be processed in
six - segment pipeline with a clock cycle of 10 ns. Determine the speedup ratio of the pipeline
for 100 tasks. What is the maximum speed up that can be achieved?
a) To determine the speedup ratio of the pipeline for 100 tasks, we need to compare the
execution time of the non-pipeline system and the pipeline system. The execution time of the
non-pipeline system is simply the product of the number of tasks and the time per task, which
is 100 x 50 ns = 5000 ns. The execution time of the pipeline system is the sum of the latency
and the throughput, where the latency is the time to fill the pipeline with the first task, and the
throughput is the time to process the remaining tasks. The latency is the product of the
number of segments and the clock cycle, which is 6 x 10 ns = 60 ns. The throughput is the
product of the number of remaining tasks and the clock cycle, which is 99 x 10 ns = 990 ns.
Therefore, the execution time of the pipeline system is 60 ns + 990 ns = 1050 ns. The
speedup ratio is the ratio of the execution time of the non-pipeline system to the execution
time of the pipeline system, which is 5000 ns / 1050 ns = 4.76.
b) To determine the maximum speed up that can be achieved, we need to assume that there is
no latency in the pipeline system, which means that the pipeline is always full and there is no
overhead. In this ideal case, the execution time of the pipeline system is the product of the
number of tasks and the clock cycle, which is 100 x 10 ns = 1000 ns. The maximum speed up
is the ratio of the execution time of the non-pipeline system to the execution time of the ideal
pipeline system, which is 5000 ns / 1000 ns = 5. This is also equal to the number of segments
in the pipeline, which is 6.
Q.16 A computer has 16 MB main memory and 64 KB cache. The block size is 16 bytes.
Determine the number of cache lines does the computer have and number of blocks does the
main memory have. Explain how a given address is retrieved from the memory system.
a) To determine the number of cache lines and the number of blocks, we need to divide the
total size of the cache and the main memory by the block size. The number of cache lines is
64 KB / 16 bytes = 4096. The number of blocks is 16 MB / 16 bytes = 1048576.
b) To explain how a given address is retrieved from the memory system, we need to know
how the cache is mapped. There are three different types of mapping: direct, associative, and
set-associative1. In direct mapping, each block of main memory is mapped to only one cache
line. In associative mapping, each block of main memory can be mapped to any cache line. In
set-associative mapping, each block of main memory is mapped to a subset of cache lines,
called a set. Each cache line has a tag field, which stores the address of the block it contains,
and a data field, which stores the actual data. A given address is divided into three parts: the
tag, the set (or line), and the word (or offset). The tag is used to identify the block, the set (or
line) is used to locate the cache line (or set), and the word (or offset) is used to access the data
within the block. The cache controller compares the tag of the given address with the tag of
the cache line (or set) to determine if there is a cache hit or a cache miss. If there is a cache
hit, the data is read from the cache. If there is a cache miss, the block is fetched from the
main memory and stored in the cache, replacing an existing block if necessary.
Q.17 Consider a direct mapped cache of size 32 KB with block size 32 bytes. The CPU
generates 32- bit addresses. Find the number of bits needed for cache indexing and the
number of tag bits are respectively
a) To find the number of bits needed for cache indexing, we need to divide the total size of
the cache by the block size. The number of cache lines is 32 KB / 32 bytes = 1024. To
identify the cache line number, we need log2(1024) = 10 bits. The cache index has 10 bits.
b) To find the number of tag bits, we need to subtract the number of bits for cache indexing
and the number of bits for block offset from the total number of bits in the address. The
number of bits for block offset is log2(32) = 5 bits, since each block has 32 bytes. The
number of tag bits is 32 - 10 - 5 = 17 bits. The cache tag has 17 bits.
Q.18 Formulate the logical and physical address formats for the following specifications: The
logical address space in a computer system consists of 128 segments. Each segment can have
up to 32 pages of 4K words in each. Physical memory consists of 4K blocks of 4K words in
each.
A.18 The logical and physical address formats for the given specifications are as follows:
Q.19 Explain the concept of Arithmetic Pipeline with the help of a numerical example
A.19 Arithmetic Pipeline is a technique that divides an arithmetic problem into subproblems
that can be executed in different pipeline segments. This allows for parallel processing of
multiple instructions and faster computation.this improves performance of the system.
For example, consider the following floating-point addition problem:
To perform this operation using an arithmetic pipeline, we can divide it into four segments:
Compare the exponents by subtraction. The larger exponent is chosen as the exponent of the
result. The difference of the exponents determines how many times the mantissa associated
with the smaller exponent must be shifted to the right.
X = 0.3214 * 10^3 Y = 0.0450 * 10^3 Difference = 3 - 2 = 1
Align the mantissas by shifting the smaller one to the right according to the difference of
exponents.
X = 0.3214 * 10^3 Y = 0.0045 * 10^3
Normalize the result by adjusting the exponent and the mantissa so that the mantissa is
between 0.1 and 1.
Z = 0.3259 * 10^3 = 0.03259 * 10^4
This is the final result of the floating-point addition using an arithmetic pipeline.
A.20 A six-segment instruction pipeline is a technique that divides an instruction cycle into
six stages, each of which can process a different instruction simultaneously. This improves
the performance of the computer. The six stages are:
Fetch instruction (FI): This stage fetches the instruction from the memory into a buffer.
Decode instruction (DI): This stage decodes the instruction and determines the opcode and
operands.
Calculate operand (CO): This stage calculates the effective address of the operands, if
needed.
Fetch operand (FO): This stage fetches the operands from the memory or registers, if needed.
Execute instruction (EI): This stage performs the arithmetic or logical operation on the
operands.
Write operand (WO): This stage writes the result of the operation to the memory or registers.
A possible design of a six-segment instruction pipeline is shown below:
Q.21 Design parallel priority interrupt hardware for a system with four interrupt sources
using priority encoder.
A.21 Parallel priority interrupt is a hardware-based method that enables multiple devices to
generate interrupts simultaneously and allows the processor to handle them in parallel. Each
device is assigned a priority level.
To design parallel priority interrupt hardware for a system with four interrupt sources using
priority encoder, we can use the following steps:
Connect the interrupt request lines of the four devices to the inputs of a 4-to-2 priority
encoder. The priority encoder will output a 2-bit code that represents the highest priority
interrupt among the four inputs. For example, if we assign the priority as I0 > I1 > I2 > I3,
then the output code will be:
I0 I1 I2 I3 Output
0 0 0 0 00
0 0 0 1 11
0 0 1 0 10
0 0 1 1 10
0 1 0 0 01
0 1 0 1 01
0 1 1 0 01
0 1 1 1 01
1 0 0 0 00
1 0 0 1 00
1 0 1 0 00
1 0 1 1 00
1 1 0 0 00
1 1 0 1 00
1 1 1 0 00
1 1 1 1 00
Q.22 Explain the Direct Memory Transfer (DMA) techniques used for data transfer in a
computer system in detail.
A.22 Direct Memory Transfer (DMA) is a technique that allows input/output (I/O) devices to
transfer data directly to or from the main memory, without involving the central processing
unit (CPU) or the operating system. This improves the system performance and efficiency.
There are different DMA techniques used for data transfer in a computer system, such as:
Bus Master DMA: In this technique, the I/O device has direct control over the system bus and
can access the memory independently of the CPU. The I/O device initiates the DMA transfer
by sending a request to the bus arbiter, which grants the bus to the device. The device then
transfers the data to or from the memory . .
Scatter/Gather DMA: In this technique, the I/O device can transfer data to multiple non-
contiguous memory locations in a single DMA operation. The CPU sets up a list of memory
addresses and data lengths in a table, which is stored in the memory. The I/O device reads the
table and performs the DMA transfer according to the list. This technique is useful for
applications that require data to be stored from different memory segments.
Common Buffer System DMA: In this technique, the I/O device and the CPU share a
common buffer in the memory, which is used for the DMA transfer. The I/O device transfers
data to or from the buffer, while the CPU transfers data to or from the buffer and the final
destination or source in the memory. This technique avoids the direct conflict between the
I/O device and the CPU for the memory access.
Q.23 Differentiate between the following: a) RISC vs CISC Processor b) Hardwired Control
Unit vs Micro-programmed Control Unit c) Vectored and Non- vectored Interrupts d) Write -
Through and Write Back policy of Cache
a) RISC vs CISC Processor
RISC stands for Reduced Instruction Set Computer, while CISC stands for Complex
Instruction Set Computer.
RISC processors use a smaller and simpler set of instructions that can be executed faster and
more efficiently, while CISC processors use a larger and more complex set of instructions
that can perform multiple operations in one instruction.
RISC processors have more general-purpose registers and simpler addressing modes, while
CISC processors have fewer registers and more complex addressing modes.
RISC processors are easier to design and manufacture, but require more instructions to
perform complex tasks, while CISC processors are more difficult to design and manufacture,
but require fewer instructions to perform complex tasks.
RISC processors are suitable for applications that require high performance and low power
consumption while CISC processors are suitable for applications that require compatibility
and flexibility.
b) Hardwired Control Unit vs Micro-programmed Control Unit
A hardwired control unit is a control unit that uses a fixed set of logic gates to generate the
control signals for each instruction, while a micro-programmed control unit is a control unit
that uses a microcode to generate the control signals for instruction.
A hardwired control unit is faster and simpler, but less flexible and more difficult to modify,
while a micro-programmed control unit is slower and more complex, but more flexible and
easier to modify.
A hardwired control unit is suitable for simple and fixed instruction sets, such as RISC
architectures, while a micro-programmed control unit is suitable for complex and variable
instruction sets, such as CISC architectures.
c) Vectored and Non- vectored Interrupts
A vectored interrupt is an interrupt that has a fixed vector address, which is the starting
address of the ISR, while a non-vectored interrupt is an interrupt that does not have a fixed
vector address.
A vectored interrupt is faster and simpler. it requires more memory and hardware, while a
non-vectored interrupt is slower and more complex. it requires less memory and hardware.
d) Write -Through and Write Back policy of Cache
A cache’s write policy is the behavior of a cache while performing a write operation, which
updates the data in the cache and the main memory.
A write-through policy is a write policy that writes the data to both the cache and the main
memory simultaneously, while a write-back policy is a write policy that writes the data to
only the cache and marks it as dirty, and writes it to the main memory later when the cache
block is evicted.
A write-through policy provides better consistency and reliability, but lower performance and
higher power consumption, while a write-back policy provides better performance and lower
power consumption, but lower consistency and reliability.