MODULE 4
MODULE 4
The maximum size of the Main Memory (MM) that can be used in any computer is
determined by its addressing scheme. For example, a 16-bit computer that generates 16-bit
16
addresses is capable of addressing upto 2 =64K memory locations. If a machine generates
32
32-bit addresses, it can access upto 2 = 4G memory locations. This number represents the
size of address space of the computer.
0 0 1 2 3
4 4 5 6 7
8 8 9 10 11
. …..
. …..
. …..
With the above structure a READ or WRITE may involve an entire memory word or
it may involve only a byte. In the case of byte read, other bytes can also be read but ignored
by the CPU. However, during a write cycle, the control circuitry of the MM must ensure that
only the specified byte is altered. In this case, the higher-order 30 bits can specify the word
and the lower-order 2 bits can specify the byte within the word.
It is a useful measure of the speed of the memory unit. It is the time that elapses
between the initiation of an operation and the completion of that operation (for example,
the time between READ and MFC).
Memory Cycle Time :-
It is an important measure of the memory system. It is the minimum time delay
required between the initiations of two successive memory operations (for example, the
time between two successive READ operations). The cycle time is usually slightly longer
than the access time.
RAM: A memory unit is called a Random Access Memory if any location can be
accessed for a READ or WRITE operation in some fixed amount of time that is
independent of the location‟s address. Main memory units are of this type. This
distinguishes them from serial or partly serial access storage devices such as magnetic
tapes and disks which are used as the secondary storage device.
Cache Memory:-
The CPU of a computer can usually process instructions and data faster than they can be
fetched from compatibly priced main memory unit. Thus the memory cycle time becomes
the bottleneck in the system. One way to reduce the memory access time is to use cache
memory. This is a small and fast memory that is inserted between the larger, slower main
memory and the CPU. This holds the currently active segments of a program and its data.
Because of the locality of address references, the CPU can, most of the time, find the
relevant information in the cache memory itself (cache hit) and infrequently needs access
to the main memory (cache miss) with suitable size of the cache memory, cache hit rates
of over 90% are possible leading to a cost-effective increase in the performance of the
system.
Memory Interleaving: -
This technique divides the memory system into a number of memory modules and
arranges addressing so that successive words in the address space are placed in different
modules. When requests for memory access involve consecutive addresses, the access
will be to different modules. Since parallel access to these modules is possible, the
average rate of fetching words from the Main Memory can be increased.
Virtual Memory: -
In a virtual memory System, the address generated by the CPU is referred to as a virtual
or logical address. The corresponding physical address can be different and the required
mapping is implemented by a special memory control unit, often called the memory
management unit. The mapping function itself may be changed during program execution
according to system requirements.
Because of the distinction made between the logical (virtual) address space and the
physical address space; while the former can be as large as the addressing capability of
the CPU, the actual physical memory can be much smaller. Only the active portion of the
virtual address space is mapped onto the physical memory and the rest of the virtual
address space is mapped onto the bulk storage device used. If the addressed information
is in the Main Memory (MM), it is accessed and execution proceeds. Otherwise, an
exception is generated, in response to which the memory management unit transfers a
contigious block of words containing the desired word from the bulk storage unit to the
MM, displacing some block that is currently inactive. If the memory is managed in such a
way that, such transfers are required relatively infrequency (ie the CPU will generally
find the required information in the MM), the virtual memory system can provide a
reasonably good performance and succeed in creating an illusion of a large memory with
a small, in expensive MM.
Memory cells are usually organized in the form of an array, in which each cell is
capable of storing on bit of information. Each row of cells constitutes a memory word,
and all cells of a row are connected to a common line referred to as the word line, which
is driven by the address decoder on the chip. The cells in each column are connected to a
Sense/Write circuit by two bit lines. The Sense/Write circuits are connected to the data
I/O lines of the chip. During the read operation, these circuits‟ sense, or read, the
information stored in the cells selected by a word line and transmit this information to the
output data lines. During the write operation, the Sense/Write circuits receive the input
information and store in the cells of the selected word.
The above figure is an example of a very small memory chip consisting of 16 words of 8
bits each. This is referred to as a 16×8 organization. The data input and the data output of
each Sense/Write circuit are connected to a single bidirectional data line that can be
connected to the data bus of a computer. Two control lines, R/W (Read/ Write) input
specifies the required operation, and the CS (Chip Select) input selects a given chip in a
multichip memory system.
The memory circuit given above stores 128 and requires 14 external connections for
address, data and control lines. Of course, it also needs two lines for power supply and
ground connections. Consider now a slightly larger memory circuit, one that has a 1k
(1024) memory cells. For a 1k×1 memory organization, the representation is given next.
The required 10-bit address is divided into two groups of 5 bits each to form the row and
column addresses for the cell array. A row address selects a row of 32 cells, all of which
are accessed in parallel. However, according to the column address, only one of these
cells is connected to the external data line by the output multiplexer and input de-
multiplexer.
5.2.2 Static Memories
Memories that consist of circuits capable of retaining their state as long as power is
applied are known as static memories.
Y
Bit lines
The above figure illustrates how a static RAM (SRAM) cell may be implemented. Two
inverters are cross- connected to form a latch. The latch is connected to two bit lines by
transistors T1 and T2. These transistors act as switches that can be opened or closed under
control of the word line. When the word line is at ground level, the transistors are turned
off and the latch retains its state. For example, let us assume that the cell is in state 1 if the
logic value at point X is 1 and at point Y is 0. This state is maintained as long as the signal
on the word line is at ground level.
Read Operation
In order to read the state of the SRAM cell, the word line is activated to close switches T 1
and T2. If the cell is in state 1, the signal on the bit line b is high and the signal on the bit
line b‟ is low. The opposite is true if the cell is in state 0. Thus b and b‟ are compliments
of each other. Sense/Write circuits at the end of the bit lines monitor the state of b and b‟
and set the output accordingly.
Write Operation
The state of the cell is set by placing the appropriate value on bit line b and its
complement b‟, and then activating the word line. This forces the cell into the
corresponding state. The required signals on the bit lines are generated by the Sense/Write
circuit.
CMOS Cell
A sense amplifier connected to the bit line detects whether the charge stored on the
capacitor is above the threshold. If so, it drives the bit line to a full voltage that represents
logic value 1. This voltage recharges the capacitor to full charge that corresponds to logic
value 1. If the sense amplifier detects that the charge on the capacitor will have no charge,
representing logic value 0.
A 16-megabit DRAM chip, configured as 2M×8, is shown below.
◾ Each row can store 512 bytes. 12 bits to select a row, and 9 bits to select a group
in a row. Total of 21 bits.
• First apply the row address; RAS signal latches the row address. Then apply the
column address, CAS signal latches the address.
In these DRAMs, operation is directly synchronized with a clock signal. The below
given figure indicates the structure of an SDRAM.
The above figure shows the timing diagram for a burst read of length 4.
First, the row address is latched under control of the RAS signal.
Then, the column address latched under control of the CAS signal.
After a delay of one clock cycle, the first set of data bits is placed on the
data lines.
The SDRAM automatically increments the column address to access next
three sets of the bits in the selected row, which are placed on the data lines
in the next clock cycles.
◾ Memory latency is the time it takes to transfer a word of data to or from memory
◾ Memory bandwidth is the number of bits or bytes that can be transferred in one
second.
To assist the processor in accessing data at high enough rate, the cell array is organized
in two banks. Each bank can be accessed separately. Consecutive words of a given block
are stored in different banks. Such interleaving of words allows simultaneous access to
two words that are transferred on the successive edges of the clock. This type of SDRAM
is called Double Data Rate SDRAM (DDR- SDRAM).
◾ Placing large memory systems directly on the motherboard will occupy a large
amount of space.
◾ Memory modules are an assembly of memory chips on a small board that plugs
vertically onto a single socket on the motherboard.
Recall that in a dynamic memory chip, to reduce the number of pins, multiplexed
addresses are used.
◾ Recall that in a dynamic memory chip, to reduce the number of pins, multiplexed
addresses are used.
Refresh Operation:-
The Refresh control block periodically generates Refresh, requests, causing the access
control block to start a memory cycle in the normal way. This block allows the refresh
operation by activating the Refresh Grant line. The access control block arbitrates
between Memory Access requests and Refresh requests, with priority to refresh requests
in the case of a tie to ensure the integrity of the stored data.
As soon as the Refresh control block receives the Refresh Grant signal, it activates the
Refresh line. This causes the address multiplexer to select the Refresh counter as the
source and its contents are thus loaded into the row address latches of all memory chips
when the RAS signal is activated.
Data are written into a ROM at the time of manufacture programmable ROM
(PROM) devices allow the data to be loaded by the user. Programmability is achieved by
connecting a fuse between the emitter and the bit line. Thus, prior to programming, the
memory contains all 1s. The user can inserts Os at the required locations by burning out
the fuses at these locations using high-current pulses. This process is irreversible.
ROMs are attractive when high production volumes are involved. For smaller numbers,
PROMs provide a faster and considerably less expensive approach. Chips which allow
the stored data to be erased and new data to be loaded. Such a chip is an erasable,
programmable ROM, usually called an EPROM. It provides considerable flexibility
during the development phase. An EPROM cell bears considerable resemblance to the
dynamic memory cell. As in the case of dynamic memory, information is stored in the
form of a charge on a capacitor. The main difference is that the capacitor in an EPROM
cell is very well insulated. Its rate of discharge is so low that it retains the stored
information for very long periods. To write information, allowing charge to be stored on
the capacitor.
The contents of EPROM cells can be erased by increasing the discharge rate of the
storage capacitor by several orders of magnitude. This can be accomplished by allowing
ultraviolet light into the chip through a window provided for that purpose, or by the
application of a high voltage similar to that used in a write operation. If ultraviolet light is
used, all cells in the chip are erased at the same time. When electrical erasure is used,
however, the process can be made selective. An electrically erasable EPROM, often
referred to as EEPROM. However, the circuit must now include high voltage generation.
Some EEPROM chips incorporate the circuitry for generating these voltages o the chip
itself. Depending on the requirements, suitable device can be selected.
Flash memory:
Read the contents of a single cell, but write the contents of an entire block
of cells.
Single flash chips are not sufficiently large, so larger memory modules are
implemented using flash cards and flash drives.
(REFER slides for point wise notes on RoM and types of ROM)
Static RAM: Very fast, but expensive, because a basic SRAM cell has a complex circuit
making it impossible to pack a large number of cells onto a single chip.
Dynamic RAM: Simpler basic cell circuit, hence are much less expensive, but
significantly slower than SRAMs.
Magnetic disks: Storage provided by DRAMs is higher than SRAMs, but is still less than
what is necessary. Secondary storage such as magnetic disks provides a large amount of
storage, but is much slower than DRAMs.
Fastest access is to the data held in processor registers. Registers are at the top of the
memory hierarchy. Relatively small amount of memory that can be implemented on the
processor chip. This is processor cache. Two levels of cache. Level 1 (L1) cache is on the
processor chip. Level 2 (L2) cache is in between main memory and processor. Next level
is main memory, implemented as SIMMs. Much larger, but much slower than cache
memory. Next level is magnetic disks. Huge amount of inexpensive storage. Speed of
memory access is critical, the idea is to bring instructions and data that will be used in the
near future as close to the processor as possible.
5.5 Cache memories
Processor is much faster than the main memory. As a result, the processor has to spend
much of its time waiting while instructions and data are being fetched from the main
memory. This serves as a major obstacle towards achieving good performance. Speed of
the main memory cannot be increased beyond a certain point. So we use Cache
memories. Cache memory is an architectural arrangement which makes the main memory
appear faster to the processor than it really is. Cache memory is based on the property of
computer programs known as “locality of reference”.
Analysis of programs indicates that many instructions in localized areas of a program are
executed repeatedly during some period of time, while the others are accessed relatively
less frequently. These instructions may be the ones in a loop, nested loop or few
procedures calling each other repeatedly. This is called “locality of reference”. Its types
are:
• Processor issues a Read request, a block of words is transferred from the main
memory to the cache, one word at a time.
• Subsequent references to the data in this block of words are found in the cache.
• At any given time, only some blocks in the main memory are held in the cache.
Which blocks in the main memory are in the cache is determined by a “mapping
function”.
• When the cache is full, and a block of words needs to be transferred from the main
memory, some block of words in the cache must be replaced. This is determined
by a “replacement algorithm”.
Cache hit:
Existence of a cache is transparent to the processor. The processor issues Read and
Write requests in the same manner. If the data is in the cache it is called a Read or Write
hit.
Write hit: Cache has a replica of the contents of the main memory. Contents of the cache
and the main memory may be updated simultaneously. This is the write-through protocol.
Update the contents of the cache, and mark it as updated by setting a bit known as the
dirty bit or modified bit. The contents of the main memory are updated when this
block is replaced. This is write-back or copy-back protocol.
Cache miss:
• If the data is not present in the cache, then a Read miss or Write miss occurs.
• Read miss: Block of words containing this requested word is transferred from the
memory. After the block is transferred, the desired word is forwarded to the
processor. The desired word may also be forwarded to the processor as soon as it
is transferred without waiting for the entire block to be transferred. This is called
load-through or early-restart.
• Write-miss: Write-through protocol is used, then the contents of the main memory
are updated directly. If write-back protocol is used, the block containing the
addressed word is first brought into the cache. The desired word is overwritten
with new information.
A bit called as “valid bit” is provided for each block. If the block contains valid data, then
the bit is set to 1, else it is 0. Valid bits are set to 0, when the power is just turned on.
When a block is loaded into the cache for the first time, the valid bit is set to 1. Data
transfers between main memory and disk occur directly bypassing the cache. When the
data on a disk changes, the main memory block is also updated. However, if the data is
also resident in the cache, then the valid bit is set to 0.
The copies of the data in the cache, and the main memory are different. This is called the
cache coherence problem
Mapping functions: Mapping functions determine how memory blocks are placed in the
cache.
1. Direct mapping
2. Associative mapping
3. Set-associative mapping.