Unit Iv Advanced Microprocessor Notes PDF
Unit Iv Advanced Microprocessor Notes PDF
The address unit is responsible for calculating the physical address of instructions and data
that the CPU wants to access. Also the address lines derived by this unit may be used to
address different peripherals.
The 80286 CPU contains almost the same set ofregisters, as in 8086, namely
Segment Registers: Four 16-bit special-purpose registers are used to select the segments of
memory that are immediately addressable for code, stack, and data.
Base and Index Registers: Four of the general-purpose registers can also be used to
determine offset addresses of operands in memory. Usually, these registers hold base
addresses or indexes to particular locations within a segment. Any specified addressing mode
determines the specific registers used for operand address calculations.
Status and Control Registers: The three 16-bit special-purpose registers are used to record
and control of the 80286 processor. The instruction pointer contains the offset address of the
next sequential instruction to be executed.
The flags D0, D2, D4, D6, D7 and D11 are modified according to the result of the execution
of logical and arithmetic instructions are called as status flag bits.
The bits D8 (Trap Flag) and D9 (Interrupt Flag) are used for controlling machine operation
and thus they are called control flags.
CF Carry Flag (bit D0) Set on high-order bit carry or borrow; cleared otherwise.
PF Parity Flag (bit D2) Set if low-order 8 bits of result contain an even number of 1 bit;
cleared otherwise.
AF Auxiliary Carry Flag (bit D4) Set on carry from or borrow to the lower order four bits of
AL; cleared otherwise.
SF Sign Flag (bit D7) Set equal to high-order bit of result (0 if positive, 1 if negative).
TF Trap Flag (bit D8) Once set, a single-step interrupt occurs after the next instruction
executes. TF is cleared by the single step interrupt.
IF Interrupt–enable Flag (bit D9) When set, maskable interrupts will cause the CPU to
transfer control to an interrupt vector specified location.
DF Direction Flag (bit D10) Causes string instructions to auto-decrement the appropriate
index registers when set. Clearing DF causes auto increment.
OF Overflow Flag (bit D11) Set if result is a too-large positive number or a too-small
negative number (excluding sign-bit) to fit in destination operand: cleared otherwise.
The machine status word consists of four flags – PE, MP, EM and TS of the four lower
order bits D19 to D16 of the upper word of the flag register.
The LMSW and SMSW instructions are available in the instruction set of 80286 to write
and read the MSW in real address mode.
The 80286 addresses only 1Mbytes of physical memory using A0- A19. The lines A20-A23
are not used by the internal circuit of 80286 in this mode. In real address mode, while
addressing the physical memory, the 80286 uses BHE along with A0- A19. The 20-bit
physical address is again formed in the same way as that in 8086.
The contents of segment registers are used as segment base addresses. The other registers,
depending upon the addressing mode, contain the offset addresses. Because of extra
pipelining and other circuit level improvements, in real address mode also, the 80286
operates at a much faster rate than 8086, although functionally they work in an identical
fashion. As in 8086, the physical memory is organized in terms of segments of 64Kbyte
maximum size.
An exception is generated, if the segment size limit is exceeded by the instruction or the data.
The overlapping of physical memory segments is allowed to minimize the memory
requirements for a task. The 80286 reserves two fixed areas of physical memory for system
initialization and interrupt vector table. In the real mode the first 1Kbyte of memory starting
from address 0000H to 003FFH is reserved for interrupt vector table. Also the addresses from
FFFF0H to FFFFFH are reserved for system initialization.
The program execution starts from FFFFH after reset and initialization. The interrupt vector
table of 80286 is organized in the same way as that of 8086. Some of the interrupt types are
reserved for exceptions, single-stepping and processor extension segment overrun, etc.
When the 80286 is reset, it always starts the execution in real address mode. In real address
mode, it performs the following functions: it initializes the IP and other registers of 80286, it
prepares for entering the protected virtual address mode.
The Segment of the program or data required for actual execution at that instant is fetched
from the secondary memory into physical memory. After the execution of this fetched
segment, the next segment required for further execution is again fetched from the secondary
memory, while the results of the executed segment are stored back into the secondary
memory for further references. This continues till the complete program is executed.
During the execution the partial results of the previously executed portions are again fetched
into the physical memory, if required for further execution. The procedure of fetching the
chosen program segments or data from the secondary storage into physical memory is called
swapping. The procedure of storing back the partial results or data back on the secondary
storage is called unswapping. The virtual memory is allotted per task.
The 80286 is able to address 1 G byte (230 bytes) of virtual memory per task. The complete
virtual memory is mapped on to the 16Mbyte physical memory. If a program larger than
16Mbyte is stored on the hard disk and is to be executed, if it is fetched in terms of data or
program segments of less than 16Mbyte in size into the program memory by swapping
sequentially as per sequence of execution.
Whenever the portion of a program is required for execution by the CPU, it is fetched from
the secondary memory and placed in the physical memory is called swapping in of the
program. A portion of the program or important partial results required for further execution,
may be saved back on secondary storage to make the PM free for further execution of another
required portion of the program is called swapping out of the executable program.
80286 uses the 16-bit content of a segment register as a selector to address a descriptor stored
in the physical memory. The descriptor is a block of contiguous memory locations containing
information of a segment, like segment base address, segment limit, segment type, privilege
level, segment availability in physical memory; descriptor type and segment use another task.
A23-A0: These are the physical address output lines used to address memory or I/O devices.
The address lines A23 - A16 are zero during I/O transfers
BHE: This output signal, as in 8086, indicates that there is a transfer on the higher byte of the
data bus (D15 – D8) .
S1 , S0: These are the active-low status output signals which indicate initiation of a buscycle
and with M/IO and COD/INTA, they define the type of the bus cycle.
M/ IO’: This output line differentiates memory operations from I/O operations. If this signal
is it “0” indicates that an I/O cycle or INTA cycle is in process and if it is “1” it indicates that
a memory or a HALT cycle is in progress.
COD/ INTA’: This output signal, in combination with M/ IO signal and S1, S0 distinguishes
different memory, I/O and INTA cycles.
LOCK: This active-low output pin is used to prevent the other masters from gaining the
control of the bus for the current and the following bus cycles. This pin is activated by a
"LOCK" instruction prefix, or automatically by hardware during XCHG, interrupt
acknowledge or descriptor table access.
READY: This active-low input pin is used to insert wait states in a bus cycle, for interfacing
low speed peripherals. This signal is neglected during HLDA cycle.
HOLD and HLDA: This pair of pins is used by external bus masters to request for the
control of the system bus (HOLD) and to check whether the main processor has granted the
control (HLDA) or not, in the same way as it was in 8086.
INTR: Through this active high input, an external device requests 80286 to suspend the
current instruction execution and serve the interrupt request. Its function is exactly similar to
that of INTR pin of 8086.
BUSY and ERROR: Processor extension BUSY and ERROR active-low input signals
indicate the operating conditions of a processor extension to 80286. The BUSY goes low,
indicating 80286 to suspend the execution and wait until the BUSY become inactive. In this
duration, the processor extension is busy with its allotted job. Once the job is completed the
processor extension drives the BUSY input high indicating 80286 to continue with the
program execution. An active ERROR signal causes the 80286 to perform the processor
extension interrupt while executing the WAIT and ESC instructions. The active ERROR
signal indicates to 80286 that the processor extension has committed a mistake and hence it is
reactivating the processor extension interrupt.
CAP: A 0.047 μf, 12V capacitor must be connected between this input pin and ground to
filter the output of the internal substrate bias generator. For correct operation of 80286 the
capacitor must be charged to its operating voltage. Till this capacitor charges to its full
capacity, the 80286 may be kept stuck to reset to avoid any spurious activity.
Vcc: This pin is used to apply +5V power supply voltage to the internal circuit of 80286.
RESET: The active-high reset input pulse width should be at least 16 clock cycles. The
80286 requires at least 38 clock cycles after the trailing edge of the RESET input signal,
before it makes the first opcode fetch cycle.
MEMORY HIERARCHY
o The memory hierarchy system consists of all storage devices employed in a computer
system.
o The goal of using a memory hierarchy is to obtain the highest possible average
access speed while minimizing the total cost of the entire memory system.
o Going down the hierarchy, the following occurs:
o Decreasing cost per bit
o Increasing capacity
o Increasing access time
o Decreasing frequency of access of the memory by the processor.
o In the memory hierarchy, the registers are at the top in terms of speed of access.
o At the next level of the hierarchy is a relatively small amount of memory that can be
implemented directly on the processor chip. This memory, called a cache, holds
copies of the instructions and data stored in a much larger memory that is provided
externally.
o The processor cache is of two or more levels:
L1(L1)cache
Level2(L2)cache
Level3(L3)cache
o A primary cache is always located on the processor chip. This cache is small and its
access time is comparable to that of processor registers. The primary cache is referred
to as the Level1 (L1) cache.
o A larger, and slower, secondary cache is placed between the primary cache and the
rest of the memory. It is referred to as the Level2 (L2) cache.
o Some computers have a level 3 (L3) cache of even larger size, in addition to the L1
and L2 caches. An L3 cache, also implemented in SRAM technology.
o The next level in the hierarchy is the main memory. The main memory is much larger
but slower than cache memories.
o At the bottom level in the memory hierarchy is the magnetic Disk and Tape devices.
They provide a very large amount of inexpensive memory.
CACHE MEMORY
o Cache memory, also called CPU memory, is high-speed static random access
memory.(SRAM).
o Cache memory is responsible for speedingup computer operations and processing.
o This memory is typically integrated directly into the
CPU chip or placed on a separate chip that has a separate bus interconnect with the
CPU.
o The purpose of cache memory is to store program instructions and data that are
used repeatedly in the operation of programs or information that the CPU is likely to
need next.
o The CPU can access this information quickly from the cache rather than having to get
it from computer's main memory.
o Fast access to these instructions increases the overall speed of the program.
o A cache memory system includes a small amount of fast memory and a large amount
of slow memory(DRAM). This system is configured to simulate a large amount of
fast memory.
o The cache memory system consists of the following units:
Cache - consists of static RAM(SRAM)
Main Memory –consists of dynamic RAM(DRAM)
Cache Controller – implements the cache logic. This controller decides which
block of memory should be moved in or out of the cache.
CACHE LEVELS
o The processor cache is of two or more levels :
Level 1 (L1) cache
Level 2 (L2) cache
Level 3 (L3) cache
o A primary cache is always located on the processor chip. This cache is small and its
access time is comparable to that of processor registers. The primary cache is referred
to as the Level 1 (L1) cache.
o A larger, and slower, secondary cache is placed between the primary cache and the
rest of the memory. It is referred to as the Level 2 (L2) cache.
o Some computers have a Level 3 (L3) cache of even larger size, in additionto the L1
and L2 caches.
TYPES OF CACHE
Two types of cache exists. They are
Unified cache and Split cache
Unified cache : Data and instructions are stored together(Von Neumann Architecture)
Split cache: Data and instructions are stored separately(Harvard architecture)
LOCALITY OF REFERENCE
o Cache memory is based on the property known as “locality of reference”.
o Locality of reference, also known as the principle of locality, is the tendency of
a processor to access the same set of memory locations repetitively over a short
period of time.
o The locality of reference has been implemented in two ways :
Temporal Locality
Temporal Locality means that a recently executed instruction is likely
to be executed again very soon.
Spatial Locality
Spatial Locality means that instructions in close proximity to a recently
executed instruction are also likely to be executed soon.
CACHE PERFORMANCE
o If a process needs some data, it first searches in the cache memory.
o If the data is available in the cache, this is termed as a cache hit and thedata is
accessed as required.
o If the data is not in the cache, then it is termed as a cache miss.
o Then the data is obtained from the main memory.
HIT RATIO
o Average memory access time (AMAT) is the average time to access memory
considering both hits and misses and the frequency of different accesses.
AMAT = Hit time + (Miss Rate x Miss Penalty)
where
Hit Time - Time to hit in the cache.
Miss Penalty - Cost of a cache miss in terms of timeMiss Rate -
Frequency of cache misses
Problem 1:
Consider the following details - 1 cycle hit cost , 10 cycle miss penalty (11 cycles total
for a miss) and the Program has 10% miss rate .Calculate the AMAT.
Solution :
AMAT = Hit time + (Miss Rate x Miss Penalty) = 1.0 + (0.1 x 10) = 2.0
Problem 2:
If a direct mapped cache has a hit rate of 95%, a hit time of 4 ns, and a misspenalty
of 100 ns, what is the AMAT?
Solution :
AMAT = Hit time + (Miss Rate x Miss Penalty) = 4 + (0.05 x 100) = 9 ns
Problem 3:
If replacing the cache with a 2-way set associative increases the hit rate to 97%, but
increases the hit time to 5 ns, what is the new AMAT?
Solution :
AMAT = Hit time + (Miss Rate x Miss Penalty) = 5 + (0.03 x 100) = 8 ns
Problem 4 :
Suppose that in 1000 memory references there are 40 misses in L1 cache and 10 misses
in L2 cache. If the miss penalty of L2 is 200 clock cycles, hit time ofL1 is 1 clock cycle,
and hit time of L2 is 15 clock cycles. What will be the average memory access time?
Solution :
L1 hit ratio = (1000 – 40) / 1000 = 0.96
L2 hit ratio = (1000 – 10) / 1000 = 0.99
Average memory access time = 0.96 x 1 + 0.04 [0.99 x 15 + 0.01 x 200]
= 1.634
Problem 5 :
Problem 6 :
The application program in a computer system with cache uses 1400 instruction
acquisition bus cycle from cache memory and 100 from main memory. What is the hit
rate ? If the cache memory operates with zero wait state and the main memory bus cycles
use three wait states, what is the average number of wait states experienced during the
program execution ?
Solution :
MEASURING AND IMPROVING CACHE PERFORMANCE
CPU time can be divided into the clock cycles that the CPU spends executing the
program and the clock cycles that the CPU spends waiting for the memory system.
Memory-Stall Cycles
Memory-stall clock cycles can be defined as the sum of the stall cycles comingfrom
reads plus those coming from writes.
Read-Stall Cycles
The read-stall cycles can be defined in terms of the number of read accesses per program,
the miss penalty in clock cycles for a read, and the read miss rate.
Write-Stall Cycles
For a write-through scheme, we have two sources of stalls: write misses, which usually
require that we fetch the block before continuing the write and write buffer stalls, which,
occur when the write buffer is full when a write occurs. Thus, the cycles stalled for
writes equals the sum of these two.
Read and Write stall cycles can be combined by using single miss rate and miss penalty
(the write and read miss penalties are the same, i.e. time to fetch a block from Main
Memory).
It can also be written as :
Problem 1:
Assume the miss rate of an instruction cache is 2% and the miss rate of the data cache is
4%. If a processor has a CPI of 2 without any memory stalls and the miss penalty is 40
cycles for all misses, determine how much faster a processor would run with a perfect
cache that never missed. Assume the frequency of all loads and stores is 36%.
Solution :
Problem 2 :
Suppose that clock rate of the machine used in the previous example is doubled but the
memory speed, cache misses, and miss rate are same. How much faster the machine be
with the faster clock?
Solution:
Problem 3 :
Suppose we have a 500 MHz processor with a base CPI of 1.0 with no cache misses.
Assume memory access time is 200 ns and average cache miss rate is 5%. Compare
performance after adding a second level cache, with access time 20 ns, that reduces miss
rate to main memory to 2%.
Solution:
CACHE OPERATIONS
Notes:
Main memory is divided into equal size partitions called as blocks or frames.
Cache memory is divided into partitions having same size as that of blocks called
as lines.
During cache mapping, block of main memory is simply copied to the cache and the
block is not actually brought from the main memory.
or
The direct mapping expression
j = i mod n
For example,
Let us consider that particular cache memory is divided into a total of ‘n’ number of
lines.
Then, the block ‘j’ of the main memory would be able to map to line number of the
cache (j mod n).
(2)Associative Mapping:-
In fully associative mapping,
A block of main memory can map to any line of the cache that is freely available at
that moment.
This makes fully associative mapping more flexible than direct mapping.
All the lines of cache are freely available.
Thus, any block of main memory can map to any line of the cache.
Had all the cache lines been occupied, then one of the existing blocks will have to be
replaced.
Cache lines are grouped into sets where each set contains k number of lines.
A particular block of main memory can map to only one particular set of the cache.
However, within that set, the memory block can map any cache line that is freely
available.
The set of the cache to which a particular block of the main memory can map
is given by-
Here,
Problem 1 :
Problem 2 :
Consider a 64-byte cache with 8 byte blocks, an associativity of 2 and LRU block replacement.
Virtual addresses are 16 bits. The cache is physically tagged. The processor has 16KB of
physical memory. What is the total number of tag bits?
Solution :
The cache is 64-bytes with 8-byte blocks, so there are 8 blocks.
The associativity is 2, so there are 4 sets.
Since there are 16KB of physical memory, a physical address is 14 bits long.
Of these, 3 bits are taken for the offset (8-byte blocks), and 2 for the index (4sets). That
leaves 9 tag bits per block.
Since there are 8 blocks, that makes 72 tag bits.
Problem 3 :
Problem 4 :
Problem 5:
A block-set associative cache memory consists of 128 blocks divided into four blocksets. The
main memory consists of 16,384 blocks and each block contains 256 eight bit words.
1. How many bits are required for addressing the main memory?
2. How many bits are needed to represent the TAG, SET and WORD fields?
Solution
Given-
Number of blocks in cache memory = 128 Number of blocks in each set of cache = 4
Main memory size = 16384 blocks
Solution
There are 8 blocks in cache memory numbered from 0 to 7. In direct mapping, a particular
Hit ratio = 3 / 20
Miss ratio = 17 / 20
Problem 7:
Consider a fully associative cache with 8 cache blocks (0-7). The memory blockrequests are in
the order-
Solution
There are 8 blocks in cache memory numbered from 0 to 7. In fully
associative mapping, any block of main memory can be mapped to
any line of the cache that is freely available.
If all the cache lines are already occupied, then a block is replaced in
accordance with the replacement policy.
Thus,
Line-5 contains the block-7. Hit ratio = 5 / 17
Miss ratio = 12 / 17
VIRTUAL MEMORY
o Virtual memory is an architectural solution to increase the effective size of the
memory system.
o Virtual memory is a memory management technique that allows the execution of
processes that are not completely in memory.
o In some cases during the execution of the program the entire program may not be
needed.
o Virtual memory allows files and memory to be shared by two or more processes
through page sharing.
o The techniques that automatically move program and data between main memory and
secondary storage when they are required for execution is called virtual-memory
techniques.
ADVANTAGES
o One major advantage of this scheme is that programs can be larger thanphysical
memory
o Virtual memory also allows processes to share files easily and to implementshared
memory.
o Increase in processor utilization and throughput.
o Less I/O would be needed to load or swap user programs into memory
Segment Translation
Segment Translation is a process of
converting a logical address(virtualaddress)
into a linear address.
A logical address consists of a Selector
and an Offset.
A Selector is the contents of a segment
register. A selector is used to point a
descriptor for the segment in a
table of descriptors.
Every selector has a linear base address associated
with it, and it is stored in the segment descriptor.
The linear base address is then added to the offset
to generate the Linear Address.
If paging is not enabled, then the linear address corresponds to the Physical Address.
But if paging is enabled, then page translation should be done which translates the
linear address into physical address.
Page Translation
Page Translation is the process of converting a
linear address into a physicaladdress.
When paging is enabled ,the linear address is
broken into a virtual page number and a page
offset.
Page Table -The page table contains the
information about the main memory address
where the page is stored & the current status of the
page.
Page Frame -An area in the main memory that holds one page.
Page Table Base Register -It contains the starting address of the page table,if that
page currently resides in memory.
Control Bits in Page Table -The Control bits specifies the status of the page while it is
in main memory. There are two control bits.
They are
(i) Valid Bit – The valid bit indicates the validity of the page. If the bit is 0 , the
page is not present in main memory and a page fault occurs. If the bit is 1, the
page is in memory and the entry contains the physical page number.
(ii) Dirty Bit - The dirty bit is set when any word in a page is written.
EXAMPLE:
Advantages:
o The FIFO page-replacement algorithm is easy to understand andprogram
Disadvantages:
o The Performance is not always good.
2. OPTIMAL PAGE REPLACEMENT
o Optimal page replacement algorithm Replace the page that will not beused for
the longest period of time
EXAMPLE:
Advantage:
o Optimal replacement is much better than a FIFO algorithm
Disadvantage:
o The optimal page-replacement algorithm is difficult to implement, because it
requires future knowledge of the reference string.
EXAMPLE:
Advantage:
o The LRU policy is often used as a page-replacement algorithm and is
considered to be good.
o LRU replacement does not suffer from Belady’s anomaly.
Disadvantage:
o The problem is to determine an order for the frames defined by the timeof last
use.
Problem 1 :
Problem 2:
Problem 3 :
A computer system has a 36-bit virtual address space with a page size of 8K, and 4bytes per
page table entry.
PAGING:
Paging is the memory management technique in which secondary memory is divided into
fixed-size blocks called pages, and main memory is divided into fixed-size blocks called
frames. The Frame has the same size as that of a Page. The processes are initially in
secondary memory, from where the processes are shifted to main memory(RAM) when there
is a requirement. Each process is mainly divided into parts where the size of each part is the
same as the page size. One page of a process is mainly stored in one of the memory frames.
Paging follows no contiguous memory allocation. That means pages in the main memory can
be stored at different locations in the memory.
Advantages of paging:
SEGMENTATION:
Segmentation is another memory management technique used by operating systems. The
process is divided into segments of different sizes and then put in the main memory. The
program/process is divided into modules, unlike paging, in which the process was divided
into fixed-size pages or frames. The corresponding segments are loaded into the main
memory when the process is executed. Segments contain the program’s utility functions,
main function, subroutines, stack, array and so on.
Compaction:
As in the fig we have some used memory(black color) and some unused
memory(white color).The used memory is combined. All the empty spaces are
combined together. This process is called compaction.
This is done to prevent to solve the problem of fragmentation, but it requires too much
of CPU time.
By compacting memory, the operating system can reduce or eliminate fragmentation
and make it easier for programs to allocate and use memory.
1. Copying all pages that are not in use to one large contiguous area.
2. Then write the pages that are in use into the newly freed space.
PIPELINING
Instruction Fetch - The CPU reads instructions from the address in the
memorywhose value is present in the program counter.
Instruction Decode - Instruction is decoded and the register file is accessed to
get thevalues from the registers used in the instruction.
Execute - ALU operations are performed.
Memory Access - Memory operands are read and written from/to the memory
that ispresent in the instruction.
Write Back – Computed value is written back to the register
1. Structural Hazard – The situation when two instructions require the use of a
given hardware resource at the same time.
2. Data Hazard – Any condition in which either the source or the destination
operands of an instruction are not available at the time expected in the
pipeline.So some operation has to be delayed, and the pipeline stalls.
STRUCTURAL HAZARD
A structural hazard occurs when two or more instructions that are already in
pipelineneed the same resource.
These hazards are because of conflicts due to insufficient resources.
The result is that the instructions must be executed in series rather than parallel
for aportion of pipeline.
Structural hazards are sometime referred to as resource hazards.
Example:
A situation in which multiple instructions are ready to enter the execute
instruction phase and there is a single ALU (Arithmetic Logic Unit).
One solution to such resource hazard is to increase available resources,
such ashaving multiple ALU.
DATA HAZARD
A data hazard occurs when there is a conflict in the access of an operand location. There
are three types of data hazards. They are
Read After Write (RAW) or True Dependency:
An instruction modifies a register or memory location and a succeeding
instructionreads the data in that memory or register location.
A RAW hazard occurs if the read takes place before the write operation is
complete.
Example
I1 : R2 ← R5 + R3I2 : R4 ← R2 + R3
I2 : R5 ← R1 + R2
I2 : R2 ← R1 + R3
There are two techniques using which we can handle data hazards.
They are
(1) Using Operand Forwarding (2) Using Software
1) MULTIPLE STREAMS
o The approach is to replicate the initial portions of the pipeline and allow the
pipeline to fetch both instructions, making use of multiple streams.
o There are two problems with this approach:
1. Contention delays for access to the registers and to memory.
2. Additional branch instructions may enter the pipeline before the original
branch decision is resolved.
2) PREFETCH BRANCH TARGET
[
Features of 80286
The Intel 80286 was introduced in early 1982. It is an x86 16-bit microprocessor
with 134,000 transistors. It was the first Intel processor that could run all the software
written for its predecessor.
The 80286’s performance is more than twice that of its predecessors, i.e., Intel 8086
and Intel 8088 per clock cycle.
The 80286 processors have a 24-bit address bus. Therefore, it is able to address up
to 16 MB of RAM.
80286 processor is the first x86 processor, which can be used to operate in protected
mode. The protected mode enabled up to 16 MB of memory to be addressed by the
on-chip linear memory management unit (MMU) with 1 GB logical address space.
The 80286 with 8 MHz clock provides up to 6 times higher than the 5 MHz 8086.
The 80286 operates in two different modes such as real mode and protected mode.
The real mode is used for compatibility with existing 8086/8088 software base, and
the protected mode is used for enhanced system level features such as memory
management, multitasking, and protection.
The 80286 introduced several new instructions, including support for signed and
unsigned multiplication and division, as well as a new set of instructions for handling
interrupts.
Features of 80386
The 80386 is a 32-bit microprocessor that can support 8-bit, 16-bit and 32-bit
operands. It has 32-bits registers, 32-bits internal and external data bus, and 32-bit
address bus.
Due to its 32-bit address bus, the 80386 can address up to 4GB of physical
memory. The physical memory of this processor is organized in terms of segments
of 4 GB size at maximum.
The 80386 CPU is able to support 16k number of segments and the total virtual
memory space is 4 giga bytes x 16k = 64 TBytes.
It operates in real, protected and virtual real mode. The protected mode of 80386 is
fully compatible with 80286.
The 80386 can run 8086 applications under a protected mode in its virtual 8086
mode of operation, which allows multiple programs to run simultaneously in
protected mode, while still allowing legacy programs to run in real mode.
The 80386-instruction set is upward compatible with all its predecessors. The 80386
introduced several new instructions, including support for SIMD (Single Instruction
Multiple Data) instructions, which allow multiple operations to be performed
simultaneously on a single piece of data.
The 80386 introduced on-chip cache memory, which allowed frequently accessed
data to be stored on the processor itself, reducing the time it takes to access that data.
The 80386 ran at clock speeds up to 33 MHz, which was significantly faster than
the 80286's maximum speed of 12.5 MHz.
Features of 80486
It has complete 32-bit architecture which can support 8-bit, 16-bit and 32-bit data
types.
8 KB unified level 1 cache for code and data has been added to the CPU. In advanced
versions of the 80486 processor, the size of level 1 cache has been increased to 16
KB.
The 80486 is packaged in a 168-pin grid array package. The 25 MHz, 33 MHz, 50
MHz and 100 MHz ( DX-4) versions of 80486 are available in the market.
This processor retains all complex instruction sets of 80386, and more pipelining has
been introduced to improve performance in speed.
For fast execution of complex instructions, the 80486 has a five-stage pipeline. Two
out of the five stages are used for decoding the complex instructions and the other
three stages are used for execution.
The 80486 included power management features that allowed it to consume less
power when idle, which was an important consideration for portable computers.
Features of Pentium IV
While longer pipelines are less efficient than shorter ones, they allow CPU core to
reach higher frequencies, and thus increase CPU performance.
To improve efficiency of very deep pipeline the Pentium 4 processors included new
features: Trace Execution Cache, Enhanced Branch prediction, and Quad Data
Rate bus.
The instruction set of the Pentium 4 processor is compatible with x86 (i386), x 86-
64, MMX, SSE, SSE2, and SSE3 instructions. These instructions include 128-bit
SIMD integer arithmetic and 128-bit SIMD double-precision floating-point
operations.
Another features of Pentium 4 Processor is that it supports faster system bus at 400
MHz to 1066 MHz with 3.2 GB/s of bandwidth.
The Pentium 4 processor has two arithmetic logic units (ALUs) which are operated at
twice the core processor frequency.
A core, or CPU core, is the "brain" of a CPU. It receives instructions, and performs
calculations, or operations, to satisfy those instructions. A CPU can have multiple cores.
A processor with two cores is called a dual-core processor; with four cores, a quad-core; six
cores, hexa-core; eight cores, octa-core.
Each CPU core can perform operations separately from the others. Multiple cores may also
work together to perform parallel operations on a shared set of data in the CPU's
memory cache.
What is a multicore processor?
A multicore processor is an integrated circuit that has two or more processor cores attached
for enhanced performance and reduced power consumption. These processors also enable
more efficient simultaneous processing of multiple tasks, such as with parallel
processing and multithreading. A dual core setup is similar to having multiple, separate
processors installed on a computer. However, because the two processors are plugged into
the same socket, the connection between them is faster.
The use of multicore processors or microprocessors is to boost processor performance
without exceeding the practical limitations of semiconductor design and fabrication. Using
multicores also ensure safe operation in areas such as heat generation.
A multi-core processor's design enables the communication between all available cores, and
they divide and assign all processing duties appropriately. The processed data from each
core is transmitted back to the computer's main board (Motherboard) via a single common
gateway once all of the processing operations have been finished. This method beats a
single-core CPU in terms of total performance.
Clock speed. One approach was to make the processor's clock faster.
Hyper-threading. Another approach involved the handling of multiple instruction
threads called as hyper-threading. With hyper-threading, processor cores are designed to
handle two separate instruction threads at the same time.
More chips. The next step was to add processor chips -- or dies -- to the processor
package, which is the physical device that plugs into the motherboard. A dual-core
processor includes two separate processor cores. A quad-core processor includes four
separate cores. Today's multicore processors can easily include 12, 24 or even more
processor cores.
Analytics and HPC. Big data analytics, such as machine learning, and high-
performance computing (HPC) both require breaking large, complex tasks into smaller
and more manageable pieces.
Cloud. Organizations building a cloud will almost certainly adopt multicore processors
to support all the virtualization needed to accommodate the highly scalable and highly
transactional demands of cloud software platforms such as OpenStack.
Multicore disadvantages
Software dependent.
Performance boosts are limited.
Power, heat and clock restrictions.
INTRODUCTION TO POWER PC
PowerPC – Performance Optimization With Enabled RISC – Performance Computing.
The PowerPC machine architecture is organized into several layers, each with its own set of
functions and responsibilities. These layers include:
1. Processor hardware: This layer includes the microprocessor chip and its associated
components, such as the cache, memory management unit, and bus interface.
2. Operating system interface: This layer provides the interface between the processor
hardware and the operating system software. It includes system calls, interrupt
handling routines, and other low-level functions that the operating system needs to
interact with the processor hardware.
3. Operating system kernel: This layer provides the core functions of the operating
system, such as memory management, process scheduling, and device driver support.
4. Application programming interface (API): This layer provides a set of functions
and libraries that developers can use to write applications for the PowerPC
architecture. The API includes standard C and C++ libraries, as well as platform-
specific libraries that provide access to hardware features such as graphics and sound.
5. Applications: This layer includes the user-facing applications that run on top of the
operating system. These can range from simple command-line utilities to complex
graphical applications such as web browsers and video games.
Memory:
Memory consists of 8-bit bytes. PowerPC programs can be written using a Virtual
Address Space (264 bytes). Address space are divided into fixed-length segments
which are further divided into pages.
Registers:
There are 32 general-purpose registers (GPR) from GPR0 to GPR31. Length of each
register is 64-bit. The general purpose register are used to store and manipulate data
and addresses. As PowerPC machine support floating point data format so it have
Floating-point unit (FPU) for computation.
Register Operations
Link Register (LR) Contain address to return at the end of the function call
Condition Register (CR) Signify the result of an instruction
Count Register (CTR) For loop count
Data Formats:
Integers are stored as 8-, 16-, 32-, or 64-bit Binary numbers.
Characters are represented using 8-bit ASCII codes.
Floating points are represented using two different floating-point formats
namely single-precision format and double-precision format.
Instruction Formats:
PowerPC support seven basic instruction formats. All of these instruction formats
are 32-bits long. PowerPC architecture instruction format have more variety and
complexity as compared to other RISC systems such as SPARC. Bit numbering for
PowerPC is the opposite of most other definitions:
bit 0 is the most significant bit, and
bit 31 is the least significant bit
Instructions are first decoded by the upper 6 bits in a field, called the primary
opcode. The remaining 26 bits contain fields for operand specifiers, immediate
operands, and extended opcodes, and these may be reserved bits or fields.
Addressing Mode:
Load and store operations use one of the following three addressing mode
depending upon the operand value:
Mode Target address(TA) calculation
Register indirect TA=(register)
Register indirect with index TA=(register-1) + (register-2)
Register indirect with immediate index TA=(register) + displacement
Instruction Set:
PowerPC architecture is more complex than the other RISC systems. Thus
PowerPC architecture has approximately 200 machine instructions. This
architecture follows the pipeline execution of instructions which means while one
instruction is executed next one is being fetched from memory and decoded.
PowerPC architecture follows two different methods for performing I/O operations.
In one approach Virtual address space is used while in the other approach I/O is
performed using Virtual memory management.
Features of Power PC601
The PowerPC 601 is a microprocessor chip that was designed by IBM and Motorola in the
early 1990s. Here are some of its features:
1. Architecture: The PowerPC 601 is based on the RISC (Reduced Instruction Set
Computing) architecture, which allows for efficient and fast processing of
instructions.
2. Clock speed: The PowerPC 601 had a clock speed of 50 MHz, which was
considered fast for its time.
3. Pipeline depth: The PowerPC 601 had a pipeline depth of 5 stages, which allowed it
to process instructions quickly.
4. Cache memory: The PowerPC 601 had a 32 KB level 1 (L1) cache and a 256 KB
level 2 (L2) cache, which helped improve its performance.
5. Bus interface: The PowerPC 601 had a 64-bit bus interface, which allowed for fast
data transfer between the processor and memory.
6. Instruction set: The PowerPC 601 supported the PowerPC instruction set, which
included both 32-bit and 64-bit instructions.
7. Floating-point performance: The PowerPC 601 had a high-performance floating-
point unit, which allowed it to perform complex mathematical calculations quickly.
Overall, the PowerPC 601 was a powerful microprocessor for its time, and it was used in a
variety of applications, including Apple's Power Macintosh computers.
The AMD Athlon processor family features the industry's first seventh-generation x86
microarchitecture, which is designed to support the growing processor and system
bandwidth requirements of emerging software, graphics, I/O, and memory technologies.
The AMD Athlon processors are implemented in AMD’s advanced 0.18-micron process
technology to achieve maximum performance and scalability.
1. High Clock Speeds: AMD Athlon processors offer high clock speeds, which means
they can execute instructions quickly.
2. Multiple Cores: Many AMD Athlon processors have multiple cores, which allows
for better multitasking and overall performance.
3. Cache Memory: The AMD Athlon processors come with varying levels of cache
memory, which is used to temporarily store frequently accessed data for quick
access.
4. HyperTransport Technology: AMD Athlon processors use HyperTransport
technology, which allows for high-speed communication between the processor and
other components in the computer.
5. 64-bit Architecture: The AMD Athlon processors are designed with 64-bit
architecture, which allows for more memory to be used and better performance in
certain applications.
6. AMD Virtualization Technology: Some AMD Athlon processors feature AMD
Virtualization technology, which allows for better performance in virtualized
environments.
7. Overclocking: Many AMD Athlon processors can be overclocked, which means
running the processor at higher speeds than it was designed for, to achieve even
better performance.
SPARC was one of the most successful early commercial RISC systems, and its success led
to the introduction of similar RISC designs from many vendors through the 1980s and
1990s. The first implementation of the original 32-bit architecture (SPARC V7) was used in
Sun's Sun-4 computer workstation and server systems, replacing their earlier Sun-3 systems
based on the Motorola 68000 series of processors.
SPARC has become a widely used architecture for hardware used with UNIX-based
operating systems, including Sun's own Solaris systems.
• The SPARC structure is scalable and adaptable, both in terms of cost and capacity.
• SPARC incorporates object-oriented programming (OOP) features.
• It is versatile, with numerous possibilities for commercial, aerospace, military, and
technical applications.
• SPARC is highly scalable and open source.
SuperSPARC PROCESSOR
The chip, with an optional second-level cache controller, is targeted at a wide range of
systems from uniprocessor desktop machines to multiprocessing file and compute servers.
Applications of SuperSPARC processor are as follows:
1. Workstations: SuperSPARC was primarily used in Sun's high-end workstations,
such as the SPARCstation 20 and SPARCstation 10. These workstations were
popular in engineering and scientific fields where high-performance computing was
necessary.
2. Servers: SuperSPARC was also used in Sun's servers, such as the Sun Enterprise
3000 and Sun Enterprise 4000. These servers were used in businesses and
organizations that required high levels of computing power and reliability.
3. Database management: SuperSPARC was particularly well-suited for database
management applications. Its high performance and ability to handle large amounts
of data made it a popular choice for businesses that relied heavily on databases.
4. Scientific computing: SuperSPARC was also used in scientific computing
applications, such as weather forecasting and climate modeling. Its ability to handle
complex calculations made it well-suited for these types of applications.
5. Graphics and visualization: SuperSPARC was used in graphics and visualization
applications, such as computer-aided design (CAD) and 3D modeling. Its high
performance and ability to handle large data sets made it ideal for these applications
Overall, SuperSPARC was a versatile microprocessor chip that was used in a variety of
applications that required high-performance computing, reliability, and scalability.