0% found this document useful (0 votes)
55 views

Course Notes

The document provides a history of computation from the mechanical age to the integrated circuits age. It discusses the development of microprocessors from the Intel 4004 in 1971 to modern microprocessors from Intel, AMD, and Motorola. Key aspects of microprocessor-based computer systems are described, including memory organization, buses, I/O systems, and the role of the microprocessor. Details are given on memory addressing modes, microprocessor architecture including registers, and the evolution from real to protected memory addressing modes.

Uploaded by

zihad bin islam
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
55 views

Course Notes

The document provides a history of computation from the mechanical age to the integrated circuits age. It discusses the development of microprocessors from the Intel 4004 in 1971 to modern microprocessors from Intel, AMD, and Motorola. Key aspects of microprocessor-based computer systems are described, including memory organization, buses, I/O systems, and the role of the microprocessor. Details are given on memory addressing modes, microprocessor architecture including registers, and the evolution from real to protected memory addressing modes.

Uploaded by

zihad bin islam
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 147

Microprocessor Systems

97.461

Maitham Shams

Course Slide Presentations

Department of Electronics
Carleton University
History of Computation
• Mechanical Age: B.C. to 1800s
– 500 B.C. Babylonians invented abacus, first mechanical
calculator
– 1642 Blaise Pascal invented calculator using wheels and
gears
– 1823 Charles Babbage created Analytical Engine capable of
storing data using punch cards

• Electrical Age: 1800s to 1970s


– Triggered by advent of electric motor (conceived by
Faraday)
– Motor driven adding machines based on Pascal’s idea
– 1896 Hollerith formed Tabulating Machine Company
(Today’s IBM)
– 1946 ENIAC (Electronics Numerical Integrator and
Calculator First general purpose programmable electronic
machine Used 17000 vacuum tubes, 500 miles of wires,
weighed 30 tons. Performed 100K operations/second,
programmed by rewiring)

• Integrated Circuits Age: 1960s to present


– Triggered by development of transistor at Bell Labs, 1948
– 1958 IC technology invented by Jack Kibly of Texas
Instruments
– 1971 World’s first microprocessor, Intel 4004, 4-bit bus 4K
4-bit(nibble) memory, 50 KIPs, 2300 transistors, 10 μm
technology
– 1972 first 8-bit μP, Intel 8008, 16K bytes, 50 KIPs
– 1973 Intel 808, 64K bytes, 500 KIPS, 6000 transistors,
6 μm followed by other 8-bit μPs lke Motorola
MC6800 (1974) and Z-8
– 1978 Intel 8086, 16-bit μP, 1M bytes, 2.5 MIPs
Used 4-bytes instruction cache to speed up execution
time
Base for 80286 μP, also 16-bit with 16M bytes
– 1986 Intel 80386, 32-bit μP, 32-bit data and address
busses
4G bytes, 16 to 33 MHz, 275000 transistors, 1 μm
– 1989 Intel 80486, like 80386 with numeric co-
processor. 4G bytes + 8Kb cache, 25 to 50 MHz, 1.2M
transistors, 1 and 0.8 μm
– Advancement continues with Intel, AMD, Motorola,
and other μPs
Reasons Behind μP Technology
• Speed
– Graphics, Numerical Analysis, CAD, and Signal Processing
applications
• Convenience
– Large memory, smaller size, and lower weight
• Power Dissipation
– Portable computers and wireless services
• Reliability
– Noise tolerance in adverse environments and
temperatures
• Cost
– Get more done for the money
μP BASED Computer Systems
Buses

Memory
Microprocessor I/O System
Systems

Dynamic RAM (DRAM) 8086 Printer


Static RAM (SRAM) 8088 Hard disk drive
Cache 80186 Mouse
Read-Only (ROM) 80286 CD-ROM Drive
Flash Memory 80386 Keyboard
EEPROM 80486 Monitor
Pentium Scanner
Pentium Pro
Pentium II
Memory
• Transient Program Area Extended 15M bytes in the 80286
31M bytes in the
Memory 80386SL/SLC
(TPA) 640Kb 63M bytes in the
80386EX

• System Area 4095M bytes in the


80386DX, 80486, and
Pentium
64G bytes in the Pentium
384 Kb Pro and Pentium II

• Extended Memory System System Area


384K bytes
(XMS) over 4MB 1M bytes of real
(conventional) memory
TPA
640K bytes
• Transient Program Area (TPA)
9FFFF
MSDOS Program
9FFF0
Free TPA

08E30
COMMAND.COM
08490
Device Drivers such as MOUSE.SYS
02530
MSDOS Programs
01160
IO.SYS Program
00700
DOS communications area
00500
BIOS communications area
00400
Interrupt Vectors
00000
• Programs that control computer system (Operating
Systems)
• Also contains data, drivers, and application programs
• Consists of RAM, ROM, EEPROM, and Flash Memory
• DOS controls memory organization and some I/O
devices
• Interrupt Vectors contain addresses of interrupt service
procedures
• BIOS (Basic I/O system) area controls I/O devices
• IO program allows use of keyboard, video display,
printer, etc.
• Command program controls operation of computer
through keyboard
• System Area
FFFFF
MSDOS Program
F0000
BASIC language ROM (earlier PCs)
E0000

Free Area

Hard disk controller ROM


LAN controller ROM
C8000
C0000 Video BIOS ROM

B0000 Video RAM (Text area)

A0000 Video RAM (Graphics area)


FFFF
I/O Expansion Area

COM1
03F8
• I/O Space Floppy Disk Controller
03F0
– Addresses I/O ports CGA Adapter
03D0
– Up to 64K 8-bit devices LPT1
0378
Hard disk Controller
0320
COM2
02F8
8255 (PIA)
0060
Timer (8253)
0040
Interrupt controller
0020
DMA Controller
0000
Microprocessor
• Data transfer between itself and memory or I/O
system
– Using data, address, and control buses
• Simple arithmetic and logic operations
– Add, Sub, Mul, Div, AND, OR, NOT, NEG, Shift, Rotate
– Data width: byte (8-bit), word (16-bit), and double
word (32-bit)
• Program flow via simple decisions
– Zero, Sign, Carry, Parity, Overflow
• Why is it so important?
Computer System Block Diagram
Address Bus

Data Bus
µP
MWTC
MRDC
IOWC
IORC

Read-only Read/Write
Memory memory Keyboard Printer
ROM RAM
• Bus is a common group of wires for
interconnection
• Address Bus: 16-bit for I/O and 20 to 36-bit for
memory
• Data Bus: 8 to 64-bit, the wider the bus, the more
data can be transferred
• Control Bs: contains lines that selects the
memory or I/O to perform a read or write
operation
– Four main control lines
– MRDC‘ (memory read control)
– MWTC’ (memory write control)
– IORC’ (I/O read control)
– IOWC’ (I/O write control)
Intel Microprocessor Architecture
• Operation Modes
– Real: uses 1st M byte of memory in all versions
– Protected: uses all parts of memory in 80286 and
above
• Register Types
– Program Visible: used during application programs
– Program Invisible: not directly addressable, but used
by system
• Program Visible Registers
– 4 Data Registers, 4 Pointer/Index Registers, 4-6
Segment Registers, Instruction Pointer, and Flags
• Compatibility is a successful strategy
– Register A may be used as 8-bit (AH and AL), 16-bit
(AX), and 32-bit (EAX) fir the later Pentium processors
– e.g. ADD AL, AH; ADD DX, CX; ADD ECX, EBX
– Instructions only affect the intended part of a register
– Later µP versions support earlier version codes

• Some registers are Multipurpose, some are Special


Purpose
– Segment Registers generate memory addresses
Real Mode Memory Addressing
Real mode memory
• Location = Segment + Offset
FFFFF

– Segment address located in a segment


register; always appended with 0H
– Segments always have length of 64 Kb
1FFFF
– Offset or displacement selects location
within 64 Kb of segment 1F000
Offset = F000

– e.g. 1000:2000 gives location 12000H 64K byte


• Default Segment and Address segment

Registers 10000 1000


– e.g. code segment and instruction
pointer CS:IP and stack segment and
stack pointer SS:SP 00000
Protected Mode Memory Addressing
• Accessed via segment and offset address, but
– Segment register contains a selector
– Selector selects a descriptor from descriptor table
– Descriptor: memory segment location, length, and
access right
• Two types of descriptor tables
– Global/system descriptors used for all programs
– Local/application descriptors used for applications
– Each descriptor is 8 bytes
• 16-bit segment register contains 3 parts
– Left most 13 bits address a descriptor
– TI bit access global (0) or local descriptor (1) table
– Right most 2 bits select priority for memory
segment access
• How many global and local descriptors in a
table?
• How large is a global and a local descriptor
table?
• How many memory segments are allowed?
Descriptor Formats
Access Right Byte
Program-Invisible Registers
• Each segment register contains a program-invisible
portion
– This register is re-loaded when segment register change
– Contains base-address, limit, and access information
– These registers also called descriptor cache
• Other program-invisible registers
– GDTR (global descriptor table register) contain base
address and limit for descriptor table
– Location of local descriptor table is selected from global
descriptor table using the selector held in LDTR (local
descriptor table register)
Memory Paging
• Memory paging changes a linear address to
physical
– Linear address is produced by software
– Page directory base is held in a control register (CR3)
– Linear address is broken into 3 sections: directory,
page table, offset
– Page directory contains 1024 entries of 4 bytes each
which addresses a page table that contains 1024
entries of 4 bytes each
– Each memory page is 4K bytes
– TLB (table look aside buffer) is a cache which contains
the 32 most recent page translation addresses
Addressing Modes
Data Addressing Modes
• Intel family supports 8 data addressing modes
• Modes differ in the location of data and address
calculations
• All modes involve physical address generation
• Consider MOV opcode as example: MOV AX, BX
– Opcode or operation code tells µP which operation to
perform
– Source operand is to the right
– Destination operand is to the left
• Register Addressing: MOV CX, DX
– Copy content of source register to destination register
– Source and destination must be of the same size
• Immediate Addressing: MOV AL, 22H
– Transfer the immediate data into destination register
– This is called constant data, but data transferred from
a register is a variable data
• Direct Addressing: MOV CX, LIST
– Move a byte or word between a memory location and
a register
– Memory address, instead of data, appears in the
instruction
• Register Indirect Addressing: MOV AX, [BX]
– Transfer data between a register and a memory
location addressed by a register
– Sometimes need using special assembler directives
BYTE PTR, WORD PTR, DWORD PTR, when size is not
clear
– FOR example MOV DWORD PTR [DI], 10H instead of
MOV [DI], 10H
• Base-plus-index Addressing: MOV [BX+DX], CL
– Transfer data between a register and a memory
location addressed by a base register and an index
register
• Register Relative Addressing: MOV AX, [BX+4]
– Move data between a register and a memory location
addressed specified by a register plus a displacement
• Base relative-plus-index Addressing:
MOV AX, ARRAY[BX+DI]
– Transfer data between a register and a memory
location specified by a base and index register plus a
displacement
– Another example is MOV AX, [BX+DI+4]

• Scaled-index Addressing: MOV EDX, [EAX+4*EBX]


– Address in the second register is modified by a scale
factor
– Scale factor are 2, 4, or 8, word, double-word, and
quad-word access, respectively
– Only available in 80386 through μP
– Other examples: MOV AL, [EBX+ECX] and MOV AL,
[2*EBX]
Program Memory-Addressing Modes
• Three forms, used with JMP and CALL instructions
• Direct Program Memory Addressing: LMP Label
– Like GOTO or GOSUB in BASIC language
– Allows going to any location in memory for next
instruction
• Relative Program Memory Addressing: JMP [2]
– Jump relative to instruction pointer (IP)
• Indirect Program Memory Addressing: JMP AX
– Jump to current code segment location addressed by
content of AX
– Other examples: JMP [DI+2[] and JMP [BX]
Stack Memory-Addressing Modes
• Stack is a LIFO (last-in, first-out memory)
• Data are place by PUSH and removed by POP
– Stack memory is maintained by stack segment register
(ss) and stack pointer (sp)
– When a word is pushed, high 8 bits are stored at SP-1
low 8 bits are stored at SP-2, the SP is decremented by
2
– When a word is poped, low 8 bits are removed from
location addressed by SP, high 8 bits are removed
from location addressed by SP+1, then SP is
incremented by 2
Instruction Encoding
• Assembler translates assembly code into
machine language
• Machine language is the native binary code μP
understands
• Override Prefixes
– First two bytes in 32-bit instructions:
Address size-prefix (67H) and Register size-prefix (66H)
– They toggle size of register and operand address from
16-bit to 32-bit or vice versa
D W

Opcode
• First byte of instruction: opcode
– First 6 bits of instruction are the binary opcode
– Direction bit (D) determines the direction of data flow
– Width bit (W) determines data size: 0 for byte, 1 for
word and double word
• Second byte of instruction: MOD-REG-R/M
MOD REG R/M

– MOD specifies addressing mode for instruction and


whether displacement is present
– If MOD=11, then register addressing mode, else
memory addressing mod
– In register addressing mode, R/M specifies a register
– In memory addressing mode, R/M selects a mode
from table
– If D=1, data flow to REG from R/M, if D=0 data flow to
R/M from REG
Intel Family Instruction Set
• PUSH and POP for stack operations
• Load Effective Address
– LEA loads a 16- or 32-bit register with offset address
– LDS, LES, LFS, LGS, and LSS load a 16- or 32-bit register with
offset address and a corresponding segment register DS,
ES, FS, GS, or SS with a segment address

• String Data Transfer


– Uses destination index (DI) and source index (SI) registers
– Two modes: auto-increment (D=0) and auto-decrement
(D=1)
• By default DI access data in extra segment and SI in
data segment
• LODS loads AL, AX, or EAX with data addressed by SI in
data segment and increments or decrements SI
• STOS stores AL, AX or EAX at the extra segment
addressed by DI and increments or decrements DI
• REPS STOS repeats the instruction the number of times
stored in CX, i.e. terminates when CX=0
• MOVS is the only instruction that transfers data
between memory locations
• INS transfers data from I/O device into extra segment
addressed by DI; I/O address is in DX register
• OUTS transfers data from data segment memory
addressed by SI to an I/O device addressed by DX
– For inputting or outputting a block of data INS and
OUTS are repeated

• Miscellaneous Data Transfer Instructions


– XCHG exchange contents of a register with any other
register or memory location
– IN and OUT instructions perform I/O operations
– Two I/O addressing modes: fixed-port and variable
port
– In fixed-port addressing the port address appears in
instructions, e.g. when using ROM
– In variable-port addressing I/O address in a register
– MOVSX is move and sign extend; MOVZX is move and
zero-extend
– CMOV new to Pentiums moves data only if condition
is true; conditions are checked for some prior
instruction results

• Segment Override Prefix


– May be added to any instruction to deviate from
default segment

• Arithmetic and Logic Instructions


– ADD simply adds two numbers and sets the flags
– ADC adds also the carry flag (C)
– INC adds one to a register or memory location
– SUB subtracts two and sets the flags
– SBB subtract-with-borrow also subtracts (C) from
difference
– DEC subtracts one from a register or memory location
– CMP is a subtract that only changes the flag bits; this
is normally followed by a conditional jump instruction
– Multiplication can be unsigned (MUL) or signed
(IMUL)
– Division can also be unsigned (DIV) or signed (IDIV)
– Basic logic instructions are AND, OR, XOR, NOT
– TEST is like CMP, but for bits zero flag Z=1 if bit is 0 and
Z=0 if bit is 1
– TEST performs AND operation, so TEST AL,1 tests the
first bit and TEST AL,128 tests the last bit of a byte in
AL
– NOT is logical inversion or one’s complement
– NEG is arithmetic sign inversion or two’s complement

• Shift and Rotate Instructions


– SHL and SHR are logical shift left and right that insert 0 and
put one bit in the carry flag C
– SAL and SAR are arithmetic shift operations; SAL is similar
to SHL, but SAR is different than SHR because it inserts the
sign bit instead of 0
– Rotate instructions rotate data from one end to another,
ROL (rotate left) and ROR (rotate right), or through the
carry flag (RCL and RCR)

• String Data Comparing


– String scan instruction SCAS compares register A with
memory
– Compare string instruction CMPS compares two memory
locations
Intel 8086 Hardware
• Similar to 8088 but has 16-bit data bus instead of
8-bit
• Power Supply Requirements
– Requires 5V with 10% tolerance
– Maximum supply current of 360 mA
– Operates between 32 to 180 degrees F
– CMOS version uses only 10mA and operates in -40 to
225 degrees F
• Noise Immunity
– Difference between logic 0 output and logic 0 input
voltages (= 0.35V)
– AD15-AD0: multiplexed address/data pins
– A19/S6-A16/S3: multiplexed address/status pins
S6 always remains 0, S5 is related to Flags, S4 and
S3 show which segment in memory is accessed
– RD : Read Signal (0 when receiving data from
memory or I/O)
– READY: for inserting wait states in μP timing (0)
– INTR: for requesting hardware interrupt if IF=1
– TEST: works with WAIT instruction
– NMI: Non-maskable interrupt (regardless of IF bit)
– Reset: causes reset and disables interrupts
– CLK: clock input pin of μP with 1/3 duty cycle
– Vcc: power supply input
– GND: two ground connections
– MN/MX: minimum/maximum operation mode
– BHE/S7: bus high enable used to enable D15-D8

• Minimum Mode Pins


– IO/M: selects memory or I/O for address bus
– WR: indicates μP is outputting data
– INTA: interrupt acknowledge responds to INTR
input
– ALE: address latch enable shows μP bus contains address
– DT/R: data transmit/receive shows that μP is transmitting
(1) or receiving data (0)
– DEN: data bus enable activates external data bus buffers
– HOLD: requests direct memory address (DMA) if 1;
another bus master wants to control the bus
– HOLA: hold acknowledge indicates the μP is in hold state
and all buses are floating
– SS0: used with IO/M and DT/R to detect function of
current bus cycle

• Maximum Mode Pins for use with a co-processor


– S2, S1, S0: status bits indicate function of current bus cycle
– R0/GT0 and R0/GT1: request/grant bi-directional
pins request and grant DMA
– LOCK: lock output locks peripherals off the system
– QS1 and QS0: queue status pins indicate the
internal instruction queue for numeric co-
processor
Clock Generator
• Provides 5 MHz for μP and 2.5 MHz for
peripherals
• Uses an external clock for 15 MHz crystal
• Provides a system reset signal
Bus Buffering and Latching
• Multiplexing reduces number of pins
• Demultiplexing required to have stable
addresses for memory or I/O
• Transparent latch is like a wire when enabled
and hold previous state when disabled
• Buffers used to drive high-capacitance loads
• Data bus uses bi-directional buffers
Bus Timing
• μP uses memory or I/O in periods called bus cycle
• Each bus cycle equals 4 system-clocking period (T
state)
• In T1 the address is placed, ALE, DT/R, and IO/M
are activated
• In case of write, data appears on data bus in T2
• READY is sampled at the end of T2, if low then T3
is wait state
• In T4 all signals are deactivated and prepared for
next cycle
• Ready and Wait State
– READY input to μP causes wait state for slower access
– Wait states appear between T2 and T3 to lengthen
bus cycle
– Memory access time is the period between when
address appears on bus until data is sampled by μP
– For 8086, at 5 MHz, each state is 200 ns; normal
access times are 460 ns
– READY is sampled at the end of T2 and again middle of
Tw
– Clock generator is used to synchronize READY signal
Memory Interface
• Two types: ROM and RAM, with 4 types of connection
lines
• Address Connection: labeled A0 to An for n+1 lnes
– Number of location = 2n+1, e.g. 10 pins means 1 K
• Data Connections: outputs (Os) or input/output (Ds)
– A byte-wide memory stores 8 bits per memory location.
Memories often referred to as locations times bits per
location
– E.g. 16K x 1 memory has 16K 1-bit locations
• Selection Connections
– Enables memory like chip select (CS), or select (S) pins in
RAM and chip enable (CE) in ROM
• Control Connections
– One or more pins for operation control
ROM has only one called output enable (OE) or
gate (G) which enable or disable tri-state output
buffers;
RAM sometimes has one: (R/W) which enables
read/write, and sometimes two: (WE or W) for
enabling writing and (OE) for enabling reading
ROM
• Programmed during manufacturing; Data is
permanent, so called nonvolatile memory
• Programmable ROM (PROM)
– Programmed in-field by burning NI-chrom or silicon oxide
fuses
• Erasable Programmable ROM (EPROM)
– Programmed in-field with EPROM programmer; erasable if
exposed to high-intensity ultraviolet light
• Electrically Erasable Programmable ROM (EEPROM)
– Erasable in system but need more time than normal RAM
also called read-mostly memory (RMM), flash
memory, electrically alterable ROM, and nonvolatile
RAM (NVRAM)
– Flash memory stores system setup information
Static RAM (SRAM)
• Retains data as long as DC power applied
(volatile)
• Used for cache memory because of fast access
Dynamic RAM (DRAM)
• Retains data on integrated capacitances
• Needs to be refreshed every 2 to 4 ms
• Much larger capacity than SRAM
• Refresh is done by reading and rewriting data
– RAS selects a row for refreshing while DRAM is
operational. Called hidden refresh, transparent refresh, or
cycle stealing

• Extended Data Output (EDO): DRAM with output


latches
– Latch holds next data, this 15% to 25% faster
– Refresh is done by writing into these latches

• Synchronous DRAM (SDRAM) used with newer systems


– SDRAM read four 64-bit numbers in one burst
– First number takes 3 to 4 clock cycles, rest only 1
– Faster than both normal DRAM and EDO
Address Decoding
• Usually more than one memory chip is connected μP
• Decoding allocates each chip to a part of memory map
• Types of decoders
– NAND gate: expensive because multi-input NAND gates are
required for each memory device
– Decoder chips: more commonly used than NAND, like 3-to-
8
– PLD: programmable logic device; used today
1-PROM: economical because of large number of inputs
2-PLA: programmable logic arrays; has replaced PROM
because of higher flexibility
Hamming Codes
• By R. W. Hamming; Commonly used in RAM
• k parity bits added to n data bits
• Bit positions numbered 1 to n + k
• Positions numbered with powers of two are for parity
• The k parity bits are generated as follows
– P1 = XOR (all data bit position numbers with 1 in 1st bit)
– P2 = XOR (all data bit position numbers with 1 in 2nd bit)
– P4 = XOR (all data bit position numbers with 1 in 3rd bit)
– P8 = XOR (all data bit position numbers with 1 in 4th bit)
• When n + k data are read, the parity are evaluated
– C1 = XOR(all bit position numbers with 1 in 1st bit)
– C2 = XOR(all bit position numbers with 1 in 2nd bit)
– C4 = XOR(all bit position numbers with 1 in 3rd bit)
– C8 = XOR(all bit position numbers with 1 in 4th bit)

• C=C1 C2 C4 C8=0 means no error, else there is error

• Decimal value of C indicates the error bit position


– Error may be in data or parity bits
– Hamming code detects and corrects error only in single bit
– With an additional parity bit, two errors detected but not
corrected
Basic I/O Interface
• Two methods: isolated I/O and memory-
mapped I/O
• Isolated I/O
– I/O locations are isolated from memory system
– Only instructions IN, INS, OUT and OUTS used
• Memory-Mapped I/O
– May use any instruction that references memory
– I/O is treated like a memory location
Interface Units
• Links between CPU and I/O
– Some peripherals are electromechanical devices
and need conversion of signal values
– Data transfer rates of peripherals may differ from
CPU clock and need synchronization mechanism
– Data codes and formats in peripherals may differ
from CPU
– Operation modes of peripherals differ from each
other and they need to be controlled so that they
do not interfere each others operation
Asynchronous Data Transfer
• CPU and I/O usually have different clocks
• Strobing: simple method with one control signal
– Transfer may be initiated by the source or destination
– No indication that data ever captured by destination
– No indication that source has put the data on bus
– Speed is as low as that of slowest attached device
• Handshaking: two control signals, one from each
side
– Based on request and acknowledge signals
Impact of I/O on System Performance
• Some applications require high throughput, like Tax
Service
• Some require low response time, like Personal
Computers
• Many require both, like Automatic Teller Machines
– Suppose a benchmark executes in 100 seconds of elapsed
time, where 90 sec is CPU time and the rest is I/O. If CPU
speed improves by 50% per year but I/O performance
doesn’t, how much faster the program runs at the end of 5
years?

– Answer: 4.5
Magnetic Disks
• Components
– One to 15 platters with two recordable surfaces each
– Stack of platters has diameter of 1 to 8 inches and rotates
at 3600 to 7200 RPM
– Each disk surface divided into 1000 to 5000 concentric
circles called tracks
– Each track divided into 64 to 200 sectors which contain
information
• Access time
– Seek time + rotational latency + transfer time + controller
time
– What is the average rotational latency? 8.3 ms to 4.2 ms
Serial Communication
• Parallel: all bits sent at once
– Fast, but lots of wires; good for short distance
• Serial: bits sent in sequence one at a time
– Slow, but less expensive
– Modems (modulator-demodulator) allow use of telephone
lines
– Simplex: only one way, line radio and television
broadcasting
– Half-duplex: both directions, but one at a time. Modems at
both end change roles as transmitter and receiver in a
turnaround time
– Full-duplex: both directions simultaneously
Communication Between I/O and CPU
• CPU to I/O
– Isolated I/O or memory-mapped I/O
• I/O to CPU
– Operating system needs to know when I/O
finished a task
– Operating system should be notified of any errors
in I/O
– Two methods: Polling and Interrupt Driven
– I/O may access memory directly (DMA)
• Polling (Programmed I/O)
– I/O puts information in a status register
– The OS periodically checks the status register
– Busy wait loop is used to implement polling
– Checks for I/O completion is dispersed among
code
– Advantage: simple, CPU controls all the work
– Disadvantage: Polling overhead consumes a lot of
CPU time
• Interrupt Driven (Exception Strategy)
• I/O interrupts CPU to get its attention
• Step 1: CPU receives interrupt signal from I/O
• Step 2: Current PC or IP is saved
• Step 3: CPU gets address of interrupt service
routine
• Step 4: After executing ISR, CPU jumps back
• Advantage: user program is only halted during
actual transfer
• Disadvantage: Special hardware needed to cause
interrupt (I/O), detect interrupt (CPU), and save
proper states to resume after interrupt (CPU)
• Compare I/O Interrupt and Processor
Exceptions
Overhead of Polling in I/O Systems
• Polling is only suitable for low bandwidth devices
• Polling should be frequently enough not to lose any
data
• Assume and 500 MHz μP with 400 clock cycle polling
– Mouse must be polled 30 times per second
• Fraction of processor clock cycle time consumed is 0.002%

• Polling can be used for mouse without much impact on


performance
– Floppy disk transfers data to processor in 16-bit units
and has data rate of 50 KB/sec
• Fraction of processor clock cycle time consumed is 2%

• This is significant but can be tolerated in low-end systems


– Hard disk transfers data to processor in four 32-bit
chunks and has data rate of 4 MB/sec
• Fraction of processor clock cycle time consumed is 20%

• This is one-fifth of CPU time and not acceptable to do


Overhead of Interrupt-Driven I/O Systems

• Has overhead for CPU only during actual data transfer


– For previous hard disk system, assume each interrupt
overhead is 500 clock cycles and hard disk transfers data
5% of the time

• Average fraction of CPU time consumed is 1.25%


Direct Memory Access (DMA)
• Allows devices to talk to memory directly
• Less overhead for CPU compared to polling and
interrupt
• Suitable for high bandwidth devices like hard disk and
transfer of large chunks of data at a time
• Interrupt still used but on completion of transfer or
error
• DMA done by special controller device to master the
bus
• Many DMA controller are flexible with respect to
delays
• Three main steps are involved in DMA
– Processor sets up DMA by supplying
1) device identity, 2) operation to perform, 3) Source or
destination memory address, and 4) number of bytes to
transfer
– DMA starts and DMA controller arbitrates for the bus
– DMA transfer completes
DMA controller interrupts CPU, and CPU checks for any
possible errors
• Overhead of DMA I/O Systems
– For previous hard disk systems
assume initial setup for DMA takes 1000 clock cycles,
handling interrupt at DMA completion takes 500
cycles, and average transfer from disk is 8 kB

• Average fraction of CPU time consumed is 0.15%


Computer Performance
• Performance is major measure of evaluating computers
• Performance is task dependent and defined differently
– Single users are interested in Response Time or Executing Time
time between start and completion of a task improves by faster
processors
– Computer center managers are interested in Throughput
total amount of work done in a given time improves by faster
processor or using multiprocessors

• Performance of a machine: PER = 1 / EXE


– Relative performance of machine A to machine B is given by
PER(A)/PER(B) = EXE(B)/EXE(A)
– Performance improves by reducing the length of clock
cycles (TAU) or number of clock cycles required for
executing a program (CYC) EXE = CYC * TAU
– Execution time depends on the number of instructions
in a program (INS) and the average clock cycles
needed for instructions (CPI)
CYC = INS * CPI
• Basic Performance equations
– EXE = INS * CPI * TAU = INS * CPI/FRE
Instructio ns Clock Cycles Seconds
– Check units: Time = Program Instructio n Clock Cycles
x x

– CPU clock cycles is measured by looking at different


types of instructions (n and i) and their clock cycle
countsn(COU)
CYC = i 1 CPIi x COUi
• MIPS: Million Instruction Per Second
– Alternative to execution time for evaluating
systems
– MIPS = Execution
Instructio n Count
6
Time x 10
– Faster machine means higher MIPS
– Higher MIPS does not necessarily mean higher or
better performance!
– MIPS is instructions execution rate with no regard
to capabilities
– Cannot use MIPS to compare computers with
different instructions
– MIPS varies for different programs on the same
machine
Exercises on Organization
• For a Pentium II processor descriptor that contains a base address of
00280000 H, a limit of 00010 H, and G=1, what starting and ending
locations are addressed?

• Code a descriptor that describes a memory segment that begins at


location 03000000 H and ends at location 05FFFFF H. This is a data
segment that grows upward in the memory system and can be written, for
an 80386 Intel processor.
• If the processor sends linear address 00200000 H to the paging
mechanism, which paging directory entry is accessed and which
page entry is accessed.

• What is wrong with a MOV [BX],[DI] instruction?

• What, if anything, is wrong with MOV AL,[BX][DI] instruction?

• Suppose DS=1100 H, BX=0200 H, LIST=0250 H, and SI=0500 H,


determine the address accessed by each of the following
instructions.
a) MOV LIST[SI],EDX
b) MOV CL,LIST[BX+SI]
c) MOV CH,[BX+SI]
• Explain what happens when PUSH EAX instruction is
executed. Assume SP=0100 H and SS=0200 H.

• Develop a sequence of instructions that copy 12 bytes


of data from an area of memory addressed by SOURCE
into an area of memory addressed by DEST.

• What is wrong with a MOV CS,AX instruction?


• If AX=1001 H and DX=20FF H, list the sum and the
content of each flag register bit (C, A, S, Z and O) after
the ADD AX, DX instruction executes.

• What is wrong with the INC[BX] instruction?

• Develop a sequence of instructions that sets (to 1) the


rightmost 4 bits of AX, clears (to 0) the leftmost three
bits of AX, and inverts bits 7,8, and 9 of AX.

• Why are buffers required in 8086- and 8088-based


systems?

• What two 8086 operations occur during a bus cycle?


• Briefly describe the purpose of each T state from T1 to
T4.

• Modify the NAND gate decoder in Figure 10-13 to


select the memory for address range DF800 H- DFFFF
H.
• Modify Figure 10-19 by rewriting the PAL
program to address B0000 H- BFFFF H.

• Modify the circuit of Figure 10-20 to select


memory locations 68000 H-6FFFF H.
Exercises on Hamming Code
• Given the 11-bit data word 00100111010, generate the corresponding 15-bit
Hamming Code word.

• A 12-bit Hamming code word contains 8 bits of data and 4 parity bits is read from
the memory. What was the original 8-bit data word that was written into memory
if the 12-bit word read out is:
a)000010101010, b)111110010110, and c) 100111110100.

• How many parity check bits must be included with the data word to achieve single
error correction and double error detection when the data word contains: a) 16
bits, b) 32 bits, and c) 64 bits
• It is necessary to formulate the Hamming code for 4 data
bits D3, D5, D6, D7 together with 3 parity bits P1, P2 and
P4.
a) Evaluate the 7-bit composite code word for the data
word 0101

b) Evaluate the 3 check bits C1, C2 and C4, assuming no


error.

c) Assume an error in bit D5 during storage into memory.


Show how the error in the bit is detected and corrected.

d)Add a parity bit P to include double error detection in


code. Assume that errors occurred in bits P2 and D5. Show
how this double error is detected.
Exercises on Computer Performance
• Suppose we have two implementations of the same instruction set
architecture. Machine A has a clock cycle time of 1 ns and a CPI of
2.0 for some program, and machine B has a clock cycle time of 2 ns
and a CPI of 1.2 for the same program. Which machine is faster for
this program, and by how much?
• Our favorite program runs in 10 seconds on computer A, which has
a 400 MHz clock. We are trying to help a computer designer build a
machine, B, that will run this program in 6 seconds. The designer
has determined that a substantial increase in the clock rate is
possible, but this increase will affect the rest of the CPU design,
causing machine B to require 1.2 times as many clock cycles as
machine A for this program. What clock rate should we tell the
designer to target?
• A compiler designer is trying to decide between two code
sequences for a particular machine. The hardware designers
have supplied the following facts:
Instruction Class CPI for this Instruction Class

A 1
B 2
C 3

For a particular high-level language statement, the compiler


writer is considering two code sequences that require the
following instruction counts:
Code Sequence Instruction Counts for Instruction Class
A B C
1 2 1 2

2 4 1 1

Which code sequence executes the most instructions? Which


will be faster? What is the CPI for each sequence?
• Consider the machine with three instruction classes and CPI
measurements from the previous problem. Now suppose
we measure the code for the same program from two
different compilers and obtain the following data:
Codes From Instruction Counts (in billions) for Instruction Class
Compiler
A B C
1 5 1 1

2 10 1 1

• Assume that the machine’s clock rate is 500 MHz. Which


code sequence will execute faster according to MIPS?
According to execution time?

You might also like