Computer Architecture Note ND2
Computer Architecture Note ND2
CTE 214
TWO UNITS
1
1.1 INTRODUCTION TO COMPUTER ARCHITECTURE
Computer Architecture is the science and art of selecting and interconnecting hardware
components to create computers that meet functional, performance and cost goals. Computer
architecture is not about using computers to design buildings. In computer science and computer
engineering, computer architecture or digital computer organization is the conceptual design
and fundamental operational structure of a computer system. It's a functional description of
requirements and design implementations for the various parts of a computer, focusing largely on
the way by which the central processing unit (CPU) performs internally and accesses addresses
in memory.
Instruction set architecture, or ISA, is the abstract image of a computing system that is
seen by a machine language (or assembly language) programmer, including the
instruction set, word size, memory address modes, processor registers, and address and
data formats.
System Design which includes all of the other hardware components within a computing
system such as:
Once both ISA and microarchitecture have been specified, the actual device needs to be designed
into hardware. This design process is called the implementation. Implementation is usually not
considered architectural definition, but rather hardware design engineering.
Implementation can be further broken down into three (not fully distinct) pieces:
2
WORD FORMAT PROCESSING
The use of specialized document manipulation software running on a computer or terminal that
allows a user to create, edit, store and print out text based documents. Most modern companies
that have a need for producing business letters or other types of text documents will have access
to word processing software and a printer. A word processor, or word processing program, does
exactly what the name implies. It processes words. It also processes paragraphs, pages, and entire
papers. Some examples of word processing programs include Microsoft Word, WordPerfect
(Windows only), AppleWorks (Mac only), and OpenOffice.org.
The first word processors were basically computerized typewriters, which did little more than
place characters on a screen, which could then be printed by a printer. Modern word processing
programs, however, include features to customize the style of the text, change the page
formatting, and may be able to add headers, footers, and page numbers to each page. Some may
also include a "Word Count" option, which counts the words and characters within a document
Microsoft Word is a word processor developed by Microsoft. It was first released in 1983
under the name Multi-Tool Word for Xenix systems.[3] Subsequent versions were later written for
several other platforms including IBM PCs running DOS (1983), the Apple Macintosh (1985),
the AT&T Unix PC (1985), Atari ST (1988), SCO UNIX (1994), OS/2 (1989), and Windows
(1989). Commercial versions of Word are licensed as a standalone product or as a component of
Microsoft Office, Windows RT or the discontinued Microsoft Works Suite. Freeware editions of
Word are Microsoft Word Viewer and Word Web App on SkyDrive, both of which have limited
feature sets.
Other word processors have their own standards as well. OpenOffice Writer, for example, uses
the OpenDocument, or ODF, format. Kingsoft Writer uses a format called WPS. And so on.
Fortunately, these and other programs can save documents in multiple formats, thereby making
them easier to access in, well, other programs. That's why, in Microsoft Word, if you click the
Save as type pull-down in the Save dialog, you'll see a wealth of choices. Below I've identified
some of the more popular ones, and in what circumstances you might use them.
3
1.2.1 The Different formats
4
Word can also save files as Web pages, XML documents, templates, and more. Needless to say,
if you need to learn about those formats, a little Google searching should reveal all.
When preparing a publication, different authors contribute to one document. As many different
MS Word versions exist (Word 2010 / 2007, Word 2003/02, older versions), each with different
possibilities and constraints, problems can arise when files are exchanged across these versions.
Problems can arise when .docx files are worked on in Word 2003 or earlier. In particular figures
and equations may become unusable.
To avoid compatibility issues as far as possible, please avoid using .docx format when you are
not entirely sure that all of the participants working on the original document will have
Word 2007 or 2010 (backwards compatibility seems to be ensured without further problems).
You can use Microsoft Office Word 2007 to open or save files in other formats. For example,
you can open a Web page and then upgrade it to access the new and enhanced features in Office
Word 2007. For more information on upgrading documents, see Use Microsoft Office Word
2007 to open documents created in previous versions of Word.
You can use Office Word 2007 to open files in any of several formats.
2. In the Open dialog box, click the type of file that you want to open.
3. Click the file, and then click Open.
You can save Office Word 2007 documents to any of several file formats.
Note You cannot use Microsoft Office Word 2007 to save a document as a JPEG (.jpg) or GIF
(.gif) file, but you can save a file as a PDF (.pdf) file.
1. Click the Microsoft Office Button , and then click Save As.
Note If you point to Save As, the menu that appears does not show a complete list of file
formats. To view all of the possible file formats, you must click Save As to open the Save As
dialog box.
5
2. In the Save As dialog box, click the arrow to the right of the Save as type list, and then
click the file type that you want.
There are basically two types of digital computer architectures. The first one is called Von
Neumann architecture and later Harvard architecture was adopted for designing digital
computers.
It is named after the mathematician and early computer scientist John Von Neumann.
The computer has single storage system(memory) for storing data as well as program to
be executed.
Processor needs two clock cycles to complete an instruction. Pipelining the instructions is
not possible with this architecture.
6
In the first clock cycle the processor gets the instruction from memory and decodes it. In
the next clock cycle the required data is taken from memory. For each instruction this
cycle repeats and hence needs two cycles to complete an instruction.
This is a relatively older architecture and was replaced by Harvard architecture.
The processor takes more time to execute as it has to decide between the data and
instruction as both are stored in same memory and
The shared bus between the program memory and data memory leads to the Von
Neumann bottleneck, the limited throughput (data transfer rate) between the CPU and
memory compared to the amount of memory
Also there are two types of memory access first to access data and next for instruction or
vice versa.
Because program memory and data memory cannot be accessed at the same time,
throughput is much smaller than the rate at which the CPU can work
This seriously limits the effective processing speed when the CPU is required to perform
minimal processing on large amounts of data
The CPU is continually forced to wait for needed data to be transferred to or from
memory
Since CPU speed and memory size have increased much faster than the throughput
between them, the bottleneck has become more of a problem, a problem whose severity
increases with every newer generation of CPU.
HARVARD ARCHITECTURE
Harvard architecture which is also a stored-program system but has one dedicated set of address
and data buses for reading data from and writing data to memory, and another set of address and
data buses for fetching instructions.
The name is originated from "Harvard Mark I" a relay based old computer.
The computer has two separate memories for storing data and program.
Processor can complete an instruction in one cycle if appropriate pipelining strategies are
implemented.
In the first stage of pipeline the instruction to be executed can be taken from program
memory.In the second stage of pipeline data is taken from the data memory using the
decoded instruction or address.
Most of the modern computing architectures are based on Harvard architecture.But the
number of stages in the pipeline varies from system to system.
These are the basic differences between the two architectures.A more comprehensive list can be
found here with respect to ARM class of processors.
7
The Von Neumann architecture
A CPU consists of a set of registers that function as a level of memory above Main memory and
Cache memory. The central processing unit(CPU) is the brain of any computer. It carries out all
the processing in the computer. Central Processing Unit itself consists of three main subsystems.
The first one is Control Unit, the second is Registers, and the third is Arithmetic and Logic Unit
(ALU).
A CPU works in a fetch execute cycle. On power on, the CPU fetches the first instruction from a
location specified by the program counter. This instruction is brought into instruction register
which is decoded by the control unit. Based on the instruction, the control unit would either fetch
the operand and or carry out arithmetic or logical operations on it, or store the result of such an
operation into a specified memory location. After one instruction is executed the next instruction
is fetched by the processor and executed. This process goes on till the processor does not come to
an halt instruction. A real life processor would have large number of registers, sophisticated
microprogram control unit and a sophisticated arithmetic and logic unit. Most powerful
processors currently popular are from from Intel, Pentium III and Pentium IV.
8
The steps in the instruction cycle are performed by a variety of functional components within
the CPU. These components work very closely with the PCs memory and bus systems to
carry out their designated tasks.
The control unit (sometimes called the fetch / decode unit) is responsible for retrieving
individual instructions from their location in memory, then translating them into commands
that the CPU can understand. These commands are commonly referred to as machine-
language instructions, but are sometimes called micro-operations, or UOPs. When the
translation is complete, the control unit sends the UOPs to the execution unit for processing.
The execution unit is responsible for performing the third step of the instruction cycle,
namely, executing, or performing the operation that was specified by the instruction. The
execution unit itself is made up of several functional components as follows:
Registers are temporary holding areas for UOPs and any data that the UOPs require
for processing. Typically, registers will be the same size as the CPU�s word size
(the number of bits that the CPU can process at one time).
The arithmetic / logic unit (ALU) contains the electronic circuitry needed to perform
arithmetic and logical operations on data in the registers. Arithmetic operations
include addition, subtraction, multiplication and division. Logical operations consist
of comparing one data item to another in order to determine if the first data item was
equal, greater than or less than the second data item. The results of such a comparison
may cause different processing to occur.
The floating point unit (FPU) performs arithmetic operations on numbers with
decimals, known as floating-point numbers. This is in contrast to the ALU, which
performs its operations on whole numbers only. (Early CPUs used a separate chip,
called a math co-processor, to perform operations on decimal numbers).
The multimedia execution (MMX) unit performs special operations associated with
graphics, audio or video.
The following diagram shows how the functional components of the CPU work together to
fetch data and instructions that are stored in RAM, decode and execute the instructions, and
store the results back in RAM:
9
Functional components of a CPU
In this diagram,
1. Information from a software program, running in RAM, is sent along the 64-bit data
bus and enters the CPU via the Bus Interface Unit (BIU). The BIU first makes a copy
of the information and sends it to the L2 cache. It then determines if the information
is data or an instruction. Data is sent down a 64-bit path to a small (eg, 32KB) data
cache. Instructions are sent down a separate 64-bit path to a similar instruction cache.
The data cache and instruction cache are collectively known as the internal cache,
processor cache or L1 cache.
2. The control unit retrieves instructions from the instruction cache, breaks them down
into smaller micro-operations (UOPs), and moves them to the execution unit, where
they are held in temporary storage areas called registers.
3. The execution unit checks each UOP to see if they need any data. If the required data
currently resides in the L1 or L2 caches, it is moved into the registers. If not, a fetch
to the slower RAM is required (in which case, the process begins all over again). The
execution unit is able to "buffer" several instructions, so that it can move on to another
instruction while waiting for the data for the previous one.
4. The execution unit now executes the instruction. If calculations are involved, the
execution unit will enlist the help of several other units to perform the calculations,
specifically:
Arithmetic / Logic Unit (ALU) for integer calculations, relational or logical
operations
Floating Point Unit (FPU) for floating point number calculations
MMX Unit for special calculations associated with graphics, audio or video.
10
5. The execution unit sends the result of the calculation back to the L2 and L1 caches, in
case it is needed again soon by another instruction. The data cache sends the result to
the BIU, which in turn sends it back to RAM
1.4.2 REGISTERS
· User-visible Registers
User-Visible Registers
User visible registers are those registers that are used by the user in the machine language and
can be executed by the CPU. These enable the machine or assembly language programmer to
minimize main memory references by use of registers All CPU designs provide a number of user
visible registers. These registers can be categorized into following different types:
· General Purpose Registers: They can be assigned a variety of functions by the programmer.
The general purpose registers can be considered for orthogonal usage and non-orthogonal usage.
If any general purpose register can contain the operand for any opcode then we refer the use of
general purpose as orthogonal usage. Sometimes the use of general purpose register is restricted.
For example, there may be dedicated registers that are used for floating point operations. Then
we refer to the use of general purpose as non orthogonal usage.
· Data registers: They can only be used to hold data, and cannot be employed in the calculation
of an operand address.
· Address registers: They may be used either in general purpose addressing modes, or may be
devoted to a particular addressing mode.
-Segment Pointers: In a machine CPU with segmented addressing, a register holds the address
of the base of a segment. There may be multiple segment registers. For example, one for the
operating system, one for the current process.
Thus there may be many registers one for operating system and one for the current process.
-Index Registers: These are used for indexed addressing, and may be auto indexed.
- Stack Pointer: If there is a user-visible stack addressing, then typically the stack is in memory
and there is a dedicated register that points to the top of the stack. This allows implicit
addressing, that is push, pop and other stack instructions need not contain an explicit stack
operand.
11
· Condition codes: These are least partially visible register types. They are also referred to as
flags. The bits of Flag register are set according to the result of an operation. There can be Zero
flag, Carry flag, etc., that indicate whether the result is zero or the result produced an overflow,
respectively.
These registers are employed to control the operation of the CPU. These are used by the control
unit to control the operation of the CPU and by privileged operating system programs to control
the execution of programs.
Four registers are essential to instruction execution: These four registers are used for the
movement of data between the CPU and memory.
· Memory Buffer Register (MBR): It contains a word of data to be written to memory or the
word most recently read.
Within the CPU, data must be presented to the ALU for processing. The ALU may have direct
access to the MBR and user-visible registers. Alternatively, there may be additional buffering
registers at the boundary to the ALU; and these registers serve as input and output registers for
the ALU and exchange data with MBR and user-visible registers. This depends on the design of
the CPU and ALU.
It contains status information. PSW typically contains condition codes plus other status
information. Some of these may be user visible. Common flags include:
· Sign: contains the sign bit of the result of the last arithmetic operation.
· Carry: set if an operation resulted in a carry (addition) into or borrow (subtraction) out of the
high-order bit.
12
· Supervisor: Indicates whether the CPU is executing in supervisor mode or user mode. Certain
privilege instructions can be executed only in supervisor mode (e.g. halt instruction), and certain
areas of memory can be accessed only in supervisor mode.
As discussed earlier CPU which is the heart of a computer consists of Registers, Control Unit
and Arithmetic Logic Unit. The interconnection of these units is achieved through the system bus
as shown in figure 4.1.
1. Fetch instructions: The CPU must read instructions from the memory.
3. Fetch data: The execution of an instruction may require reading data from memory or an I/O
module.
4. Process data: The execution of an instruction may require performing some arithmetic or
logical operations on data.
5. Write data: The results of an execution may require writing data to the memory or an I/O
module.
13
1.5 HARDWARE/ SOFTWARE TRADEOFFS
1 ATAM benefits
2 ATAM process
3 Steps of the ATAM process
4 See also
5 References
6 External links
ATAM benefits
The following are some of the benefits of the ATAM process:
ATAM process
The ATAM process consists of gathering stakeholders together to analyze business drivers
(system functionality, goals, constraints, desired non-functional properties) and from these
drivers extract quality attributes that are used to create scenarios. These scenarios are then used
in conjunction with architectural approaches and architectural decisions to create an analysis of
trade-offs, sensitivity points, and risks (or non-risks). This analysis can be converted to risk
14
themes and their impacts whereupon the process can be repeated. With every analysis cycle, the
analysis process proceeds from the more general to the more specific, examining the questions
that have been discovered in the previous cycle, until such time as the architecture has been fine-
tuned and the risk themes have been addressed.
1. Present ATAM – Present the concept of ATAM to the stakeholders, and answer any
questions about the process.
2. Present business drivers – everyone in the process presents and evaluates the business
drivers for the system in question.
3. Present the architecture – the architect presents the high level architecture to the team,
with an 'appropriate level of detail'
4. Identify architectural approaches – different architectural approaches to the system are
presented by the team, and discussed.
5. Generate quality attribute utility tree – define the core business and technical
requirements of the system, and map them to an appropriate architectural property.
Present a scenario for this given requirement.
6. Analyze architectural approaches – Analyze each scenario, rating them by priority. The
architecture is then evaluated against each scenario.
7. Brainstorm and prioritize scenarios – among the larger stakeholder group, present the
current scenarios, and expand.
8. Analyze architectural approaches – Perform step 6 again with the added knowledge of the
larger stakeholder community.
9. Present results – provide all documentation to the stakeholders.
These steps are separated in two phases: Phase 1 consists of steps 1-6 and after this phase, the
state and context of the project, the driving architectural requirements and the state of the
architectural documentation are known. Phase 2 consists of steps 7-9 and finishes the evaluation
15
2.1 Microcomputer Address Bus, Data Bus and Control Bus
It is a group of wires or lines that are used to transfer the addresses of Memory or I/O devices. It
is unidirectional.In Intel 8085 microprocessor, Address bus was of 16 bits. This means that
Microprocessor 8085 can transfer maximum 16 bit address which means it can address 65,536
different memory locations. This bus is multiplexed with 8 bit data bus. So the most significant
bits (MSB) of address goes through Address bus (A7-A0) and LSB goes through multiplexed
data bus (AD0-AD7). The address bus consists of all the signals necessary to define any of the
possible memory address locations within the computer, or for modular memories any of the
possible memory address locations within a module. An address is defined as a label, symbol, or
other set of characters used to designate a location or register where information is stored. Before
data or instructions can be written into or read from memory by the CPU or I/O sections, an
address must be transmitted to memory over the address bus.
As name tells that it is used to transfer data within Microprocessor and Memory/Input or Output
devices. It is bidirectional as Microprocessor requires to send or receive data. The data bus also
works as address bus when multiplexed with lower order address bus. Data bus is 8 Bits long.
The word length of a processor depends on data bus, thats why Intel 8085 is called 8 bit
Microprocessor because it have an 8 bit data bus. The bidirectional data bus, sometimes called
the memory bus, handles the transfer of all data and instructions between functional areas of the
computer. The bidirectional data bus can only transmit in one direction at a time. The data bus is
used to transfer instructions from memory to the CPU for execution. It carries data (operands) to
and from the CPU and memory as required by instruction translation. The data bus is also used to
transfer data between memory and the I/O section during input/output operations. The
information on the data bus is either written into.
16
bus is used by the CPU to direct and monitor the actions of the other functional areas of the
computer. It is used to transmit a variety of individual signals (read, write, interrupt,
acknowledge, and so forth) necessary to control and coordinate the operations of the computer.
The individual signals transmitted over the control bus and their functions are covered in the
appropriate functional area description.
Memory management is the act of managing computer memory. In its simpler forms, this
involves providing ways to allocate portions of memory to programs at their request, and freeing
it for reuse when no longer needed. The management of main memory is critical to the computer
system. The memory management function keeps track of the status of each memory location,
either allocated or free. It determines how memory is allocated among competing processes,
deciding who gets memory, when they receive it, and how much they are allowed. When
memory is allocated it determines which memory locations will be assigned. It tracks when
memory is freed or unallocated and updates the status.
Virtual memory systems separate the memory addresses used by a process from actual physical
addresses, allowing separation of processes and increasing the effectively available amount of
RAM using disk swapping. The quality of the virtual memory manager can have a big impact on
overall system performance.
Garbage collection is the automated allocation and deallocation of computer memory resources
for a program. This is generally implemented at the programming language level and is in
opposition to manual memory management, the explicit allocation and deallocation of computer
memory resources. Region-based memory management is an efficient variant of explicit memory
management that can deallocate large groups of objects simultaneously.
Single allocation is the simplest memory management technique. All the computer's memory,
usually with the exception of a small portion reserved for the operating system, is available to the
single application. MS-DOS is an example of a system which allocates memory in this way. An
embedded system running a single application might also use this technique.A system using
single contiguous allocation may still multitask by swapping the contents of memory to switch
among users. Early versions of the Music operating system used this technique.
Partitioned allocation divides primary memory into multiple memory partitions, usually
contiguous areas of memory. Each partition might contain all the information for a specific job
or task. Memory management consists of allocating a partition to a job when it starts and
unallocating it when the job ends. Partitioned allocation usually requires some hardware support
to prevent the jobs from interfering with one another or with the operating system. The IBM
17
System/360 used a lock-and-key technique. Other systems used base and bounds registers which
contained the limits of the partition and flagged invalid accesses. The UNIVAC 1108 Storage
Limits Register had separate base/bound sets for instructions and data. The system took
advantage of memory interleaving to place what were called the i bank and d bank in separate
memory modules. Partitions may be either static, that is defined at Initial Program Load (IPL) or
boot time or by the computer operator, or dynamic, that is automatically created for a specific
job. IBM System/360 Operating System Multiprogramming with a Fixed Number of Tasks
(MFT) is an example of static partitioning, and Multiprogramming with a Variable Number of
Tasks (MVT) is an example of dynamic. MVT and successors use the term region to distinguish
dynamic partitions from static ones in other systems[3]:p.73.
Paged allocation divides the computer's primary memory into fixed-size units called page
frames, and the program's address space into pages of the same size. The hardware memory
management unit maps pages to frames. The physical memory can be allocated on a page basis
while the address space appears contiguous.Usually, with paged memory management, each job
runs in its own address space, however, IBM OS/VS/2 SVS ran all jobs in a single 16MiB virtual
address space.Paged memory can be demand-paged when the system can move pages as required
between primary and secondary memory.
Segmented memory is the only memory management technique that does not provide the user's
program with a 'linear and contiguous address space. Segments are areas of memory that usually
correspond to a logical grouping of information such as a code procedure or a data array.
Segments require hardware support in the form of a segment table which usually contains the
physical address of the segment in memory, its size, and other data such as access protection bits
and status (swapped in, swapped out, etc.)
The primary role of the memory management system is to satisfy requests for memory
allocation. Sometimes this is implicit, as when a new process is created. At other times,
processes explicitly request memory. Either way, the system must locate enough unallo-
cated memory and assign it to the process.
1. First Fit
The first of these is called first fit. The basic idea with first fit allocation is that we
begin searching the list and take the first block whose size is greater than or equal to the
request size, as illustrated in Example 9.3. If we reach the end of the list without finding
a suitable block, then the request fails. Because the list is often kept sorted in order of
address, a first fit policy tends to cause allocations to be clustered toward the low memory
addresses. The net effect is that the low memory area tends to get fragmented, while the
upper memory area tends to have larger free blocks.
18
2. Next Fit
If we want to spread the allocations out more evenly across the memory space, we often
use a policy called next fit. This scheme is very similar to the first fit approach, except
for the place where the search starts. In next fit, we begin the search with the free block
that was next on the list after the last allocation. During the search, we treat the list as
a circular one. If we come back to the place where we started without finding a suitable
block, then the search fails.
3. Best Fit
In many ways, the most natural approach is to allocate the free block that is closest in
size to the request. This technique is called best fit. In best fit, we search the list for the
block that is smallest but greater than or equal to the request size. This is illustrated in
Example 9.5. Like first fit, best fit tends to create significant external fragmentation, but
keeps large blocks available for potential large allocation requests.
4. Worst Fit
If best fit allocates the smallest block that satisfies the request, then worst fit allocates
the largest block for every request. Although the name would suggest that we would
never use the worst fit policy, it does have one advantage: If most of the requests are of
similar size, a worst fit policy tends to minimize external fragmentation.
6.Fragmentation
When allocating memory, we can end up with some wasted space. This happens in two
ways. First, if we allocate memory in such a way that we actually allocate more than
is requested, some of the allocated block will go unused. This type of waste is called
internal fragmentation. The other type of waste is unused memory outside of any
allocated unit. This can happen if there are available free blocks that are too small to
satisfy any request. Wasted memory that lies outside allocation units is called external
fragmentation.
7. Partitioning
The simplest methods of allocating memory are based on dividing memory into areas with
fixed partitions. Typically, we administratively define fixed partitions between blocks of
varying size. These partitions are in effect from the time the system starts to the time it
19
is shut down. Memory requests are all satisfied from the fixed set of defined partitions.
Before we can allocate memory, we must locate the free memory. Naturally, we want to
represent the free memory blocks in a way that makes the search efficient.
Before getting into the details, however, we should ask whether we are talking about
locating free memory in the physical memory space or the virtual memory space. Through-
out this chapter, we look at memory management techniques primarily from the perspec-
tive of the operating system managing the physical memory resource. Consequently, these
techniques can all be viewed as operating in the physical memory space. However, many
of these techniques can also be used to manage virtual memory space. Application-level
dynamic memory allocation, using familiar operations such as the malloc( ) call in C or the
new operator in C++, often allocate large blocks from the OS and then subdivide them
into smaller allocations. They may well use some of these same techniques to manage
their own usage of memory. The operating system, however, does not concern itself with
that use of these techniques.
Free Bitmaps
If we are operating in an environment with fixed-sized pages, then the search becomes
easy. We don’t care which page, because they’re all the same size. It’s quite common in
this case to simply store one bit per page frame, which is set to one if the page frame is
free, and zero if it is allocated. With this representation, we can mark a page as either
free or allocated in constant time by just indexing into this free bitmap. Finding a free
page is simply a matter of locating the first nonzero bit in the map. To make this search
easier, we often keep track of the first available page. When we allocate it, we search from
that point on to find the next available one.
The memory overhead for a free bitmap representation is quite small. For example,
if we have pages that are 4096 bytes each, the bitmap uses 1 bit for each 32,768 bits of
memory, a 0.003% overhead.
Generally, when we allocate in an environment that uses paging address translation,
we don’t care which page frame we give a process, and the process never needs to have
control over the physical relationship among pages. However, there are exceptions. One
exception is a case where we do allocate memory in fixed sized units, but where there is
no address translation. Another is where not all page frames are created equally. In both
cases, we might need to request a number of physically contiguous page frames. When we
allocate multiple contiguous page frames, we look not for the first available page, but for
a run of available pages at least as large as the allocation request.
Free Lists
We can also represent the set of free memory blocks by keeping them in a linked list.
When dealing with fixed-sized pages, allocation is again quite easy. We just grab the first
page off the list. When pages are returned to the free set, we simply add them to the list.
Both of these are constant time operations.
If we are allocating memory in variable-sized units, then we need to search the list to
find a suitable block. In general, this process can take an amount of time proportional
20
to the number of free memory blocks. Depending on whether we choose to keep the list
sorted, adding a new memory block to the free list can also take O(n) time (proportional
to the number of free blocks). To speed the search for particular sized blocks, we often
use more complex data structures. Standard data structures such as binary search trees
and hash tables are among the more commonly used ones.
Using the usual linked list representation, we have a structure that contains the starting
address, the size, and a pointer to the next element in the list. In a typical 32-bit system,
this structure takes 12 bytes. So if the average size of a block is 4096 bytes, the free list
would take about 0.3% of the available free space. However, there’s a classic trick we can
play to reduce this overhead to nothing except for a pointer to the head of the list. This
trick is based on finding some other way to keep track of the starting address, the size,
and the pointers that define the list structure. Because each element of the list represents
free space, we can store the size and pointer to the next one in the free block itself. (The
starting address is implicit.) This technique is illustrated in Example 9.1.
Relocation
In systems with virtual memory, programs in memory must be able to reside in different parts of
the memory at different times. This is because when the program is swapped back into memory
after being swapped out for a while it can not always be placed in the same location. The virtual
memory management unit must also deal with concurrency. Memory management in the
operating system should therefore be able to relocate programs in memory and handle memory
references and addresses in the code of the program so that they always point to the right
location in memory.
Protection
Processes should not be able to reference the memory for another process without permission.
This is called memory protection, and prevents malicious or malfunctioning code in one program
from interfering with the operation of other running programs.
21
Sharing
Even though the memory for different processes is normally protected from each other, different
processes sometimes need to be able to share information and therefore access the same part of
memory. Shared memory is one of the fastest techniques for Inter-process communication.
Logical organization
Programs are often organized in modules. Some of these modules could be shared between
different programs, some are read only and some contain data that can be modified. The memory
management is responsible for handling this logical organization that is different from the
physical linear address space. One way to arrange this organization is segmentation.
Cache memory, also called Cache, a supplementary memory system that temporarily stores
frequently used instructions and data for quicker processing by the central processor of a
computer. The cache augments, and is an extension of, a computer’s main memory. Both main
memory and cache are internal, random-access memories (RAMs) that use semiconductor-based
transistor circuits. Cache holds a copy of only the most frequently used information or program
codes stored in the main memory; the smaller capacity of the cache reduces the time required to
locate data within it and provide it to the computer for processing.When a computer’s central
processor accesses its internal memory, it first checks to see if the information it needs is stored
in the cache. If it is, the cache returns the data to the processor. If the information is not in the
cache, the processor retrieves it from the main memory. Disk cache memory operates similarly,
but the cache is used to hold data that has been recently written on, or retrieved from, a magnetic
disk or other external storage device.
Cache memory is random access memory (RAM) that a computer microprocessor can access
more quickly than it can access regular RAM. As the microprocessor processes data, it looks first
in the cache memory and if it finds the data there (from a previous reading of data), it does not
have to do the more time-consuming reading of data from larger memory.Cache memory is
sometimes described in levels of closeness and accessibility to the microprocessor. An L1 cache
is on the same chip as the microprocessor. (For example, the PowerPC 601 processor has a 32
kilobyte level-1 cache built into its chip.) L2 is usually a separate static RAM (SRAM) chip. The
main RAM is usually a dynamic RAM (DRAM) chip.
Data is transferred between memory and cache in blocks of fixed size, called cache lines. When a
cache line is copied from memory into the cache, a cache entry is created. The cache entry will
include the copied data as well as the requested memory location (now called a tag).
When the processor needs to read or write a location in main memory, it first checks for a
corresponding entry in the cache. The cache checks for the contents of the requested memory
location in any cache lines that might contain that address. If the processor finds that the memory
22
location is in the cache, a cache hit has occurred. However, if the processor does not find the
memory location in the cache, a cache miss has occurred. In the case of:
a cache hit, the processor immediately reads or writes the data in the cache line
a cache miss, the cache allocates a new entry, and copies in data from main memory;
then, the request is fulfilled from the contents of the cache.
Cache performance
The proportion of accesses that result in a cache hit is known as the hit rate, and can be a
measure of the effectiveness of the cache for a given program or algorithm.
When a computer architecture is designed, the choice of a word size is of substantial importance.
There are design considerations which encourage particular bit-group sizes for particular uses
(e.g. for addresses), and these considerations point to different sizes for different uses. However,
considerations of economy in design strongly push for one size, or a very few sizes related by
multiples or fractions (submultiples) to a primary size. That preferred size becomes the word size
of the architecture.Early machine designs included some that used what is often termed a
variable word length. In this type of organization, a numeric operand had no fixed length but
rather its end was detected when a character with a special marking was encountered.
In computer architecture, 32-bit integers, memory addresses,or other data units are those that are
at most 32 bits (4octets) wide. Also, 32-bit CPU and ALU architectures are those that are based
on registers, address buses, or databuses of that size. 32-bit is also a term given to generation of
computers in which 32-bit processors were the norm.
23
digital video, scientific computing, and large databases easier, there has
been considerable debate as to whether they or their 32-bit compatibility modes will be faster
than comparably-priced 32-bit systems for other tasks. In x86-64 architecture
(AMD64), the majority of the 32-bit operating systems and applications are able to run smoothly
on the 64-bit hardware.
3.1.4 Speed
Speed is not the only factor to consider in a comparison of 32-bit and 64-bit processors.
Applications such as multi-tasking, stress testing, and clustering—for HPC
(high-performance computing)—may be more suited to a 64-bit architecture when deployed
appropriately. 64-bit clusters have been widely deployed in large organizations such as IBM, HP
and Microsoft.
Instruction pipelining is a method for increasing the throughput of a digital circuit, particularly a
CPU, and implements a form of instruction level parallelism. The idea is to divide the logic into
stages, and to work on different data within each stage.
Pipelining is most suited for tasks in which essentially the same sequence of steps must be
repeated many times for different data. This is true, for example, in many numerical problems
which systematically process data from arrays. Arithmetic pipelining is used in some specialized
computers discussed elsewhere. One action common to all computers, however, is the systematic
fetch and execute of instructions. This process can be effectively pipelined, and this instruction
pipelining is the subject to be considered in this chapter.
The first step in applying pipelining techniques to instruction processing is to divide the task into
steps that may be performed with independent hardware. The most obvious division is between the
FETCH cycle (fetch and interpret instructions) and the EXECUTE cycle (access operands and
perform operation). If these two activities are to run simultaneously, they must use independent
registers and processing circuits, including independent access to memory (separate MAR and
MBR).
It is possible to further divide FETCH into fetching and interpreting, but since interpreting is
very fast this is not generally done. To gain the benefits of pipelining it is desirable that each
stage take a comparable amount of time.
24
A more practical division would split the EXECUTE cycle into three parts: Fetch operands,
perform operation, and store results. A typical pipeline might then have four stages through
which instructions pass, and each stage could be processing a different instruction at the same
time. The result of each stage is passed on to the next stage.
Several difficulties prevent instruction pipelining from being as simple as the above description
suggests. The principal problems are:
Timing variations:
Not all stages take the same amount of time. This means that the speed gain of a pipeline will be
determined by its slowest stage. This problem is particularly acute in instruction processing,
since different instructions have different operand requirements and sometimes vastly different
processing time. Moreover, synchronization mechanisms are required to ensure that data is
passed from stage to stage only when both stages are ready.
Data Hazards
When several instructions are in partial execution, a problem arises if they reference the same
data. We must ensure that a later instruction does not attempt to access data sooner than a
preceding instruction, if this will lead to incorrect results. For example, instruction N+1 must not
be permitted to fetch an operand that is yet to be stored into by instruction N.
Branching
In order to fetch the "next" instruction, we must know which one is required. If the present
instruction is a conditional branch, the next instruction may not be known until the current one is
processed.
Interrupts
Interrupts insert unplanned "extra" instructions into the instruction stream. The interrupt must
take effect between instructions, that is, when one instruction has completed and the next has not
yet begun. With pipelining, the next instruction has usually begun before the current one has
completed. All of these problems must be solved in the context of our need for high speed
performance. If we cannot achieve sufficient speed gain, pipelining may not be worth the cost.
Timing Variations
To maximize the speed gain, stages must first be chosen to be as uniform as possible in timing
requirements. However, a timing mechanism is needed. A synchronous method could be used, in
which a stage is assumed to be complete in a definite number of clock cycles. However,
25
asynchronous techniques are generally more efficient. A flag bit or signal line is passed forward
to the next stage indicating when valid data is available. A signal must also be passed back from
the next stage when the data has been accepted. In all cases there must be a buffer register
between stages to hold the data; sometimes this buffer is expanded to a memory which can hold
several data items. Each stage must take care not to accept input data until it is valid, and not to
produce output data until there is room in its output buffer.
Data Hazards
To guard against data hazards it is necessary for each stage to be aware of the operands in use by
stages further down the pipeline. The type of use must also be known, since two successive reads
do not conflict and should not be cause to slow the pipeline. Only when writing is involved is
there a possible conflict.
The pipeline is typically equipped with a small associative check memory which can store the
address and operation type (read or write) for each instruction currently in the pipe. The concept
of "address" must be extended to identify registers as well. Each instruction can affect only a
small number of operands, but indirect effects of addressing must not be neglected.As each
instruction prepares to enter the pipe, its operand addresses are compared with those already
stored. If there is a conflict, the instruction (and usually those behind it) must wait. When there is
no conflict, the instruction enters the pipe and its operands addresses are stored in the check
memory. When the instruction completes, these addresses are removed. The memory must be
associative to handle the high-speed lookups required.
Branching
The problem in branching is that the pipeline may be slowed down by a branch instruction
because we do not know which branch to follow. In the absence of any special help in this area,
it would be necessary to delay processing of further instructions until the branch destination is
resolved. Since branches are extremely frequent, this delay would be unacceptable.
One solution which is widely used, especially in RISC architectures, is deferred branching. In
this method, the instruction set is designed so that after a conditional branch instruction, the next
instruction in sequence is always executed, and then the branch is taken. Thus every branch must
be followed by one instruction which logically precedes it and is to be executed in all cases. This
gives the pipeline some breathing room. If necessary this instruction can be a no-op, but frequent
use of no-ops would destroy the speed benefit.Use of this technique requires a coding method
which is confusing for programmers but not too difficult for compiler code generators.
Interrupts
The fastest but most costly solution to the interrupt problem would be to include as part of the
saved "hardware state" of the CPU the complete contents of the pipeline, so that all instructions
may be restored to their original state in the pipeline. This strategy is too expensive in other ways
and is not practical. The simplest solution is to wait until all instructions in the pipeline complete,
that is, flush the pipeline from the starting point, before admitting the interrupt sequence. If
26
interrupts are frequent, this would greatly slow down the pipeline; moreover, critical interrupts
would be delayed.
Superpipelining refers to dividing the pipeline into more steps. The more pipe stages there are,
the faster the pipeline is because each stage is then shorter. Ideally, a pipeline with five stages
should be five times faster than a non-pipelined processor (or rather, a pipeline with one stage).
The instructions are executed at the speed at which each stage is completed, and each stage takes
one fifth of the amount of time that the non-pipelined instruction takes. Thus, a processor with an
8-step pipeline (the MIPS R4000) will be even faster than its 5-step counterpart. The MIPS
R4000 chops its pipeline into more pieces by dividing some steps into two. Instruction fetching,
for example, is now done in two stages rather than one. The stages are as shown:
Dynamic pipelines have the capability to schedule around stalls. A dynamic pipeline is divided
into three units: the instruction fetch and decode unit, five to ten execute or functional units, and
a commit unit. Each execute unit has reservation stations, which act as buffers and hold the
operands and operations.
27
Diagram of a Pipelining process
While the functional units have the freedom to execute out of order, the instruction fetch/decode
and commit units must operate in-order to maintain simple pipeline behavior. When the
instruction is executed and the result is calculated, the commit unit decides when it is safe to
store the result. If a stall occurs, the processor can schedule other instructions to be executed
until the stall is resolved. This, coupled with the efficiency of multiple units executing
instructions simultaneously, makes a dynamic pipeline an attractive alternative.
Reduced Instruction Set Computing (RISC), is a microprocessor CPU design philosophy that
favors a smaller and simpler set of instructions that all take about the same amount of time to
execute. A RISC processor pipeline operates in much the same way, although the stages in the
pipeline are different. While different processors have different numbers of steps, they are
basically variations of these five, used in the MIPS R3000 processor:
If you glance back at the diagram of the laundry pipeline, you'll notice that although the washer
finishes in half an hour, the dryer takes an extra ten minutes, and thus the wet clothes must wait
ten minutes for the dryer to free up. Thus, the length of the pipeline is dependent on the length of
the longest step. Because RISC instructions are simpler than those used in pre-RISC processors
28
(now called CISC, or Complex Instruction Set Computer), they are more conducive to
pipelining. While CISC instructions varied in length, RISC instructions are all the same length
and can be fetched in a single operation. Ideally, each of the stages in a RISC processor pipeline
should take 1 clock cycle so that the processor finishes an instruction each clock cycle and
averages one cycle per instruction (CPI).
The term Harvard architecture originally referred to computer architectures that used physically
separate storage and signal pathways for their instructions and data (in contrast to the von
Neumann architecture). In a computer with a von Neumann architecture, the CPU can be either
reading an instruction or reading/writing data from/to the memory. Both cannot occur at the
same time since the instructions and data use the same signal pathways and memory. In a
computer with Harvard architecture, the CPU can read both an instruction and data from memory
at the same time. A computer with Harvard architecture can be faster because it is able to fetch
the next instruction at the same time it completes the current instruction. Speed is gained at the
expense of more complex electrical circuitry.
Instructions are operations performed by the CPU. Operands are entities operated upon by the
instruction. Addresses are the locations in memory of specified data.
4.1.1 Instructions
Label
Instruction
Operands
Comment
The terms Instruction and Mnemonic are used interchangeably in this document to refer to the
names of x86 instructions. Although the term Opcode is sometimes used as a synonym for
Instruction, this document reserves the term Opcode for the hexadecimal representation of the
instruction value.
For most instructions, the Solaris x86 assembler mnemonics are the same as the Intel or AMD
mnemonics. However, the Solaris x86 mnemonics might appear to be different because the
Solaris mnemonics are suffixed with a one-character modifier that specifies the size of the
instruction operands. That is, the Solaris assembler derives its operand type information from the
instruction name and the suffix. If a mnemonic is specified with no type suffix, the operand type
defaults to long.
29
Assembly language, or just assembly, is a low-level programming language, which uses
mnemonics, instructions and operands to represent machine code. This enhances the readability
while still giving precise control over the machine instructions. Most programming is currently
done using high-level programming languages,which are typically easier to read and write. These
languages need to be compiled (translated into assembly language), or run through other
compiled programs.
4.1.2 Opcodes
Opcodes are also given mnemonics (short names) so that they can be easily referred to in code
listings and similar documentation. For example, an instruction to store the contents of the
accumulator in a given memory address could be given the binary opcode 000001, which may
then be referred to using the mnemonic STA (short for STore Accumulator). Such mnemonics
will be used for the examples on upcoming pages.
4.1.3 Operand
In computers, an Operand is the part of a computer instruction that specifies data that is to be
operating on or manipulated and, by extension, the data itself. Basically, a computer instruction
describes an operation (add, subtract, and so forth) and the operand or operands on which the
operation is to be performed.
An Opcode is an identifier that starts with a letter character and may be followed by up to
fourteen more characters. Each additional character may be a letter or a digit or the underscore
character. Traditionally, no uppercase letters are used in opcode names that are to be used by
more than one program.
An Operand is either a set of contiguous non-white space printing characters or a string. A string
is a set of contiguous printing characters delimited by a quote (ASCII code: 34 decimal, 0x22
hexadecimal) character at each end. A string value must have less than 256 bytes of data. If at
least one operand is present in an operation, there is a single space between the opcode and the
first operand. If more than one operand is present in an operation, there is a single blank
character between every two adjacent operands. If there are no operands, a semicolon character
is appended to the opcode to mark the end of the operation. If any operands appear, the last
operand has an appended semicolon that marks the end of the operation.
30
The exact format of the machine codes is again CPU dependant. For the purpose of this tutorial,
we will presume we are using a 24-bit CPU. This means that the minimum length of the machine
codes used here should be 24 binary bits, which in this instance are split as shown in the table
below:
Operands can be immediate (that is, constant expressions that evaluate to an inline value),
register (a value in the processor number registers), or memory (a value stored in memory). An
indirect operand contains the address of the actual operand value. Indirect operands are specified
by prefixing the operand with an asterisk (*) (ASCII 0x2A). Only jump and call instructions can
use indirect operands.
Indirect Operands
16-bit registers for used indirect addressing: SI, DI, BX, and BP
31
4.3 INSTRUCTION CYCLE
The fetch execute cycle is the time period of which the computer reads and processes the
instructions from the memory, and executes them. This process is a continuous cycle which is
used until the computer is turned off or there are no more instructions to process.
32
4.3.2 Fetch Cycle
The fetch cycle takes the address required from memory, stores it in the instruction register, and
moves the program counter on one so that it points to the next instruction.
The fetch part of the cycle starts by instructions being collected either from the hard-drive, the
RAM, the cache or the registers. The way it knows which order to retrieve the instructions in is
by each instruction being given a unique ID which are stored in a register so the control unit
knows exactly what its looking for. (similar to how a computer has a unique IP Address on a
network).
Here, the control unit checks the instruction that is now stored within the instruction register. It
determines which opcode and addressing mode have been used, and as such what actions need to
be carried out in order to execute the instruction in question.
4.3.4 Execute
The actual actions which occur during the execute cycle of an instruction depend on both the
instruction itself, and the addressing mode specified to be used to access the data that may be
required. However, four main groups of actions do exist, which are discussed in full later on.
After the correct instructions have been fetched the CPU will then interpret what the instruction
is telling it to do then it will simply execute the instruction and the whole process will begin
again until there are no more instructions or the computer is turned off. Once a program is in
memory it has to be executed. To do this, each instruction must be looked at, decoded and acted
upon in turn until the program is completed. This is achieved by the use of what is termed the
'instruction execution cycle', which is the cycle by which each instruction in turn is processed.
However, to ensure that the execution proceeds smoothly, it is is also necessary to synchronise
the activites of the processor.
Program counter (PC) - an incrementing counter that keeps track of the memory
address of the instruction that is to be executed next.
Memory address register (MAR) - holds the address of a memory block to be read from
or written to.
Memory data register (MDR) - a two-way register that holds data fetched from memory
(and ready for the CPU to process) or data waiting to be stored in memory
Instruction register (IR) - a temporary holding ground for the instruction that has just
been fetched from memory
33
Control unit (CU) - decodes the program instruction in the IR, selecting machine
resources such as a data source register and a particular arithmetic operation, and
coordinates activation of those resources
Arithmetic logic unit (ALU) - performs mathematical and logical operations
When an instruction requires two operands, the first operand is generally the destination, which
contains data in a register or memory location and the second operand is the source. Source
contains either the data to be delivered (immediate addressing) or the address (in register or
memory) of the data. Generally, the source data remains unaltered after the operation.
Register addressing
Immediate addressing
Memory addressing
A register operand is one of the eight general- and special-purpose 16-bit registers listed above,
or one of the eight general-purpose 8-bit registers (AL, AH, ...), or one of the four segment
registers. The contents of the register are used and/or modified by the operation. In the example
above, the destination operand of the MOV instruction is the low byte of the accumulator, AL; the
effect of the instruction is to store the binary number 00001101 into the bottom eight bits of AX
(leaving the other bits unchanged).
In this addressing mode, a register contains the operand. Depending upon the instruction, the
register may be the first operand, the second operand or both.
For example,
An immediate operand is just a number (or a label, which the assembler converts to the
corresponding address). An immediate operand is used to specify a constant for one of the
arithmetic or logical operations, or to give the jump address for a branching instruction. Most
34
assemblers, including NASM, allow simple arithmetic expressions when computing immediate
operands. For example, all of the following are equivalent:
MOV AL, 13
MOV AL, 0xD
MOV AL, 0Ah + 3 ;Note leading 0 to distinguish from register AH
MOV AL, George * 2 - 1
assuming that the label George is associated with the address 7.
An immediate operand has a constant value or an expression. When an instruction with two
operands uses immediate addressing, the first operand may be a register or memory location, and
the second operand is an immediate constant. The first operand defines the length of the data.
For example,
A memory operand gives the address of a location in main memory to use in the operation. The
NASM syntax for this is very simple: put the address in square brackets. The address can be
given as an arithmetic expression involving constants and labels (the displacement), plus an
optional base or index register. Here are some examples:
When operands are specified in memory addressing mode, direct access to main memory,
usually to the data segment, is required. This way of addressing results in slower processing of
data. To locate the exact location of data in memory, we need the segment start address, which is
typically found in the DS register and an offset value. This offset value is also called effective
address.
In direct addressing mode, the offset value is specified directly as part of the instruction, usually
indicated by the variable name. The assembler calculates the offset value and maintains a symbol
table, which stores the offset values of all the variables used in the program.
In direct memory addressing, one of the operands refers to a memory location and the other
operand references a register.
For example,
35
Other types of Addressing modes include the following:
Direct-Offset Addressing
This addressing mode uses the arithmetic operators to modify an address. For example, look at
the following definitions that define tables of data:
The following operations access data from the tables in the memory into registers:
This addressing mode utilizes the computer's ability of Segment:Offset addressing. Generally, the
base registers EBX, EBP (or BX, BP) and the index registers (DI, SI), coded within square
brackets for memory references, are used for this purpose.
Indirect addressing is generally used for variables containing several elements like, arrays.
Starting address of the array is stored in, say, the EBX register.
The following code snippet shows how to access different elements of the variable.
5.1 INTERRUPT
Interrupt is a signal to the processor emitted by hardware or software indicating an event that
needs immediate attention. An interrupt alerts the processor to a high-priority condition requiring
the interruption of the current code the processor is executing, the current thread. The processor
responds by suspending its current activities, saving its state, and executing a small program
called an interrupt handler (or interrupt service routine, ISR) to deal with the event. This
interruption is temporary, and after the interrupt handler finishes, the processor resumes
execution of the previous thread. An interrupt is a signal from a device attached to a computer or from
a program within the computer that causes the main program that operates the computer (the operating
system ) to stop and figure out what to do next.
36
Basically, a single computer can perform only one computer instruction at a time. But, because it
can be interrupted, it can take turns in which programs or sets of instructions that it performs.
This is known as multitasking . It allows the user to do a number of different things at the same
time. The computer simply takes turns managing the programs that the user effectively starts. Of
course, the computer operates at speeds that make it seem as though all of the user's tasks are
being performed at the same time. (The computer's operating system is good at using little
pauses in operations and user think time to work on other programs.)An operating system usually
has some code that is called an interrupt handler . The interrupt handler prioritizes the interrupts
and saves them in a queue if more than one is waiting to be handled. The operating system has
another little program, sometimes called a scheduler , that figures out which program to give
control to next.
37
occurring during program execution that are exceptional enough that they cannot be
handled within the program itself. For example, if the processor's arithmetic logic unit is
commanded to divide a number by zero, this impossible demand will cause a divide-by-
zero exception, perhaps causing the computer to abandon the calculation or display an
error message. Software interrupt instructions function similarly to subroutine calls and
are used for a variety of purposes, such as to request services from low level system
software such as device drivers. For example, computers often use software interrupt
instructions to communicate with the disk controller to request data be read or written to
the disk..
An interrupt that leaves the machine in a well-defined state is called a precise interrupt. Such
an interrupt has four properties:
An interrupt that does not meet these requirements is called an imprecise interrupt.
The phenomenon where the overall system performance is severely hindered by excessive
amounts of processing time spent handling interrupts is called an interrupt storm.
38
Message-signaled : A message-signalled interrupt does not use a physical interrupt
line. Instead, a device signals its request for service by sending a short message over
some communications medium, typically a computer bus. The message might be of a
type reserved for interrupts, or it might be of some pre-existing type such as a memory
write.Message-signalled interrupts behave very much like edge-triggered interrupts, in
that the interrupt is a momentary signal rather than a continuous condition. Interrupt-
handling software treats the two in much the same manner. Typically, multiple pending
message-signalled interrupts with the same message (the same virtual interrupt line) are
allowed to merge, just as closely spaced edge-triggered interrupts can merge.
Performance issues
Interrupts provide low overhead and good latency at low load, but degrade significantly at high
interrupt rate unless care is taken to prevent several pathologies. These are various forms of
livelocks, when the system spends all of its time processing interrupts to the exclusion of other
required tasks. Under extreme conditions, a large number of interrupts (like very high network
traffic) may completely stall the system. To avoid such problems, an operating system must
schedule network interrupt handling as carefully as it schedules process execution.[2]
Typical uses of interrupts include the following: system timers, disk I/O, power-off signals, and
traps. Other interrupts exist to transfer data bytes using UARTs or Ethernet; sense key-presses;
control motors; or anything else the equipment must do.
One typical use is to generate interrupts periodically by dividing the output of a crystal
oscillator and having an interrupt handler count the interrupts in order to keep time.
These periodic interrupts are often used by the OS's task scheduler to reschedule the
priorities of running processes. Some older computers generated periodic interrupts from
the power line frequency because it was controlled by the utilities to eliminate long-term
drift of electric clocks.
A disk interrupt signals the completion of a data transfer from or to the disk peripheral. A
process waiting to read or write a file starts up again.
A power-off interrupt predicts or requests a loss of power. It allows the computer
equipment to perform an orderly shut-down.
Interrupts are also used in typeahead features for buffering events like keystrokes.
Branch instructions are those that tell the processor to make a decision about what the next
instruction to be executed should be based on the results of another instruction. Branch
instructions can be troublesome in a pipeline if a branch is conditional on the results of an
instruction which has not yet finished its path through the pipeline.
For example:
39
Loop : add $r3, $r2, $r1
sub $r6, $r5, $r4
beq $r3, $r6, Loop
The example above instructs the processor to add r1 and r2 and put the result in r3, then subtract
r4 from r5, storing the difference in r6. In the third instruction, beq stands for branch if equal. If
the contents of r3 and r6 are equal, the processor should execute the instruction labeled "Loop."
Otherwise, it should continue to the next instruction. In this example, the processor cannot make
a decision about which branch to take because neither the value of r3 or r6 have been written into
the registers yet.
The processor could stall, but a more sophisticated method of dealing with branch instructions is
branch prediction. The processor makes a guess about which path to take - if the guess is wrong,
anything written into the registers must be cleared, and the pipeline must be started again with
the correct instruction. Some methods of branch prediction depend on stereotypical behavior.
Branches pointing backward are taken about 90% of the time since backward-pointing branches
are often found at the bottom of loops. On the other hand, branches pointing forward, are only
taken approximately 50% of the time. Thus, it would be logical for processors to always follow
the branch when it points backward, but not when it points forward. Other methods of branch
prediction are less static: processors that use dynamic prediction keep a history for each branch
and uses it to predict future branches. These processors are correct in their predictions 90% of
the time.
40