0% found this document useful (0 votes)
83 views40 pages

Computer Architecture Note ND2

The document discusses computer architecture and different word processing formats. It covers topics like instruction set architecture, microarchitecture, system design, logic implementation, circuit implementation, and physical implementation in computer architecture. It also discusses different file formats like RTF, PDF, plain text, and older Microsoft Word formats that can be used for compatibility. The document provides information on opening and saving files in different formats using Microsoft Word.

Uploaded by

omolewatimilehin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
83 views40 pages

Computer Architecture Note ND2

The document discusses computer architecture and different word processing formats. It covers topics like instruction set architecture, microarchitecture, system design, logic implementation, circuit implementation, and physical implementation in computer architecture. It also discusses different file formats like RTF, PDF, plain text, and older Microsoft Word formats that can be used for compatibility. The document provides information on opening and saving files in different formats using Microsoft Word.

Uploaded by

omolewatimilehin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 40

COMPUTER ARCHITECTURE I

CTE 214

ND 2 ( FULLTIME AND PART-TIME)

TWO UNITS

FIRST SEMESTER COURSE

LECTURE: MRS AKINGBADE L.O.

THE FEDERAL POLYTECHNIC

ILARO, OGUN STATE.

1
1.1 INTRODUCTION TO COMPUTER ARCHITECTURE

Computer Architecture is the science and art of selecting and interconnecting hardware
components to create computers that meet functional, performance and cost goals. Computer
architecture is not about using computers to design buildings. In computer science and computer
engineering, computer architecture or digital computer organization is the conceptual design
and fundamental operational structure of a computer system. It's a functional description of
requirements and design implementations for the various parts of a computer, focusing largely on
the way by which the central processing unit (CPU) performs internally and accesses addresses
in memory.

Computer architecture comprises at least three main subcategories:

 Instruction set architecture, or ISA, is the abstract image of a computing system that is
seen by a machine language (or assembly language) programmer, including the
instruction set, word size, memory address modes, processor registers, and address and
data formats.

 Microarchitecture, also known as Computer organization is a lower level, more concrete


and detailed, description of the system that involves how the constituent parts of the
system are interconnected and how they interoperate in order to implement the ISA.[2]
The size of a computer's cache for instance, is an organizational issue that generally has
nothing to do with the ISA.

 System Design which includes all of the other hardware components within a computing
system such as:

Once both ISA and microarchitecture have been specified, the actual device needs to be designed
into hardware. This design process is called the implementation. Implementation is usually not
considered architectural definition, but rather hardware design engineering.

Implementation can be further broken down into three (not fully distinct) pieces:

 Logic Implementation — design of blocks defined in the microarchitecture at


(primarily) the register-transfer and gate levels.
 Circuit Implementation — transistor-level design of basic elements (gates,
multiplexers, latches etc.) as well as of some larger blocks (ALUs, caches etc.) that may
be implemented at this level, or even (partly) at the physical level, for performance
reasons.
 Physical Implementation — physical circuits are drawn out, the different circuit
components are placed in a chip floorplan or on a board and the wires connecting them
are routed.

1.2 DOCUMENT FORMATS IN WORD

2
WORD FORMAT PROCESSING

The use of specialized document manipulation software running on a computer or terminal that
allows a user to create, edit, store and print out text based documents. Most modern companies
that have a need for producing business letters or other types of text documents will have access
to word processing software and a printer. A word processor, or word processing program, does
exactly what the name implies. It processes words. It also processes paragraphs, pages, and entire
papers. Some examples of word processing programs include Microsoft Word, WordPerfect
(Windows only), AppleWorks (Mac only), and OpenOffice.org.

The first word processors were basically computerized typewriters, which did little more than
place characters on a screen, which could then be printed by a printer. Modern word processing
programs, however, include features to customize the style of the text, change the page
formatting, and may be able to add headers, footers, and page numbers to each page. Some may
also include a "Word Count" option, which counts the words and characters within a document

Microsoft Word is a word processor developed by Microsoft. It was first released in 1983
under the name Multi-Tool Word for Xenix systems.[3] Subsequent versions were later written for
several other platforms including IBM PCs running DOS (1983), the Apple Macintosh (1985),
the AT&T Unix PC (1985), Atari ST (1988), SCO UNIX (1994), OS/2 (1989), and Windows
(1989). Commercial versions of Word are licensed as a standalone product or as a component of
Microsoft Office, Windows RT or the discontinued Microsoft Works Suite. Freeware editions of
Word are Microsoft Word Viewer and Word Web App on SkyDrive, both of which have limited
feature sets.

Other word processors have their own standards as well. OpenOffice Writer, for example, uses
the OpenDocument, or ODF, format. Kingsoft Writer uses a format called WPS. And so on.

Fortunately, these and other programs can save documents in multiple formats, thereby making
them easier to access in, well, other programs. That's why, in Microsoft Word, if you click the
Save as type pull-down in the Save dialog, you'll see a wealth of choices. Below I've identified
some of the more popular ones, and in what circumstances you might use them.

3
1.2.1 The Different formats

 Rich Text Format RTF might best be described as a "universal word-processing


format," as it's supported by just about every word processor. However, unlike plain text,
it retains basic formatting information, like font sizes and styles.
 PDF Adobe's Portable Document Format also has universal appeal, as it can be opened
using any number of viewers (including, most commonly, Adobe Reader). You'd use
PDF to produce your document in a read-only format, meaning it couldn't easily be
edited. It's also a good way to distribute documents online, as most browsers can view
PDFs without the need to download them fist.
 Plain Text Just like it sounds, this format saves only the raw text--no formatting, no
hidden codes, just your words. You might use this to export text that needs to be imported
into another program, like a blog tool or text editor--something that won't like all of
Word's underlying extras.
 Word 97-2003 Document So you've got Word 2010, but your parents are still plugging
along with Word 97. The latter can't open documents created by the former (not without a
converter, anyway), but at least Word lets you save files using the older formats. Some
kinds of formatting may get lost in translation, but this should work for most kinds of
documents.

4
Word can also save files as Web pages, XML documents, templates, and more. Needless to say,
if you need to learn about those formats, a little Google searching should reveal all.

When preparing a publication, different authors contribute to one document. As many different
MS Word versions exist (Word 2010 / 2007, Word 2003/02, older versions), each with different
possibilities and constraints, problems can arise when files are exchanged across these versions.

 .doc format: used until Word 2003


 .docx format: used in Word 2007 and 2010

Problems can arise when .docx files are worked on in Word 2003 or earlier. In particular figures
and equations may become unusable.

To avoid compatibility issues as far as possible, please avoid using .docx format when you are
not entirely sure that all of the participants working on the original document will have
Word 2007 or 2010 (backwards compatibility seems to be ensured without further problems).

1.2.2 Use Word to open or save a file in another file format

You can use Microsoft Office Word 2007 to open or save files in other formats. For example,
you can open a Web page and then upgrade it to access the new and enhanced features in Office
Word 2007. For more information on upgrading documents, see Use Microsoft Office Word
2007 to open documents created in previous versions of Word.

Open a file in Office Word 2007

You can use Office Word 2007 to open files in any of several formats.

1. Click the Microsoft Office Button, and then click Open.

2. In the Open dialog box, click the type of file that you want to open.
3. Click the file, and then click Open.

Save an Office Word 2007 file in another file format

You can save Office Word 2007 documents to any of several file formats.

Note You cannot use Microsoft Office Word 2007 to save a document as a JPEG (.jpg) or GIF
(.gif) file, but you can save a file as a PDF (.pdf) file.

1. Click the Microsoft Office Button , and then click Save As.

Note If you point to Save As, the menu that appears does not show a complete list of file
formats. To view all of the possible file formats, you must click Save As to open the Save As
dialog box.

5
2. In the Save As dialog box, click the arrow to the right of the Save as type list, and then
click the file type that you want.

For this type of file Choose


.docx Word Document
.docm Word Macro-Enabled Document
.doc Word 97-2003 Document
.dotx Word Template
.dotm Word Macro-Enabled Template
.dot Word 97-2003 Template
.pdf PDF
.xps XPS Document
.mht (MHTML) Single File Web Page
.htm (HTML) Web Page
.htm (HTML, filtered) Web Page, Filtered
.rtf Rich Text Format
.txt Plain Text
.xml (Word 2007) Word XML Document
.xml (Word 2003) Word 2003 XML Document
.wps Works 6.0-9.0

3. In the File name box, type a name for the file.


4. Click Save.

1.3 VON NEUMANN AND HARVARD ARCHITECTURE

There are basically two types of digital computer architectures. The first one is called Von
Neumann architecture and later Harvard architecture was adopted for designing digital
computers.

VON NEUMANN ARCHITECTURE


The meaning has evolved to be any stored-program computer in which an instruction fetch and a
data operation cannot occur at the same time because they share a common bus.

 It is named after the mathematician and early computer scientist John Von Neumann.
 The computer has single storage system(memory) for storing data as well as program to
be executed.
 Processor needs two clock cycles to complete an instruction. Pipelining the instructions is
not possible with this architecture.

6
 In the first clock cycle the processor gets the instruction from memory and decodes it. In
the next clock cycle the required data is taken from memory. For each instruction this
cycle repeats and hence needs two cycles to complete an instruction.
 This is a relatively older architecture and was replaced by Harvard architecture.

LIMITATIONS OF VON NEUMAN

 The processor takes more time to execute as it has to decide between the data and
instruction as both are stored in same memory and
 The shared bus between the program memory and data memory leads to the Von
Neumann bottleneck, the limited throughput (data transfer rate) between the CPU and
memory compared to the amount of memory
 Also there are two types of memory access first to access data and next for instruction or
vice versa.
 Because program memory and data memory cannot be accessed at the same time,
throughput is much smaller than the rate at which the CPU can work
 This seriously limits the effective processing speed when the CPU is required to perform
minimal processing on large amounts of data
 The CPU is continually forced to wait for needed data to be transferred to or from
memory
 Since CPU speed and memory size have increased much faster than the throughput
between them, the bottleneck has become more of a problem, a problem whose severity
increases with every newer generation of CPU.

HARVARD ARCHITECTURE
Harvard architecture which is also a stored-program system but has one dedicated set of address
and data buses for reading data from and writing data to memory, and another set of address and
data buses for fetching instructions.

 The name is originated from "Harvard Mark I" a relay based old computer.
 The computer has two separate memories for storing data and program.
 Processor can complete an instruction in one cycle if appropriate pipelining strategies are
implemented.
 In the first stage of pipeline the instruction to be executed can be taken from program
memory.In the second stage of pipeline data is taken from the data memory using the
decoded instruction or address.
 Most of the modern computing architectures are based on Harvard architecture.But the
number of stages in the pipeline varies from system to system.

These are the basic differences between the two architectures.A more comprehensive list can be
found here with respect to ARM class of processors.

7
The Von Neumann architecture

1.4 CENTRAL PROCESSING UNIT (CPU)

A CPU consists of a set of registers that function as a level of memory above Main memory and
Cache memory. The central processing unit(CPU) is the brain of any computer. It carries out all
the processing in the computer. Central Processing Unit itself consists of three main subsystems.
The first one is Control Unit, the second is Registers, and the third is Arithmetic and Logic Unit
(ALU).

A CPU works in a fetch execute cycle. On power on, the CPU fetches the first instruction from a
location specified by the program counter. This instruction is brought into instruction register
which is decoded by the control unit. Based on the instruction, the control unit would either fetch
the operand and or carry out arithmetic or logical operations on it, or store the result of such an
operation into a specified memory location. After one instruction is executed the next instruction
is fetched by the processor and executed. This process goes on till the processor does not come to
an halt instruction. A real life processor would have large number of registers, sophisticated
microprogram control unit and a sophisticated arithmetic and logic unit. Most powerful
processors currently popular are from from Intel, Pentium III and Pentium IV.

1.4.1 Functional Units of the CPU

8
The steps in the instruction cycle are performed by a variety of functional components within
the CPU. These components work very closely with the PCs memory and bus systems to
carry out their designated tasks.

The control unit (sometimes called the fetch / decode unit) is responsible for retrieving
individual instructions from their location in memory, then translating them into commands
that the CPU can understand. These commands are commonly referred to as machine-
language instructions, but are sometimes called micro-operations, or UOPs. When the
translation is complete, the control unit sends the UOPs to the execution unit for processing.

The execution unit is responsible for performing the third step of the instruction cycle,
namely, executing, or performing the operation that was specified by the instruction. The
execution unit itself is made up of several functional components as follows:

 Registers are temporary holding areas for UOPs and any data that the UOPs require
for processing. Typically, registers will be the same size as the CPU�s word size
(the number of bits that the CPU can process at one time).
 The arithmetic / logic unit (ALU) contains the electronic circuitry needed to perform
arithmetic and logical operations on data in the registers. Arithmetic operations
include addition, subtraction, multiplication and division. Logical operations consist
of comparing one data item to another in order to determine if the first data item was
equal, greater than or less than the second data item. The results of such a comparison
may cause different processing to occur.
 The floating point unit (FPU) performs arithmetic operations on numbers with
decimals, known as floating-point numbers. This is in contrast to the ALU, which
performs its operations on whole numbers only. (Early CPUs used a separate chip,
called a math co-processor, to perform operations on decimal numbers).
 The multimedia execution (MMX) unit performs special operations associated with
graphics, audio or video.

The following diagram shows how the functional components of the CPU work together to
fetch data and instructions that are stored in RAM, decode and execute the instructions, and
store the results back in RAM:

9
Functional components of a CPU

In this diagram,

1. Information from a software program, running in RAM, is sent along the 64-bit data
bus and enters the CPU via the Bus Interface Unit (BIU). The BIU first makes a copy
of the information and sends it to the L2 cache. It then determines if the information
is data or an instruction. Data is sent down a 64-bit path to a small (eg, 32KB) data
cache. Instructions are sent down a separate 64-bit path to a similar instruction cache.
The data cache and instruction cache are collectively known as the internal cache,
processor cache or L1 cache.
2. The control unit retrieves instructions from the instruction cache, breaks them down
into smaller micro-operations (UOPs), and moves them to the execution unit, where
they are held in temporary storage areas called registers.
3. The execution unit checks each UOP to see if they need any data. If the required data
currently resides in the L1 or L2 caches, it is moved into the registers. If not, a fetch
to the slower RAM is required (in which case, the process begins all over again). The
execution unit is able to "buffer" several instructions, so that it can move on to another
instruction while waiting for the data for the previous one.
4. The execution unit now executes the instruction. If calculations are involved, the
execution unit will enlist the help of several other units to perform the calculations,
specifically:
 Arithmetic / Logic Unit (ALU) for integer calculations, relational or logical
operations
 Floating Point Unit (FPU) for floating point number calculations
 MMX Unit for special calculations associated with graphics, audio or video.

10
5. The execution unit sends the result of the calculation back to the L2 and L1 caches, in
case it is needed again soon by another instruction. The data cache sends the result to
the BIU, which in turn sends it back to RAM

1.4.2 REGISTERS

The registers in the CPU are of two types:

· User-visible Registers

· Control and Status Registers

 User-Visible Registers

User visible registers are those registers that are used by the user in the machine language and
can be executed by the CPU. These enable the machine or assembly language programmer to
minimize main memory references by use of registers All CPU designs provide a number of user
visible registers. These registers can be categorized into following different types:

· General Purpose Registers: They can be assigned a variety of functions by the programmer.
The general purpose registers can be considered for orthogonal usage and non-orthogonal usage.
If any general purpose register can contain the operand for any opcode then we refer the use of
general purpose as orthogonal usage. Sometimes the use of general purpose register is restricted.
For example, there may be dedicated registers that are used for floating point operations. Then
we refer to the use of general purpose as non orthogonal usage.

· Data registers: They can only be used to hold data, and cannot be employed in the calculation
of an operand address.

· Address registers: They may be used either in general purpose addressing modes, or may be
devoted to a particular addressing mode.

-Segment Pointers: In a machine CPU with segmented addressing, a register holds the address
of the base of a segment. There may be multiple segment registers. For example, one for the
operating system, one for the current process.

Thus there may be many registers one for operating system and one for the current process.

-Index Registers: These are used for indexed addressing, and may be auto indexed.

- Stack Pointer: If there is a user-visible stack addressing, then typically the stack is in memory
and there is a dedicated register that points to the top of the stack. This allows implicit
addressing, that is push, pop and other stack instructions need not contain an explicit stack
operand.

11
· Condition codes: These are least partially visible register types. They are also referred to as
flags. The bits of Flag register are set according to the result of an operation. There can be Zero
flag, Carry flag, etc., that indicate whether the result is zero or the result produced an overflow,
respectively.

 Control and Status Registers

These registers are employed to control the operation of the CPU. These are used by the control
unit to control the operation of the CPU and by privileged operating system programs to control
the execution of programs.

Four registers are essential to instruction execution: These four registers are used for the
movement of data between the CPU and memory.

· Program Counter (PC): It contains the address of an instruction to be fetched.

· Instruction Register (IR): It contains the instruction most recently fetched.

· Memory Address Register (MAR): It contains the address of a location in memory.

· Memory Buffer Register (MBR): It contains a word of data to be written to memory or the
word most recently read.

Within the CPU, data must be presented to the ALU for processing. The ALU may have direct
access to the MBR and user-visible registers. Alternatively, there may be additional buffering
registers at the boundary to the ALU; and these registers serve as input and output registers for
the ALU and exchange data with MBR and user-visible registers. This depends on the design of
the CPU and ALU.

Program Status Word (PSW):

It contains status information. PSW typically contains condition codes plus other status
information. Some of these may be user visible. Common flags include:

· Sign: contains the sign bit of the result of the last arithmetic operation.

· Zero: set when the result is 0.

· Carry: set if an operation resulted in a carry (addition) into or borrow (subtraction) out of the
high-order bit.

· Equal: set if logical compare result is equality.

· Overflow: used to indicate arithmetic overflow.

· Interrupt enable/disable: used to enable or disable interrupts.

12
· Supervisor: Indicates whether the CPU is executing in supervisor mode or user mode. Certain
privilege instructions can be executed only in supervisor mode (e.g. halt instruction), and certain
areas of memory can be accessed only in supervisor mode.

1.4.3 CPU ORGANIZATION

CPU with System Bus

As discussed earlier CPU which is the heart of a computer consists of Registers, Control Unit
and Arithmetic Logic Unit. The interconnection of these units is achieved through the system bus
as shown in figure 4.1.

The following tasks are to be performed by the CPU:

1. Fetch instructions: The CPU must read instructions from the memory.

2. Interpret instructions: The instructions must be decoded to determine what action is


required.

3. Fetch data: The execution of an instruction may require reading data from memory or an I/O
module.

4. Process data: The execution of an instruction may require performing some arithmetic or
logical operations on data.

5. Write data: The results of an execution may require writing data to the memory or an I/O
module.

13
1.5 HARDWARE/ SOFTWARE TRADEOFFS

A hardware/software trade-off is the establishment of the division of responsibility for


performing system functions between the software, firmware and hardware. This is part and
parcel of the fundamental process of defining computer architecture. It begins the day a
computer is conceived and may be carried on by an ever widening group of individuals until the
last computer of a given model is retired. There are areas of the trade-off which are the sole
preserve of the manufacturer and his hardware/software team. Other areas of the trade-off are the
responsibility of the user, or independent equipment manufacturers.
In software engineering, Architecture tradeoff analysis method (ATAM) is a risk-mitigation
process used early in the software development life cycle.
ATAM was developed by the Software Engineering Institute at the Carnegie Mellon University.
Its purpose is to help choose a suitable architecture for a software system by discovering trade-
offs and sensitivity points.
ATAM is most beneficial when done early in the software development life-cycle, when the cost
of changing architectures is minimal.

 1 ATAM benefits
 2 ATAM process
 3 Steps of the ATAM process
 4 See also
 5 References
 6 External links

ATAM benefits
The following are some of the benefits of the ATAM process:

 Promotes the gathering of precise quality requirements


 Creates an early start at architecture documentation
 Creates a documented basis for architectural decisions
 Promotes identification of risks early in the life-cycle
 Encourages increased communication among stakeholders
 Results in the Prioritization of Conflicting Goals
 Forces a Clear Explication of the Architecture
 Uncovers Opportunities for Cross-Project Reuse
 Results in Improved Architecture Practices

ATAM process
The ATAM process consists of gathering stakeholders together to analyze business drivers
(system functionality, goals, constraints, desired non-functional properties) and from these
drivers extract quality attributes that are used to create scenarios. These scenarios are then used
in conjunction with architectural approaches and architectural decisions to create an analysis of
trade-offs, sensitivity points, and risks (or non-risks). This analysis can be converted to risk

14
themes and their impacts whereupon the process can be repeated. With every analysis cycle, the
analysis process proceeds from the more general to the more specific, examining the questions
that have been discovered in the previous cycle, until such time as the architecture has been fine-
tuned and the risk themes have been addressed.

Steps of the ATAM process


ATAM formally consists of nine steps, outlined below:

1. Present ATAM – Present the concept of ATAM to the stakeholders, and answer any
questions about the process.
2. Present business drivers – everyone in the process presents and evaluates the business
drivers for the system in question.
3. Present the architecture – the architect presents the high level architecture to the team,
with an 'appropriate level of detail'
4. Identify architectural approaches – different architectural approaches to the system are
presented by the team, and discussed.
5. Generate quality attribute utility tree – define the core business and technical
requirements of the system, and map them to an appropriate architectural property.
Present a scenario for this given requirement.
6. Analyze architectural approaches – Analyze each scenario, rating them by priority. The
architecture is then evaluated against each scenario.
7. Brainstorm and prioritize scenarios – among the larger stakeholder group, present the
current scenarios, and expand.
8. Analyze architectural approaches – Perform step 6 again with the added knowledge of the
larger stakeholder community.
9. Present results – provide all documentation to the stakeholders.
These steps are separated in two phases: Phase 1 consists of steps 1-6 and after this phase, the
state and context of the project, the driving architectural requirements and the state of the
architectural documentation are known. Phase 2 consists of steps 7-9 and finishes the evaluation

Figure: MIPS 16 decompression

15
2.1 Microcomputer Address Bus, Data Bus and Control Bus

Microprocessor is processing device of every computing device. It is like an artificial brain. It


needs to communicate with outer world. for example, It needs to communicate with Input
devices to get data, it needs to communicate with memory to process data according to
instructions written in memory and finally it needs to communicate with output devices to
display the output on O/P devices. To communicate with external world, Microprocessor make
use of buses. There are different types of buses used in Microprocessor. All computers use three
types of basic buses. The name of the bus is generally determined by the type of signal it is
carrying or the method of operation. We group the buses into three areas as you see them in their
most common uses. They are as follows:

2.1.1 Address Bus:

It is a group of wires or lines that are used to transfer the addresses of Memory or I/O devices. It
is unidirectional.In Intel 8085 microprocessor, Address bus was of 16 bits. This means that
Microprocessor 8085 can transfer maximum 16 bit address which means it can address 65,536
different memory locations. This bus is multiplexed with 8 bit data bus. So the most significant
bits (MSB) of address goes through Address bus (A7-A0) and LSB goes through multiplexed
data bus (AD0-AD7). The address bus consists of all the signals necessary to define any of the
possible memory address locations within the computer, or for modular memories any of the
possible memory address locations within a module. An address is defined as a label, symbol, or
other set of characters used to designate a location or register where information is stored. Before
data or instructions can be written into or read from memory by the CPU or I/O sections, an
address must be transmitted to memory over the address bus.

2.1.2 Data Bus:

As name tells that it is used to transfer data within Microprocessor and Memory/Input or Output
devices. It is bidirectional as Microprocessor requires to send or receive data. The data bus also
works as address bus when multiplexed with lower order address bus. Data bus is 8 Bits long.
The word length of a processor depends on data bus, thats why Intel 8085 is called 8 bit
Microprocessor because it have an 8 bit data bus. The bidirectional data bus, sometimes called
the memory bus, handles the transfer of all data and instructions between functional areas of the
computer. The bidirectional data bus can only transmit in one direction at a time. The data bus is
used to transfer instructions from memory to the CPU for execution. It carries data (operands) to
and from the CPU and memory as required by instruction translation. The data bus is also used to
transfer data between memory and the I/O section during input/output operations. The
information on the data bus is either written into.

2.1.3 Control Bus:

Microprocessor uses control bus to process data, that is what to do


with the selected memory location. Some control signals are Read, Write and Opcode fetch etc.
Various operations are performed by microprocessor with the help of control bus. This is a
dedicated bus, because all timing signals are generated according to control signal. The control

16
bus is used by the CPU to direct and monitor the actions of the other functional areas of the
computer. It is used to transmit a variety of individual signals (read, write, interrupt,
acknowledge, and so forth) necessary to control and coordinate the operations of the computer.
The individual signals transmitted over the control bus and their functions are covered in the
appropriate functional area description.

2.2 PRINCIPLES OF MEMORY MANAGEMENT

Memory management is the act of managing computer memory. In its simpler forms, this
involves providing ways to allocate portions of memory to programs at their request, and freeing
it for reuse when no longer needed. The management of main memory is critical to the computer
system. The memory management function keeps track of the status of each memory location,
either allocated or free. It determines how memory is allocated among competing processes,
deciding who gets memory, when they receive it, and how much they are allowed. When
memory is allocated it determines which memory locations will be assigned. It tracks when
memory is freed or unallocated and updates the status.

Virtual memory systems separate the memory addresses used by a process from actual physical
addresses, allowing separation of processes and increasing the effectively available amount of
RAM using disk swapping. The quality of the virtual memory manager can have a big impact on
overall system performance.

Garbage collection is the automated allocation and deallocation of computer memory resources
for a program. This is generally implemented at the programming language level and is in
opposition to manual memory management, the explicit allocation and deallocation of computer
memory resources. Region-based memory management is an efficient variant of explicit memory
management that can deallocate large groups of objects simultaneously.

2.2.1 Memory management techniques

 Single contiguous allocation management

Single allocation is the simplest memory management technique. All the computer's memory,
usually with the exception of a small portion reserved for the operating system, is available to the
single application. MS-DOS is an example of a system which allocates memory in this way. An
embedded system running a single application might also use this technique.A system using
single contiguous allocation may still multitask by swapping the contents of memory to switch
among users. Early versions of the Music operating system used this technique.

 Partitioned allocation management

Partitioned allocation divides primary memory into multiple memory partitions, usually
contiguous areas of memory. Each partition might contain all the information for a specific job
or task. Memory management consists of allocating a partition to a job when it starts and
unallocating it when the job ends. Partitioned allocation usually requires some hardware support
to prevent the jobs from interfering with one another or with the operating system. The IBM

17
System/360 used a lock-and-key technique. Other systems used base and bounds registers which
contained the limits of the partition and flagged invalid accesses. The UNIVAC 1108 Storage
Limits Register had separate base/bound sets for instructions and data. The system took
advantage of memory interleaving to place what were called the i bank and d bank in separate
memory modules. Partitions may be either static, that is defined at Initial Program Load (IPL) or
boot time or by the computer operator, or dynamic, that is automatically created for a specific
job. IBM System/360 Operating System Multiprogramming with a Fixed Number of Tasks
(MFT) is an example of static partitioning, and Multiprogramming with a Variable Number of
Tasks (MVT) is an example of dynamic. MVT and successors use the term region to distinguish
dynamic partitions from static ones in other systems[3]:p.73.

 Paged memory management

Paged allocation divides the computer's primary memory into fixed-size units called page
frames, and the program's address space into pages of the same size. The hardware memory
management unit maps pages to frames. The physical memory can be allocated on a page basis
while the address space appears contiguous.Usually, with paged memory management, each job
runs in its own address space, however, IBM OS/VS/2 SVS ran all jobs in a single 16MiB virtual
address space.Paged memory can be demand-paged when the system can move pages as required
between primary and secondary memory.

 Segmented memory management

Segmented memory is the only memory management technique that does not provide the user's
program with a 'linear and contiguous address space. Segments are areas of memory that usually
correspond to a logical grouping of information such as a code procedure or a data array.
Segments require hardware support in the form of a segment table which usually contains the
physical address of the segment in memory, its size, and other data such as access protection bits
and status (swapped in, swapped out, etc.)

2.2.2 Memory Allocation Techniques

The primary role of the memory management system is to satisfy requests for memory
allocation. Sometimes this is implicit, as when a new process is created. At other times,
processes explicitly request memory. Either way, the system must locate enough unallo-
cated memory and assign it to the process.

1. First Fit
The first of these is called first fit. The basic idea with first fit allocation is that we
begin searching the list and take the first block whose size is greater than or equal to the
request size, as illustrated in Example 9.3. If we reach the end of the list without finding
a suitable block, then the request fails. Because the list is often kept sorted in order of
address, a first fit policy tends to cause allocations to be clustered toward the low memory
addresses. The net effect is that the low memory area tends to get fragmented, while the
upper memory area tends to have larger free blocks.

18
2. Next Fit
If we want to spread the allocations out more evenly across the memory space, we often
use a policy called next fit. This scheme is very similar to the first fit approach, except
for the place where the search starts. In next fit, we begin the search with the free block
that was next on the list after the last allocation. During the search, we treat the list as
a circular one. If we come back to the place where we started without finding a suitable
block, then the search fails.

3. Best Fit
In many ways, the most natural approach is to allocate the free block that is closest in
size to the request. This technique is called best fit. In best fit, we search the list for the
block that is smallest but greater than or equal to the request size. This is illustrated in
Example 9.5. Like first fit, best fit tends to create significant external fragmentation, but
keeps large blocks available for potential large allocation requests.

4. Worst Fit
If best fit allocates the smallest block that satisfies the request, then worst fit allocates
the largest block for every request. Although the name would suggest that we would
never use the worst fit policy, it does have one advantage: If most of the requests are of
similar size, a worst fit policy tends to minimize external fragmentation.

5. Buddy System Allocation


There is another memory allocation system, which is very elegant and which tends to
have very little external fragmentation. This approach is called the buddy system and
is based on the idea that all allocated blocks are a power of 2 in size.
Buddy system allocation is very straightforward and tends to have very low external
fragmentation. However, the price we pay for that is increased internal fragmentation. In
the worst case, each allocation request is 1 byte greater than a power of 2. In this case,
every allocation is nearly twice as large as the size requested. Of course, in practice, the
actual internal fragmentation is substantially smaller, but still tends to be larger than
what we find with the other variable-sized block techniques.

6.Fragmentation
When allocating memory, we can end up with some wasted space. This happens in two
ways. First, if we allocate memory in such a way that we actually allocate more than
is requested, some of the allocated block will go unused. This type of waste is called
internal fragmentation. The other type of waste is unused memory outside of any
allocated unit. This can happen if there are available free blocks that are too small to
satisfy any request. Wasted memory that lies outside allocation units is called external
fragmentation.

7. Partitioning
The simplest methods of allocating memory are based on dividing memory into areas with
fixed partitions. Typically, we administratively define fixed partitions between blocks of
varying size. These partitions are in effect from the time the system starts to the time it

19
is shut down. Memory requests are all satisfied from the fixed set of defined partitions.

2.2.3 Free Space Management

Before we can allocate memory, we must locate the free memory. Naturally, we want to
represent the free memory blocks in a way that makes the search efficient.
Before getting into the details, however, we should ask whether we are talking about
locating free memory in the physical memory space or the virtual memory space. Through-
out this chapter, we look at memory management techniques primarily from the perspec-
tive of the operating system managing the physical memory resource. Consequently, these
techniques can all be viewed as operating in the physical memory space. However, many
of these techniques can also be used to manage virtual memory space. Application-level
dynamic memory allocation, using familiar operations such as the malloc( ) call in C or the
new operator in C++, often allocate large blocks from the OS and then subdivide them
into smaller allocations. They may well use some of these same techniques to manage
their own usage of memory. The operating system, however, does not concern itself with
that use of these techniques.

 Free Bitmaps
If we are operating in an environment with fixed-sized pages, then the search becomes
easy. We don’t care which page, because they’re all the same size. It’s quite common in
this case to simply store one bit per page frame, which is set to one if the page frame is
free, and zero if it is allocated. With this representation, we can mark a page as either
free or allocated in constant time by just indexing into this free bitmap. Finding a free
page is simply a matter of locating the first nonzero bit in the map. To make this search
easier, we often keep track of the first available page. When we allocate it, we search from
that point on to find the next available one.
The memory overhead for a free bitmap representation is quite small. For example,
if we have pages that are 4096 bytes each, the bitmap uses 1 bit for each 32,768 bits of
memory, a 0.003% overhead.
Generally, when we allocate in an environment that uses paging address translation,
we don’t care which page frame we give a process, and the process never needs to have
control over the physical relationship among pages. However, there are exceptions. One
exception is a case where we do allocate memory in fixed sized units, but where there is
no address translation. Another is where not all page frames are created equally. In both
cases, we might need to request a number of physically contiguous page frames. When we
allocate multiple contiguous page frames, we look not for the first available page, but for
a run of available pages at least as large as the allocation request.

 Free Lists
We can also represent the set of free memory blocks by keeping them in a linked list.
When dealing with fixed-sized pages, allocation is again quite easy. We just grab the first
page off the list. When pages are returned to the free set, we simply add them to the list.
Both of these are constant time operations.
If we are allocating memory in variable-sized units, then we need to search the list to
find a suitable block. In general, this process can take an amount of time proportional

20
to the number of free memory blocks. Depending on whether we choose to keep the list
sorted, adding a new memory block to the free list can also take O(n) time (proportional
to the number of free blocks). To speed the search for particular sized blocks, we often
use more complex data structures. Standard data structures such as binary search trees
and hash tables are among the more commonly used ones.
Using the usual linked list representation, we have a structure that contains the starting
address, the size, and a pointer to the next element in the list. In a typical 32-bit system,
this structure takes 12 bytes. So if the average size of a block is 4096 bytes, the free list
would take about 0.3% of the available free space. However, there’s a classic trick we can
play to reduce this overhead to nothing except for a pointer to the head of the list. This
trick is based on finding some other way to keep track of the starting address, the size,
and the pointers that define the list structure. Because each element of the list represents
free space, we can store the size and pointer to the next one in the free block itself. (The
starting address is implicit.) This technique is illustrated in Example 9.1.

 Free List Structure


Consider a free list with three free blocks, of sizes 3, 8, and 16. If the block of size 16
is first in memory, followed by the blocks of sizes 3 and 8.

 Free List Example


There are several things in this example we should note. First, except for the global
pointer free list , we store everything in the free blocks themselves. The only overhead is
this pointer. Second, if a block is of size 16 KB and the pointers and sizes are stored in 4-
byte integers, then the unused space while the block is in the free list is 16384−8 = 16376.
However, the full 16,384 bytes are available to be allocated to a requesting process. Third,

Relocation

In systems with virtual memory, programs in memory must be able to reside in different parts of
the memory at different times. This is because when the program is swapped back into memory
after being swapped out for a while it can not always be placed in the same location. The virtual
memory management unit must also deal with concurrency. Memory management in the
operating system should therefore be able to relocate programs in memory and handle memory
references and addresses in the code of the program so that they always point to the right
location in memory.

Protection

Processes should not be able to reference the memory for another process without permission.
This is called memory protection, and prevents malicious or malfunctioning code in one program
from interfering with the operation of other running programs.

21
Sharing

Even though the memory for different processes is normally protected from each other, different
processes sometimes need to be able to share information and therefore access the same part of
memory. Shared memory is one of the fastest techniques for Inter-process communication.

Logical organization

Programs are often organized in modules. Some of these modules could be shared between
different programs, some are read only and some contain data that can be modified. The memory
management is responsible for handling this logical organization that is different from the
physical linear address space. One way to arrange this organization is segmentation.

2.3 CACHE MEMORY

Cache memory, also called Cache, a supplementary memory system that temporarily stores
frequently used instructions and data for quicker processing by the central processor of a
computer. The cache augments, and is an extension of, a computer’s main memory. Both main
memory and cache are internal, random-access memories (RAMs) that use semiconductor-based
transistor circuits. Cache holds a copy of only the most frequently used information or program
codes stored in the main memory; the smaller capacity of the cache reduces the time required to
locate data within it and provide it to the computer for processing.When a computer’s central
processor accesses its internal memory, it first checks to see if the information it needs is stored
in the cache. If it is, the cache returns the data to the processor. If the information is not in the
cache, the processor retrieves it from the main memory. Disk cache memory operates similarly,
but the cache is used to hold data that has been recently written on, or retrieved from, a magnetic
disk or other external storage device.

Cache memory is random access memory (RAM) that a computer microprocessor can access
more quickly than it can access regular RAM. As the microprocessor processes data, it looks first
in the cache memory and if it finds the data there (from a previous reading of data), it does not
have to do the more time-consuming reading of data from larger memory.Cache memory is
sometimes described in levels of closeness and accessibility to the microprocessor. An L1 cache
is on the same chip as the microprocessor. (For example, the PowerPC 601 processor has a 32
kilobyte level-1 cache built into its chip.) L2 is usually a separate static RAM (SRAM) chip. The
main RAM is usually a dynamic RAM (DRAM) chip.

2.3.1 Cache entries

Data is transferred between memory and cache in blocks of fixed size, called cache lines. When a
cache line is copied from memory into the cache, a cache entry is created. The cache entry will
include the copied data as well as the requested memory location (now called a tag).

When the processor needs to read or write a location in main memory, it first checks for a
corresponding entry in the cache. The cache checks for the contents of the requested memory
location in any cache lines that might contain that address. If the processor finds that the memory

22
location is in the cache, a cache hit has occurred. However, if the processor does not find the
memory location in the cache, a cache miss has occurred. In the case of:

 a cache hit, the processor immediately reads or writes the data in the cache line
 a cache miss, the cache allocates a new entry, and copies in data from main memory;
then, the request is fulfilled from the contents of the cache.

Cache performance

The proportion of accesses that result in a cache hit is known as the hit rate, and can be a
measure of the effectiveness of the cache for a given program or algorithm.

3.1 16/32/64 Bit Word Size Architectures

When a computer architecture is designed, the choice of a word size is of substantial importance.
There are design considerations which encourage particular bit-group sizes for particular uses
(e.g. for addresses), and these considerations point to different sizes for different uses. However,
considerations of economy in design strongly push for one size, or a very few sizes related by
multiples or fractions (submultiples) to a primary size. That preferred size becomes the word size
of the architecture.Early machine designs included some that used what is often termed a
variable word length. In this type of organization, a numeric operand had no fixed length but
rather its end was detected when a character with a special marking was encountered.

In computer architecture, 32-bit integers, memory addresses,or other data units are those that are
at most 32 bits (4octets) wide. Also, 32-bit CPU and ALU architectures are those that are based
on registers, address buses, or databuses of that size. 32-bit is also a term given to generation of
computers in which 32-bit processors were the norm.

3.1.1 16-bit file format


A 16-bit file format is a binary file format for which each data element is defined on 16 bits (or 2
Bytes). An example of such a format is UTF-16 and the Windows Metafile Format.

3.1.2 32-bit file format


A change from a 32-bit to a 64-bit architecture is a fundamental alteration, as most operating
systems must be extensively modified to take advantage of the new
architecture. Other software must also be ported to use the new capabilities; older software is
usually supported through either a hardware compatibility mode (in which the
new processors support the older 32-bit version of the instruction set as well as the 64-bit
version), through software emulation, or by the actual implementation of a
32-bit processor core within the 64-bit processor (as with the Itanium processors from Intel,
which include an x86 processor core to run 32-bit x86 applications). The
operating systems for those 64-bit architectures generally support both 32-bit and 64-bit
applications.

3.1.3 64-bit file format


While 64-bit architectures indisputably make working with large data sets in applications such as

23
digital video, scientific computing, and large databases easier, there has
been considerable debate as to whether they or their 32-bit compatibility modes will be faster
than comparably-priced 32-bit systems for other tasks. In x86-64 architecture
(AMD64), the majority of the 32-bit operating systems and applications are able to run smoothly
on the 64-bit hardware.

3.1.4 Speed
Speed is not the only factor to consider in a comparison of 32-bit and 64-bit processors.
Applications such as multi-tasking, stress testing, and clustering—for HPC
(high-performance computing)—may be more suited to a 64-bit architecture when deployed
appropriately. 64-bit clusters have been widely deployed in large organizations such as IBM, HP
and Microsoft.

3.2 INSTRUCTION PIPELINING

Instruction pipelining is a method for increasing the throughput of a digital circuit, particularly a
CPU, and implements a form of instruction level parallelism. The idea is to divide the logic into
stages, and to work on different data within each stage.

The concept of pipelines can be extended to various structures of interconnected processing


elements, including those in which data flows from more than one source or to more than one
destination, or may be fed back into an earlier stage. We will limit our attention to linear
sequential pipelines in which all data flows through the stages in the same sequence, and data
remains in the same order in which it originally entered.

Pipelining is most suited for tasks in which essentially the same sequence of steps must be
repeated many times for different data. This is true, for example, in many numerical problems
which systematically process data from arrays. Arithmetic pipelining is used in some specialized
computers discussed elsewhere. One action common to all computers, however, is the systematic
fetch and execute of instructions. This process can be effectively pipelined, and this instruction
pipelining is the subject to be considered in this chapter.

3.2.1 Instruction Processing

The first step in applying pipelining techniques to instruction processing is to divide the task into
steps that may be performed with independent hardware. The most obvious division is between the
FETCH cycle (fetch and interpret instructions) and the EXECUTE cycle (access operands and
perform operation). If these two activities are to run simultaneously, they must use independent
registers and processing circuits, including independent access to memory (separate MAR and
MBR).

It is possible to further divide FETCH into fetching and interpreting, but since interpreting is
very fast this is not generally done. To gain the benefits of pipelining it is desirable that each
stage take a comparable amount of time.

24
A more practical division would split the EXECUTE cycle into three parts: Fetch operands,
perform operation, and store results. A typical pipeline might then have four stages through
which instructions pass, and each stage could be processing a different instruction at the same
time. The result of each stage is passed on to the next stage.

3.2.2 Problems in Instruction Pipelining

Several difficulties prevent instruction pipelining from being as simple as the above description
suggests. The principal problems are:

 Timing variations:

Not all stages take the same amount of time. This means that the speed gain of a pipeline will be
determined by its slowest stage. This problem is particularly acute in instruction processing,
since different instructions have different operand requirements and sometimes vastly different
processing time. Moreover, synchronization mechanisms are required to ensure that data is
passed from stage to stage only when both stages are ready.

 Data Hazards

When several instructions are in partial execution, a problem arises if they reference the same
data. We must ensure that a later instruction does not attempt to access data sooner than a
preceding instruction, if this will lead to incorrect results. For example, instruction N+1 must not
be permitted to fetch an operand that is yet to be stored into by instruction N.

 Branching

In order to fetch the "next" instruction, we must know which one is required. If the present
instruction is a conditional branch, the next instruction may not be known until the current one is
processed.

 Interrupts

Interrupts insert unplanned "extra" instructions into the instruction stream. The interrupt must
take effect between instructions, that is, when one instruction has completed and the next has not
yet begun. With pipelining, the next instruction has usually begun before the current one has
completed. All of these problems must be solved in the context of our need for high speed
performance. If we cannot achieve sufficient speed gain, pipelining may not be worth the cost.

3.2.3 Pipelines Solutions

 Timing Variations

To maximize the speed gain, stages must first be chosen to be as uniform as possible in timing
requirements. However, a timing mechanism is needed. A synchronous method could be used, in
which a stage is assumed to be complete in a definite number of clock cycles. However,
25
asynchronous techniques are generally more efficient. A flag bit or signal line is passed forward
to the next stage indicating when valid data is available. A signal must also be passed back from
the next stage when the data has been accepted. In all cases there must be a buffer register
between stages to hold the data; sometimes this buffer is expanded to a memory which can hold
several data items. Each stage must take care not to accept input data until it is valid, and not to
produce output data until there is room in its output buffer.

 Data Hazards

To guard against data hazards it is necessary for each stage to be aware of the operands in use by
stages further down the pipeline. The type of use must also be known, since two successive reads
do not conflict and should not be cause to slow the pipeline. Only when writing is involved is
there a possible conflict.

The pipeline is typically equipped with a small associative check memory which can store the
address and operation type (read or write) for each instruction currently in the pipe. The concept
of "address" must be extended to identify registers as well. Each instruction can affect only a
small number of operands, but indirect effects of addressing must not be neglected.As each
instruction prepares to enter the pipe, its operand addresses are compared with those already
stored. If there is a conflict, the instruction (and usually those behind it) must wait. When there is
no conflict, the instruction enters the pipe and its operands addresses are stored in the check
memory. When the instruction completes, these addresses are removed. The memory must be
associative to handle the high-speed lookups required.

 Branching

The problem in branching is that the pipeline may be slowed down by a branch instruction
because we do not know which branch to follow. In the absence of any special help in this area,
it would be necessary to delay processing of further instructions until the branch destination is
resolved. Since branches are extremely frequent, this delay would be unacceptable.

One solution which is widely used, especially in RISC architectures, is deferred branching. In
this method, the instruction set is designed so that after a conditional branch instruction, the next
instruction in sequence is always executed, and then the branch is taken. Thus every branch must
be followed by one instruction which logically precedes it and is to be executed in all cases. This
gives the pipeline some breathing room. If necessary this instruction can be a no-op, but frequent
use of no-ops would destroy the speed benefit.Use of this technique requires a coding method
which is confusing for programmers but not too difficult for compiler code generators.

 Interrupts

The fastest but most costly solution to the interrupt problem would be to include as part of the
saved "hardware state" of the CPU the complete contents of the pipeline, so that all instructions
may be restored to their original state in the pipeline. This strategy is too expensive in other ways
and is not practical. The simplest solution is to wait until all instructions in the pipeline complete,
that is, flush the pipeline from the starting point, before admitting the interrupt sequence. If

26
interrupts are frequent, this would greatly slow down the pipeline; moreover, critical interrupts
would be delayed.

3.2.4 Pipelining Developments


In order to make processors even faster, various methods of optimizing pipelines have been
devised.

Superpipelining refers to dividing the pipeline into more steps. The more pipe stages there are,
the faster the pipeline is because each stage is then shorter. Ideally, a pipeline with five stages
should be five times faster than a non-pipelined processor (or rather, a pipeline with one stage).
The instructions are executed at the speed at which each stage is completed, and each stage takes
one fifth of the amount of time that the non-pipelined instruction takes. Thus, a processor with an
8-step pipeline (the MIPS R4000) will be even faster than its 5-step counterpart. The MIPS
R4000 chops its pipeline into more pieces by dividing some steps into two. Instruction fetching,
for example, is now done in two stages rather than one. The stages are as shown:

1. Instruction Fetch (First Half)


2. Instruction Fetch (Second Half)
3. Register Fetch
4. Instruction Execute
5. Data Cache Access (First Half)
6. Data Cache Access (Second Half)
7. Tag Check
8. Write Back

Superscalar pipelining involves multiple pipelines in parallel. Internal components of the


processor are replicated so it can launch multiple instructions in some or all of its pipeline stages.
The RISC System/6000 has a forked pipeline with different paths for floating-point and integer
instructions. If there is a mixture of both types in a program, the processor can keep both forks
running simultaneously. Both types of instructions share two initial stages (Instruction Fetch and
Instruction Dispatch) before they fork. Often, however, superscalar pipelining refers to multiple
copies of all pipeline stages (In terms of laundry, this would mean four washers, four dryers, and
four people who fold clothes). Many of today's machines attempt to find two to six instructions
that it can execute in every pipeline stage. If some of the instructions are dependent, however,
only the first instruction or instructions are issued.

Dynamic pipelines have the capability to schedule around stalls. A dynamic pipeline is divided
into three units: the instruction fetch and decode unit, five to ten execute or functional units, and
a commit unit. Each execute unit has reservation stations, which act as buffers and hold the
operands and operations.

27
Diagram of a Pipelining process

While the functional units have the freedom to execute out of order, the instruction fetch/decode
and commit units must operate in-order to maintain simple pipeline behavior. When the
instruction is executed and the result is calculated, the commit unit decides when it is safe to
store the result. If a stall occurs, the processor can schedule other instructions to be executed
until the stall is resolved. This, coupled with the efficiency of multiple units executing
instructions simultaneously, makes a dynamic pipeline an attractive alternative.

3.3 Reduced Instruction Set Computing (RISC) Pipelines

Reduced Instruction Set Computing (RISC), is a microprocessor CPU design philosophy that
favors a smaller and simpler set of instructions that all take about the same amount of time to
execute. A RISC processor pipeline operates in much the same way, although the stages in the
pipeline are different. While different processors have different numbers of steps, they are
basically variations of these five, used in the MIPS R3000 processor:

1. fetch instructions from memory


2. read registers and decode the instruction
3. execute the instruction or calculate an address
4. access an operand in data memory
5. write the result into a register

If you glance back at the diagram of the laundry pipeline, you'll notice that although the washer
finishes in half an hour, the dryer takes an extra ten minutes, and thus the wet clothes must wait
ten minutes for the dryer to free up. Thus, the length of the pipeline is dependent on the length of
the longest step. Because RISC instructions are simpler than those used in pre-RISC processors

28
(now called CISC, or Complex Instruction Set Computer), they are more conducive to
pipelining. While CISC instructions varied in length, RISC instructions are all the same length
and can be fetched in a single operation. Ideally, each of the stages in a RISC processor pipeline
should take 1 clock cycle so that the processor finishes an instruction each clock cycle and
averages one cycle per instruction (CPI).

3.3.1 Harvard Architecture

The term Harvard architecture originally referred to computer architectures that used physically
separate storage and signal pathways for their instructions and data (in contrast to the von
Neumann architecture). In a computer with a von Neumann architecture, the CPU can be either
reading an instruction or reading/writing data from/to the memory. Both cannot occur at the
same time since the instructions and data use the same signal pathways and memory. In a
computer with Harvard architecture, the CPU can read both an instruction and data from memory
at the same time. A computer with Harvard architecture can be faster because it is able to fetch
the next instruction at the same time it completes the current instruction. Speed is gained at the
expense of more complex electrical circuitry.

4.1 INSTRUCTIONS, OPCODES AND OPERANDS

Instructions are operations performed by the CPU. Operands are entities operated upon by the
instruction. Addresses are the locations in memory of specified data.

4.1.1 Instructions

An Instruction is a statement that is executed at runtime. An x86 instruction statement can


consist of four parts:

 Label
 Instruction
 Operands
 Comment

The terms Instruction and Mnemonic are used interchangeably in this document to refer to the
names of x86 instructions. Although the term Opcode is sometimes used as a synonym for
Instruction, this document reserves the term Opcode for the hexadecimal representation of the
instruction value.

For most instructions, the Solaris x86 assembler mnemonics are the same as the Intel or AMD
mnemonics. However, the Solaris x86 mnemonics might appear to be different because the
Solaris mnemonics are suffixed with a one-character modifier that specifies the size of the
instruction operands. That is, the Solaris assembler derives its operand type information from the
instruction name and the suffix. If a mnemonic is specified with no type suffix, the operand type
defaults to long.

29
Assembly language, or just assembly, is a low-level programming language, which uses
mnemonics, instructions and operands to represent machine code. This enhances the readability
while still giving precise control over the machine instructions. Most programming is currently
done using high-level programming languages,which are typically easier to read and write. These
languages need to be compiled (translated into assembly language), or run through other
compiled programs.

4.1.2 Opcodes

In computer science, an Opcode (operation code) is the portion of a machine language


instruction that specifies the operation to be performed. The opcode is a short code which
indicates what operation is expected to be performed. Each operation has a unique opcode. The
operand, or operands, indicate where the data required for the operation can be found and how it
can be accessed (the addressing mode, which is discussed in full later). The length of a machine
code can vary - common lengths vary from one to twelve bytes in size.

Opcodes are also given mnemonics (short names) so that they can be easily referred to in code
listings and similar documentation. For example, an instruction to store the contents of the
accumulator in a given memory address could be given the binary opcode 000001, which may
then be referred to using the mnemonic STA (short for STore Accumulator). Such mnemonics
will be used for the examples on upcoming pages.

4.1.3 Operand

In computers, an Operand is the part of a computer instruction that specifies data that is to be
operating on or manipulated and, by extension, the data itself. Basically, a computer instruction
describes an operation (add, subtract, and so forth) and the operand or operands on which the
operation is to be performed.

4.1.4 General format of Opcodes and Operands

An Opcode is an identifier that starts with a letter character and may be followed by up to
fourteen more characters. Each additional character may be a letter or a digit or the underscore
character. Traditionally, no uppercase letters are used in opcode names that are to be used by
more than one program.

An Operand is either a set of contiguous non-white space printing characters or a string. A string
is a set of contiguous printing characters delimited by a quote (ASCII code: 34 decimal, 0x22
hexadecimal) character at each end. A string value must have less than 256 bytes of data. If at
least one operand is present in an operation, there is a single space between the opcode and the
first operand. If more than one operand is present in an operation, there is a single blank
character between every two adjacent operands. If there are no operands, a semicolon character
is appended to the opcode to mark the end of the operation. If any operands appear, the last
operand has an appended semicolon that marks the end of the operation.

30
The exact format of the machine codes is again CPU dependant. For the purpose of this tutorial,
we will presume we are using a 24-bit CPU. This means that the minimum length of the machine
codes used here should be 24 binary bits, which in this instance are split as shown in the table
below:

Opcode 6 bits (18-23) - Allows for 64 unique opcodes (2^6)


Operand(s) 18 bits (0-17) - 16 bits (0-15) for address values
- 2 bits (16/17) for specifying addressing
mode to be used

Operands can be immediate (that is, constant expressions that evaluate to an inline value),
register (a value in the processor number registers), or memory (a value stored in memory). An
indirect operand contains the address of the actual operand value. Indirect operands are specified
by prefixing the operand with an asterisk (*) (ASCII 0x2A). Only jump and call instructions can
use indirect operands.

4.2 BASIC OPERAND TYPES


 Register Operands. An operand that refers to a register.

MOV AX, BX ; moves contents of register BX to register AX.

 Immediate Operands. A constant is an immediate operand.

MOV AX, 5 ; moves the numeric value 5 into AX


MOV AX, @DATA ; moves the constant represented by @data into AX

 Direct Operands. A variable name that represents a memory address


is a direct (memory) operand.

MOV BX, NUM ; moves the contents of the memory variable


; NUM into AX

 Indirect Operands

An indirect operand is a register that contains the (offset)


memory address, which means the register is acting as a pointer
to a variable (or procedure).

16-bit registers for used indirect addressing: SI, DI, BX, and BP

32-bit registers for used indirect addressing: any of the


general purpose 32-bit registers may be used (for .386 or
higher processor).

31
4.3 INSTRUCTION CYCLE

An Instruction cycle (sometimes called fetch-and-execute cycle, fetch-decode-execute cycle,


or FDX) is the basic operation cycle of a computer. It is the process by which a computer
retrieves a program instruction from its memory, determines what actions the instruction
requires, and carries out those actions. This cycle is repeated continuously by the central
processing unit (CPU), from bootup to when the computer is shut down.

Diagram of the Fetch Execute Cycle.

4.3.1 Fetch Decode And Execute Cycle

The fetch execute cycle is the time period of which the computer reads and processes the
instructions from the memory, and executes them. This process is a continuous cycle which is
used until the computer is turned off or there are no more instructions to process.

32
4.3.2 Fetch Cycle

The fetch cycle takes the address required from memory, stores it in the instruction register, and
moves the program counter on one so that it points to the next instruction.

The fetch part of the cycle starts by instructions being collected either from the hard-drive, the
RAM, the cache or the registers. The way it knows which order to retrieve the instructions in is
by each instruction being given a unique ID which are stored in a register so the control unit
knows exactly what its looking for. (similar to how a computer has a unique IP Address on a
network).

4.3.3 Decode Cycle

Here, the control unit checks the instruction that is now stored within the instruction register. It
determines which opcode and addressing mode have been used, and as such what actions need to
be carried out in order to execute the instruction in question.

4.3.4 Execute

The actual actions which occur during the execute cycle of an instruction depend on both the
instruction itself, and the addressing mode specified to be used to access the data that may be
required. However, four main groups of actions do exist, which are discussed in full later on.

After the correct instructions have been fetched the CPU will then interpret what the instruction
is telling it to do then it will simply execute the instruction and the whole process will begin
again until there are no more instructions or the computer is turned off. Once a program is in
memory it has to be executed. To do this, each instruction must be looked at, decoded and acted
upon in turn until the program is completed. This is achieved by the use of what is termed the
'instruction execution cycle', which is the cycle by which each instruction in turn is processed.
However, to ensure that the execution proceeds smoothly, it is is also necessary to synchronise
the activites of the processor.

4.3.5 Circuits Used

The circuits used in the CPU during the cycle are:

 Program counter (PC) - an incrementing counter that keeps track of the memory
address of the instruction that is to be executed next.
 Memory address register (MAR) - holds the address of a memory block to be read from
or written to.
 Memory data register (MDR) - a two-way register that holds data fetched from memory
(and ready for the CPU to process) or data waiting to be stored in memory
 Instruction register (IR) - a temporary holding ground for the instruction that has just
been fetched from memory

33
 Control unit (CU) - decodes the program instruction in the IR, selecting machine
resources such as a data source register and a particular arithmetic operation, and
coordinates activation of those resources
 Arithmetic logic unit (ALU) - performs mathematical and logical operations

4.4 ADDRESSING MODES

Most Assembly language instructions require operands to be processed. An operand address


provides the location, where the data to be processed is stored. Some instructions do not require
an operand, whereas some other instructions may require one, two or three operands.

When an instruction requires two operands, the first operand is generally the destination, which
contains data in a register or memory location and the second operand is the source. Source
contains either the data to be delivered (immediate addressing) or the address (in register or
memory) of the data. Generally, the source data remains unaltered after the operation.

The three basic modes of addressing are:

 Register addressing
 Immediate addressing
 Memory addressing

4.4.1 Register Addressing

A register operand is one of the eight general- and special-purpose 16-bit registers listed above,
or one of the eight general-purpose 8-bit registers (AL, AH, ...), or one of the four segment
registers. The contents of the register are used and/or modified by the operation. In the example
above, the destination operand of the MOV instruction is the low byte of the accumulator, AL; the
effect of the instruction is to store the binary number 00001101 into the bottom eight bits of AX
(leaving the other bits unchanged).

In this addressing mode, a register contains the operand. Depending upon the instruction, the
register may be the first operand, the second operand or both.

For example,

MOV DX, TAX_RATE ; Register in first operand


MOV COUNT, CX ; Register in second operand
MOV EAX, EBX ; Both the operands are in registers

4.4.2 Immediate Addressing

An immediate operand is just a number (or a label, which the assembler converts to the
corresponding address). An immediate operand is used to specify a constant for one of the
arithmetic or logical operations, or to give the jump address for a branching instruction. Most

34
assemblers, including NASM, allow simple arithmetic expressions when computing immediate
operands. For example, all of the following are equivalent:

MOV AL, 13
MOV AL, 0xD
MOV AL, 0Ah + 3 ;Note leading 0 to distinguish from register AH
MOV AL, George * 2 - 1
assuming that the label George is associated with the address 7.

An immediate operand has a constant value or an expression. When an instruction with two
operands uses immediate addressing, the first operand may be a register or memory location, and
the second operand is an immediate constant. The first operand defines the length of the data.

For example,

BYTE_VALUE DB 150 ; A byte value is defined


WORD_VALUE DW 300 ; A word value is defined
ADD BYTE_VALUE, 65 ; An immediate operand 65 is added
MOV AX, 45H ; Immediate constant 45H is transferred to AX

4.4.3 Direct Memory Addressing

A memory operand gives the address of a location in main memory to use in the operation. The
NASM syntax for this is very simple: put the address in square brackets. The address can be
given as an arithmetic expression involving constants and labels (the displacement), plus an
optional base or index register. Here are some examples:

MOV DX, [1234h]


ADD DX, [BX + 8]
MOV [BP + SI], DL
INC BYTE [0x100 + CS:DI]

When operands are specified in memory addressing mode, direct access to main memory,
usually to the data segment, is required. This way of addressing results in slower processing of
data. To locate the exact location of data in memory, we need the segment start address, which is
typically found in the DS register and an offset value. This offset value is also called effective
address.

In direct addressing mode, the offset value is specified directly as part of the instruction, usually
indicated by the variable name. The assembler calculates the offset value and maintains a symbol
table, which stores the offset values of all the variables used in the program.

In direct memory addressing, one of the operands refers to a memory location and the other
operand references a register.

For example,

ADD BYTE_VALUE, DL ; Adds the register in the memory location


MOV BX, WORD_VALUE ; Operand from the memory is added to register

35
Other types of Addressing modes include the following:

 Direct-Offset Addressing

This addressing mode uses the arithmetic operators to modify an address. For example, look at
the following definitions that define tables of data:

BYTE_TABLE DB 14, 15, 22, 45 ; Tables of bytes


WORD_TABLE DW 134, 345, 564, 123 ; Tables of words

The following operations access data from the tables in the memory into registers:

MOV CL, BYTE_TABLE[2] ; Gets the 3rd element of the BYTE_TABLE


MOV CL, BYTE_TABLE + 2 ; Gets the 3rd element of the BYTE_TABLE
MOV CX, WORD_TABLE[3] ; Gets the 4th element of the WORD_TABLE
MOV CX, WORD_TABLE + 3 ; Gets the 4th element of the WORD_TABLE

 Indirect Memory Addressing

This addressing mode utilizes the computer's ability of Segment:Offset addressing. Generally, the
base registers EBX, EBP (or BX, BP) and the index registers (DI, SI), coded within square
brackets for memory references, are used for this purpose.

Indirect addressing is generally used for variables containing several elements like, arrays.
Starting address of the array is stored in, say, the EBX register.

The following code snippet shows how to access different elements of the variable.

MY_TABLE TIMES 10 DW 0 ; Allocates 10 words (2 bytes) each initialized to 0


MOV EBX, [MY_TABLE] ; Effective Address of MY_TABLE in EBX
MOV [EBX], 110 ; MY_TABLE[0] = 110
ADD EBX, 2 ; EBX = EBX +2
MOV [EBX], 123 ; MY_TABLE[1] = 123

5.1 INTERRUPT

Interrupt is a signal to the processor emitted by hardware or software indicating an event that
needs immediate attention. An interrupt alerts the processor to a high-priority condition requiring
the interruption of the current code the processor is executing, the current thread. The processor
responds by suspending its current activities, saving its state, and executing a small program
called an interrupt handler (or interrupt service routine, ISR) to deal with the event. This
interruption is temporary, and after the interrupt handler finishes, the processor resumes
execution of the previous thread. An interrupt is a signal from a device attached to a computer or from
a program within the computer that causes the main program that operates the computer (the operating
system ) to stop and figure out what to do next.

36
Basically, a single computer can perform only one computer instruction at a time. But, because it
can be interrupted, it can take turns in which programs or sets of instructions that it performs.
This is known as multitasking . It allows the user to do a number of different things at the same
time. The computer simply takes turns managing the programs that the user effectively starts. Of
course, the computer operates at speeds that make it seem as though all of the user's tasks are
being performed at the same time. (The computer's operating system is good at using little
pauses in operations and user think time to work on other programs.)An operating system usually
has some code that is called an interrupt handler . The interrupt handler prioritizes the interrupts
and saves them in a queue if more than one is waiting to be handled. The operating system has
another little program, sometimes called a scheduler , that figures out which program to give
control to next.

Diagram of Interrupt sources and processor handling

5.1.1 TYPES OF INTERRUPTS

 Hardware interrupt : is an electronic alerting signal sent to the processor from an


external device, either a part of the computer itself such as a disk controller or an external
peripheral. For example, pressing a key on the keyboard or moving the mouse triggers
hardware interrupts that cause the processor to read the keystroke or mouse position.
Unlike the software type (below), hardware interrupts are asynchronous and can occur in
the middle of instruction execution, requiring additional care in programming. The act of
initiating a hardware interrupt is referred to as an interrupt request (IRQ).
 Software interrupt : is caused either by an exceptional condition in the processor itself,
or a special instruction in the instruction set which causes an interrupt when it is
executed. The former is often called a trap or exception and is used for errors or events

37
occurring during program execution that are exceptional enough that they cannot be
handled within the program itself. For example, if the processor's arithmetic logic unit is
commanded to divide a number by zero, this impossible demand will cause a divide-by-
zero exception, perhaps causing the computer to abandon the calculation or display an
error message. Software interrupt instructions function similarly to subroutine calls and
are used for a variety of purposes, such as to request services from low level system
software such as device drivers. For example, computers often use software interrupt
instructions to communicate with the disk controller to request data be read or written to
the disk..

An interrupt that leaves the machine in a well-defined state is called a precise interrupt. Such
an interrupt has four properties:

 The Program Counter (PC) is saved in a known place.


 All instructions before the one pointed to by the PC have fully executed.
 No instruction beyond the one pointed to by the PC has been executed (that is no
prohibition on instruction beyond that in PC, it is just that any changes they make to
registers or memory must be undone before the interrupt happens).
 The execution state of the instruction pointed to by the PC is known.

An interrupt that does not meet these requirements is called an imprecise interrupt.

The phenomenon where the overall system performance is severely hindered by excessive
amounts of processing time spent handling interrupts is called an interrupt storm.

5.1.2 Other Types of Interrupts

 Level-triggered : A Level-triggered interrupt is an interrupt signalled by


maintaining the interrupt line at a high or low level. A device wishing to signal a Level-
triggered interrupt drives the interrupt request line to its active level (high or low), and
then holds it at that level until it is serviced. It ceases asserting the line when the CPU
commands it to or otherwise handles the condition that caused it to signal the interrupt.
 Edge-triggered : An edge-triggered interrupt is an interrupt signalled by a level
transition on the interrupt line, either a falling edge (high to low) or a rising edge (low to
high). A device, wishing to signal an interrupt, drives a pulse onto the line and then
releases the line to its inactive state. If the pulse is too short to be detected by polled I/O
then special hardware may be required to detect the edge.
 Hybrid : Some systems use a hybrid of level-triggered and edge-triggered signalling.
The hardware not only looks for an edge, but it also verifies that the interrupt signal stays
active for a certain period of time.A common use of a hybrid interrupt is for the NMI
(non-maskable interrupt) input. Because NMIs generally signal major – or even
catastrophic – system events, a good implementation of this signal tries to ensure that the
interrupt is valid by verifying that it remains active for a period of time. This 2-step
approach helps to eliminate false interrupts from affecting the system.

38
 Message-signaled : A message-signalled interrupt does not use a physical interrupt
line. Instead, a device signals its request for service by sending a short message over
some communications medium, typically a computer bus. The message might be of a
type reserved for interrupts, or it might be of some pre-existing type such as a memory
write.Message-signalled interrupts behave very much like edge-triggered interrupts, in
that the interrupt is a momentary signal rather than a continuous condition. Interrupt-
handling software treats the two in much the same manner. Typically, multiple pending
message-signalled interrupts with the same message (the same virtual interrupt line) are
allowed to merge, just as closely spaced edge-triggered interrupts can merge.

Performance issues

Interrupts provide low overhead and good latency at low load, but degrade significantly at high
interrupt rate unless care is taken to prevent several pathologies. These are various forms of
livelocks, when the system spends all of its time processing interrupts to the exclusion of other
required tasks. Under extreme conditions, a large number of interrupts (like very high network
traffic) may completely stall the system. To avoid such problems, an operating system must
schedule network interrupt handling as carefully as it schedules process execution.[2]

5.1.3 Typical uses of interrupts

Typical uses of interrupts include the following: system timers, disk I/O, power-off signals, and
traps. Other interrupts exist to transfer data bytes using UARTs or Ethernet; sense key-presses;
control motors; or anything else the equipment must do.

 One typical use is to generate interrupts periodically by dividing the output of a crystal
oscillator and having an interrupt handler count the interrupts in order to keep time.
These periodic interrupts are often used by the OS's task scheduler to reschedule the
priorities of running processes. Some older computers generated periodic interrupts from
the power line frequency because it was controlled by the utilities to eliminate long-term
drift of electric clocks.
 A disk interrupt signals the completion of a data transfer from or to the disk peripheral. A
process waiting to read or write a file starts up again.
 A power-off interrupt predicts or requests a loss of power. It allows the computer
equipment to perform an orderly shut-down.
 Interrupts are also used in typeahead features for buffering events like keystrokes.

5.2 BRANCHING INSTRUCTIONS

Branch instructions are those that tell the processor to make a decision about what the next
instruction to be executed should be based on the results of another instruction. Branch
instructions can be troublesome in a pipeline if a branch is conditional on the results of an
instruction which has not yet finished its path through the pipeline.

For example:

39
Loop : add $r3, $r2, $r1
sub $r6, $r5, $r4
beq $r3, $r6, Loop

The example above instructs the processor to add r1 and r2 and put the result in r3, then subtract
r4 from r5, storing the difference in r6. In the third instruction, beq stands for branch if equal. If
the contents of r3 and r6 are equal, the processor should execute the instruction labeled "Loop."
Otherwise, it should continue to the next instruction. In this example, the processor cannot make
a decision about which branch to take because neither the value of r3 or r6 have been written into
the registers yet.

The processor could stall, but a more sophisticated method of dealing with branch instructions is
branch prediction. The processor makes a guess about which path to take - if the guess is wrong,
anything written into the registers must be cleared, and the pipeline must be started again with
the correct instruction. Some methods of branch prediction depend on stereotypical behavior.
Branches pointing backward are taken about 90% of the time since backward-pointing branches
are often found at the bottom of loops. On the other hand, branches pointing forward, are only
taken approximately 50% of the time. Thus, it would be logical for processors to always follow
the branch when it points backward, but not when it points forward. Other methods of branch
prediction are less static: processors that use dynamic prediction keep a history for each branch
and uses it to predict future branches. These processors are correct in their predictions 90% of
the time.

40

You might also like